Performance Profiling in Python

Performance Profiling in Python

Ever been in the position of needing to troubleshoot poorly performing Python code, or out of curiosity wanted to see what is taking the longest to execute?

This is where you would want to go down the path of profiling your Python code, there are several tools available to do this, including some built right into the Python standard library. The standard library includes the cProfile and profile modules to expose profiling utilities to users, this post will focus on cProfile.

What is Profiling

Profiling refers to collecting statistics about the execution of a program, specifically how frequently a section of code executes for, and for how long it executes.

There are two different types of profiling, deterministic, and statistical.

Deterministic Profiling

A deterministic profiler monitors all function calls, returns, and exception events. This results in a highly accurate execution profile with precise measurements between each event. This does add some overhead to code that is being profiled. In the context of a Python application however, the interpreted nature of the language already adds significant overhead that makes the additional processing overhead of a deterministic profiler negligible.

Statistical Profiling

Statistical profilers take random samples of the programs instruction pointer and then infers runtime statistics based on this. As this method is not monitoring every event within a section of code it has significantly less overhead than deterministic profiling with the tradeoff of less precision.

Capturing a Profile

cProfile provides several ways of capturing a profile. The simplest of them being invoking a .py file with the cProfile module to get a profile of an entire scripts execution, or calling

Invoking a .py file can be done by executing the file from a terminal with the following command (replacing with the relevant filename).

python -m cProfile

Capturing a profile of the fib() function in the above file could also be achieved by importing the cProfile module and passing the function call as a string to

An additional way to capture and output profiles at runtime is to implement a decorator that will capture the execution profile of a decorated function and output the results. This is very useful for cases where you may want to gather runtime statistics on an ad-hoc basis for individual functions.

Interpreting The Results

Executing the script with the cProfile module results in the following output.

Here we can see that calling fib(35) results in 29,860,705 function calls, with 3 primitive calls. A primitive call is one that was not invoked by recursion. This tells us that the current implementation of fib resulted in 29,860,702 recursive function calls to calculate the first 35 Fibonacci numbers. As we can see that this far from efficient with significant room for improvement (outside the scope of this post).

The next line is indicating which column in the output the results are ordered by, the default is standard name which is the far right column in the results.

The actual results of the profile are represented in a tabular format with a line for each function that was executed. Each column represents a different statistic about the execution of the function.

Number of calls that were made to the function, recursive calls are represented with two numbers separated by a /.  E.g. 29860703/1 where the first number is the number of recursive calls made to the function, and the second the number of primitive calls.

This is the total time spent in the function, excluding any time that was spent in sub-functions executed by the current function.

The per call value, it is the result of tottime/ncalls and indicates the average time for each of the function calls made.
Note: this value is rounded to 3 decimal places so may display as 0.000.

This is the cumulative time spent in the function, including any calls to sub-functions, and recursive calls to the current function.

Perhaps somewhat confusing, there is a second percall column. This one is the result of cumtime/primitive calls. From this stat, we can see the actual per call time of the function inclusive of any recursive calls or sub-function calls were made within it.

The filename, line number, and function that the profile results line relates to.

What to do with all this?

With the basic knowledge of how to capture a profile, and interpret the results, we can approach performance problems in a Python application pragmatically way, identifying areas that are potential candidates for the root cause of a performance issue.

There are also other third-party utilities available for profiling Python code, a personal favourite is PyFlame from the Uber engineering team. Such tools fall outside the scope of this post, however, they often will provide more advanced insights into the execution of a Python application.

Show Comments