Profile

Objectives

  • Learn how to profile Python code using cProfile

  • Learn how to visualise cProfile results using SnakeViz

  • Examine the most most expensive function call via line_profiler

Instructor note

  • 15 min teaching/type-along

Using cProfile to investigate performance

While %timeit can provide good benchmarking information on single lines or single functions, larger codebases have more complex function hierarchies which require more sofisticated tools to traverse properly. Python comes with two built-in tools to profile code, which implement the same interface: cProfile and profile. These tools can help to identify performance bottlenecks in the code.

In this lesson, we will use cProfile due to its smaller overhead (profile, on the other hand, is more extensible). The standard syntax to call it is:

$ python -m cProfile [-o <outputFile>] <python_module>

By default, cProfile writes the results to stdout, but the optional -o flag redirects the output to file instead. A report can be generated using the pstats command.

Type-Along

Let’s profile the wordcount script and write the results to a file.

Warning

Use the shell variant. The profiling output from Jupyter, although it seems to work, is hard to decipher.

The %run magic supports profiling out-of-the-box using the -p flag. The script can be run as:

In [1]: %run -p -D wordcount.prof source/wordcount.py data/concat.txt processed_data/concat.dat
 
*** Profile stats marshalled to file 'wordcount.prof'.

We can call cProfile as:

$ python -m cProfile -o wordcount.prof source/wordcount.py data/concat.txt processed_data/concat.dat

We can then generate a report using the pstats command:

$ python -m pstats wordcount.prof
# Welcome to the profile statistics browser.
# wordcount.prof% sort tottime
# wordcount.prof% stats
# Wed Sep 25 11:52:27 2024    wordcount.prof

#          53473208 function calls in 8.410 seconds

#    Ordered by: internal time

#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#   1233410    4.151    0.000    7.204    0.000 source/wordcount.py:41(update_word_counts)
#  32068660    1.799    0.000    1.799    0.000 {method 'replace' of 'str' objects}
#   7747363    0.570    0.000    0.570    0.000 {method 'lower' of 'str' objects}
#   7747363    0.428    0.000    0.428    0.000 {method 'strip' of 'str' objects}
#   1530212    0.271    0.000    0.271    0.000 source/wordcount.py:23(<genexpr>)
#   1233411    0.256    0.000    0.256    0.000 {method 'split' of 'str' objects}
#         1    0.184    0.184    7.388    7.388 source/wordcount.py:59(calculate_word_counts)
#    382553    0.133    0.000    0.404    0.000 {method 'join' of 'str' objects}
#         1    0.126    0.126    0.580    0.580 source/wordcount.py:16(save_word_counts)
# ...

Discussion

Profiling introduces a non-negligible overhead on the code being executed. Thus, the absolute values for time being spent in each function should be taken with a grain of salt. The real objective lies in understanding the relative amount of time spent in each function call.

Using SnakeViz to visualise performance reports

SnakeViz is a browser-based visualiser of performance reports generated by cProfile. It is already included among the dependecies installed in this virtual/Conda environment.

Type-Along

SnakeViz has a IPython magic to profile and open a browser directly. To use it, we just need to load the relevant extension and run it:

In [4]: %load_ext snakeviz
In [5]: %snakeviz wordcount.word_count("data/concat.txt", "processed_data/concat.dat", 1)

Warning

This will run only if the source IPython instance has access to a local web browser. This also means that, e.g., if you are on Windows and following the tutorial in WSL, this will not work.

We can run SnakeViz as:

$ snakeviz wordcount.prof --server

The output will contain a clickable link containing the visualisation.

SnakeViz output

Based on the output, we can clearly see that the update_word_counts() function is where most of the runtime of the script is spent.

Using line_profiler to inspect the expensive function

Once the main performance-intensive function is identified, we can further examine it to find bottlenecks. This can be done using the line_profiler tool, which returns a line-by-line breakdown of where time is spent.

Type-Along

Let’s profile the wordcount script and write the results to a file.

The line_profiler package provides a magic to be used in IPython. First, the magic needs to be loaded:

In [1]: %load_ext line_profiler

The script can be run with the %lprun magic, whose syntax is very close to the %run introduced above. Notice that we have to explicitly mention which functions we want to step through line by line:

In [5]: %lprun -f wordcount.update_word_counts wordcount.word_count("data/concat.txt", "processed_data/concat.dat", 1)

To use line_profiler from the command line, the functions to be profiled need to be explicitly marked. For this reason, we need to import the relevant decorator at the beginning of the wordcount.py script:

from line_profiler import profile

Note

The tool will also work without the import

We can now mark the update_word_counts function with the @profile decorator:

@profile
def update_word_counts(line, counts):

Profiling is performed with a script called kernprof. Its usage is the following:

kernprof -lvr source/wordcount.py data/concat.txt processed_data/concat.dat

The -l flag ensures that the function is profiled step-by-step, the -v flag shows the result on stdout and the -r flag is used to have rich format on output.

Wrote profile results to wordcount.py.lprof
Timer unit: 1e-06 s

Total time: 12.2802 s
File: source/wordcount.py
Function: update_word_counts at line 40

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    40                                           @profile
    41                                           def update_word_counts(line, counts):
    42                                               """
    43                                               Given a string, parse the string and update a dictionary of word
    44                                               counts (mapping words to counts of their frequencies). DELIMITERS are
    45                                               removed before the string is parsed. The function is case-insensitive
    46                                               and words in the dictionary are in lower-case.
    47                                               """
    48  33302070    2574252.9      0.1     21.0      for purge in DELIMITERS:
    49  32068660    4405499.9      0.1     35.9          line = line.replace(purge, " ")
    50   1233410     392268.8      0.3      3.2      words = line.split()
    51   8980773     819407.4      0.1      6.7      for word in words:
    52   7747363    1457841.0      0.2     11.9          word = word.lower().strip()
    53   7747363    1355462.5      0.2     11.0          if word in counts:
    54   7364810    1211000.7      0.2      9.9              counts[word] += 1
    55                                                   else:
    56    382553      64505.7      0.2      0.5              counts[word] = 1

Based on the output, we can conclude that most of the time is spent replacing delimiters.

Keypoints

  • The cProfile module can provide information on how costly each function call is.

  • Profile reports can be inspected using the pstats tool in tabular form or with SnakeViz for a graphical visualisation

  • The line_profiler tool can be used to inspect line-by-line performance overhead.