Performance fundamentals

Objectives

Learn Python specific performance aspects

Instructor note

10 min teaching/type-along

Understanding the Python interpreter

Performance bottlenecks in Python

Have you ever written Python scripts that look something like this?

def read_xyz_from_text_file():
    f = open("mydata.dat", "r")
    for line in f.readlines():
        fields = line.split(",")
        x, y, z = fields[0], fields[1], fields[2]
        # some analysis with x, y and z
    f.close()

Compared to C/C++/Fortran, this for-loop will probably be orders of magnitude slower!

This happens because during the execution step CPython mostly interprets instructions. There is some level of optimization involved though. Here is a simplified schematic of how this is invoked:

flowchart TD a[Source code in .py files] --> tok[Tokenizer] tok --> ast[Abstract Syntax Tree] ast --> c[Byte-code: __pycache__] c --> d[Machine code in Python Virtual Machine] d --> c

Important

While doing so, the interpreter

evaluates a result, expression-by-expression.
every intermediate result is packed and unpacked as an instance of object (Python) / PyObject (CPython API) behind the scenes.

Type-Along

Try out the following code in Python Tutor

import io

def rms_from_text_file(f):
    """Compute root-mean-square of comma-separated values."""
    rms = 0
    n_samples = 0
    
    for line in f.readlines():
        fields = line.split(",")
        x, y, z = float(fields[0]), float(fields[1]), float(fields[2])
        # compute root-mean-square value
        rms += ((x**2 + y**2 + z**2) / 3) ** 0.5
        n_samples += 1
        
    return rms / n_samples


fake_file = io.StringIO("""\
0.27194615,0.85939776,0.76905204
0.51586611,0.59174447,0.06501842
0.23109192,0.8260391,0.08045166
""")

avg_rms = rms_from_text_file(fake_file)

Be aware that this is a simplified version of the execution. Since it does not go into the expression level. However you can get an idea of the intermediate objects being returned and the way the interpreter parses the code.

Discussion

In the previous episode, we described I/O, Memory and CPU bound bottlenecks. For the above use case and algorithm, and not necessarily the same code, what kind of performance issue arise,

when the file becomes long, with several millions of lines?
when the file is stored in network filesystem which is slow to respond?
when instead of 3 fields, x, y, z, you have to read 10 million fields for every line of the text?

Solution

We can only guess at this point, but we can expect the above code to be

CPU bound: the for loop becomes a hotspot and vanilla CPython without JIT does not optimize this.
I/O bound: if more time is spent in awaiting output of f.readlines() method
Memory bound: if a line of data does not fit in the memory the code needs to handle it in batches. The program will need to be rewritten with nested for-loop which depends on the memory availability.

We have to keep in mind that performance depends a lot on the kind of

input data
algorithm

Using a better container for the input data or a better algorithm with less computation complexity can often outperform technical solutions.

Structured approach towards optimization

The first priority is to look for an more efficient:

Data container, data structure, database etc.
Algorithm

If the above are not an option, then we move on to performance optimization.

First we evaluate the overall performance by benchmarking.
Then we measure the performance of at either function/method-level or line-level by profiling.
Finally we generate optimized code.

Any Python code can be replaced using optimized instructions. This is done by ahead of time (AOT) / just-in-time (JIT) compilation. The question which remains to be answered is at which level? One can optimize:

: whole programs (Nuitka, Shed Skin)

: interpreter compiling slowest loops (PyPy)

: modules (cython, pythran)

: user-defined functions / methods (numba, transonic)

: expressions (numexpr)

: call compiled functions (numpy / Python)

We will take a look at some of these approaches in the coming episodes.

Keypoints

Develop a strategy on how to optimize.
Go shopping.
- Look for better ways of reading data or better algorithms
- Look for tools and libraries to help you alleviate the performance bottlenecks.