Performance fundamentals

Objectives

  • Learn Python specific performance aspects

Instructor note

  • 10 min teaching/type-along

Understanding the Python interpreter

Performance bottlenecks in Python

Have you ever written Python scripts that look something like this?

def read_xyz_from_text_file():
    f = open("mydata.dat", "r")
    for line in f.readlines():
        fields = line.split(",")
        x, y, z = fields[0], fields[1], fields[2]
        # some analysis with x, y and z
    f.close()

Compared to C/C++/Fortran, this for-loop will probably be orders of magnitude slower!

This happens because during the execution step CPython mostly interprets instructions. There is some level of optimization involved though. Here is a simplified schematic of how this is invoked:

flowchart TD a[Source code in .py files] --> tok[Tokenizer] tok --> ast[Abstract Syntax Tree] ast --> c[Byte-code: __pycache__] c --> d[Machine code in Python Virtual Machine] d --> c

Important

While doing so, the interpreter

  • evaluates a result, expression-by-expression.

  • every intermediate result is packed and unpacked as an instance of object (Python) / PyObject (CPython API) behind the scenes.

Type-Along

Try out the following code in Python Tutor

import io

def rms_from_text_file(f):
    """Compute root-mean-square of comma-separated values."""
    rms = 0
    n_samples = 0
    
    for line in f.readlines():
        fields = line.split(",")
        x, y, z = float(fields[0]), float(fields[1]), float(fields[2])
        # compute root-mean-square value
        rms += ((x**2 + y**2 + z**2) / 3) ** 0.5
        n_samples += 1
        
    return rms / n_samples


fake_file = io.StringIO("""\
0.27194615,0.85939776,0.76905204
0.51586611,0.59174447,0.06501842
0.23109192,0.8260391,0.08045166
""")

avg_rms = rms_from_text_file(fake_file)

Be aware that this is a simplified version of the execution. Since it does not go into the expression level. However you can get an idea of the intermediate objects being returned and the way the interpreter parses the code.

Discussion

In the previous episode, we described I/O, Memory and CPU bound bottlenecks. For the above use case and algorithm, and not necessarily the same code, what kind of performance issue arise,

  1. when the file becomes long, with several millions of lines?

  2. when the file is stored in network filesystem which is slow to respond?

  3. when instead of 3 fields, x, y, z, you have to read 10 million fields for every line of the text?

Structured approach towards optimization

The first priority is to look for an more efficient:

  1. Data container, data structure, database etc.

  2. Algorithm

If the above are not an option, then we move on to performance optimization.

  1. First we evaluate the overall performance by benchmarking.

  2. Then we measure the performance of at either function/method-level or line-level by profiling.

  3. Finally we generate optimized code.

Any Python code can be replaced using optimized instructions. This is done by ahead of time (AOT) / just-in-time (JIT) compilation. The question which remains to be answered is at which level? One can optimize:

program clip-art : whole programs (Nuitka, Shed Skin)

interpreter: terminal console clip-art : interpreter compiling slowest loops (PyPy)

module: blocks clip-art : modules (cython, pythran)

function clip-art : user-defined functions / methods (numba, transonic)

equality clip-art : expressions (numexpr)

function clip-art : call compiled functions (numpy / Python)

We will take a look at some of these approaches in the coming episodes.

Keypoints

  • Develop a strategy on how to optimize.

  • Go shopping.

    • Look for better ways of reading data or better algorithms

    • Look for tools and libraries to help you alleviate the performance bottlenecks.