HomeArtificial IntelligenceProfiling Python Code

Profiling Python Code

Profiling is a way to determine how time is spent in a program. With this statistics, we will discover the “sizzling spot” of a program and take into consideration methods of enchancment. Typically, sizzling spot in surprising location might trace a bug in this system as effectively.

On this tutorial, we’ll see how we will use the profiling facility in Python. Particularly, you will notice

  • How we will examine small code fragments utilizing timeit module
  • How we will profile your entire program utilizing cProfile module
  • How we will invoke a profiler inside an present program
  • What the profiler can’t do

Let’s get began.

Profiling Python Code. Photograph by Prashant Saini. Some rights reserved.

Tutorial Overview

This tutorial is in 4 elements; they’re:

  • Profiling small fragments
  • The profile module
  • Utilizing profiler inside code
  • Caveats

Profiling small fragments

If you find yourself requested in regards to the alternative ways of doing the identical factor in Python, one perspective is to verify which one is extra environment friendly. In Python’s commonplace library, we now have the timeit module that enables us to do some easy profiling.

For instance, to concatenate many brief strings, we will use the be part of() operate from strings or use the + operator. So how we all know which is quicker? Think about the next Python code:

This can produce a protracted string 012345.... within the variabe longstr. Different solution to write that is:

To check the 2, we will do the next on the command line:

These two instructions will produce the next output:

The above instructions is to load the timeit module and cross on a single line of code for measurement. Within the first case, we now have two traces of statements and they’re handed on to the timeit module as two separate arguments. In the identical rationale, the primary command will also be offered as three traces of statements (by breaking the for loop into two traces), however the indentation of every line must be quoted appropriately:

The output of timeit is to search out the most effective efficiency amongst a number of runs (default to be 5). Every run is to run the supplied statements a number of instances (which is dynamically decided). The time is reported as the common to execute the statements as soon as in the most effective run.

Whereas it’s true that the be part of operate is quicker than the + operator for string concatenation, the timing above shouldn’t be a good comparability. It’s as a result of we use str(x) to make brief strings on the fly in the course of the loop. The higher solution to do are the next:

which produces:

The -s choice permits us to offer the “setup” code, which is executed earlier than the profiling and never timed. Within the above, we create the listing of brief strings earlier than we begin the loop. Therefore the time to create these strings aren’t measured within the “per loop” timing. From the above, we see that the be part of() operate is 2 orders of magnitude sooner than the + operator. The extra usually use of the -s choice is to import the libraries. For instance, we will examine the sq. root operate from Python’s math module, from numpy, and utilizing the exponential operator ** as follows:

The above produces the next measurement, which we see that math.sqrt() is quickest whereas numpy.sqrt() is slowest on this explicit instance:

For those who marvel why numpy is slowest, it’s as a result of numpy is optimized for arrays. You will notice its distinctive pace within the following various:

the place the result’s:

For those who choose, you can too run timeit in Python code. For instance, the next will likely be much like the above, however provide the uncooked complete timing for every run:

Within the above, every run is to execute the assertion 10000 instances; the result’s as follows, which you’ll see the results of roughly 98 usec per loop in the most effective run:

The profile module

Concentrate on an announcement or two for efficiency is from a microscopic perspective. Likelihood is, we now have a protracted program and need to see what’s inflicting it to run sluggish. That occurs earlier than we will think about various statements or algorithms.

A program operating sluggish can usually resulting from two causes: A component is operating sluggish, or a component is operating too many instances and that added as much as take an excessive amount of time. We name these “efficiency hogs” the new spot. Let’s take a look at an instance. Think about the next program that makes use of hill climbing algorithm to search out hyperparameters for a perceptron mannequin:

Assume we saved this program within the file hillclimb.py, we will run the profiler within the command line as follows:

and the output would be the following:

The conventional output of this system will likely be printed first, after which the profiler’s statistics will likely be printed. From the primary row, we see that the operate goal() in our program has run for 101 instances that took a complete of 4.89 seconds. However this 4.89 seconds are principally spent on the capabilities it known as, which the whole time that spent at that operate is merely 0.001 second. The capabilities from dependent modules are additionally profiled. Therefore you see a variety of numpy capabilities above too.

The above output is lengthy and is probably not helpful to you as it may be tough to inform which operate is the new spot. Certainly we will type the above output. For instance, to see which operate is named probably the most variety of instances, we will type by ncalls:

Its output is as follows, which says the get() operate from a Python dict is probably the most used operate (but it surely solely consumed 0.03 seconds in complete out of the 5.6 seconds to complete this system):

The opposite type choices are as follows:

Kind string Which means
calls Name depend
cumulative Cumulative time
cumtime Cumulative time
file File title
filename File title
module File title
ncalls Name depend
pcalls Primitive name depend
line Line quantity
title Perform title
nfl Identify/file/line
stdname Normal title
time Inner time
tottime Inner time

If this system takes a while to complete, it’s not cheap to run this system many instances simply to search out the profiling end in a special type order. Certainly, we will save the profiler’s statistics for additional processing, as follows:

Much like above, it’s going to run this system. However this is not going to print the statistics to the display however to reserve it right into a file. Afterwards, we will use the pstats module like following to open up the statistics file and supply us a immediate to govern the information:

For instance, we will use type command to alter the kind order and use stats to print what we noticed above:

You’ll discover that the stats command above permits us to offer an additional argument. The argument generally is a common expression to seek for the capabilities such that solely these matched will likely be printed. Therefore it’s a method to offer a search string to filter.

This pstats browser permits us to see extra than simply the desk above. The callers and callees instructions reveals us which operate calls which operate and what number of instances it’s known as, and the way a lot time it spent. Therefore we will think about that as a breakdown of the operate degree statistics. It’s helpful when you have a variety of capabilities that calls one another and needed to understand how the time spent in several situations. For instance, this reveals that the goal() operate is named solely by the hillclimbing() operate however the hillclimbing() operate calls a number of different capabilities:

Utilizing profiler inside code

The above instance assumes you could have the entire program saved in a file and profile your entire program. Typically, we deal with solely part of your entire program. For instance, if we load a big module, it takes time to bootstrap and we need to ignore this from profiling. On this case, we will invoke the profiler just for sure traces. An instance is as follows, which modified from this system above:

it’s going to output the next:


Utilizing profiler with Tensorflow fashions might not produce what you’ll count on, particularly when you have written your individual customized layer or customized operate for the mannequin. For those who did it appropriately, Tenorflow supposed to construct the computation graph earlier than your mannequin is executed and therefore the logic will likely be modified. The profiler output will due to this fact not exhibiting your customized lessons.

Equally for some superior modules that contain binary code. The profiler can see you known as some capabilities and marked it as “built-in” strategies but it surely can’t go any additional into the compiled code.

Beneath is a brief code of LeNet5 mannequin for the MNIST classification drawback. For those who attempt to profile it and print the highest 15 rows, you will notice {that a} wrapper is occupying majority of the time and nothing might be proven past that:

Within the consequence under, the TFE_Py_Execute is marked as “built-in” technique and it consumes 30.1 sec out of the whole run time of 39.6 sec. Notice that the tottime is identical because the cumtime that means from profiler’s perspective, it appears all time are spent at this operate and it doesn’t name another capabilities. This illustrates the limitation of Python’s profiler.

Lastly, Python’s profiler offers you solely the statistics on time however not reminiscence utilization. It’s possible you’ll have to search for one other library or instruments for this objective.

Additional Readings

The usual library modules timeit, cProfile, pstats have their documentation in Python’s documentation:

The usual library’s profiler could be very highly effective however not the one one. If you would like one thing extra visible, you may check out the Python Name Graph module. It could possibly produce an image of how capabilities calling one another utilizing the GraphViz instrument:

The limitation of not in a position to dig into the compiled code might be solved by not utilizing the Python’s profiler however as an alternative, use one for compiled applications. My favourite is Valgrind:

however to make use of it, you might have to recompile your Python interpreter to activate debugging assist.


On this tutorial, we realized what’s a profiler and what it might do. Particularly,

  • We all know how one can examine small code with timeit module
  • We see Python’s cProfile module can present us detailed statistics on how time is spent
  • We realized to make use of the pstats module in opposition to the output of cProfile to type or filter



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments