Profiling is a way to determine how time is spent in a program. With this statistics, we will discover the “sizzling spot” of a program and take into consideration methods of enchancment. Typically, sizzling spot in surprising location might trace a bug in this system as effectively.
On this tutorial, we’ll see how we will use the profiling facility in Python. Particularly, you will notice
- How we will examine small code fragments utilizing
timeit
module - How we will profile your entire program utilizing
cProfile
module - How we will invoke a profiler inside an present program
- What the profiler can’t do
Let’s get began.

Profiling Python Code. Photograph by Prashant Saini. Some rights reserved.
Tutorial Overview
This tutorial is in 4 elements; they’re:
- Profiling small fragments
- The profile module
- Utilizing profiler inside code
- Caveats
Profiling small fragments
If you find yourself requested in regards to the alternative ways of doing the identical factor in Python, one perspective is to verify which one is extra environment friendly. In Python’s commonplace library, we now have the timeit
module that enables us to do some easy profiling.
For instance, to concatenate many brief strings, we will use the be part of()
operate from strings or use the +
operator. So how we all know which is quicker? Think about the next Python code:
longstr = “” for x in vary(1000): longstr += str(x) |
This can produce a protracted string 012345....
within the variabe longstr
. Different solution to write that is:
longstr = “”.be part of([str(x) for x in range(1000)]) |
To check the 2, we will do the next on the command line:
python -m timeit ‘longstr=””‘ ‘for x in vary(1000): longstr += str(x)’ python -m timeit ‘””.be part of([str(x) for x in range(1000)])’ |
These two instructions will produce the next output:
1000 loops, better of 5: 265 usec per loop 2000 loops, better of 5: 160 usec per loop |
The above instructions is to load the timeit
module and cross on a single line of code for measurement. Within the first case, we now have two traces of statements and they’re handed on to the timeit
module as two separate arguments. In the identical rationale, the primary command will also be offered as three traces of statements (by breaking the for loop into two traces), however the indentation of every line must be quoted appropriately:
python -m timeit ‘longstr=””‘ ‘for x in vary(1000):’ ‘ longstr += str(x)’ |
The output of timeit
is to search out the most effective efficiency amongst a number of runs (default to be 5). Every run is to run the supplied statements a number of instances (which is dynamically decided). The time is reported as the common to execute the statements as soon as in the most effective run.
Whereas it’s true that the be part of operate is quicker than the +
operator for string concatenation, the timing above shouldn’t be a good comparability. It’s as a result of we use str(x)
to make brief strings on the fly in the course of the loop. The higher solution to do are the next:
python -m timeit -s ‘strings = [str(x) for x in range(1000)]’ ‘longstr=””‘ ‘for x in strings:’ ‘ longstr += str(x)’ python -m timeit -s ‘strings = [str(x) for x in range(1000)]’ ‘””.be part of(strings)’ |
which produces:
2000 loops, better of 5: 173 usec per loop 50000 loops, better of 5: 6.91 usec per loop |
The -s
choice permits us to offer the “setup” code, which is executed earlier than the profiling and never timed. Within the above, we create the listing of brief strings earlier than we begin the loop. Therefore the time to create these strings aren’t measured within the “per loop” timing. From the above, we see that the be part of()
operate is 2 orders of magnitude sooner than the +
operator. The extra usually use of the -s
choice is to import the libraries. For instance, we will examine the sq. root operate from Python’s math module, from numpy, and utilizing the exponential operator **
as follows:
python -m timeit ‘[x**0.5 for x in range(1000)]’ python -m timeit -s ‘from math import sqrt’ ‘[sqrt(x) for x in range(1000)]’ python -m timeit -s ‘from numpy import sqrt’ ‘[sqrt(x) for x in range(1000)]’ |
The above produces the next measurement, which we see that math.sqrt()
is quickest whereas numpy.sqrt()
is slowest on this explicit instance:
5000 loops, better of 5: 93.2 usec per loop 5000 loops, better of 5: 72.3 usec per loop 200 loops, better of 5: 974 usec per loop |
For those who marvel why numpy is slowest, it’s as a result of numpy is optimized for arrays. You will notice its distinctive pace within the following various:
python -m timeit -s ‘import numpy as np; x=np.array(vary(1000))’ ‘np.sqrt(x)’ |
the place the result’s:
100000 loops, better of 5: 2.08 usec per loop |
For those who choose, you can too run timeit
in Python code. For instance, the next will likely be much like the above, however provide the uncooked complete timing for every run:
import timeit measurements = timeit.repeat(‘[x**0.5 for x in range(1000)]’, quantity=10000) print(measurements) |
Within the above, every run is to execute the assertion 10000 instances; the result’s as follows, which you’ll see the results of roughly 98 usec per loop in the most effective run:
[1.0888952040000106, 0.9799715450000122, 1.0921516899999801, 1.0946189250000202, 1.2792069260000005] |
The profile module
Concentrate on an announcement or two for efficiency is from a microscopic perspective. Likelihood is, we now have a protracted program and need to see what’s inflicting it to run sluggish. That occurs earlier than we will think about various statements or algorithms.
A program operating sluggish can usually resulting from two causes: A component is operating sluggish, or a component is operating too many instances and that added as much as take an excessive amount of time. We name these “efficiency hogs” the new spot. Let’s take a look at an instance. Think about the next program that makes use of hill climbing algorithm to search out hyperparameters for a perceptron mannequin:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# manually search perceptron hyperparameters for binary classification from numpy import imply from numpy.random import randn from numpy.random import rand from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron
# goal operate def goal(X, y, cfg): # unpack config eta, alpha = cfg # outline mannequin mannequin = Perceptron(penalty=‘elasticnet’, alpha=alpha, eta0=eta) # outline analysis process cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # consider mannequin scores = cross_val_score(mannequin, X, y, scoring=‘accuracy’, cv=cv, n_jobs=–1) # calculate imply accuracy consequence = imply(scores) return consequence
# take a step within the search area def step(cfg, step_size): # unpack the configuration eta, alpha = cfg # step eta new_eta = eta + randn() * step_measurement # verify the bounds of eta if new_eta <= 0.0: new_eta = 1e–8 if new_eta > 1.0: new_eta = 1.0 # step alpha new_alpha = alpha + randn() * step_measurement # verify the bounds of alpha if new_alpha < 0.0: new_alpha = 0.0 # return the brand new configuration return [new_eta, new_alpha]
# hill climbing native search algorithm def hillclimbing(X, y, goal, n_iter, step_size): # start line for the search resolution = [rand(), rand()] # consider the preliminary level solution_eval = goal(X, y, resolution) # run the hill climb for i in vary(n_iter): # take a step candidate = step(resolution, step_size) # consider candidate level candidate_eval = goal(X, y, candidate) # verify if we should always maintain the brand new level if candidate_eval >= solution_eval: # retailer the brand new level resolution, solution_eval = candidate, candidate_eval # report progress print(‘>%d, cfg=%s %.5f’ % (i, resolution, solution_eval)) return [solution, solution_eval]
# outline dataset X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1) # outline the whole iterations n_iter = 100 # step measurement within the search area step_size = 0.1 # carry out the hill climbing search cfg, rating = hillclimbing(X, y, goal, n_iter, step_size) print(‘Completed!’) print(‘cfg=%s: Imply Accuracy: %f’ % (cfg, rating)) |
Assume we saved this program within the file hillclimb.py
, we will run the profiler within the command line as follows:
python -m cProfile hillclimb.py |
and the output would be the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
>10, cfg=[0.3792455490265847, 0.21589566352848377] 0.78400 >17, cfg=[0.49105438202347707, 0.1342150084854657] 0.79833 >26, cfg=[0.5737524712834843, 0.016749795596210315] 0.80033 >47, cfg=[0.5067828976025809, 0.05280380038497864] 0.80133 >48, cfg=[0.5427345321546029, 0.0049895870979695875] 0.81167 Completed! cfg=[0.5427345321546029, 0.0049895870979695875]: Imply Accuracy: 0.811667 2686451 operate calls (2638255 primitive calls) in 5.500 seconds
Ordered by: commonplace title
ncalls tottime percall cumtime percall filename:lineno(operate) 101 0.001 0.000 4.892 0.048 hillclimb.py:11(goal) 1 0.000 0.000 5.501 5.501 hillclimb.py:2(<module>) 100 0.000 0.000 0.001 0.000 hillclimb.py:25(step) 1 0.001 0.001 4.894 4.894 hillclimb.py:44(hillclimbing) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(<module>) 303 0.000 0.000 0.008 0.000 <__array_function__ internals>:2(all) 303 0.000 0.000 0.005 0.000 <__array_function__ internals>:2(amin) 2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(any) 4 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(atleast_1d) 3333 0.003 0.000 0.018 0.000 <__array_function__ internals>:2(bincount) 103 0.000 0.000 0.001 0.000 <__array_function__ internals>:2(concatenate) 3 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(copyto) 606 0.001 0.000 0.010 0.000 <__array_function__ internals>:2(cumsum) 6 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(dot) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(empty_like) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(inv) 2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(linspace) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(lstsq) 101 0.000 0.000 0.005 0.000 <__array_function__ internals>:2(imply) 2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(ndim) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(outer) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(polyfit) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(polyval) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(prod) 303 0.000 0.000 0.002 0.000 <__array_function__ internals>:2(ravel) 2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(result_type) 303 0.001 0.000 0.001 0.000 <__array_function__ internals>:2(form) 303 0.000 0.000 0.035 0.000 <__array_function__ internals>:2(type) 4 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(trim_zeros) 1617 0.002 0.000 0.112 0.000 <__array_function__ internals>:2(distinctive) … |
The conventional output of this system will likely be printed first, after which the profiler’s statistics will likely be printed. From the primary row, we see that the operate goal()
in our program has run for 101 instances that took a complete of 4.89 seconds. However this 4.89 seconds are principally spent on the capabilities it known as, which the whole time that spent at that operate is merely 0.001 second. The capabilities from dependent modules are additionally profiled. Therefore you see a variety of numpy capabilities above too.
The above output is lengthy and is probably not helpful to you as it may be tough to inform which operate is the new spot. Certainly we will type the above output. For instance, to see which operate is named probably the most variety of instances, we will type by ncalls
:
python -m cProfile -s ncalls hillclimb.py |
Its output is as follows, which says the get()
operate from a Python dict is probably the most used operate (but it surely solely consumed 0.03 seconds in complete out of the 5.6 seconds to complete this system):
2685349 operate calls (2637153 primitive calls) in 5.609 seconds
Ordered by: name depend
ncalls tottime percall cumtime percall filename:lineno(operate) 247588 0.029 0.000 0.029 0.000 {technique ‘get’ of ‘dict’ objects} 246196 0.028 0.000 0.028 0.000 examine.py:2548(title) 168057 0.018 0.000 0.018 0.000 {technique ‘append’ of ‘listing’ objects} 161738 0.018 0.000 0.018 0.000 examine.py:2560(form) 144431 0.021 0.000 0.029 0.000 {built-in technique builtins.isinstance} 142213 0.030 0.000 0.031 0.000 {built-in technique builtins.getattr} … |
The opposite type choices are as follows:
Kind string | Which means |
---|---|
calls | Name depend |
cumulative | Cumulative time |
cumtime | Cumulative time |
file | File title |
filename | File title |
module | File title |
ncalls | Name depend |
pcalls | Primitive name depend |
line | Line quantity |
title | Perform title |
nfl | Identify/file/line |
stdname | Normal title |
time | Inner time |
tottime | Inner time |
If this system takes a while to complete, it’s not cheap to run this system many instances simply to search out the profiling end in a special type order. Certainly, we will save the profiler’s statistics for additional processing, as follows:
python -m cProfile -o hillclimb.stats hillclimb.py |
Much like above, it’s going to run this system. However this is not going to print the statistics to the display however to reserve it right into a file. Afterwards, we will use the pstats
module like following to open up the statistics file and supply us a immediate to govern the information:
python -m pstats hillclimb.stats |
For instance, we will use type command to alter the kind order and use stats to print what we noticed above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
Welcome to the profile statistics browser. hillclimb.stat% assist
Documented instructions (sort assist <subject>): ======================================== EOF add callees callers assist stop learn reverse type stats strip
hillclimb.stat% type ncall hillclimb.stat% stats hillclimb Thu Jan 13 16:44:10 2022 hillclimb.stat
2686227 operate calls (2638031 primitive calls) in 5.582 seconds
Ordered by: name depend Checklist diminished from 3456 to 4 resulting from restriction <‘hillclimb’>
ncalls tottime percall cumtime percall filename:lineno(operate) 101 0.001 0.000 4.951 0.049 hillclimb.py:11(goal) 100 0.000 0.000 0.001 0.000 hillclimb.py:25(step) 1 0.000 0.000 5.583 5.583 hillclimb.py:2(<module>) 1 0.000 0.000 4.952 4.952 hillclimb.py:44(hillclimbing)
hillclimb.stat% |
You’ll discover that the stats
command above permits us to offer an additional argument. The argument generally is a common expression to seek for the capabilities such that solely these matched will likely be printed. Therefore it’s a method to offer a search string to filter.
This pstats
browser permits us to see extra than simply the desk above. The callers
and callees
instructions reveals us which operate calls which operate and what number of instances it’s known as, and the way a lot time it spent. Therefore we will think about that as a breakdown of the operate degree statistics. It’s helpful when you have a variety of capabilities that calls one another and needed to understand how the time spent in several situations. For instance, this reveals that the goal()
operate is named solely by the hillclimbing()
operate however the hillclimbing()
operate calls a number of different capabilities:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
hillclimb.stat% callers goal Ordered by: name depend Checklist diminished from 3456 to 1 resulting from restriction <‘goal’>
Perform was known as by… ncalls tottime cumtime hillclimb.py:11(goal) <- 101 0.001 4.951 hillclimb.py:44(hillclimbing)
hillclimb.stat% callees hillclimbing Ordered by: name depend Checklist diminished from 3456 to 1 resulting from restriction <‘hillclimbing’>
Perform known as… ncalls tottime cumtime hillclimb.py:44(hillclimbing) -> 101 0.001 4.951 hillclimb.py:11(goal) 100 0.000 0.001 hillclimb.py:25(step) 4 0.000 0.000 {built-in technique builtins.print} 2 0.000 0.000 {technique ‘rand’ of ‘numpy.random.mtrand.RandomState’ objects}
hillclimb.stat% |
Utilizing profiler inside code
The above instance assumes you could have the entire program saved in a file and profile your entire program. Typically, we deal with solely part of your entire program. For instance, if we load a big module, it takes time to bootstrap and we need to ignore this from profiling. On this case, we will invoke the profiler just for sure traces. An instance is as follows, which modified from this system above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# manually search perceptron hyperparameters for binary classification import cProfile as profile import pstats from numpy import imply from numpy.random import randn from numpy.random import rand from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron
# goal operate def goal(X, y, cfg): # unpack config eta, alpha = cfg # outline mannequin mannequin = Perceptron(penalty=’elasticnet’, alpha=alpha, eta0=eta) # outline analysis process cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # consider mannequin scores = cross_val_score(mannequin, X, y, scoring=’accuracy’, cv=cv, n_jobs=-1) # calculate imply accuracy consequence = imply(scores) return consequence
# take a step within the search area def step(cfg, step_size): # unpack the configuration eta, alpha = cfg # step eta new_eta = eta + randn() * step_size # verify the bounds of eta if new_eta <= 0.0: new_eta = 1e-8 if new_eta > 1.0: new_eta = 1.0 # step alpha new_alpha = alpha + randn() * step_size # verify the bounds of alpha if new_alpha < 0.0: new_alpha = 0.0 # return the brand new configuration return [new_eta, new_alpha]
# hill climbing native search algorithm def hillclimbing(X, y, goal, n_iter, step_size): # start line for the search resolution = [rand(), rand()] # consider the preliminary level solution_eval = goal(X, y, resolution) # run the hill climb for i in vary(n_iter): # take a step candidate = step(resolution, step_size) # consider candidate level candidate_eval = goal(X, y, candidate) # verify if we should always maintain the brand new level if candidate_eval >= solution_eval: # retailer the brand new level resolution, solution_eval = candidate, candidate_eval # report progress print(‘>%d, cfg=%s %.5f’ % (i, resolution, solution_eval)) return [solution, solution_eval]
# outline dataset X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1) # outline the whole iterations n_iter = 100 # step measurement within the search area step_size = 0.1 # carry out the hill climbing search with profiling prof = profile.Profile() prof.allow() cfg, rating = hillclimbing(X, y, goal, n_iter, step_size) prof.disable() # print program output print(‘Completed!’) print(‘cfg=%s: Imply Accuracy: %f’ % (cfg, rating)) # print profiling output stats = pstats.Stats(prof).strip_dirs().sort_stats(“cumtime”) stats.print_stats(10) # prime 10 rows |
it’s going to output the next:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
>0, cfg=[0.3776271076534661, 0.2308364063203663] 0.75700 >3, cfg=[0.35803234662466354, 0.03204434939660264] 0.77567 >8, cfg=[0.3001050823005957, 0.0] 0.78633 >10, cfg=[0.39518618870158934, 0.0] 0.78633 >12, cfg=[0.4291267905390187, 0.0] 0.78633 >13, cfg=[0.4403131521968569, 0.0] 0.78633 >16, cfg=[0.38865272555918756, 0.0] 0.78633 >17, cfg=[0.38871654921891885, 0.0] 0.78633 >18, cfg=[0.4542440671724224, 0.0] 0.78633 >19, cfg=[0.44899743344802734, 0.0] 0.78633 >20, cfg=[0.5855375509507891, 0.0] 0.78633 >21, cfg=[0.5935318064858227, 0.0] 0.78633 >23, cfg=[0.7606367310048543, 0.0] 0.78633 >24, cfg=[0.855444293727846, 0.0] 0.78633 >25, cfg=[0.9505501566826242, 0.0] 0.78633 >26, cfg=[1.0, 0.0244821888204496] 0.79800 Completed! cfg=[1.0, 0.0244821888204496]: Imply Accuracy: 0.798000 2179559 operate calls (2140124 primitive calls) in 4.941 seconds
Ordered by: cumulative time Checklist diminished from 581 to 10 resulting from restriction <10>
ncalls tottime percall cumtime percall filename:lineno(operate) 1 0.001 0.001 4.941 4.941 hillclimb.py:46(hillclimbing) 101 0.001 0.000 4.939 0.049 hillclimb.py:13(goal) 101 0.001 0.000 4.931 0.049 _validation.py:375(cross_val_score) 101 0.002 0.000 4.930 0.049 _validation.py:48(cross_validate) 101 0.005 0.000 4.903 0.049 parallel.py:960(__call__) 101 0.235 0.002 3.089 0.031 parallel.py:920(retrieve) 3030 0.004 0.000 2.849 0.001 _parallel_backends.py:537(wrap_future_result) 3030 0.020 0.000 2.845 0.001 _base.py:417(consequence) 2602 0.016 0.000 2.819 0.001 threading.py:280(wait) 12447 2.796 0.000 2.796 0.000 {technique ‘purchase’ of ‘_thread.lock’ objects} |
Caveats
Utilizing profiler with Tensorflow fashions might not produce what you’ll count on, particularly when you have written your individual customized layer or customized operate for the mannequin. For those who did it appropriately, Tenorflow supposed to construct the computation graph earlier than your mannequin is executed and therefore the logic will likely be modified. The profiler output will due to this fact not exhibiting your customized lessons.
Equally for some superior modules that contain binary code. The profiler can see you known as some capabilities and marked it as “built-in” strategies but it surely can’t go any additional into the compiled code.
Beneath is a brief code of LeNet5 mannequin for the MNIST classification drawback. For those who attempt to profile it and print the highest 15 rows, you will notice {that a} wrapper is occupying majority of the time and nothing might be proven past that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
import numpy as np import tensorflow as tf from tensorflow.keras.datasets import mnist from tensorflow.keras.fashions import Sequential from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Flatten from tensorflow.keras.utils import to_categorical from tensorflow.keras.callbacks import EarlyStopping
# Load and reshape information to form of (n_sample, peak, width, n_channel) (X_train, y_train), (X_test, y_test) = mnist.load_data() X_train = np.expand_dims(X_train, axis=3).astype(‘float32’) X_test = np.expand_dims(X_test, axis=3).astype(‘float32’)
# One-hot encode the output y_train = to_categorical(y_train) y_test = to_categorical(y_test)
# LeNet5 mannequin mannequin = Sequential([ Conv2D(6, (5,5), input_shape=(28,28,1), padding=“same”, activation=“tanh”), AveragePooling2D((2,2), strides=2), Conv2D(16, (5,5), activation=“tanh”), AveragePooling2D((2,2), strides=2), Conv2D(120, (5,5), activation=“tanh”), Flatten(), Dense(84, activation=“tanh”), Dense(10, activation=“softmax”) ]) mannequin.abstract(line_length=100)
# Coaching mannequin.compile(loss=“categorical_crossentropy”, optimizer=“adam”, metrics=[“accuracy”]) earlystopping = EarlyStopping(monitor=“val_loss”, persistence=2, restore_best_weights=True) mannequin.match(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size=32, callbacks=[earlystopping])
# Consider print(mannequin.consider(X_test, y_test, verbose=0)) |
Within the consequence under, the TFE_Py_Execute
is marked as “built-in” technique and it consumes 30.1 sec out of the whole run time of 39.6 sec. Notice that the tottime is identical because the cumtime that means from profiler’s perspective, it appears all time are spent at this operate and it doesn’t name another capabilities. This illustrates the limitation of Python’s profiler.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
5962698 operate calls (5728324 primitive calls) in 39.674 seconds
Ordered by: cumulative time Checklist diminished from 12295 to fifteen resulting from restriction <15>
ncalls tottime percall cumtime percall filename:lineno(operate) 3212/1 0.013 0.000 39.699 39.699 {built-in technique builtins.exec} 1 0.003 0.003 39.699 39.699 mnist.py:4(<module>) 52/4 0.005 0.000 35.470 8.868 /usr/native/lib/python3.9/site-packages/keras/utils/traceback_utils.py:58(error_handler) 1 0.089 0.089 34.334 34.334 /usr/native/lib/python3.9/site-packages/keras/engine/coaching.py:901(match) 11075/9531 0.032 0.000 33.406 0.004 /usr/native/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py:138(error_handler) 4689 0.089 0.000 33.017 0.007 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/def_function.py:882(__call__) 4689 0.023 0.000 32.771 0.007 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/def_function.py:929(_call) 4688 0.042 0.000 32.134 0.007 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/operate.py:3125(__call__) 4689 0.075 0.000 30.941 0.007 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/operate.py:1888(_call_flat) 4689 0.158 0.000 30.472 0.006 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/operate.py:553(name) 4689 0.034 0.000 30.152 0.006 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/execute.py:33(quick_execute) 4689 30.105 0.006 30.105 0.006 {built-in technique tensorflow.python._pywrap_tfe.TFE_Py_Execute} 3185/24 0.021 0.000 3.902 0.163 <frozen importlib._bootstrap>:1002(_find_and_load) 3169/10 0.014 0.000 3.901 0.390 <frozen importlib._bootstrap>:967(_find_and_load_unlocked) 2885/12 0.009 0.000 3.901 0.325 <frozen importlib._bootstrap_external>:844(exec_module) |
Lastly, Python’s profiler offers you solely the statistics on time however not reminiscence utilization. It’s possible you’ll have to search for one other library or instruments for this objective.
Additional Readings
The usual library modules timeit
, cProfile
, pstats
have their documentation in Python’s documentation:
The usual library’s profiler could be very highly effective however not the one one. If you would like one thing extra visible, you may check out the Python Name Graph module. It could possibly produce an image of how capabilities calling one another utilizing the GraphViz instrument:
The limitation of not in a position to dig into the compiled code might be solved by not utilizing the Python’s profiler however as an alternative, use one for compiled applications. My favourite is Valgrind:
however to make use of it, you might have to recompile your Python interpreter to activate debugging assist.
Abstract
On this tutorial, we realized what’s a profiler and what it might do. Particularly,
- We all know how one can examine small code with
timeit
module - We see Python’s
cProfile
module can present us detailed statistics on how time is spent - We realized to make use of the
pstats
module in opposition to the output ofcProfile
to type or filter