Python Performance Profiling Using PySpy#


Squirro is primarily written in Python. If any aspect of Squirro runs slower than expected, or takes more CPU than expected, or something is hanging for a very long time without a good reason, then PySpy is a great utility to investigate what is going on.

In general: Any problem report to Support or Engineering related to performance or hangs will yield much faster results with a PySpy Top snapshot or a dump or a flamegraph.


See the GitHub Sampling Profiler for Python Programs.


pip install py-spy

Note: On a Squirro server, it’s best to create a dedicated Virtualenv:

/opt/rh/rh-python36/root/bin/virtualenv ~/pyspy
source ~/pyspy/bin/activate
pip install py-spy

Before You Start#

To begin, you’ll need a PID of a target process:


#ps -elf    | grep <service>d
ps -elf    | grep topicd

Common Flags#

-p, --pid <pid>              PID of a running python program to spy on
--full-filenames Show full Python filenames
-r, --rate <rate>            The number of samples to collect per second [default: 100]
-s, --subprocesses           Profile subprocesses of the original process
-n, --native                 Collect stack traces from native extensions written in Cython, C or C++


Great for these situations:

  • What is this service doing?

  • Why is a service/process slow?

py-spy top -p 29121


Total Samples 991
GIL: 2.20%, Active: 19.78%, Threads: 16

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  8.79%   8.79%   0.330s    0.330s   readinto (
  4.40%   4.40%   0.150s    0.150s   run (topic/
  0.00%   0.00%   0.120s    0.120s   run (flup/server/
  0.00%   8.79%   0.000s    0.330s   _get_and_deliver (kombu/transport/virtual/
  0.00%   8.79%   0.000s    0.330s   _read_status (http/
  0.00%   8.79%   0.000s    0.330s   _delete_source_id (topic/
  0.00%   0.00%   0.000s    0.120s   run_flup_daemon (common/
  0.00%   4.40%   0.000s    0.080s   POST (topic/
  0.00%   8.79%   0.000s    0.330s   get (kombu/utils/
  0.00%   4.40%   0.000s    0.080s   wrapped_f (common/
  0.00%   8.79%   0.000s    0.330s   delete (elasticsearch/client/
  0.00%   8.79%   0.000s    0.330s   consume (kombu/

Here we can see that a source is being deleted.

Dump a Stacktrace#

Great for these situations:

A process is stuck:

py-spy dump -p 1529


Thread 1614 (active): "local.3"
    readinto (
    _read_status (http/
    begin (http/
    getresponse (http/
    _make_request (urllib3/
    urlopen (urllib3/
    perform_request (elasticsearch/connection/
    perform_request (elasticsearch/
    perform_request (common/tracing/
    delete (elasticsearch/client/
    _wrapped (elasticsearch/client/
    retry (index/
    delete_document (index/
    delete_document (index/item/
    update_item (topic/
    _delete_source_id (topic/
    _on_task_delete_source_items (topic/
    _on_task (common/
    on_task (common/
    _process_task (common/
    <listcomp> (kombu/
    receive (kombu/
    _receive_callback (kombu/
    _callback (kombu/transport/virtual/
    _deliver (kombu/transport/virtual/
    _get_and_deliver (kombu/transport/virtual/
    get (kombu/utils/
    _poll (kombu/transport/virtual/
    drain_events (kombu/transport/virtual/
    _drain_channel (kombu/transport/virtual/
    get (kombu/utils/
    drain_events (kombu/transport/virtual/
    drain_events (kombu/
    consume (kombu/
    run (kombu/
    run (common/
    _bootstrap_inner (
    _bootstrap (

The stack is reversed, you can see it’s on a network socket waiting for a response of an Elasticsearch delete command.


Great for these situations:

  • Investigate slow performance

  • Understanding the bottlenecks during a load test

  • Performance regression testing: New code vs old code

  • Generally pinpointing where time is spent. Note: Time is not equal to CPU load.

py-spy record -p 1529 -o /tmp/flamegraph.svg
Sampling process 100 times a second. Press Control-C to exit.
Stopped sampling because Control-C pressed
Wrote flamegraph data to '/tmp/flamegraph.svg'. Samples: 6864 Errors: 0

You can use -f speedscope for a more interactive experience and can analyze the output file on the Speedscope Site.


Full View:


Zoomed in, we can see that it’s spending a lot of time in the delete item function of Squirro, which in turn is waiting on Elastic due to many roundtrips:


Tips and Caveats#

  • You need root privileges

  • Start recording during a time of action. If you record with py-spy and do nothing on the server, the flamegraph will mostly show flup/kombu doing a whole lot of nothing.

  • For the record, consider using the --function switch, it will show you the function instead of the line number. This can be easier to read if you’re not familiar with the codebase.

  • Don’t forget the --subprocesses flag if you have multi-process services, e.g. topicd, datasourced, machinelearningd, data loader with parallel flag etc.

  • Use the --native flag if C libraries are at play.