Python Performance Profiling Using PySpy#

Intro#

Squirro is primarily written in Python. If any aspect of Squirro runs slower than expected, or takes more CPU than expected, or something is hanging for a very long time without a good reason, then PySpy is a great utility to investigate what is going on.

In general: Any problem report to Support or Engineering related to performance or hangs will yield much faster results with a PySpy Top snapshot or a dump or a flamegraph.

Home#

See the GitHub Sampling Profiler for Python Programs.

Installation#

Squirro recommends working with Python virtual environments to ensure a clean and isolated setup for your projects. For more information regarding virtual environments in Python, refer to the documentation on the python.org website. To get started with PySpy, first connect to your Squirro instance using the command line interface, then install PySpy using the following commands:

  1. Create a virtual environement:

    python3.11 -m venv ~/pyspy
    
  2. Activate the virtual environement:

    source ~/pyspy/bin/activate
    
  3. Install PySpy:

    pip install py-spy
    

Before You Start#

To begin, you’ll need a PID of a target process:

Example:

#ps -elf    | grep <service>d
ps -elf    | grep topicd

Common Flags#

-p, --pid <pid>              PID of a running python program to spy on
--full-filenames Show full Python filenames
-r, --rate <rate>            The number of samples to collect per second [default: 100]
-s, --subprocesses           Profile subprocesses of the original process
-n, --native                 Collect stack traces from native extensions written in Cython, C or C++

TOP#

Great for these situations:

  • What is this service doing?

  • Why is a service/process slow?

py-spy top -p 29121

Output:

Total Samples 991
GIL: 2.20%, Active: 19.78%, Threads: 16

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  8.79%   8.79%   0.330s    0.330s   readinto (socket.py:586)
  4.40%   4.40%   0.150s    0.150s   run (topic/background.py:534)
  0.00%   0.00%   0.120s    0.120s   run (flup/server/threadedserver.py:76)
  0.00%   8.79%   0.000s    0.330s   _get_and_deliver (kombu/transport/virtual/base.py:406)
  0.00%   8.79%   0.000s    0.330s   _read_status (http/client.py:268)
  0.00%   8.79%   0.000s    0.330s   _delete_source_id (topic/background.py:116)
  0.00%   0.00%   0.000s    0.120s   run_flup_daemon (common/daemonlauncher.py:89)
  0.00%   4.40%   0.000s    0.080s   POST (topic/items.py:849)
  0.00%   8.79%   0.000s    0.330s   get (kombu/utils/scheduling.py:56)
  0.00%   4.40%   0.000s    0.080s   wrapped_f (common/__init__.py:142)
  0.00%   8.79%   0.000s    0.330s   delete (elasticsearch/client/__init__.py:596)
  0.00%   8.79%   0.000s    0.330s   consume (kombu/mixins.py:197)

Here we can see that a source is being deleted.

Dump a Stacktrace#

Great for these situations:

A process is stuck:

py-spy dump -p 1529

Output:

Thread 1614 (active): "local.3"
    readinto (socket.py:586)
    _read_status (http/client.py:268)
    begin (http/client.py:307)
    getresponse (http/client.py:1346)
    _make_request (urllib3/connectionpool.py:421)
    urlopen (urllib3/connectionpool.py:677)
    perform_request (elasticsearch/connection/http_urllib3.py:229)
    perform_request (elasticsearch/transport.py:362)
    perform_request (common/tracing/elasticsearch_tracing.py:84)
    delete (elasticsearch/client/__init__.py:596)
    _wrapped (elasticsearch/client/utils.py:92)
    retry (index/retry.py:57)
    delete_document (index/writer.py:183)
    delete_document (index/item/writer.py:152)
    update_item (topic/background.py:88)
    _delete_source_id (topic/background.py:116)
    _on_task_delete_source_items (topic/background.py:70)
    _on_task (common/queue.py:1071)
    on_task (common/queue.py:992)
    _process_task (common/queue.py:528)
    <listcomp> (kombu/messaging.py:590)
    receive (kombu/messaging.py:590)
    _receive_callback (kombu/messaging.py:624)
    _callback (kombu/transport/virtual/base.py:633)
    _deliver (kombu/transport/virtual/base.py:983)
    _get_and_deliver (kombu/transport/virtual/base.py:406)
    get (kombu/utils/scheduling.py:56)
    _poll (kombu/transport/virtual/base.py:402)
    drain_events (kombu/transport/virtual/base.py:745)
    _drain_channel (kombu/transport/virtual/base.py:1001)
    get (kombu/utils/scheduling.py:56)
    drain_events (kombu/transport/virtual/base.py:963)
    drain_events (kombu/connection.py:323)
    consume (kombu/mixins.py:197)
    run (kombu/mixins.py:175)
    run (common/background.py:174)
    _bootstrap_inner (threading.py:916)
    _bootstrap (threading.py:884)

The stack is reversed, you can see it’s on a network socket waiting for a response of an Elasticsearch delete command.

Record#

Great for these situations:

  • Investigate slow performance

  • Understanding the bottlenecks during a load test

  • Performance regression testing: New code vs old code

  • Generally pinpointing where time is spent. Note: Time is not equal to CPU load.

py-spy record -p 1529 -o /tmp/flamegraph.svg
Sampling process 100 times a second. Press Control-C to exit.
...
Stopped sampling because Control-C pressed
Wrote flamegraph data to '/tmp/flamegraph.svg'. Samples: 6864 Errors: 0

You can use -f speedscope for a more interactive experience and can analyze the output file on the Speedscope Site.

Output:

Full View:

image1

Zoomed in, we can see that it’s spending a lot of time in the delete item function of Squirro, which in turn is waiting on Elastic due to many roundtrips:

image2

Tips and Caveats#

  • You need root privileges

  • Start recording during a time of action. If you record with py-spy and do nothing on the server, the flamegraph will mostly show flup/kombu doing a whole lot of nothing.

  • For the record, consider using the --function switch, it will show you the function instead of the line number. This can be easier to read if you’re not familiar with the codebase.

  • Don’t forget the --subprocesses flag if you have multi-process services, e.g. topicd, datasourced, machinelearningd, data loader with parallel flag etc.

  • Use the --native flag if C libraries are at play.