Python Performance Profiling Using PySpy#

Intro#

Squirro is primarily written in Python. If any aspect of Squirro runs slower than expected, or takes more CPU than expected, or something is hanging for a very long time without a good reason, then PySpy is a great utility to investigate what is going on.

In general: Any problem report to Support or Engineering related to performance or hangs will yield much faster results with a PySpy Top snapshot or a dump or a flamegraph.

Home#

See the GitHub Sampling Profiler for Python Programs.

Installation#

pip install py-spy

Note: On a Squirro server, it’s best to create a dedicated Virtualenv:

/opt/rh/rh-python36/root/bin/virtualenv ~/pyspy
source ~/pyspy/bin/activate
pip install py-spy

Before You Start#

To begin, you’ll need a PID of a target process:

Example:

#ps -elf    | grep <service>d
ps -elf    | grep topicd

Common Flags#

-p, --pid <pid>              PID of a running python program to spy on
--full-filenames Show full Python filenames
-r, --rate <rate>            The number of samples to collect per second [default: 100]
-s, --subprocesses           Profile subprocesses of the original process
-n, --native                 Collect stack traces from native extensions written in Cython, C or C++

TOP#

Great for these situations:

  • What is this service doing?

  • Why is a service/process slow?

py-spy top -p 29121

Output:

Total Samples 991
GIL: 2.20%, Active: 19.78%, Threads: 16

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  8.79%   8.79%   0.330s    0.330s   readinto (socket.py:586)
  4.40%   4.40%   0.150s    0.150s   run (topic/background.py:534)
  0.00%   0.00%   0.120s    0.120s   run (flup/server/threadedserver.py:76)
  0.00%   8.79%   0.000s    0.330s   _get_and_deliver (kombu/transport/virtual/base.py:406)
  0.00%   8.79%   0.000s    0.330s   _read_status (http/client.py:268)
  0.00%   8.79%   0.000s    0.330s   _delete_source_id (topic/background.py:116)
  0.00%   0.00%   0.000s    0.120s   run_flup_daemon (common/daemonlauncher.py:89)
  0.00%   4.40%   0.000s    0.080s   POST (topic/items.py:849)
  0.00%   8.79%   0.000s    0.330s   get (kombu/utils/scheduling.py:56)
  0.00%   4.40%   0.000s    0.080s   wrapped_f (common/__init__.py:142)
  0.00%   8.79%   0.000s    0.330s   delete (elasticsearch/client/__init__.py:596)
  0.00%   8.79%   0.000s    0.330s   consume (kombu/mixins.py:197)

Here we can see that a source is being deleted.

Dump a Stacktrace#

Great for these situations:

A process is stuck:

py-spy dump -p 1529

Output:

Thread 1614 (active): "local.3"
    readinto (socket.py:586)
    _read_status (http/client.py:268)
    begin (http/client.py:307)
    getresponse (http/client.py:1346)
    _make_request (urllib3/connectionpool.py:421)
    urlopen (urllib3/connectionpool.py:677)
    perform_request (elasticsearch/connection/http_urllib3.py:229)
    perform_request (elasticsearch/transport.py:362)
    perform_request (common/tracing/elasticsearch_tracing.py:84)
    delete (elasticsearch/client/__init__.py:596)
    _wrapped (elasticsearch/client/utils.py:92)
    retry (index/retry.py:57)
    delete_document (index/writer.py:183)
    delete_document (index/item/writer.py:152)
    update_item (topic/background.py:88)
    _delete_source_id (topic/background.py:116)
    _on_task_delete_source_items (topic/background.py:70)
    _on_task (common/queue.py:1071)
    on_task (common/queue.py:992)
    _process_task (common/queue.py:528)
    <listcomp> (kombu/messaging.py:590)
    receive (kombu/messaging.py:590)
    _receive_callback (kombu/messaging.py:624)
    _callback (kombu/transport/virtual/base.py:633)
    _deliver (kombu/transport/virtual/base.py:983)
    _get_and_deliver (kombu/transport/virtual/base.py:406)
    get (kombu/utils/scheduling.py:56)
    _poll (kombu/transport/virtual/base.py:402)
    drain_events (kombu/transport/virtual/base.py:745)
    _drain_channel (kombu/transport/virtual/base.py:1001)
    get (kombu/utils/scheduling.py:56)
    drain_events (kombu/transport/virtual/base.py:963)
    drain_events (kombu/connection.py:323)
    consume (kombu/mixins.py:197)
    run (kombu/mixins.py:175)
    run (common/background.py:174)
    _bootstrap_inner (threading.py:916)
    _bootstrap (threading.py:884)

The stack is reversed, you can see it’s on a network socket waiting for a response of an Elasticsearch delete command.

Record#

Great for these situations:

  • Investigate slow performance

  • Understanding the bottlenecks during a load test

  • Performance regression testing: New code vs old code

  • Generally pinpointing where time is spent. Note: Time is not equal to CPU load.

py-spy record -p 1529 -o /tmp/flamegraph.svg
Sampling process 100 times a second. Press Control-C to exit.
...
Stopped sampling because Control-C pressed
Wrote flamegraph data to '/tmp/flamegraph.svg'. Samples: 6864 Errors: 0

You can use -f speedscope for a more interactive experience and can analyze the output file on the Speedscope Site.

Output:

Full View:

image1

Zoomed in, we can see that it’s spending a lot of time in the delete item function of Squirro, which in turn is waiting on Elastic due to many roundtrips:

image2

Tips and Caveats#

  • You need root privileges

  • Start recording during a time of action. If you record with py-spy and do nothing on the server, the flamegraph will mostly show flup/kombu doing a whole lot of nothing.

  • For the record, consider using the --function switch, it will show you the function instead of the line number. This can be easier to read if you’re not familiar with the codebase.

  • Don’t forget the --subprocesses flag if you have multi-process services, e.g. topicd, datasourced, machinelearningd, data loader with parallel flag etc.

  • Use the --native flag if C libraries are at play.