Python Performance Profiling Using PySpy#
Intro#
Squirro is primarily written in Python. If any aspect of Squirro runs slower than expected, or takes more CPU than expected, or something is hanging for a very long time without a good reason, then PySpy is a great utility to investigate what is going on.
In general: Any problem report to Support or Engineering related to performance or hangs will yield much faster results with a PySpy Top snapshot or a dump or a flamegraph.
Home#
See the GitHub Sampling Profiler for Python Programs.
Installation#
pip install py-spy
Note: On a Squirro server, it’s best to create a dedicated Virtualenv:
/opt/rh/rh-python36/root/bin/virtualenv ~/pyspy
source ~/pyspy/bin/activate
pip install py-spy
Before You Start#
To begin, you’ll need a PID of a target process:
Example:
#ps -elf | grep <service>d
ps -elf | grep topicd
Common Flags#
-p, --pid <pid> PID of a running python program to spy on
--full-filenames Show full Python filenames
-r, --rate <rate> The number of samples to collect per second [default: 100]
-s, --subprocesses Profile subprocesses of the original process
-n, --native Collect stack traces from native extensions written in Cython, C or C++
TOP#
Great for these situations:
What is this service doing?
Why is a service/process slow?
py-spy top -p 29121
Output:
Total Samples 991
GIL: 2.20%, Active: 19.78%, Threads: 16
%Own %Total OwnTime TotalTime Function (filename:line)
8.79% 8.79% 0.330s 0.330s readinto (socket.py:586)
4.40% 4.40% 0.150s 0.150s run (topic/background.py:534)
0.00% 0.00% 0.120s 0.120s run (flup/server/threadedserver.py:76)
0.00% 8.79% 0.000s 0.330s _get_and_deliver (kombu/transport/virtual/base.py:406)
0.00% 8.79% 0.000s 0.330s _read_status (http/client.py:268)
0.00% 8.79% 0.000s 0.330s _delete_source_id (topic/background.py:116)
0.00% 0.00% 0.000s 0.120s run_flup_daemon (common/daemonlauncher.py:89)
0.00% 4.40% 0.000s 0.080s POST (topic/items.py:849)
0.00% 8.79% 0.000s 0.330s get (kombu/utils/scheduling.py:56)
0.00% 4.40% 0.000s 0.080s wrapped_f (common/__init__.py:142)
0.00% 8.79% 0.000s 0.330s delete (elasticsearch/client/__init__.py:596)
0.00% 8.79% 0.000s 0.330s consume (kombu/mixins.py:197)
Here we can see that a source is being deleted.
Dump a Stacktrace#
Great for these situations:
A process is stuck:
py-spy dump -p 1529
Output:
Thread 1614 (active): "local.3"
readinto (socket.py:586)
_read_status (http/client.py:268)
begin (http/client.py:307)
getresponse (http/client.py:1346)
_make_request (urllib3/connectionpool.py:421)
urlopen (urllib3/connectionpool.py:677)
perform_request (elasticsearch/connection/http_urllib3.py:229)
perform_request (elasticsearch/transport.py:362)
perform_request (common/tracing/elasticsearch_tracing.py:84)
delete (elasticsearch/client/__init__.py:596)
_wrapped (elasticsearch/client/utils.py:92)
retry (index/retry.py:57)
delete_document (index/writer.py:183)
delete_document (index/item/writer.py:152)
update_item (topic/background.py:88)
_delete_source_id (topic/background.py:116)
_on_task_delete_source_items (topic/background.py:70)
_on_task (common/queue.py:1071)
on_task (common/queue.py:992)
_process_task (common/queue.py:528)
<listcomp> (kombu/messaging.py:590)
receive (kombu/messaging.py:590)
_receive_callback (kombu/messaging.py:624)
_callback (kombu/transport/virtual/base.py:633)
_deliver (kombu/transport/virtual/base.py:983)
_get_and_deliver (kombu/transport/virtual/base.py:406)
get (kombu/utils/scheduling.py:56)
_poll (kombu/transport/virtual/base.py:402)
drain_events (kombu/transport/virtual/base.py:745)
_drain_channel (kombu/transport/virtual/base.py:1001)
get (kombu/utils/scheduling.py:56)
drain_events (kombu/transport/virtual/base.py:963)
drain_events (kombu/connection.py:323)
consume (kombu/mixins.py:197)
run (kombu/mixins.py:175)
run (common/background.py:174)
_bootstrap_inner (threading.py:916)
_bootstrap (threading.py:884)
The stack is reversed, you can see it’s on a network socket waiting for a response of an Elasticsearch delete
command.
Record#
Great for these situations:
Investigate slow performance
Understanding the bottlenecks during a load test
Performance regression testing: New code vs old code
Generally pinpointing where time is spent. Note: Time is not equal to CPU load.
py-spy record -p 1529 -o /tmp/flamegraph.svg
Sampling process 100 times a second. Press Control-C to exit.
...
Stopped sampling because Control-C pressed
Wrote flamegraph data to '/tmp/flamegraph.svg'. Samples: 6864 Errors: 0
You can use -f speedscope
for a more interactive experience and can analyze the output file on the Speedscope Site.
Output:
Full View:
Zoomed in, we can see that it’s spending a lot of time in the delete item
function of Squirro, which in turn is waiting on Elastic due to many roundtrips:
Tips and Caveats#
You need root privileges
Start recording during a time of action. If you record with py-spy and do nothing on the server, the flamegraph will mostly show flup/kombu doing a whole lot of nothing.
For the record, consider using the
--function
switch, it will show you the function instead of the line number. This can be easier to read if you’re not familiar with the codebase.Don’t forget the
--subprocesses
flag if you have multi-process services, e.g. topicd, datasourced, machinelearningd, data loader with parallel flag etc.Use the
--native
flag if C libraries are at play.