Python Performance Profiling Using PySpy#
Intro#
Squirro is primarily written in Python. If any aspect of Squirro runs slower than expected, or takes more CPU than expected, or something is hanging for a very long time without a good reason, then PySpy is a great utility to investigate what is going on.
In general: Any problem report to Support or Engineering related to performance or hangs will yield much faster results with a PySpy Top snapshot or a dump or a flamegraph.
Home#
See the GitHub Sampling Profiler for Python Programs.
Installation#
Squirro recommends working with Python virtual environments to ensure a clean and isolated setup for your projects. For more information regarding virtual environments in Python, refer to the documentation on the python.org website. To get started with PySpy, first connect to your Squirro instance using the command line interface, then install PySpy using the following commands:
Create a virtual environement:
python3.11 -m venv ~/pyspy
Activate the virtual environement:
source ~/pyspy/bin/activate
Install PySpy:
pip install py-spy
Before You Start#
To begin, you’ll need a PID of a target process:
Example:
#ps -elf | grep <service>d
ps -elf | grep topicd
Common Flags#
-p, --pid <pid> PID of a running python program to spy on
--full-filenames Show full Python filenames
-r, --rate <rate> The number of samples to collect per second [default: 100]
-s, --subprocesses Profile subprocesses of the original process
-n, --native Collect stack traces from native extensions written in Cython, C or C++
TOP#
Great for these situations:
What is this service doing?
Why is a service/process slow?
py-spy top -p 29121
Output:
Total Samples 991
GIL: 2.20%, Active: 19.78%, Threads: 16
%Own %Total OwnTime TotalTime Function (filename:line)
8.79% 8.79% 0.330s 0.330s readinto (socket.py:586)
4.40% 4.40% 0.150s 0.150s run (topic/background.py:534)
0.00% 0.00% 0.120s 0.120s run (flup/server/threadedserver.py:76)
0.00% 8.79% 0.000s 0.330s _get_and_deliver (kombu/transport/virtual/base.py:406)
0.00% 8.79% 0.000s 0.330s _read_status (http/client.py:268)
0.00% 8.79% 0.000s 0.330s _delete_source_id (topic/background.py:116)
0.00% 0.00% 0.000s 0.120s run_flup_daemon (common/daemonlauncher.py:89)
0.00% 4.40% 0.000s 0.080s POST (topic/items.py:849)
0.00% 8.79% 0.000s 0.330s get (kombu/utils/scheduling.py:56)
0.00% 4.40% 0.000s 0.080s wrapped_f (common/__init__.py:142)
0.00% 8.79% 0.000s 0.330s delete (elasticsearch/client/__init__.py:596)
0.00% 8.79% 0.000s 0.330s consume (kombu/mixins.py:197)
Here we can see that a source is being deleted.
Dump a Stacktrace#
Great for these situations:
A process is stuck:
py-spy dump -p 1529
Output:
Thread 1614 (active): "local.3"
readinto (socket.py:586)
_read_status (http/client.py:268)
begin (http/client.py:307)
getresponse (http/client.py:1346)
_make_request (urllib3/connectionpool.py:421)
urlopen (urllib3/connectionpool.py:677)
perform_request (elasticsearch/connection/http_urllib3.py:229)
perform_request (elasticsearch/transport.py:362)
perform_request (common/tracing/elasticsearch_tracing.py:84)
delete (elasticsearch/client/__init__.py:596)
_wrapped (elasticsearch/client/utils.py:92)
retry (index/retry.py:57)
delete_document (index/writer.py:183)
delete_document (index/item/writer.py:152)
update_item (topic/background.py:88)
_delete_source_id (topic/background.py:116)
_on_task_delete_source_items (topic/background.py:70)
_on_task (common/queue.py:1071)
on_task (common/queue.py:992)
_process_task (common/queue.py:528)
<listcomp> (kombu/messaging.py:590)
receive (kombu/messaging.py:590)
_receive_callback (kombu/messaging.py:624)
_callback (kombu/transport/virtual/base.py:633)
_deliver (kombu/transport/virtual/base.py:983)
_get_and_deliver (kombu/transport/virtual/base.py:406)
get (kombu/utils/scheduling.py:56)
_poll (kombu/transport/virtual/base.py:402)
drain_events (kombu/transport/virtual/base.py:745)
_drain_channel (kombu/transport/virtual/base.py:1001)
get (kombu/utils/scheduling.py:56)
drain_events (kombu/transport/virtual/base.py:963)
drain_events (kombu/connection.py:323)
consume (kombu/mixins.py:197)
run (kombu/mixins.py:175)
run (common/background.py:174)
_bootstrap_inner (threading.py:916)
_bootstrap (threading.py:884)
The stack is reversed, you can see it’s on a network socket waiting for a response of an Elasticsearch delete
command.
Record#
Great for these situations:
Investigate slow performance
Understanding the bottlenecks during a load test
Performance regression testing: New code vs old code
Generally pinpointing where time is spent. Note: Time is not equal to CPU load.
py-spy record -p 1529 -o /tmp/flamegraph.svg
Sampling process 100 times a second. Press Control-C to exit.
...
Stopped sampling because Control-C pressed
Wrote flamegraph data to '/tmp/flamegraph.svg'. Samples: 6864 Errors: 0
You can use -f speedscope
for a more interactive experience and can analyze the output file on the Speedscope Site.
Output:
Full View:
Zoomed in, we can see that it’s spending a lot of time in the delete item
function of Squirro, which in turn is waiting on Elastic due to many roundtrips:
Tips and Caveats#
You need root privileges
Start recording during a time of action. If you record with py-spy and do nothing on the server, the flamegraph will mostly show flup/kombu doing a whole lot of nothing.
For the record, consider using the
--function
switch, it will show you the function instead of the line number. This can be easier to read if you’re not familiar with the codebase.Don’t forget the
--subprocesses
flag if you have multi-process services, e.g. topicd, datasourced, machinelearningd, data loader with parallel flag etc.Use the
--native
flag if C libraries are at play.