High Performance Configuration

High Performance Configuration#

Within a production deployment, a Squirro installation can be further optimized for improved performance.

This page describes various ways system administrators can improve performance.

Note

There are no hard and fast rules for performance tuning. The best way to tune a system is to measure it and then make changes based on the results.

Elasticsearch Tuning#

One of the most important aspects of performance tuning is to ensure that Elasticsearch is optimized correctly.

Elasticsearch Best Practices#

Elasticsearch itself publishes documentation on best practices for performance tuning.

Note

Some of these recommendations may not apply to your setup or may need to be executed on the backend by Squirro Solutions Engineers. Contact Squirro Support if there’s a specific item you’d like help with or to discuss.

Tune for Search Speed#

Elasticsearch’s How to Tune for Search Speed guide includes the following recommendations:

Give memory to the filesystem cache.
Avoid page cache thrashing by using modest readahead values on Linux.
Use faster hardware.
Search as few fields as possible.
Consider mapping identifiers as keyword (Non-analyzed labels in Squirro).
Avoid scripts.
Force-merge read-only indices.
Warm up global ordinals and and the filesystem cache.

Reference: To learn more about these recommendations, see Elasticsearch’s official How to Tune for Search Speed Guide.

Tune for Indexing Speed#

Elasticsearch’s How to Tune for Indexing Speed guide includes the following recommendations:

Unset or increase the refresh interval.
Disable replicas for initial loads.
Disable swapping.
Give memory to the filesystem cache.
Use auto-generated ids.
Use faster hardware.
Ensure correct indexing buffer size.
Use cross-cluster replication to prevent searching from stealing resources from indexing.
Avoid hot spotting.

Reference: To learn more about these recommendations, see Elasticsearch’s official How to Tune for Indexing Speed Guide.

Squirro Search Settings#

Squirro offers a set of configurable options via Setup → Setting → Project Configuration that allow you to fine-tune various aspects of search and aggregation behavior to optimize the search performance.

Search Settings#

The search settings control the behavior of Elasticsearch queries executed by Squirro.

You can, for example, define the number of concurrent search requests.

Or if you don’t need information about the exact number of matched documents, you can increase the search speed by specifying the maximum number of documents to collect or the number of hits matching the query to count accurately.

Reference: topic.search.search-settings

pydantic model SearchConfig#

Fields:

max_concurrent_shard_requests (int)
profile (bool)
request_cache (bool)
request_timeout (int)
terminate_after (int)
track_total_hits (str)

field max_concurrent_shard_requests: int = 6#: Defines the number of concurrent shard requests per node this search executes concurrently.

field profile: bool = False#: Provides detailed timing information about the execution of individual components in a search request.

field request_cache: bool = True#: If true, the caching of search results is enabled for requests where size is 0.

field request_timeout: int = 35#: Specifies the period of time to wait for a response from each shard.

field terminate_after: int = 0#: Maximum number of documents to collect for each shard. To disable terminate query execution early, set the value to 0.

field track_total_hits: str = '500000'#: Number of hits matching the query to count accurately. If true, the exact number of hits is returned at the cost of some performance. If false, the response does not include the total number of hits matching the query.

Aggregation Settings#

The aggregation settings govern how Squirro performs Elasticsearch aggregations.

For displaying statistics about your data, you typically don’t need to aggregate all documents, only a sufficient representation of the whole index.

In this case, you can apply the sampler to the aggregations so that Elasticsearch limits the number of aggregated documents.

Reference: topic.search.agg-settings

pydantic model AggregationsConfig#

Fields:

random_sampler_probability (float)
sample_shard_size (int)

field random_sampler_probability: Annotated[float] = 1#

The probability that a document will be included in the aggregated data. Must be less than or equal to 0.5 or exactly 1.0. To disable random sampling set the value to 0.0.

Constraints:

le = 1.0

field sample_shard_size: int = 0#: Limits how many top-scoring documents are collected in the sample processed on each shard. The options is ignored if random sampling is used. To disable aggregation sampling set the value to 0.

Caching#

Various caches can be tuned to improve performance, including the following:

Authentication Cache
Sources Cache
Facets Mapping Cache
Query Cache

Authentication Cache#

The topic and provider APIs will contact the user API on every request to validate the provided authentication or refresh token and its access to the requested project.

To avoid this, the auth_cache can be enabled with the following lines in /etc/squirro/common.ini:

[caches]
auth_cache = {"type": "MemoryLRU", "max_items": 1000, "timeout": "5m"}

Caveats

Only successful authentications are cached.
Tokens that expire will be valid for up to this many seconds beyond their actual expiration.
User permission changes in projects will take up to this many seconds to refresh.

Sources Cache#

The topic API needs to check the existing sources, facets, and other information about the project when responding to queries.

While not a big overhead, that can still add up.

To enable a cache for these lookups, the project metadata cache can be set up in /etc/squirro/topic.ini as follows:

[caches]
metadata_cache = {"type": "MemoryLRU", "timeout": "5m"}

Caveats

New sources will be delayed: items for these sources will not be visible for up to this many seconds.
New facets will not work correctly for querying and when displaying items for up to this many seconds.
Will impact the adding of new communities: items will not be tagged with newly added communities for up to this many seconds.

Query Cache#

Query responses are by default cached in Redis.

When applying access control, the cache is relatively worthless and still takes considerable resources.

As a result, it may be faster to disable this cache in /etc/squirro/common.ini as follows:

[index]
query_results_cache_enabled = false

Leveraging Multiple CPU Cores#

Out of the box, all Squirro services can only leverage one CPU core.

This makes sense on smaller machines and in development environments, as Squirro consists of many small services.

However, under high load on a production server with lots of CPU cores (e.g. 8+) this can become a limiting factor.

If response times get slow and you observe Squirro python processes at 100%, it might be time to allow the service to fork multiple processes.

How to Fork Multiple Processes#

Squirro services can be configured to fork multiple processes in two ways:

Using the Gunicorn web server.
Using the flup web server and the Apache MPM Prefork framework.

While forking is possible for most Squirro services, not all services have been thoroughly tested with forking.

The following services have been tested and support forking:

provider
machinelearning

Gunicorn Web Server#

Starting with version 3.10.3, it is possible to use Gunicorn as a web server. Gunicorn can be used to serve Squirro services that are both based on WSGI and ASGI.

Leveraging Gunicorn as a web server provides the ability to spawn multiple worker processes or threads for a specific service. This effectively enhances the service’s ability to handle a larger volume of requests. However, it’s important to be aware that this increased capacity comes with the trade-off of higher resource consumption.

To use Gunicorn, add the following section to the services config file, e.g. /etc/squirro/topic.ini:

[server]
web_server = gunicorn

[gunicorn]
workers = 2
threads = 1

Then, restart the service.

After this change, you will see three topic service processes: one main process and two workers.

Reference: The [gunicorn] section accepts any valid settings for Gunicorn. Gunicorn also provides documentation about choosing the number of workers and the number threads.

Warning

For services based on WSGI, the default Gunicorn workers are synchronous. For ASGI services, Gunicorn is configured to use uvicorn workers. While Gunicorn allows using multiple threads per worker for gthread workers (i.e. synchronous workers), it is not recommended to use more than one thread per uvicorn worker (as it is an asynchronous worker, and might lead to locking). For that reason, the threads setting is ignored for ASGI services.

Warning

For services based on WSGI, serving with Gunicorn will require changing the service’s Nginx configuration to use proxy_pass instead of FastCGI. The topic and frontend services already have such a configuration under /etc/nginx/conf.d/<service-name>-proxypass.conf[.disabled] (having .disabled as a suffix ensures that Nginx does not pick up the configuration). For other services, you will need to create a similar configuration file.

Warning

Currently, whenever a Gunicorn worker starts, all of the service’s setup and background logic is executed. While this may not be the most efficient approach, it ensures that each worker maintains a consistent context. We are actively exploring ways to optimize this process without compromising the consistency of the worker contexts.

flup Web Server#

The default server used by Squirro services that are based on WSGI is flup, which provides a preforked server model conceptually similar to Apache httpd.

To fork multiple processes, add the following section to the services config file, e.g. /etc/squirro/topic.ini:

[server]
fork = true
min_spare = 2
max_spare = 2

Then, restart the service.

After this change, you will see three topic service processes: one main process and two workers.

Reference: Learn more about [server] options by reading Apache MPM Prefork documentation.

Tip

Be judicious with increases. Try two to start, measure, then increase further if needed.

FastAPI-based Services#

Since version 3.9.5, the machinelearning service has migrated to utilize the FastAPI web framework. Squirro services that employ FastAPI are served by default by Uvicorn, an ASGI web server. Consequently, they do not support any options listed in the [server] section above (such as fork, min_spare, and max_spare). Although these options can be configured, they will have no effect.

To leverage multi-core CPUs, it is recommended to set the web_server to gunicorn and increase the number of workers, rather than using uvicorn with multiple workers, as Uvicorn currently does not restart terminated workers.

In any case, it is possible to spawn multiple Uvicorn workers by adding the following section to the services config file, e.g. /etc/squirro/machinelearning.ini:

[server]
uvicorn_workers = 2

For Squirro versions 3.10.1 and earlier, it is also possible to utilize the Gunicorn web server instead of Uvicorn for FastAPI-based services. For these versions, Gunicorn is configured in a different section of the service configuration file and only supports the following options (which can be omitted, to use the default values):

[server]
web_server = gunicorn
uvicorn_workers = 2
max_requests = 1000
max_requests_jitter = 50
timeout = 60

High Performance Configuration

Contents

High Performance Configuration#

Elasticsearch Tuning#

Elasticsearch Best Practices#

Tune for Search Speed#

Tune for Indexing Speed#

Squirro Search Settings#

Search Settings#

Aggregation Settings#

Caching#

Authentication Cache#

Sources Cache#

Facets Mapping Cache#

Query Cache#

Leveraging Multiple CPU Cores#

How to Fork Multiple Processes#

Gunicorn Web Server#

flup Web Server#

FastAPI-based Services#