API for Caching and Custom State Management#

When building data loader plugins it can be useful to be able to store key-value pairs long term or for storage or caching purposes. For this purpose, two stores are made available to the DataSource Class.


This API provides you with:

  • Ability to temporarily store key-value pairs in the scope of a data loader plugin e.g. for de-duplication, throttling, caching

  • Ability to “permanently” store key-value pairs. e.g. for custom state management used by data loader plugins.

  • Redis and file-system backed implementations.

  • Sane defaults for the command line (file-backed implementation as against to Redis based implementation) for client machines where Redis might not be available.

Using The Stores#

The two stores are made available to the data loader plugin as key_value_cache and key_value_store.

Example of using the caches (shown for key_value_cache, but key_value_store will be analogous):

# Retrieving a value
my_key = self.key_value_cache.get('my_key')

# Setting the value
self.key_value_cache['my_key'] = 'hello world'

Storing The State (key_value_store)#

key_value_store can be thought of as the permanent store of information. It should be used for application critical data.

The most common use case of key_value_store is preserving the state of the last run of the data loader. This is especially critical in long-running data loading jobs, which index thousands of items. Losing the state of what was last loaded can be costly, both in terms of time spent reloading, as well as potential cost incurred as a result of making extra API connections (in the case you have a paid subscription with data provider).

The data written into key_value_store is kept until the user explicitly clears the data. Resetting a data source (using the “Reset” option in the user interface or --reset on the command line) will achieve this.

Caching Responses (key_value_cache)#

key_value_cache on the other hand is used for cases where it’s useful and time-saving to preserve the state of some data, but it is not critical.

A common example is when your loader makes repeated calls to ask some server for metadata. Let’s say this metadata is expected to change on average every few days. So instead of fetching this metadata with every call, you can cache it and re-use the cached response.

The data will be kept for TTL (time to live), which defaults to a week. However the data may be evicted earlier, as Squirro implements an LRU (least-recently used) algorithm for this data, after a specific memory threshold is met.

Store Methods#

The following methods are available on both stores:

  • get: Return the value for the given key or, if the key does not exist, the provided default.

  • put: Store the given value for the key.

  • clear: Remove all stored key/value pairs.

  • del: Remove the given key.

Backend Configuration#

On the Squirro server, when using the data loader user interface, storage for both types is done in Redis.

If you are running the Squirro data loader in the command line mode, you can configure the below mentioned command line options in the squirro_data_load tool to control the implementation of the key value stores. The implementation backend (store_backend) defaults to filesystem in the command line mode.

Key-value store options:

--store-backend {filesystem,redis}
--redis-key-value-store-host REDIS_STORE_SERVER
--redis-key-value-store-port REDIS_STORE_PORT
--redis-key-value-store-password REDIS_STORE_PASSWORD
--redis-key-value-store-db REDIS_STORE_DATABASE

Key-value cache options:

--cache-backend {filesystem,redis}
--redis-key-value-cache-host REDIS_CACHE_SERVER
--redis-key-value-cache-port REDIS_CACHE_PORT
--redis-key-value-cache-password REDIS_CACHE_PASSWORD
--redis-key-value-cache-db REDIS_CACHE_DATABASE

Moreover, the same options can also be set in the dataloader.ini file in your ~/.squirro/ folder as shown below.

# Possible values: `filesystem`, `redis`
store_backend = filesystem

# Only applicable on *nix systems. On windows, the roaming data directory of
# the current user is used
store_directory = ~/.squirro/dataloader_key_value_store

# Redis connection info for the redis instance backing the persistant key value
# store
redis_key_value_store_host = localhost
redis_key_value_store_port = 6379
redis_key_value_store_password =
redis_key_value_store_db = 6

# Possible values: `filesystem`, `redis`
cache_backend = filesystem

# Only applicable on *nix systems. On windows, the roaming data directory of
# the current user is used
cache_directory = ~/.squirro/dataloader_key_value_cache

# Redis connection info for the redis instance backing the key value cache
redis_key_value_cache_host = localhost
redis_key_value_cache_port = 6380
redis_key_value_cache_password =
redis_key_value_cache_db = 6

# TTL for keys stored in the cache
expiry_seconds = 60