class squirro.lib.nlp.utils.cache.CacheDocument(func)

Bases: object

Decorator for caching documents in NLP steps.

The decorator provides a cache mechanism for NLP steps. It can be used on a method that takes a single document as a parameter and returns the processed document. After such a method returns the result, the value is cached. Next time, the cached document will be returned without reprocessing unless one of three factors changed: step name, step config, input document.


from squirro.lib.nlp.steps.base import Step
from import CacheDocument

class DummyStep(Step):
Dummy step

    type (str): `dummy`
    input_fields (list,["body"]): `input_field`
    output_fields (list,["nlp"]): `output_field`

    def __init__(self, config) -> None:

    def process_doc(self, doc):
        return doc

Steps do not use document cache by default. To enable it, the option “cache_document”: true must be added to the step configuration. In addition, specific time for cache expiration can also be defined in the config.

    "step": "normalizer",
    "type": "dummy",
    "cache_document": true,
    "document_cache_expiration": 100,
    "input_fields": [
    "output_fields": [

The cache client which follows the proper interface can be passed to the step using pipeline. First, the decorator tries to find the cache client using pipeline.cache_client attribute. If the pipeline or cache client does not exist, it uses the default memory cache with expiration.

If the input document is not fully processed (attribute fully_processed is False), it will not be cached.

Attributes Summary


Methods Summary

__call__(self_step, doc)

Call self as a function.

Attributes Documentation

CACHE_PREFIX = 'document_cache_'

Methods Documentation

__call__(self_step, doc)

Call self as a function.

Return type