CacheDocument#

class CacheDocument(func)#

Bases: object

Decorator for caching documents in NLP steps.

The decorator provides a cache mechanism for NLP steps. It can be used on a method that takes a single document as a parameter and returns the processed document. After such a method returns the result, the value is cached. Next time, the cached document will be returned without reprocessing unless one of three factors changed: step name, step config, input document.

Example:

from squirro.lib.nlp.steps.base import Step
from squirro.lib.nlp.utils.cache.services import CacheDocument


class DummyStep(Step):
'''
Dummy step

Parameters:
    type (str): `dummy`
    input_fields (list,["body"]): `input_field`
    output_fields (list,["nlp"]): `output_field`
'''

    def __init__(self, config) -> None:
        super().__init__(config)

    @CacheDocument
    def process_doc(self, doc):
        return doc

Steps do not use document cache by default. To enable it, the option “cache_document”: true must be added to the step configuration. In addition, specific time for cache expiration can also be defined in the config.

{
    "step": "normalizer",
    "type": "dummy",
    "cache_document": true,
    "document_cache_expiration": 100,
    "input_fields": [
        "input_field"
    ],
    "output_fields": [
        "output_field"
    ]
},

The cache client which follows the proper interface can be passed to the step using pipeline. First, the decorator tries to find the cache client using pipeline.cache_client attribute. If the pipeline or cache client does not exist, it uses the default memory cache with expiration.

If the input document is not fully processed (attribute fully_processed is False), it will not be cached.

Attributes Summary

CACHE_PREFIX

Methods Summary

__call__(self_step, doc)

Call self as a function.

Attributes Documentation

CACHE_PREFIX = 'document_cache_'#

Methods Documentation

__call__(self_step, doc)#

Call self as a function.

Return type: Document