CacheDocument
CacheDocument#
- class CacheDocument(func)#
Bases:
object
Decorator for caching documents in NLP steps.
The decorator provides a cache mechanism for NLP steps. It can be used on a method that takes a single document as a parameter and returns the processed document. After such a method returns the result, the value is cached. Next time, the cached document will be returned without reprocessing unless one of three factors changed: step name, step config, input document.
Example:
from squirro.lib.nlp.steps.base import Step from squirro.lib.nlp.utils.cache.services import CacheDocument class DummyStep(Step): ''' Dummy step Parameters: type (str): `dummy` input_fields (list,["body"]): `input_field` output_fields (list,["nlp"]): `output_field` ''' def __init__(self, config) -> None: super().__init__(config) @CacheDocument def process_doc(self, doc): return doc
Steps do not use document cache by default. To enable it, the option “cache_document”: true must be added to the step configuration. In addition, specific time for cache expiration can also be defined in the config.
{ "step": "normalizer", "type": "dummy", "cache_document": true, "document_cache_expiration": 100, "input_fields": [ "input_field" ], "output_fields": [ "output_field" ] },
The cache client which follows the proper interface can be passed to the step using pipeline. First, the decorator tries to find the cache client using pipeline.cache_client attribute. If the pipeline or cache client does not exist, it uses the default memory cache with expiration.
If the input document is not fully processed (attribute fully_processed is False), it will not be cached.
Attributes Summary
Methods Summary
__call__
(self_step, doc)Call self as a function.
Attributes Documentation
- CACHE_PREFIX = 'document_cache_'#
Methods Documentation