# CacheDocument#

class squirro.lib.nlp.utils.cache.CacheDocument(func)#

Bases: object

Decorator for caching documents in NLP steps.

The decorator provides a cache mechanism for NLP steps. It can be used on a method that takes a single document as a parameter and returns the processed document. After such a method returns the result, the value is cached. Next time, the cached document will be returned without reprocessing unless one of three factors changed: step name, step config, input document.

Example:

from squirro.lib.nlp.steps.base import Step
from squirro.lib.nlp.utils.cache.services import CacheDocument

class DummyStep(Step):
'''
Dummy step

Parameters:
type (str): dummy
input_fields (list,["body"]): input_field
output_fields (list,["nlp"]): output_field
'''

def __init__(self, config) -> None:
super().__init__(config)

@CacheDocument
def process_doc(self, doc):
return doc


Steps do not use document cache by default. To enable it, the option “cache_document”: true must be added to the step configuration. In addition, specific time for cache expiration can also be defined in the config.

{
"step": "normalizer",
"type": "dummy",
"cache_document": true,
"document_cache_expiration": 100,
"input_fields": [
"input_field"
],
"output_fields": [
"output_field"
]
},


The cache client which follows the proper interface can be passed to the step using pipeline. First, the decorator tries to find the cache client using pipeline.cache_client attribute. If the pipeline or cache client does not exist, it uses the default memory cache with expiration.

If the input document is not fully processed (attribute fully_processed is False), it will not be cached.

Attributes Summary

Methods Summary

 __call__(self_step, doc) Call self as a function.

Attributes Documentation

CACHE_PREFIX = 'document_cache_'#

Methods Documentation

__call__(self_step, doc)#

Call self as a function.

Return type

Document