NlpServiceSpacy#

class squirro.lib.nlp.steps.external.NlpServiceSpacy(config)#

Bases: squirro.lib.nlp.steps.batched_step.BatchedStep

Step that uses an external API endpoint.

It sends a batch of Document in the shape of {“docs”:LIST_OF_DOCS, “fields”:LIST_OF_FIELDS} to spaCy and returns the annotated batch of Document

Parameters
  • type (str, "external") – remote_spacy

  • step (str, "nlp_service_spacy") – remote_spacy

  • field_mapping (dict) – mapping of input field to output field

  • endpoint (str, None) – Custom NLP Service endpoint to be invoked. If not defined the default endpoint loaded from the ini file is used.

  • max_concurrent (int, 10) – maximum concurrent requests

  • language_processor_mapping (dict,{}) – Define what spacy processor should be used for the detected language (fields.language). Note: This option bypasses pipeline__field setting.

  • pipeline__field (str, None) – Programmatic selection of invoked spacy processor (read processor name from field). Uses pipeline__default if empty.

  • pipeline__default (str, None) – default pipeline, if no pipeline__field specified or value is null

  • disable_pipes__field (str, None) – Programmatic selection of disabled pipelines (read disabled pipelines from field). Uses disable_pipes__default if empty.

  • disable_pipes__default (list, []) – specified pipes are disabled by default, if no disable_pipes__field specified or value is null

Example

{
    "name": "nlp_service_spacy",
    "step": "nlp_service_spacy",
    "type": "nlp_service_spacy",
    "field_mapping": {"body": "nlp__body", "title": "nlp__title"},
    language_processor_mapping": {
        "de": "de:fast",
        "en": "en:fast"
    },
    "pipeline__default": "en:fast",
    "disable_pipes__default": ["ner"]
}

Methods Summary

process_batch(batch)

Process a batch of documents.

Methods Documentation

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters

batch (list(Document)) – List of documents

Returns

List of processed documents

Return type

list(Document)