RemoteSpacy#

class RemoteSpacy(config)#

Bases: BatchedStep

Step that uses an external API endpoint.

It sends a batch of Document in the shape of {“docs”:LIST_OF_DOCS, “fields”:LIST_OF_FIELDS} to spaCy and returns the annotated batch of Document

Parameters
  • step (str, "external") – external

  • type (str, "remote_spacy") – remote_spacy

  • field_mapping (dict) – Mapping of input field to output field

  • max_concurrent (int, 10) – Maximum concurrent requests

  • language_service_mapping (dict,{}) – Define custom service URL that should be used for the detected language. If the language is not found in the mapping, the default_worker is used and the service URL is read from the configuration service.

  • default_worker (str, "fast") – Default worker that is used to read service URL from the configuration service if detected language is not found in the language_service_mapping.

  • disable_pipes__field (str, None) – Programmatic selection of disabled pipelines (read disabled pipelines from field). Uses disable_pipes__default if empty.

  • disable_pipes__default (list, []) – specified pipes are disabled by default, if no disable_pipes__field specified or value is null

Example

{
    "step": "external",
    "type": "remote_spacy",
    "field_mapping": {"body": "nlp__body", "title": "nlp__title"},
    "language_service_mapping": {
        "en": "http://localhost:8000"
    },
    "default_worker: "fast",
    "disable_pipes__default": ["ner"]
}

Methods Summary

get_client(language, worker)

rtype

SpacyClient

process_batch(batch)

Process a batch of documents.

Methods Documentation

get_client(language, worker)#
Return type

SpacyClient

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters

batch (list(Document)) – List of documents

Returns

List of processed documents

Return type

list(Document)