Base types¶
Introduces the fundament of libNLP.
Document¶
Document class
-
class
squirro.lib.nlp.document.
Document
(doc_id, fields=None, skipped=False, copy_fields=True, fully_processed=True)¶ A Document is the internal representation of data inside libNLP. Documents are streamed through a
Pipeline
with eachStep
acting on it in-memory accordingly.- Parameters
Example:
from squirro.lib.nlp.document import Document document = Document(0, {"a_field": "this is a field"})
Pipeline¶
Pipeline class
-
class
squirro.lib.nlp.pipeline.
Pipeline
(configs, path=None, cache_client=None, ml_workflow_id=None)¶ The
Pipeline
class is defined by a sequential list ofStep
. It handlesDocument
streaming through the eachStep
, as well as loading and savingStep
configurations.- Parameters
Example:
from squirro.lib.nlp.pipeline import Pipeline steps = [{ "step": "normalizer", "type": "punctuation", "fields": ["a"] }] pipeline = Pipeline(steps, path='.') documents = [Document(0, {"a_field": "this is a field!"})] documents = pipeline.process(documents) print(documents)
-
terminate
()¶ Terminate any extra running processes in the steps
Runner¶
Runner class
-
class
squirro.lib.nlp.runner.
Runner
(config, ml_workflow_id=None)¶ The
Runner
controls libNLP runs. It provides train, test, infer, and clean functions.- Parameters
Example:
from squirro.lib.nlp.runner import Runner config = { "dataset": { "train": "data/train", "test": "data/test", "infer": "data/infer" }, "analyzer": { "type": "classification", "tag_field": "pred_class", "label_field": "class" }, "pipeline": [ { "step": "loader", "type": "csv", "fields": ["sepal length", "sepal width", "petal length", "class"] }, { "step": "classifier", "type": "sklearn", "input_fields": ["sepal length", "sepal width", "petal length", "petal width"], "label_field": "class", "model_type": "SVC", "model_kwargs": {"probability": True}, "output_field": "pred_class", "explanation_field": "explanation" } ] } runner = Runner(config) try: for doc in runner.train(): print(doc) print(runner.test()) for doc in runner.infer(): print(doc) finally: runner.clean()
-
property
cache_client
¶ Placeholder for the Runner’s cache client.
Subclasses can implement their own cache client using the property.
- Return type