OverviewΒΆ

Download it here and install it using pip:

>>> pip install squirro.lib.nlp-SQUIRRO_VERSION-py2.py3-none-any.whl

libNLP is structured as a pipeline where a user can specify a sequence of steps to load and transform unstructured data to then be classified, clustered, etc, and then ultimately saved either to disk (CSV or JSON format) or in Squirro.

The pipeline configuration is specified in JSON format. For example, to train a model on the canonical Iris flower data set, we can use the following:

{
  "dataset": {
    "train": "data/train",
    "test": "data/test"
  },
  "analyzer": {
    "type": "classification",
    "tag_field": "pred_class",
    "label_field": "class"
  },
  "pipeline": [
    {
      "step": "loader",
      "type": "csv",
      "fields": [
        "sepal length",
        "sepal width",
        "petal length",
        "petal width",
        "class"
      ]
    },
    {
      "step": "classifier",
      "type": "sklearn",
      "input_fields": [
        "sepal length",
        "sepal width",
        "petal length",
        "petal width"
      ],
      "label_field": "class",
      "model_type": "SVC",
      "model_kwargs": {
        "probability": true
      },
      "output_field": "pred_class",
      "explanation_field": "explanation"
    }
  ]
}