KEE CLI Tool#

Using the kee command line (cli) tool, you can set up and deploy Known Entity Extraction to a Squirro Project. The tool is included in the Squirro Toolbox, which is available in the Downloads space.

For a comprehensive tutorial about setting up KEE using the command line tool, head over to the KEE CLI Tool Tutorial.

Basic Usage#

The KEE Tool is invoked from a command line window. For example the following command runs the test suite of a KEE project:

squirro_kee -v test --stats

This example has three components:

  • -v is a common argument that turns on verbose logging.

  • test is the sub-command to run.

  • --stats is an argument to the test sub-command.

Note

What is a KEE Project?

A Known Entity Extraction project is a single folder on your computer that has all the required data, configuration and code to create a KEE lookup database.

KEE Project Example#

Every KEE Project will include at least the following files:

  • A CSV file of known entities to look for within the unstructured data

  • A JSON configuration file

  • A db/ folder that includes the lookup database once it is compiled

Many KEE projects will also include additional content such as:

  • A pipelet file (a python script which performs the known entity extraction)

  • A fixtures/ folder that includes example Squirro items used to test the KEE configuration

Configuration#

The basis of a KEE project is the config.json file. This configuration file must be added to each KEE project before you execute the kee command.

See the KEE Config Reference for documentation on how to create this file or the KEE Tutorial for a hands-on example.

Common Arguments#

The kee utility accepts the optional arguments listed below. You must specify any of these arguments before the sub-command and its options.

-h

Show thr help message and exit.

--version

Output the tool version and exit.

--verbose, -v

Increase log verbosity.

  • Not specified: the tool outputs all warnings and errors.

  • Specified once or more: informational messages are also output.

  • Specified twice or more (-vv): debugging messages are shown.

  • Specified three times or more (-vvv): more information is included in all messages.

--log-file

Path to a log file on disk, where the log output is to be stored. If this is not specified, the log messages are shown on the console.

Sub-Commands#

You always need to specify a sub-command. The following sub-commands exist:

  • compile

  • upload

  • rerun

  • test

  • get_fixture

Compile#

squirro_kee -v compile

Compile the lookup database from the input data. (This applies to which ever KEE project is present within the current working directory of the terminal)

This command does not accept any additional arguments. The only usage of this sub-command is shown below:

Upload#

squirro_kee -v upload

Upload the KEE project to a Squirro server. This creates a pipelet on the server which can then be added to any Squirro project.

You must provide the required information in the "squirro" section of the config.json file (see here) for this to work.

You can specify the following arguments for the upload command:

--no-compile

Don’t compile the lookup database. By default the compile sub-command is automatically executed when uploading

Rerun#

squirro_kee -v rerun

Re-apply the KEE tagging to the Squirro project. This is used to apply new configuration changes to old items. You must provide the required information in the "squirro" section of the config.json file (see here) for this to work.

You can specify the following arguments for the rerun command:

--query

The Squirro query for which the KEE extractions should be rerun. Every item that matches this query will be processed.

You can omit the query if the version and version_keyword parameters of the kee section are defined. In this case, the KEE will run on all items that have not yet been tagged with the current value of version in the version_keyword. If the versioning is not set up, you have to submit a query.

--no-compile

Don’t compile the lookup database. By default the compile sub-command is automatically executed when uploading

Test#

squirro_kee -v test

Run the test suite of the current KEE project. By default the test cases are located in the fixtures directory. The KEE Testing documentation explains how those test cases can be created.

You can specify the following arguments for the test command:

[fixtures…]

A list of fixture files that should be run. If this is not specified, all the fixtures are tested. The following example runs the test on just two fixture files:

squirro_kee test fixtures/acme.json fixtures/other_corp.json
--no-compile

Don’t compile the lookup database. By default the compile sub-command is automatically executed when running the tests. That can be slow for large databases, in which case you may want to disable the compilation using this flag.

--snapshot

Creates a new snapshot from the current test results. Snapshots are stored on disk (in the snapshots folder by default) and are used to compare the KEE result quality over time.

--snapshot-message, -m

Add a comment to the snapshot. This implies --snapshot as well, so the following command is a short version of creating a snapshot and adding a comment:

squirro_kee test -m "Tuned ngrams"
--diff

Compare the snapshot to the previous snapshot. This outputs how much better or worse the match quality has become.

--stats

Outputs a summary for all the missed keywords. This provides a quick overview on what kind of entities are not yet detected as they should.

--trace STRING

Turns on detailed logging whenever the given candidate is being processed. For example if the lookup database contains an entry called “Acme Inc” then invoking kee as follows will result in a verbose log file every time that entry is looked at:

squirro_kee test --trace "Acme Inc"

If this option is present, then the --verbose flag does not have any effect.

Get fixture#

squirro_kee -v get_fixture 'item-id1' 'item-id2'

Download one or more items from the configured Squirro project and store them in the fixtures folder. You must provide the required information in the "squirro" section of the config.json file (see here) for this to work.

Consult the KEE Testing documentation for information on the fixtures.

You can specify the following arguments for the get_fixture command:

[item_ids...]

List of item identifiers to download (required).