KEE CLI Tool#
Using the kee
command line (cli) tool, you can set up and deploy Known Entity Extraction to a Squirro Project. The tool is included in the Squirro Toolbox, which is available in the Downloads space.
For a comprehensive tutorial about setting up KEE using the command line tool, head over to the KEE CLI Tool Tutorial.
Basic Usage#
The KEE Tool is invoked from a command line window. For example the following command runs the test
suite of a KEE project:
squirro_kee -v test --stats
This example has three components:
-v
is a common argument that turns on verbose logging.test
is the sub-command to run.--stats
is an argument to the test sub-command.
Note
What is a KEE Project?
A Known Entity Extraction project is a single folder on your computer that has all the required data, configuration and code to create a KEE lookup database.
KEE Project Example#
Every KEE Project will include at least the following files:
A CSV file of known entities to look for within the unstructured data
A JSON configuration file
A
db/
folder that includes the lookup database once it is compiled
Many KEE projects will also include additional content such as:
A pipelet file (a python script which performs the known entity extraction)
A
fixtures/
folder that includes example Squirro items used to test the KEE configuration
Configuration#
The basis of a KEE project is the config.json
file. This configuration file must be added to each KEE project before you execute the kee
command.
See the KEE Config Reference for documentation on how to create this file or the KEE Tutorial for a hands-on example.
Common Arguments#
The kee
utility accepts the optional arguments listed below. You must specify any of these arguments before the sub-command and its options.
-h
Show thr help message and exit.
--version
Output the tool version and exit.
--verbose, -v
Increase log verbosity.
Not specified: the tool outputs all warnings and errors.
Specified once or more: informational messages are also output.
Specified twice or more (-vv): debugging messages are shown.
Specified three times or more (-vvv): more information is included in all messages.
--log-file
Path to a log file on disk, where the log output is to be stored. If this is not specified, the log messages are shown on the console.
Sub-Commands#
You always need to specify a sub-command. The following sub-commands exist:
compile
upload
rerun
test
get_fixture
Compile#
squirro_kee -v compile
Compile the lookup database from the input data. (This applies to which ever KEE project is present within the current working directory of the terminal)
This command does not accept any additional arguments. The only usage of this sub-command is shown below:
Upload#
squirro_kee -v upload
Upload the KEE project to a Squirro server. This creates a pipelet on the server which can then be added to any Squirro project.
You must provide the required information in the "squirro"
section of the config.json
file (see here) for this to work.
You can specify the following arguments for the upload
command:
--no-compile
Don’t compile the lookup database. By default the
compile
sub-command is automatically executed when uploading
Rerun#
squirro_kee -v rerun
Re-apply the KEE tagging to the Squirro project. This is used to apply new configuration changes to old items.
You must provide the required information in the "squirro"
section of the config.json
file (see here) for this to work.
You can specify the following arguments for the rerun
command:
--query
The Squirro query for which the KEE extractions should be rerun. Every item that matches this query will be processed.
You can omit the query if the
version
andversion_keyword
parameters of the kee section are defined. In this case, the KEE will run on all items that have not yet been tagged with the current value ofversion
in theversion_keyword
. If the versioning is not set up, you have to submit a query.--no-compile
Don’t compile the lookup database. By default the
compile
sub-command is automatically executed when uploading
Test#
squirro_kee -v test
Run the test suite of the current KEE project. By default the test cases are located in the fixtures
directory. The KEE Testing documentation explains how those test cases can be created.
You can specify the following arguments for the test
command:
[fixtures…]
A list of fixture files that should be run. If this is not specified, all the fixtures are tested. The following example runs the test on just two fixture files:
squirro_kee test fixtures/acme.json fixtures/other_corp.json
--no-compile
Don’t compile the lookup database. By default the
compile
sub-command is automatically executed when running the tests. That can be slow for large databases, in which case you may want to disable the compilation using this flag.--snapshot
Creates a new snapshot from the current test results. Snapshots are stored on disk (in the
snapshots
folder by default) and are used to compare the KEE result quality over time.--snapshot-message, -m
Add a comment to the snapshot. This implies
--snapshot
as well, so the following command is a short version of creating a snapshot and adding a comment:squirro_kee test -m "Tuned ngrams"
--diff
Compare the snapshot to the previous snapshot. This outputs how much better or worse the match quality has become.
--stats
Outputs a summary for all the missed keywords. This provides a quick overview on what kind of entities are not yet detected as they should.
--trace STRING
Turns on detailed logging whenever the given candidate is being processed. For example if the lookup database contains an entry called “Acme Inc” then invoking kee as follows will result in a verbose log file every time that entry is looked at:
squirro_kee test --trace "Acme Inc"
If this option is present, then the
--verbose
flag does not have any effect.
Get fixture#
squirro_kee -v get_fixture 'item-id1' 'item-id2'
Download one or more items from the configured Squirro project and store them in the fixtures
folder. You must provide the required information in the "squirro"
section of the config.json
file (see here) for this to work.
Consult the KEE Testing documentation for information on the fixtures.
You can specify the following arguments for the get_fixture
command:
[item_ids...]
List of item identifiers to download (required).