Data Loader CLI Tool#
Important: Although the Data Loader Command Line Interface (CLI) Tool is still supported, Squirro highly encourages you to load data using the Squirro UI.
The Data Loader CLI Tool is used to load data from multiple sources into Squirro. If you would like to watch any of our detailed video tutorials on data loading and on how to develop data loader plugins, you can jump over to https://learn.squirro.com and have a look.
Label Configuration#
Note
Labels were previously referred to as facets in the Squirro UI. You will still see references to facets in the code, and in some places within the Squirro UI. All facets can be treated as labels.
The Data Loader CLI Tool uses config files for label creation, which has an HJson format (which is a more human-friendly superset of JSON). In the following chapters you can find the description of the attributes used in these files and some examples.
Labels Config File#
Sample File#
Below is an example of a fully valid Hjson file which is not a valid Json. Notice how some of the keys/values are missing quotes and some rows are missing commas a the the end.
{
"InteractionSubject": {
"name": "interaction_subject",
"display_name": "Interaction Subject",
"group_name": "Interactions",
"visible": true,
"searchable":true,
"typeahead":true
},
"InteractionType": {
"name": "interaction_type",
"display_name": "Interaction Type",
"group_name": "Interactions",
"visible": true,
"searchable":true,
"typeahead":false
},
"Date": {
"name": "creation_date",
"display_name": "Date of Creation",
"data_type": "datetime",
"input_format_string": "%Y-%m-%dT%H:%M:%S",
"default_value":"1900/01/01",
"visible": true
},
"EntitledUsers": {
"name": "users",
"display_name": "Users",
"auth": true,
"auth_value_mandatory": true,
"import": false
},
"InternalAttendees": {
"name": "attendees",
"display_name": "Attendees",
"group_name": "Interactions",
"pivotal_group": "Members",
"delimiter": ";",
"visible": true
},
"NoAtendees": {
"name": "number_of_attendees",
"display_name": "Number of Attendees",
"data_type": "int",
"visible": true,
"searchable":false
},
}
Best Practices of Label Naming#
Follow the best practices for label names and have a user friendly, nice looking name in the “display_name” but mind the maximum length mentioned below.
Best practices for the label names:
Only use lower case
Use an underscore as a separator to join multiple words
Avoid using space as a separator
Avoid having any special characters in the label name
Structure: Label Keys and Label Attributes#
In the examples above, the curly brackets at the root of the file open a new dictionary. Within each dictionary there are a number of label keys which, in turn have their own dictionary of key / value pairs - the label attributes.
Label Keys are used to identify data elements in the data source the labels are being applied to, for example the column headings in an Excel spreadsheet.
Label Attributes are used to define the behaviour of the label and how it shold be handled by Squirro. See the reference below for more details.
Character Encoding and Label Keys
Be careful using characters outside of the printable ASCII encoding range, such as umlauts (ü) or diacritics (é) in the label key above. JSON files are normally encoded as UTF-8 while Excel, for example, will default to using ANSI encoding such as Windows-1252. That means if you use a value such as “Zürich” in a label key in your JSON, it will only match the corresponding column in the input data if that data is also encoded as UTF-8. Recent versions of Excel 2016 can now export data to “CVS UTF-8” but if in doubt, simple restrict column names to characters in the printable ASCII range (i.e. avoid umlauts).
Label Attributes Reference#
Each label in the the labels config file may be configured with the following label attributes.
Key |
Data Type |
Default |
Description |
analyzed |
Boolean |
true |
Only applies to data_type string. If set to False the field is not analyzed for searching. It can be used only for display and filtering & aggregation. This speeds up index operations and keeps the index size smaller. |
auth |
Boolean |
false |
If set to ‘true’ for a field, this field will be used to determine user access to the Squirro item. |
auth_value_mandatory |
Boolean |
If value of the field in the data source is empty, the item will not be uploaded. |
|
data_type |
String |
‘string’ |
Data type of the label. One of (‘string’, ‘int’, ‘float’, ‘datetime’,’geo_point’, ‘weighted’). |
default_value |
String |
None |
Default value of the label if value is None. |
delimiter |
String |
Specified for columns that have multiple values, i.e. pivotal and auth columns. |
|
display_name |
String |
Name to show to the user in the front-end. Maximum 50 characters |
|
format_instr |
String |
Formatting instruction for the label value display. This parameter defines how the Squirro UI will display the value. This is only relevant for int, float and dates. Learn more about Labels. |
|
group_name |
String |
Label of the group this label is under. Maximum 50 characters. |
|
import |
Boolean |
true |
For specifying if a column will be loaded as a label or not. |
input_format_string |
String or List of strings |
Format of the date string coming from the source (used for csv and excel). When importing from database, the date values are of type datetime and don’t need a format. Only for datetime columns. Can be specified as a list of Format Strings. The data loader then tries each format string in order and uses the first one that doesn’t result in a parsing error. Please note that the list of format strings is not currently supported on the Squirro UI. You can achieve the same result by adding the Time Formats pipelet in your pipeline workflow. |
|
name |
String |
Name use to display and access the label in the Squirro UI - used to identify labels in the search field for example and on item tags. Maximum 50 characters |
|
pivotal_group |
String |
If this attribute exists then the column is also a pivotal column. That means that the values of that column, delimited by ‘delimiter’, were joined from a detail table to generate only one line in the source. |
|
searchable |
Boolean |
false |
If ‘true’ this label values will be searchable. |
typeahead |
Boolean |
false |
If ‘true’, this label will be included in the typeahead search function of the various search fields. Added with version 2.4.0 (Aspen) |
visible |
Boolean |
true |
If `false` this label will be hidden in the front-end. |
Pipelet Configuration#
The config file used for Pipelets has a strict JSON format. In the following chapters you can find the description of the attributes used in these files and some examples.
Pipelets Configuration File#
When creating a new pipelet it is mandatory to create a config file which references the pipelet itself.
Sample File#
Below is a valid JSON file used for pipelet configuration.
{
"DummyPipeletClass":{
"file_location":"pipelets\dummy_pipelet.py",
"stage":"before templating",
"config": {
"restricted_fields_columns": ["HeaderRestrictedFieldNames", "DetailRestrictedFieldNames"]
}
},
"DummyPipeletClass2":{
"class_name": "DummyPipeletClass"
"file_location":"pipelets\dummy_pipelet.py",
"stage":"after templating",
"config": {
"restricted_fields_columns": ["HeaderRestrictedFieldNames", "DetailRestrictedFieldNames"]
}
}
}
Structure#
The format of the pipelets config file is Json. In the examples above, the curly brackets at the root of the file open a new dictionary. Within each dictionary there are a number of keys, each representing a pipelet class. The attributes of each class will be described below.
Reference#
The pipelets config files contains the following attributes.
Key |
Data Type |
Description |
Notes |
config |
Dictionary |
Contains the attributes of each piplet. This dictionary is passed in as a constructor argument to the pipelet. |
|
file_location |
String |
The location of the Python file. This is mandatory. It can be the full path or the relative location from where the tool is executed |
|
stage |
String |
Specifies when the pipelets consume method will be executed. Possible values:
|
|
class_name |
String |
The name of main python class of the pipelet. The class is expected to be a child class of PipeletV1. If class_name is missing, the key of the top-level dictionary is used to determine class name (DummyPipeletClass in the example above). |
Introduced in version 2.6.5. In earlier version the top-level key was always used. Specifying this attribute allows running the same pipelet multiple times, each with a different config |