How To Write a Custom Data Loader Plugin#
This page will describe in detail how you can build a custom loader to work with data formats/inputs that are not supported out of the box.
Prerequisites#
Follow the steps outlined here: Data Loader Command Line Interface Tool Tutorial.
It’s highly encouraged that you begin by installing the Squirro Toolbox package.
It’s also suggested you create a Python virtual environment (to isolate the packages) to work with.
Introduction#
For any new data loader plugin create a new Python file.
The Data Loader Plugin Boilerplate template can be used to get started.
SDK reference#
The plugin is implemented as an instance of the DataSource class. A number of methods must be implemented to provide the intended functionality. These special methods are all documented in the DataSource class.
Frontend-compatible loaders#
Uploading#
To provide a data loader plugin to the user in the user interface, it needs to be uploaded to the server.
This is done using the squirro_asset
command line tool.
The following command is how a data loader plugin can be uploaded:
squirro_asset dataloader_plugin upload --folder pubmed --token $TOKEN --cluster $CLUSTER
Preview#
Apart from technical implementation differences between the command line and frontend data load which are not visible to the users, the main consideration for writing a UI compatible loader is the preview mode.
See Data Loader Plugin Preview for details.
Preview mode is a UI feature that enables the user to have a peak at the data before it is ingested into the system. It allows a preview of the first 10 items. For most use cases this should not present difficulties, but there are a few cases which might result in data loss.
Caching & Data Storage#
Data loader plugins often need to cache information or store certain progress information. For these purposes there are two types of stores that are available to use inside a data loader plugin:
key_value_cache
key_value_store
This is covered in API for Caching and Custom State Management.