Troubleshooting and FAQ#

Profile: Project Creator, Python Engineer

This page contains some common issues and questions regarding pipelets.

Project creators are typically responsible for resolving issues with pipelets, though python engineers may also be involved in the writing of pipelets.

Pipelet Not Uploading#

If you are having trouble uploading a pipelet, you may receive an error message similar to the following:

ERROR: Could not upload pipelet: b’{“error”: “{\“error\”: \“Unknown pipelet loading error\”}”}’

If this is the case, try following the steps below to identify and resolve the issue:

  1. Check the outputs of the pipelet validate and pipeline upload commands for any errors as follows:

pipelet -vvvv validate <your pipelet name>.py
pipelet -vvvv upload --token <your_token> --cluster <cluster> pipelet.py "Pipelet Name"
  1. Look through the output(s) and try to identify any errors. If you are unable to identify the error, proceed to the next step.

  2. Check the plumber.log for errors. You can check the logs by SSHing into your instance:

$ ssh username@host

then checking the log files:

[user@host] $ less /var/log/squirro/plumber/plumber.log

Note

Alternatively, you may be able to catch it as it happens by attempting to upload again using [user@host] $ squirro_tail_errors.

  1. At this point, the error should be identified and you can resolve it. If it is not, see potential Module Errors troubleshooting steps below.

Module Errors#

Oftentimes, errors found in the plumber logs relate to a module that is not installed on the server, or if modules are versioned, an incorrectly versioned module.

To identify these issues, log on to the remote server as shown in the section above, then follow the steps below:

  1. Check which modules are already installed on the server python environment by running the following:

[user@host] $ squirro_activate
  1. List packages using the following command:

[(virtualenv38) user@host] $ pip list | column
  1. Verify if you are importing any package NOT listed. If this is the case (i.e. if you are importing an unlisted package), contact Squirro Support for assistance with installing the module.

500 Errors#

If you are receiving 500 errors, there is likely an NGINX issue. This is particularly true of servers running heavy loads where timeouts are more likely.

If you believe this may be the case, you can try increasing the timeout value by following the steps below:

  1. Modify step_plumber_timeout_secs in /etc/squirro/ingester.ini under the [pipeline] heading.

  2. For NGINX (the Squirro webserver), open /etc/nginx/conf.d/plumber-fastcgi.inc and edit the value for fastcgi_read_timeout. Its default is 3600, so you can increase to between 1.5x (5400) and 2x (7200) as a potential starting point.

Reference: To learn more, see the Pipeline section of the commin.ini config documentation.

If this doesn’t resolve the issue, contact Squirro Support for assistance.

Questions#

How can I remove a facet from an item using a pipelet?

You can remove a facet within a pipelet by setting it to an empty list:

item["keywords"][facetName] = []

where facetName is the name of the facet you want to remove.

My pipelet is validated but upload fails.

The pipelet validate throws no error but when uploading the pipelet, I see the following error:

2021-12-06 12:43:16,023 ERROR: Could not upload pipelet: b'{"error": "{\\"error\\": \\"Unknown pipelet loading error\\"}"}'

Check the /var/log/squirro/plumber/plumber.log file for errors. Likely it will give you a hint on the error, for example, a failing import of a custom library that needs to be installed first.

My pipelet is not executed when run by a data loader.

Check the following components.

Configuration file:

Make sure to reference the correct pipelet configuration file from your load.sh script. For example add:

--pipelets-file 'pipelets.json' \

Item transformation:

If you see the following error:

INFO:squirro_data_load[23476]:--pipelets-file is ignored when item transformation takes place on the server. Please add the pipelet as a step of the workflow.

make sure to include the transform step in your load.sh script:

--transform-items \

Consume method:

In order to use pipelets with the data loader, the consume() method must be implemented. The consume_multiple() method is only executed when the pipelet is uploaded to the server and included in the pipeline workflow.