RheinInsights Connector

RheinInsights Connector#

The RheinInsights connector indexes enterprise content from a RheinInsights instance into Squirro. It supports document-level access control by syncing the RheinInsights principal and group model into the Squirro ACL layer. The connector setup is wizard-driven: it creates the connector on the RheinInsights side and links it to a Squirro data source in a single flow.

Note

This connector is not available out of the box. It requires a RheinInsights instance installed and configured alongside your Squirro deployment. To set up this integration, visit the Squirro Support website and submit a technical support request.

Setting Up a Connector#

Navigate to the Data tab inside the Server space, click the Configure a New Data Source button, look for the Enterprise section, and click the RheinInsights entry. Squirro redirects you to the connector configuration wizard.
Step 1 - Create connector

Enter the connection details for your RheinInsights instance:
- Base URL
  
  The root URL of your RheinInsights instance (for example, https://rheininsights.example.com).
- Auth Token
  
  The authorization token used to authenticate against the RheinInsights API.
- Connector Title
  
  A human-readable name for this connector.
- Connector ID
  
  A numeric identifier (auto-generated, but can be customized).
- Source Type
  
  The content source type to configure.
- Search Engine
  
  The search engine type to configure.
Click Next. The plugin validates the connector against the RheinInsights API and creates it before proceeding.
Step 2 - Content source

Configure the content source settings. Fields are fetched live from the RheinInsights API for your connector, so the form reflects the exact configuration schema your instance supports. Fill in the required fields and click Next. The plugin validates your input before proceeding.
Step 3 - Search engine

Configure the search engine settings. The Squirro cluster URL, tenant, project ID, and source ID are injected automatically. Fill in any remaining required fields and click Finish & Save. After saving, the plugin redirects back to Squirro and your source begins running. Squirro triggers crawls on the RheinInsights side and starts indexing content.

Server Configuration#

Pre-Filling Wizard Defaults#

You can pre-configure the RheinInsights connection details so that the Base URL and Auth Token fields are pre-filled in the wizard UI for all users. Add the following section to ~/.squirro/frontend.ini on the Squirro frontend server:

[rhein_insights]
base_url = https://rhein-insights.example.com
auth_token = YOUR_AUTH_TOKEN_HERE

If these values are not set, the fields are empty and users must provide them manually through the wizard.

Dataloader Configuration#

The dataloader also requires access to the RheinInsights API to trigger crawl jobs at ingestion time. Add the same section to /etc/squirro/common.ini on your Squirro cluster:

[rhein_insights]
base_url = https://rhein-insights.example.com
auth_token = YOUR_AUTH_TOKEN_HERE

After editing common.ini, restart the relevant services to apply the settings:

sudo systemctl restart sqfrontendd
sudo systemctl restart sqdatasourced

Advanced Options#

The following plugin arguments can be tuned when configuring the data source. All are optional and have sensible defaults for most deployments.

Option	Default	Description
`vendor`	`rheininsights`	Vendor name used as the subdirectory under buffer paths for content and principal state files.
`es_bulk_batch_size`	`1000`	Batch size for bulk-indexing ACL user data into Elasticsearch. Decrease this value if the users belong to many groups on average, to reduce server load and prevent bulk indexing timeouts.
`idle_timeout`	`10`	Seconds to wait after the last new file before stopping the file monitor. Lower values make the plugin react faster but may miss files that arrive slightly late.
`initial_timeout`	`20`	Seconds to wait for the first content file to appear after triggering a crawl. If no file arrives within this window, the crawl is considered empty and ingestion stops.
`max_timeout`	`120`	Hard upper limit in seconds for monitoring content files. The plugin stops regardless of activity once this limit is reached.
`acl_es_index_prefix`	`rhein_insights_users`	Prefix for the Elasticsearch index used to store user/group ACL data. That value is also used when deleting the source to clean up the associated ACL index.

ACL Enforcement#

The RheinInsights connector syncs access control data alongside content. During each ingestion run, a principal crawl is triggered in parallel with the content crawl. Principal state files describe which users belong to which groups. That data is stored in a dedicated Elasticsearch index ({acl_es_index_prefix}-{source_id}) and is used at search time to filter results to only the items a given user is permitted to see.

Each user document in the ACL index stores the user ID and all parent group IDs in the allow_acl field. Users that no longer exist in the principal crawl results are deleted from the index incrementally.