Data Virtualization#
The Squirro Data Virtualization add-on extends the capabilities of Retrieval Augmented Generation (RAG) for your AI workflows by providing frictionless access to a vast array of structured and unstructured data, in real-time or near real-time.
Understanding the Data Virtualization Add-on#
The Squirro Data Virtualization add-on leverages modern data virtualization techniques, giving access to a unified view of data from multiple sources. Easily create API endpoints capable of answering queries by rerouting them to the appropriate underlying data sources and allowing you to aggregate, join, and transform data from anywhere in real time. Out of the box, Data Virtualization supports the most widely used databases, data formats, and data lakes.
One of the key benefits of the add-on is that it eliminates the need for data ingestion and indexing.
API Management Interface#
Squirro provides a range of predefined API endpoints and queries that cover common use cases, allowing you to get started quickly. The system offers a lot of flexibitiy, so you can edit the default ones and adjust them according to your specific requirements or create brand new ones to cover specific use cases unique to your organization.
Refer to the in-product documentation for more information on navigating and using the API management interface.
Configuration file#
To activate the add-on, you need to adjust the settings in the configuration file, as described in the following steps. Note that server-level administrator privileges are required to complete these instructions.
Connect to the server via SSH and create the
deployments
directory.
sudo mkdir /etc/squirro/genai.d/deployments
Get the
default.yaml
configuration file associated with your Squirro platform version. If you do not already have it, visit the Squirro Support website and submit a technical support request.Edit the YAML file and locate the following line.
- $runnable: data/tools/raw_labs
Adjust the following settings:
enabled
- Activate the tool by changing the value totrue
.api_prefix
- Add the value for the corresponding branch associated with your API endpoint,"/main/"
for example. Note that the value for the parameter must be in between forward slashes.specs_url
- Insert the URL for the OpenAPI specifications.group_by_tags
- When set totrue
it activates the hierarchical workflow.description
- The input passed to the LLM, informing the model what the tool is and its intended use.tags_descriptions
- List all the tags associated with their description. They direct the LLM to the appropriate API endpoint based on its analysis of the user input.
Save your edits and upload the file to the deployments folder.
Recursively give ownership of the deployments folder to the
sqgenai
user andsquirro
group.
sudo chown -R sqgenai:squirro /etc/squirro/genai.d/deployments
Create the
docker-override.env
file inside the/etc/squirro/genai.d/
directory if it does not already exist. That file stores the environment variables used to configure the add-on. This approach ensures that your environment variables are preserved and continue to work as expected after updates or upgrades of the Squirro components.Edit the
docker-override.env
file and add a new line for the API key. Ensure each line is in the formatVARIABLE_NAME=VALUE
without spaces around the equal sign.
RAW_API_KEY=JT7dZE9cxF4yCZ6heBDzTVJhQywfThisIsAnExample
Save your edits.
Restart the
sqgenaid
service.
sudo systemctl restart sqgenaid
Example#
Here is an extract of a configuration file:
- $runnable: data/tools/raw_labs
enabled: true
api_prefix: "/main/"
specs_url: "https://prod-zurich-212.api.raw-labs.com/raw/2/api/admin/endpoints/open-api?file_format=json&security_scheme=api_key"
group_by_tags: false
description: |
Retrieves information, especially structured data, from a variety of APIs. Use this tool whenever it looks like you need to access real time data or any other data that might be stored in a database.
Data to be obtained from here is:
- Information from our Salesforce.com CRM, including leads / SQLs, customers / accounts, opportunities.
- Information about our physical office spaces, including restaurants in the vicinity.
tags_descriptions:
salesforce: Information about our customers from our Salesforce.com CRM
lead: Access data about leads including sales qualified leads (SQLs)
opportunity: Information about Salesforce opportunities
office: Information about our physical office spaces, restaurants
cluster:
$placeholder: squirro_cluster
env_lookup: "SQUIRRO_CLUSTER"
refresh_token:
$placeholder: squirro_refresh_token
env_lookup: "SQUIRRO_REFRESH_TOKEN"
openai_chat_model:
model:
$placeholder: llm_name
env_lookup: "OPENAI_MODEL"
api_key:
$placeholder: llm_api_key
env_lookup: "OPENAI_API_KEY"
temperature:
$placeholder: llm_temperature
base_url:
$placeholder: llm_base_url
env_lookup: "OPENAI_API_BASE"
top_p:
$placeholder: llm_top_p
provider:
$placeholder: llm_provider
Getting Started#
To activate the Data Virtualization add-on for your Squirro instance, contact Squirro Support and submit a technical support request. Once your system is ready, our solutions engineers will assist with the setup process to ensure a smooth integration and optimal configuration.