Data Virtualization

Data Virtualization#

The Squirro Data Virtualization add-on extends the capabilities of Retrieval Augmented Generation (RAG) for your AI workflows by providing frictionless access to a vast array of structured and unstructured data, in real-time or near real-time.

Understanding the Data Virtualization Add-on#

The Squirro Data Virtualization add-on leverages modern data virtualization techniques, giving access to a unified view of data from multiple sources. Easily create API endpoints capable of answering queries by rerouting them to the appropriate underlying data sources and allowing you to aggregate, join, and transform data from anywhere in real time. Out of the box, Data Virtualization supports the most widely used databases, data formats, and data lakes.

One of the key benefits of the add-on is that it eliminates the need for data ingestion and indexing.

API Management Interface#

Squirro provides a range of predefined API endpoints and queries that cover common use cases, allowing you to get started quickly. The system offers a lot of flexibitiy, so you can edit the default ones and adjust them according to your specific requirements or create brand new ones to cover specific use cases unique to your organization.

Squirro API Management Platform for the Data Virtualization

Refer to the in-product documentation for more information on navigating and using the API management interface.

Configuration file#

To activate the add-on, you need to adjust the settings in the configuration file, as described in the following steps. Note that server-level administrator privileges are required to complete these instructions.

Connect to the server via SSH and create the deployments directory.

sudo mkdir /etc/squirro/genai.d/deployments

Get the default.yaml configuration file associated with your Squirro platform version. If you do not already have it, visit the Squirro Support website and submit a technical support request.
Edit the YAML file and locate the following line.

- $runnable: data/tools/raw_labs

Adjust the following settings:
1. enabled - Activate the tool by changing the value to true.
2. api_prefix - Add the value for the corresponding branch associated with your API endpoint, "/main/" for example. Note that the value for the parameter must be in between forward slashes.
3. specs_url - Insert the URL for the OpenAPI specifications.
4. group_by_tags - When set to true it activates the hierarchical workflow.
5. description - The input passed to the LLM, informing the model what the tool is and its intended use.
6. tags_descriptions - List all the tags associated with their description. They direct the LLM to the appropriate API endpoint based on its analysis of the user input.
Save your edits and upload the file to the deployments folder.
Recursively give ownership of the deployments folder to the sqgenai user and squirro group.

sudo chown -R sqgenai:squirro /etc/squirro/genai.d/deployments

Create the docker-override.env file inside the /etc/squirro/genai.d/ directory if it does not already exist. That file stores the environment variables used to configure the add-on. This approach ensures that your environment variables are preserved and continue to work as expected after updates or upgrades of the Squirro components.
Edit the docker-override.env file and add a new line for the API key. Ensure each line is in the format VARIABLE_NAME=VALUE without spaces around the equal sign.

RAW_API_KEY=JT7dZE9cxF4yCZ6heBDzTVJhQywfThisIsAnExample

Save your edits.
Restart the sqgenaid service.

sudo systemctl restart sqgenaid

Example#

Here is an extract of a configuration file:

- $runnable: data/tools/raw_labs
  enabled: true
  api_prefix: "/main/"
  specs_url: "https://prod-zurich-212.api.raw-labs.com/raw/2/api/admin/endpoints/open-api?file_format=json&security_scheme=api_key"
  group_by_tags: false
  description: |
      Retrieves information, especially structured data, from a variety of APIs. Use this tool whenever it looks like you need to access real time data or any other data that might be stored in a database.
      Data to be obtained from here is:
          - Information from our Salesforce.com CRM, including leads / SQLs, customers / accounts, opportunities.
          - Information about our physical office spaces, including restaurants in the vicinity.
  tags_descriptions:
      salesforce: Information about our customers from our Salesforce.com CRM
      lead: Access data about leads including sales qualified leads (SQLs)
      opportunity: Information about Salesforce opportunities
      office: Information about our physical office spaces, restaurants
  cluster:
      $placeholder: squirro_cluster
      env_lookup: "SQUIRRO_CLUSTER"
  refresh_token:
      $placeholder: squirro_refresh_token
      env_lookup: "SQUIRRO_REFRESH_TOKEN"
  openai_chat_model:
    model:
      $placeholder: llm_name
      env_lookup: "OPENAI_MODEL"
    api_key:
      $placeholder: llm_api_key
      env_lookup: "OPENAI_API_KEY"
    temperature:
      $placeholder: llm_temperature
    base_url:
      $placeholder: llm_base_url
      env_lookup: "OPENAI_API_BASE"
    top_p:
      $placeholder: llm_top_p
    provider:
      $placeholder: llm_provider

Getting Started#

To activate the Data Virtualization add-on for your Squirro instance, contact Squirro Support and submit a technical support request. Once your system is ready, our solutions engineers will assist with the setup process to ensure a smooth integration and optimal configuration.