PoolParty Knowledge Graph#

This page outlines how the structured expressiveness of a PoolParty knowledge graph can be combined with the unstructured text understanding of Squirro’s Insight Engine by integrating the two technologies.

Together, they create a semantic search engine experience that produces extremely high-quality results.

Reference: Learn more about the basics of knowledge graphs or the semantic AI company providing the knowledge graph integration, PoolParty.

The Query Triangle#

The foundation of a knowledge graph and insight engine integration is Content, Context, and Intent.

This is known as the Query Triangle.

  • Content is the actual query terms of the user. By applying the context and the intent to it the result quality can be improved compared with returning textually matching documents.

  • Context is generally defined by the user’s role or team assignments. This allows for personalization of results.

  • Intent is the workflow goal a user has. Are they trying to find a specific document, research a topic generally, or do they need an answer to a specific question?

Query Triangle Diagram

Squirro’s out-of-the-box query processing can extract intent and context by learning from content and user behavior.

Layering a knowledge graph on top of query processing workflows further improves search quality.

Most importantly, where a knowledge graph already exists, the integration works immediately without the need for initial machine learning training.

Reference: Learn more about Squirro Query Processing.

Integration Features#

The following features become available once the two systems are integrated, all of which lead to more relevant search results:

  • Mapping of query terms to their graph concepts. This makes search more precise for concepts that are represented in the graph.

  • Synonym handling for concepts. A graph’s labels can be used to handle synonyms of concepts.

  • Expanded queries using concept relationships. The relationships between graph concepts are used to expand the user’s queries automatically and transparently.

  • Role-based concept expansion. User roles can be represented in the graph with relationships to concepts. This way specific concepts (and their documents) can be boosted in relevance or, if desired, the search scope can be limited to matching concepts.

Prerequisites#

This page will walk you through setting up a Squirro/PoolParty integration.

You will need the following components to successfully set up the integration:

Architecture#

To understand the integration of the two different technologies, it’s important to first understand the individual components before looking at the joint architecture.

Squirro Item Model#

Gather Understand Act

Squirro indexes predominantly unstructured data. Each unique object from source systems, such as a document, call log, news story, etc. is modeled as an Item.

Reference: Learn more about Item Format.

These items can have labels attached to them, either on the document level or on the sentence level, though the latter are usually referred to as entities in the Squirro context.

Labels can be derived from the original data being indexed into Squirro or be attached to the items with a variety of different enrichments in the Squirro pipeline.

Reference: Learn more about Labels.

AI Studio#

An important component in the context of a PoolParty integration is the AI Studio text classification models that can classify sentences and documents into different topics using a supervised classification approach.

Reference: Learn more about the AI Studio.

PoolParty Knowledge Graph#

PoolParty’s Taxonomy Editor is built on top of an RDF graph database.

The default ontology used by PoolParty is SKOS, the Simple Knowledge Organization System.

In SKOS, every knowledge object is represented as a Concept. Every concept is represented by a URI as its unique identifier.

Reference: Learn more about SKOS.

These concepts can have a multitude of properties attached.

In the context of the Squirro integration, the most important ones are the relationships (broader, narrower) and the labels (prefLabel, altLabel, etc.).

Joint Architecture#

Joint architecture combines different Squirro and PoolParty services to enable a single search experience, as shown in the diagram below:

Joint Architecture Squirro and PoolParty

One of the primary goals of the integration is to make Concepts a first-class object in the Squirro Insight Engine.

To achieve this, the integration works on two levels:

  • Index time

  • Query time

As part of the indexing, all content is run through a combination of Squirro’s AI Studio and the PoolParty Extractor, as visualized in the chart below:

Indexing Diagram

As a result, all matching concept URIs are then attached to the Squirro items to index.

News Story Example#

To visualize this enrichment, consider the following example document indexed in Squirro.

Step 1: Indexing

Without any enrichment, the following news story is represented as a Squirro item with its title, date, link, and body:

Step 2: AI Studio Text Classification

As a second step, the text is run through text classification models in Squirro’s AI Studio.

This results in the text content being labeled with some signals:

Step 3: Entities to Concepts

These signals are now linked to the knowledge graph in the “Entities to Concepts” step:

Step 4: PoolParty Extractor Signal Analysis and Linking

Now, the PoolParty Extractor analyzes the signals’ text to link to additional concepts from the graph:

Querying#

As part of the querying process, the user’s query is run through the PoolParty Extractor and a new PoolParty Recommender API, as shown in the diagram below:

Querying Diagram

This creates a list of concept URIs that are then used as the query against the search index.

Squirro Set Up#

This section guides you through how to set up a Squirro instance and connect it with a PoolParty taxonomy.

Prerequisites#

The following general steps are assumed to have been completed already. As they are not specific to this integration, they are not covered on this page.

  1. Set up a Squirro instance and a Squirro project. For the Squirro project, a Cognitive Search template is recommended.

    Reference: For private cloud or local installations, see Install and Manage Squirro with Ansible. To install a Cognitive Search template on Squirro’s managed cloud service, How to Install A Squirro Application.

  2. Set up multiple AI Studio models and integrate them into the main processing pipeline.

    Reference: Learn more about the AI Studio.

  3. Set up the relevant data connections and load data into the project.

    Reference: See How to Load Data Using the UI for more information.

Upload Custom Components#

Some components are provided as a separate package.

Reference: Download the package. This will require your license key as the username and password.

To upload these to your Squirro project follow the steps below:

  1. Extract the downloaded package locally. This should create a folder called squirro-poolparty which contains a project template (squirro-poolparty.sqproj) and a studio plugin (poolparty_communities).

  2. Open your Squirro project and navigate to Setup → Settings → Import Project.

  3. Drag and drop the project template squirro-poolparty.sqproj into the drop area.

  4. Click the Import button.

Configuration Values#

Various Squirro enrichments and query steps need access to the PoolParty server.

To manage the access credentials and configuration service centrally, follow the steps below:

  1. Navigate to the Setup space.

  2. Click Settings → Project Configuration.

  3. To add a new value, press the plus button on the top right.

  4. Add the following configurations:

Configuration Name

Data Type

Description

poolparty.server

String

The PoolParty server URL. For example, https://acme.poolparty.biz.

poolparty.recommender-server

String

The PoolParty recommendation service URL. E.g., https://acme-recommender.poolparty.biz.

poolparty.username

String

The username to access the PoolParty APIs.

poolparty.password

String

The password to access the PoolParty APIs.

poolparty.project-id

String

The PoolParty project identifier. (See Locate Your Project Identifier on how to obtain your ID.)

poolparty.signals-type-uri

String

The RDF type which is used to model Squirro signals in the knowledge graph. See the PoolParty Setup section later on this page for details.

poolparty.concept-mapping

Dictionary

Properties that should be returned when looking up communities in the user interface. See Configuring Additional Properties section later on this page for details.

PoolParty Extractor#

The PoolParty Extractor pipeline step uses PoolParty’s extractor API to identify graph concepts that are mentioned in the documents being indexed.

The best-matching concept URIs are then added to the Squirro item as document-level labels.

This is the foundation for the concept-based query solution.

Note

The PoolParty extractor is a pipelet. To learn more about pipelets, Squirro’s custom Python-based data processing plugins, see Pipelets.

Adding the Pipelet#

To use this pipelet, add it to the relevant pipeline workflow (or workflows).

Tip

If you are using the default Squirro Cognitive Search template, a good workflow to use for this is the NLP-tagger one which is shared between the Document Handling and Standard workflows.

This pipelet should run before all of the classifications, so add it to the Relate section, then edit its configuration.

Pipelet Configuration#

All of the configuration options, from the server name up to and including the password, have a default value that ensures the configuration is fetched from the configuration service.

This ensures the values entered earlier to the project configuration are used here.

Note: The password field is masked. However, the default value for that field is ${poolparty.password} which fetches the corresponding value.

Number of Concepts to Add Per Class

The option Number of concepts to add per class reduces the number of tagged concepts per item to a smaller number.

The purpose of this is to only tag the item-level concepts for the most relevant concepts found inside the document, the ones most frequently mentioned for example.

The limits to not apply to any concepts identified in the document title, as they are assumed to always be relevant.

Tip: Set the value to 0 to always tag all concepts.

Limit Concept Tagging to These Classes

If Limit concept tagging to these classes is defined, only concepts for those classes are added to the concepts list. Additionally, the number of Number of concepts to add per class is then counted per class, not overall.

This option is useful to make concept tagging more relevant in very broad knowledge graphs where some concepts are not to be used for searches by the users.

Custom Relations to Expand

The Custom relations to expand option can be used to tag concepts linked in the taxonomy. This is in addition to the SKOS broader relationship which is always expanded.

An example of this could be to always tag occurrences of identified products also with the company that manufactures this product.

This could be achieved with an example relationship URI of https://acme.poolparty.biz/Organizations/isProductOf (which is a placeholder only and needs to be adjusted to your own taxonomy of course).

Entities to Concepts#

The second pipeline step to add is the Entities to Concepts pipelet.

This step runs after all the AI Studio model classifications have executed and does two things:

  1. It maps the extracted signals to their corresponding signal URI from the knowledge graph. Additionally, all the SKOS broader relationships to the top of the knowledge tree are added for matching signals.

Important

For this first step to work correctly, the poolparty.signals-type-uri configuration value needs to be set to the URI which is used as the type of all signal concepts in the graph.

  1. For each extracted signal it uses the PoolParty Recommender API to add additional matching concepts. All of the identified concept URIs are then added to the Squirro item as document-level labels.

Note

This second step adds concepts much more aggressively than the previous one, but only does so for text around extracted signals.

Adding the Pipelet#

To use this pipelet, add it to the relevant pipeline workflow (or workflows). If you are using the default Cognitive Search template, a good workflow to use for this is the NLP-tagger, which is shared between the Document Handling and Standard workflows.

This pipelet should run after all of the classifications, so add it to the Recommend section, then edit its configuration.

Pipelet Configuration#

All of the configuration options have a default which ensures the configuration is fetched from the configuration service.

This ensures the values entered earlier to the project configuration are used here.

Note: The password field is masked. However, the default value for that field is ${poolparty.password} which fetches the corresponding value.

Query Processing Workflow#

When a query is entered it is expanded to identify the concepts mentioned in the input text. These concepts are then used for searching against the previously added labels. This query expansion is done through a query-processing workflow.

Reference: Learn more about Query Processing Workflows.

Upon importing the file provided earlier, the correct query processing workflow is already present in the project.

To enable it, follow the steps below:

  1. Access the ML Workflows project section by navigating to Setup → AI Studio → ML Workflows.

  2. Find the workflow titled PoolParty Query Pipeline in the table.

  3. Click the Set Active button for that workflow. With this change, all the user queries are now processed through PoolParty and expanded with the knowledge graph concepts.

  4. After this is completed, open the project settings again, and find the setting topic.search.query-workflow-timeout.

The default timeout for query processing is 500 milliseconds. With a large knowledge graph, the expansion may currently take more time than that.

Increase the workflow timeout to account for this if needed.

Communities#

Communities is a feature which allows end users to personalize their Squirro experience by following topics of interest.

Reference: Learn more about Communities.

Setting up Communities#

The PoolParty Extractor step created labels on the indexed documents for each configured class. This leads to the items being tagged with URIs for the matching concepts.

Additionally, the concepts label will always be defined and have all the matching concepts.

Squirro communities can now be set up to provide additional metadata for each major concept class. Let’s assume for example that Limit concept tagging to these classes was set to Organization,Product to create labels for all organizations and products on the Squirro items.

Based on these, communities can be created as follows:

  1. Navigate to Setup → Data → Communities.

  2. Press the Plus button at the top right to add a new community topic.

  3. Enter a name, for example, Organizations.

  4. Select the label this should map to, for example, organization.

  5. Keep the rest of the values as default. This means Fetch label values should be selected and Automatically label ingested data with these communities must not be selected.

  6. Press Next.

After completing this step the communities have now been imported and will show a list of URIs.

Repeat these steps for any other type you would like to expose to the user.

Community Augmentation#

If you go to the Dashboard space and open the Communities at this point, a list of URIs is shown to the user.

The next step will change this to show the actual community names and additional information.

Querying Diagram

Reference: Learn more about community augmentation.

Install Community Augmentation Plugin#

The first step is to upload the studio plugin that was downloaded initially together with the project export. Do so by following the steps below:

  1. Install the Squirro Toolbox. For this, see Squirro Toolbox and follow the instructions within Install In a Python Environment.

  2. On the command line, navigate to the folder where the downloaded file has been extracted. From that folder run the following command:

squirro_asset -v studio_plugin upload --token "$TOKEN" --cluster "$CLUSTER" -f poolparty_communities

Important

When running this command, replace $TOKEN and $CLUSTER with the appropriate values as explained in Connecting to Squirro.

  1. If the command runs correctly it should provide the following message:

Created or updated studio_plugin - poolparty_communities

Enable Community Augmentation#

Next, community augmentation needs to be enabled on the server.

To do so, follow the steps below:

  1. Open your project and navigate to the Setup space.

  2. Select Settings → Project Configuration.

  3. Search for the following two values and set them as per this table:

Configuration Name

Value

topic.community.augment-community-data

TRUE

topic.community.augment-community-data-url

/studio/poolparty_communities/augment/<your-Squirro-projectID>

Reference: To find your project ID to insert, see Locate Your Project Identifier.

  1. Refresh the list of communities and verify that the name of each concept shows correctly, as the concept is fetched from the PoolParty RDF endpoint when it is shown to the user.

Note

These requests are cached, meaning that changes to either the configuration or graph values may not show immediately.

Configuring Additional Properties#

The project configuration poolparty.concept-mapping can be used to return additional properties from the knowledge graph. These can be used by custom widgets to show the values to the user on result pages or in the Community 360 pages.

The dictionary is a mapping of the output key that will be returned as a community property to the predicate URI with which the value can be retrieved from the concept.

Note: There is currently no support for path traversal in this context.

The following is an example of a value that would retrieve multiple additional properties from the graph:

{
"name": "skos:prefLabel",
"photo": "https://acme.poolparty.biz/Organizations/logo",
"cik": "https://acme.poolparty.biz/Organizations/cik",
"dateFirstAdded": "https://acme.poolparty.biz/Organizations/dateFirstAdded",
"foundedInYear": "https://acme.poolparty.biz/Organizations/foundedInYear",
"isHeadquarteredIn": "https://acme.poolparty.biz/Organizations/isHeadquarteredIn",
"sector": "https://acme.poolparty.biz/Organizations/sector",
"subSector": "https://acme.poolparty.biz/Organizations/subSector",
"symbol": "https://acme.poolparty.biz/Organizations/symbol"
}

The name and photo properties have a special meaning as they are used for the main community name and its image respectively.

All the other keys can be chosen freely and will be provided to the community widgets without any changes.

Note

The name defaults to skos:prefLabel if no configuration has been provided.

For the predicate URIs the following prefixes are defined:

Prefix

URI

dc

http://purl.org/dc/elements/1.1/

dcterms

http://purl.org/dc/elements/1.1/

owl

http://purl.org/dc/terms/

rdf

http://www.w3.org/2002/07/owl#

rdfs

http://www.w3.org/2000/01/rdf-schema#

skos

http://www.w3.org/2004/02/skos/core#

PoolParty Set Up#

For help setting up the PoolParty end of the integration, contact Squirro Support.