TopicApiBaseMixin

class squirro_client.topic.TopicApiBaseMixin

Bases: object

Methods Summary

assert_permission([project_id, …])

Ensure the user has the right permissions on the project.

delete_item(project_id, item_id)

Deletes an item.

delete_pipelet(name)

Delete a pipelet.

get_encrypted_query(project_id[, query, …])

Encrypts and signs the query and returns it.

get_item(project_id, item_id, **kwargs)

Returns the requested item for the provided project.

get_items(project_id, **kwargs)

Returns items for the provided project.

get_pipelet(name)

Return details for one pipelet.

get_pipelets()

Return all available pipelets.

get_projects()

Return all projects.

get_typeahead_suggestions(project_id, …[, …])

Get the typeahead suggestions for a query searchbar_query in the project identified by the id project_id.

get_version()

Get current squirro version and build number.

modify_item(project_id, item_id[, star, …])

Updates the flags, entities, and/or keywords of an item.

modify_items(project_id, items[, …])

Updates the flags and/or keywords of a list of items.

query(project_id[, query, query_context, …])

Returns items for the provided project.

query_entities(project_id[, query, fields, …])

Query entity and return aggregations of some entity fields

query_for_you(project_id[, query, …])

Wraps the query endpoint and adds an additional user-behaviour centric view on the data.

recommend(project_id[, item_id, …])

Returns recommended items for the provided ids or text.

recommend_entities(project_id[, method, …])

Recommend entity property based on input entity properties

recommend_entities_explore(project_id[, …])

Explore results of entity recommendation

recommend_entities_methods(project_id)

Returns the available entity recommendation methods.

recommend_facets(project_id[, method, …])

Recommend facet value based on input facets

recommend_facets_explore(project_id[, …])

Explore results of facet recommendation

recommend_facets_methods(project_id)

Returns the available facet recommendation methods.

recommendation_methods(project_id)

Returns the available recommendation methods.

scan(project_id[, query, scroll, …])

Returns an iterator to scan through all items of a project.

Methods Documentation

assert_permission(project_id=None, user_permissions=None, project_permissions=None)

Ensure the user has the right permissions on the project.

Parameters
  • project_id – Project identifier.

  • user_permissions – User permissions required.

  • project_permissions – Project permissions required.

Returns

True if the permissions are met.

Example:

>>> client.assert_permissions('2aEVClLRRA-vCCIvnuEAvQ',
user_permissions='admin')

Or with multiple permissions (at least one permission needs to match):

>>> client.assert_permissions('2aEVClLRRA-vCCIvnuEAvQ',
project_permissions=['items.read', 'project.read'])
delete_item(project_id, item_id)

Deletes an item.

Parameters
  • project_id – Project identifier.

  • item_id – Item identifier.

Example:

>>> client.delete_item(
...     '2aEVClLRRA-vCCIvnuEAvQ', 'haG6fhr9RLCm7ZKz1Meouw')
delete_pipelet(name)

Delete a pipelet.

This will break existing enrichments if they still make use of this pipelet.

Example:

>>> client.delete_pipelet('textrazor')
get_encrypted_query(project_id, query=None, aggregations=None, fields=None, created_before=None, created_after=None, options=None, **kwargs)

Encrypts and signs the query and returns it. If set the aggregations, created_before, created_after, fields and options are part of the encrypted query as well.

Parameters
  • project_id – Project identifier.

  • query – query to encrypt.

For additional parameters see self.query().

Returns

A dictionary which contains the encrypted query

Example:

>>> client.get_encrypted_query(
        '2aEVClLRRA-vCCIvnuEAvQ',
        query='test_query')
{'encrypted_query': 'YR4h147YAldsARmTmIrOcJqpuntiJULXPV3ZrX_'
'blVWvbCavvESTw4Jis6sTgGC9a1LhrLd9Nq-77CNX2eeieMEDnPFPRqlPGO8V'
'e2rlwuKuVQJGQx3-F_-eFqF-CE-uoA6yoXoPyYqh71syalWFfc-tuvp0a7c6e'
'eKAO6hoxwNbZlb9y9pha0X084JdI-_l6hew9XKZTXLjT95Pt42vmoU_t6vh_w1'
'hXdgUZMYe81LyudvhoVZ6zr2tzuvZuMoYtP8iMcVL_Z0XlEBAaMWAyM5hk_tAG'
'7AbqGejZfUrDN3TJqdrmHUeeknpxpMp8nLTnbFMuHVwnj2hSmoxD-2r7BYbolJ'
'iRFZuTqrpVi0='}
get_item(project_id, item_id, **kwargs)

Returns the requested item for the provided project.

Parameters
  • project_id – Project identifier.

  • item_id – Item identifier.

  • kwargs – Query parameters. All keyword arguments are passed on verbatim to the API. See the [[Items#Get Item|Get Item]] resource for all possible parameters.

Keyword Arguments
  • highlight_query (Union[str,dict]) –

    Can be a single query-string using squirro syntax. OR a dictionary containing more metadata, currently supported keys are: highlight_query.query: squirro query syntax highlight_query.like: long piece of text to perform concept search on

Returns

A dictionary which contains the individual item.

Example:

>>> client.get_item(
...     '2aEVClLRRA-vCCIvnuEAvQ', 'haG6fhr9RLCm7ZKz1Meouw')
{'item': {'created_at': '2012-10-06T08:27:58',
           'id': 'haG6fhr9RLCm7ZKz1Meouw',
           'link': 'https://www.youtube.com/watch?v=Zzvhu42dWAc',
           'read': True,
           'item_score': 0.5,
           'score': 0.56,
           'sources': [{'id': 'oMNOQ-3rQo21q3UmaiaLHw',
                         'link': 'https://gdata.youtube.com/...',
                         'provider': 'feed',
                         'title': 'Uploads by mymemonic'},
                        {'id': 'H4nd0CasQQe_PMNDM0DnNA',
                         'link': None,
                         'provider': 'savedsearch',
                         'title': 'Squirro Alerts for "memonic"'}
                       ],
           'starred': False,
           'thumbler_url': '[long url]...jpg',
           'title': 'Web Clipping - made easy with Memonic',
           'webshot_height': 360,
           'webshot_url': 'http://webshot.trunk....jpg',
           'webshot_width': 480}}
get_items(project_id, **kwargs)

Returns items for the provided project.

DEPRECATED. The query method is more powerful.

Parameters
  • project_id – Project identifier.

  • kwargs – Query parameters. All keyword arguments are passed on verbatim to the API. See the [[Items#List Items|List Items]] resource for all possible parameters.

Returns

A dictionary which contains the items for the project.

Example:

>>> client.get_items('2aEVClLRRA-vCCIvnuEAvQ', count=1)
{'count': 1,
 'eof': False,
 'items': [{'created_at': '2012-10-06T08:27:58',
             'id': 'haG6fhr9RLCm7ZKz1Meouw',
             'link': 'https://www.youtube.com/...',
             'read': True,
             'item_score': 0.5,
             'score': 0.56,
             'sources': [{'id': 'oMNOQ-3rQo21q3UmaiaLHw',
                           'link': 'https://gdata.youtube...',
                           'provider': 'feed',
                           'title': 'Uploads by mymemonic'},
                          {'id': 'H4nd0CasQQe_PMNDM0DnNA',
                           'link': None,
                           'provider': 'savedsearch',
                           'title': 'Squirro Alerts for "mmonic"'
                          }],
             'starred': False,
             'thumbler_url': '[long url]...jpg',
             'title': 'Web Clipping - made easy with Memonic',
             'webshot_height': 360,
             'webshot_url': 'http://webshot.trunk....jpg',
             'webshot_width': 480}],
 'now': '2012-10-11T14:39:54'}
get_pipelet(name)

Return details for one pipelet.

returns

A dictionary with pipelet details.

Example:

>>> client.get_pipelet('textrazor')
{'description': 'Entity extraction with `TextRazor`.',
 'description_html': '<p>Entity extraction with
 '<code>TextRazor</code>.</p>',
 'id': 'tenant01/textrazor',
 'name': 'textrazor',
 'source': 'from squirro.sdk.pipelet import PipeletV1

…’}

get_pipelets()

Return all available pipelets.

These pipelets can be used for enrichments of type pipelet.

Returns

A dictionary where the value for pipelets is a list of pipelets.

Example:

>>> client.get_pipelets()
{'pipelets': [{'id': 'tenant01/textrazor',
                'name': 'textrazor'}]}
get_projects()

Return all projects.

get_typeahead_suggestions(project_id, searchbar_query, cursor_pos, max_suggestions=None, options=None, filter_query=None)

Get the typeahead suggestions for a query searchbar_query in the project identified by the id project_id.

Parameters
  • project_id – Project identifier from which the typeahead suggestions should be returned.

  • searchbar_query – The full query that goes into a searchbar. The searchbar_query will automatically be parsed and the suggestion on the field defined by the cursor_pos and filtered by the rest of the query will be returned. searchbar_query can not be None.

  • cursor_pos – The position in the searchbar_query on which the typeahead is needed. cursor_pos parameter follow a 0-index convention, i.e. the first position in the searchbar-query is 0. cursor_pos should be a positive integer.

  • max_suggestions – Maximum number of typeahead suggestions to be returned. max_suggestions should be a non-negative integer.

  • options

    Dictionary of options that influence the result-set. Valid options are:

    • template_params dict containing the query template parameters

    • select_suggesters (optional) list containing suggesters to be used.

    If no suggesters are selected, all suggesters are executed by default. Available suggesters:

    • ”facet_value”: Complete on facet-values that have “autocomplete” enabled (Exact Search)

    • ”facet_name: Help to find the correct facet

    • ”content_phrase”: Complete on key-phrases + item-title (Fuzzy Search) added via NLP-Tagger

    • ”saved_search”: Complete on queries saved by the user

    • ”popular_query”: Complete on popular queries on the project, filtered by user_id (optional)

    • ”search_history”: Triggered only to pre-populate and show default-suggestions. Returns last-N searches, filtered by project_id and user_id.

  • filter_query – Squirro query to limit the typeahead suggestions. Must be of type string. Defaults to None if not specified. As an example, this parameter can be used to filter the typeahead suggestions by a dashboard query on a Squirro dashboard.

Returns

A dict of suggestions

Example:

# Default usage

>>> client.get_typeahead_suggestions(project_id='Sz7LLLbyTzy_SddblwIxaA',
                                 searchbar_query='Country:India c',
                                 cursor_pos=15)
{'suggestions': [
    {'type': 'facetvalue', 'key': 'Country:India
    City:Calcutta', 'value': 'city:Calcutta', 'score': 12,
    'cursor_pos': 26, 'group': 'country'},
    {'type': 'facetvalue', 'key': 'Country:India
    Name:Caesar', 'value': 'name:Caesar', 'score': 8,
    'cursor_pos': 24, 'group': 'country'},
    {'type': 'facetname', 'key': 'Country:India city:',
    'value': 'City', 'score': 6, 'cursor_pos': 19, 'group':
    'Fields'}
]}

Example:

# Autocomplete on one specific suggester like `content_phrase`
# - The "content_phrase" suggester completes on key-phrases and item titles added by
#  `NLP-Tagger <https://squirro.atlassian.net/wiki/spaces/DOC/pages/2396061784/Content-based+Typeahead>`__
# - Autocompletion supports in-order phrase & fuzzy string matching
# - Highlighted tokens are returned as <b> html tags

>>> client.get_typeahead_suggestions(project_id=project_id,
         searchbar_query="pla",
         cursor_pos=0, # cursor_pos not used for this suggester, but parameter is required on API level
         options={"select_suggesters": ["content_phrase"]})
{"suggestions":[
      {
         "type":"text",
         "key":" "plant material"",
         "value":"<b>plant</b> material",
         "group_name":"By Content",
      },
      {
         "type":"text",
         "key":" "plant proteins"",
         "value":"<b>plant</b> proteins",
         "group_name":"By Content",
      }]
}
get_version()

Get current squirro version and build number.

Returns

Dictionary contains ‘version’, ‘build’ and ‘components’. ‘components’ is used for numeric comparison.

Example:

>>> client.get_version()
{
    "version": "2.4.5",
    "build": "2874"
    "components": [2, 4, 5]
}
modify_item(project_id, item_id, star=None, read=None, keywords=None, entities=None, force_cache_clear=None)

Updates the flags, entities, and/or keywords of an item.

You can only update star, read, and keywords. The new values will overwrite all old values.

Parameters
  • project_id – Project identifier.

  • item_id – Item identifier.

  • star – Starred flag for the item, either True or False.

  • read – Read flag for the item, either True or False.

  • keywords – Updates to the keywords of the item.

  • entities – Updates to the entities of the item.

  • force_cache_clear – Deprecated. This is the default behavior now. Force all relevant caches to be cleared

Example:

>>> client.modify_item(
...     '2aEVClLRRA-vCCIvnuEAvQ', 'haG6fhr9RLCm7ZKz1Meouw',
...     star=True,
...     read=False,
...     entities=[],
...     keywords={'Canton': ['Zurich'], 'Topic': None,
...               'sports': [{'hockey', 0.9}, {'baseball', 0.1}]
modify_items(project_id, items, batch_size=10000, force_cache_clear=None)

Updates the flags and/or keywords of a list of items.

You can only update star, read, and keywords. The new values will overwrite all old values.

Parameters
  • project_id – Project identifier.

  • items – List of items.

  • batch_size – An optional batch size (defaults to MAX_UPDATE_COUNT)

  • force_cache_clear – Deprecated. This is the default behavior now. Force all relevant caches to be cleared

Example:

>>> client.modify_items(
...     '2aEVClLRRA-vCCIvnuEAvQ', [
...     {
...         'id': 'haG6fhr9RLCm7ZKz1Meouw',
...         'star': True,
...         'read': False,
...         'keywords': {'Canton': ['Berne'], 'Topic': None,
...                      'sports': [{'hockey': 0.3},
...                                 {'baseball': 0.5}]
...     },
...     {
...         'id': 'masnnawefna9MMf3lk',
...         'star': False,
...         'read': True,
...         'keywords': {'Canton': ['Zurich'], 'Topic': None,
...                      'sports': [{'hockey': 0.9},
...                                 {'baseball': 0.1}]
...     }],
...     batch_size=1000
... )
query(project_id, query=None, query_context=None, aggregations=None, start=None, count=None, fields=None, highlight=None, next_params=None, created_before=None, created_after=None, options=None, encrypted_query=None, child_count=None, timing=False, **kwargs)

Returns items for the provided project.

This is the successor to the get_items method and should be used in its place.

For information on the item-structure see Item-Format reference

Parameters
  • project_id (str) – Project identifier.

  • query (Optional[str]) – Optional query to run.

  • query_context (Optional[Dict[str, Optional[str]]]) –

    Dictionary containing context of the query:

    • searchbar_query : Terms that user typed in a searchbar.

    • dashboard_filters : Additional filter-queries provided via dashboard or widget configuration.

    • likeAdditional input to perform approximate search on.

      For now like is considered to be a long string (e.g. paragraphs)

  • start – Zero based starting point.

  • count – Maximum number of items to return.

  • child_count – Maximum number of entities to return with items.

  • fields – Fields to return.

  • highlight – Dictionary containing highlight information. Keys are: query (boolean) if True the response will contain highlight information. smartfilters (list): List of Smart Filter names to be highlighted.

  • spellcheck – (Dictionary) Apply spellchecking on the given query terms. Spellchecking is performed against the provided field (full elastic field name).

  • options

    Dictionary of options that influence the result-set. Valid options are:

    • fold_near_duplicates to fold near-duplicates together and filter them out of the result-stream. Defaults to False.

    • abstract_size to set the length of the returned abstract in number of characters. Defaults to the configured default_abstract_size (500).

    • update_cache if False the result won’t be cached. Used for non-interactive queries that iterate over a large number of items. Defaults to True.

  • encrypted_query – Optional Encrypted query returned by get_encrypted_query method. This parameter overrides the query parameter and query_template_params (as part of options parameter), if provided. Returns a 403 if the encrypted query is expired or has been altered with.

  • next_params – Parameter that were sent with the previous response as next_params.

  • created_before – Restrict result set to items created before created_before.

  • created_after – Restrict result set to items created after created_after.

  • timing (bool) – Boolean, specifies if detailed execution time profiling is reported (True). Disabled by default (False). The reported timing profile consists of a call stack with important subroutine execution times measured in milliseconds [ms]. If enabled, the report is returned with key timing_report.

  • kwargs – Additional query parameters. All keyword arguments are passed on verbatim to the API.

Examples on how to query items

Search for documents and return specific fields:

# Possible field values: ["title","body","abstract","keywords","starred","read",
#                        "item_id","external_id","created_at","modified_at"]
>>> client.query(
            project_id="DSuNrcnlSc6x5SJZh02IyQ",
            query="virus",
            count=1,
            fields=["title"])
{'count': 1,
 'items': [{'id': '7rjxIjg_gPjrfjTk3dsTTA',
   'title': "FDA Adviser: Vaccine To Be OK'd In Days, But 'Normal' May Not Return Until Next Fall",
   'sources': [{'id': '4fp-1YiASwS-kfNEXYus_g', 'title': 'News Source'}]}],
 'now': '2022-03-21T16:53:52',
 'eof': False,
 'total': 2254,
 'next_params': {'expected_num_results': 2254, 'start': 1},
 'query_executed': 'virus',
 'from_cache': False,
 'time_ms': 221}

Detect spelling errors and provide refined search terms:

# Example query `vimus` matches zero items in the index:
# --> Suggest similar terms from the index --> return term `virus` to be used for query refinement

>>> client.query(
            project_id="DSuNrcnlSc6x5SJZh02IyQ",
            query="vimus",
            count=2,
            spellcheck={"text":query,"field":"body.en.unstemmed"})
{'count': 0,
 'items': [],
 'now': '2022-03-21T16:55:38',
 'eof': True,
 'total': 0,
 'next_params': {},
 'query_executed': 'vimus',
 'from_cache': False,
 'spellcheck': [
    {'original': 'vimus',
    'corrected': 'virus'}
    ],
 'time_ms': 226}
query_entities(project_id, query=None, fields=None, aggregations=None, start=None, count=None, **kwargs)

Query entity and return aggregations of some entity fields

Parameters
  • project_id – Project identifier.

  • count – number of return entities

  • start – zero based starting point of return entities

  • fields – List of fields to return

  • query – query to match entity. Use item query syntax, e.g entity:{type:career}

  • aggregations – Aggregation of entity fields. For numeric property you need to add prefix numeric_ to field name, e.g. numeric_properties.salary. We support 2 methods of aggregations: “terms” and “stats” (for numeric properties). Default method is “terms” aggregation.

Returns

List of entities and aggregations

Example:

aggregations = {
    "city": {
        "fields": "properties.city",
        "size": 3
    },
    "salary": {
        "fields": "numeric_properties.salary",
        "method": "stats"
    },
    "job": {
        "fields": "properties.job",
        "size": 3
    },
}

>>> client.query_entities(project_id='2aEVClLRRA-vCCIvnuEAvQ',
...     query='entity:{properties.name:And*}', count=3,
...     aggregations=aggregations)

response = {
    "count": 3,
    "entities": [
        {
            "confidence": 0.8,
            "name": "Andrea Warren",
            "external_id": "entity_288",
            "extracts": [
                {
                    ...
                }
            ],
            "properties": {
                "person_id": "id_andrea warren",
                "city": "Cooperville",
                "job": "Tax inspector",
                "name": "Andrea Warren",
                "salary": 511937
            },
            "item_id": "-xkKQf2SBlS-ZRkIfw4Suw",
            "relevance": 0.8,
            "child_id": "wQ_atc8Nuk4eqj_xSugMOg",
            "type": "career",
            "id": "entity_288"
        },
        ...
    ],
    "total": 1213,
    "aggregations": {
        "salary": {
            "stats": {
                "count": 969,
                "max": 998787.0,
                "sum": 490231470.0,
                "avg": 505914.8297213622,
                "min": 130.0
            }
        },
        "job": {
            "values": [
                {
                    "key": "Conservation officer, nature",
                    "value": 6
                },
                {
                    "key": "Geneticist, molecular",
                    "value": 6
                },
                {
                    "key": "Immigration officer",
                    "value": 6
                }
            ]
        },
        ...
    },
    "time_ms": 62
}
query_for_you(project_id, query=None, query_context=None, aggregations=None, start=None, count=None, fields=None, highlight=None, next_params=None, created_before=None, created_after=None, options=None, encrypted_query=None, child_count=None, timing=False, **kwargs)

Wraps the query endpoint and adds an additional user-behaviour centric view on the data.

It allows to easily:
  • search only within last read items by the user

  • search only within top trending items on the project (tbd)

Experimental - likely to be changed in the future!

Parameters

kwargs – The endpoint supports the same arguments as the query endpoint (except pagination).

Returns

Matching query items annoted with item_read metadata & sorted DESC by user-read-time:

  • item_read.last_read_by_user : timestamp of last item.read event (by user)

  • item_read.read_count_by_user : amount of item.read events (by user)

Example:

>>> client.query_for_you(project_id=f"{project_id}", count=1, fields=["title"])

{
'count': 1,
'items': [
    {'id': 'z9CNx-NMK5ZwCGpg5FrqUw',
    'title': 'Climate change accelerates',
    'sources': [{'id': 'Y2zh9RFbTVqG2hBmuoATNQ', 'title': 'CSV'}],
    'item_read': {
        'last_read_by_user': '2022-07-04T15:58:24',
        'read_count_by_user': 3
    },
    'read': True
}]}
recommend(project_id, item_id=None, external_id=None, text=None, method=None, related_fields=None, count=10, fields=None, created_before=None, options=None, created_after=None, query=None, aggregations=None, method_params=None, **kwargs)

Returns recommended items for the provided ids or text.

Parameters
  • project_id – Project identifier.

  • item_id – ID of item used for recommendation (optional).

  • external_id – External ID of item used for recommendation if item_id is not provided (optional)

  • text – Text content used for recommendation if neither item_id nor external_id are not provided (optional)

  • method – Recommendation method (optional).

  • method_params – Dictionary of method parameters used for recommendations (optional).

  • related_fields – Fields used to find relationship for between items for recommendation. If this param is not set, we use the title and the body of the item.

  • count – Maximum number of items to return.

  • fields – Fields to return.

  • options

    Dictionary of options that influence the result-set. Valid options are:

    • fold_near_duplicates to fold near-duplicates together and filter them out of the result-stream. Defaults to False.

  • created_before – Restrict result set to items created before created_before.

  • created_after – Restrict result set to items created after created_after.

  • query – Search query to restrict the recommendation set.

  • aggregations – Aggregation of faceted fields

recommend_entities(project_id, method=None, count=10, explanation_count=1, data=None)

Recommend entity property based on input entity properties

Parameters
  • project_id – Project identifier.

  • method

    Method of recommendation. Possible values:

    • conprob: use conditional probability for scoring

    • composition: use sum of individual feature scores for scoring

    • ml_classification: use squirro machine learning service with classifcation workflow

    • ml_regression_aggregation: use squirro machine learning service with regression aggregation workflow

  • count – number of return recommendations

  • explanation_count – number of return explanations for each recommendations, explanations are sorted by score, default is 1

  • data

    input data, json object containing flowing fields:

    • input_features: dictionary of input entity properties. Each feature is a property name and list of values. Accept range of values, using elasticsearch range query syntax.

    • entity_type: type of entity to filter data for recommendation.

    • filter_query: query to filter data set for recommendations, adhere squirro query syntax (optional)

    • target_feature: name of target property

    • return_features: list of return properties in recommendation. If this field is not set then name of target property is used.

    • ml_workflow_id: Identififer of machine learning workflow. Could be None in “adhoc” recommendation methods (e.g conprob, composition) which do not need machine learning training.

Returns

Recommendation response

Example:

data = {
    "input_features": {
        "job": ["Head of Sales", "Head of Marketing"],
        "city": ["Zurich", "London"],
        "salary": [{
            "gte": 80000,
            "lte": 120000
        }]
    },
    "filter_query": "$item_created_at>=2018-03-20T00:00:00",
    "target_feature": "person_id",
    "return_features": ["name"],
    "ml_workflow_id": None,
    "entity_type": "career"
}

>>> client.recommend_entities(
...     project_id='2aEVClLRRA-vCCIvnuEAvQ',
...     method='conprob', data=data, count=3)

response = {
    "count": 3,
    "time_ms": 79,
    "params": {...},
    "total": 989,
    "method": {
        "last_updated": null,
        "name": "conprob",
        "ml_workflow_id": null
    },
    "recommendations": [{
        "target_feature": "person_id",
        "score": 1.0,
        "explanations": [
            {
                "score": 0.7713846764962218,
                "feature": "city",
                "value": "Zurich"
            },
            {
                "score": 0.7461064995415513,
                "feature": "job",
                "value": "Head of Sales"
            },
            {
                "score": 0.7289157048296231,
                "feature": "salary",
                "value": {
                    "gte": 80000,
                    "lte": 100000
                }
            }
        ],
        "return_features": {
            "name": "Amber Duncan"
        },
        "target_value": "person_1234"},
        ...
    ]
}
recommend_entities_explore(project_id, method=None, count=10, data=None, start=None, next_params=None, highlight=None)

Explore results of entity recommendation

Parameters
  • project_id – Project identifier.

  • method

    Method of recommendation. Possible values:

    • conprob: use conditional probability for scoring

    • composition: use sum of individual feature scores for scoring

  • count – number of return entities

  • data

    input data, json object containing flowing fields:

    • input_features: dictionary of input entity properties. Each feature is a property name and list of values. Accept range of values, using elasticsearch range query syntax.

    • entity_type: type of entity to filter data for recommendation.

    • filter_query: query to filter data set for recommendations, adhere squirro query syntax (optional)

    • target_feature: name of target property

    • target_value: value of target property

    • filter_features: dictionary of entity properties used for filtering entities. Similar format as input_features

    • ml_workflow_id: Identififer of machine learning workflow. Could be None in “adhoc” recommendation methods (e.g conprob, composition) which do not need machine learning training.

  • start – Zero based starting point.

  • next_params – Parameter that were sent with the previous response as next_params.

  • highlight – Dictionary containing highlight information. Keys are: query (boolean) if True the response will contain highlight information. smartfilters (list): List of Smart Filter names to be highlighted.

Returns

List of items and entities satisfied input

Example:

data = {
    "input_features": {
        "job": ["Head of Sales", "Head of Marketing"],
        "city": ["Zurich", "London"],
        "salary": [{
            "gte": 80000,
            "lte": 120000
        }]
    },
    "filter_query": "$item_created_at>=2018-03-20T00:00:00",
    "target_feature": "person_id",
    "target_value": "a_squirro_employee",
    "filter_features": {
        "job": ["Head of Sales"]
    },
    "ml_workflow_id": None,
    "entity_type": "career"
}

>>> client.recommend_entities_explore(
...     project_id='2aEVClLRRA-vCCIvnuEAvQ',
...     method='conprob', data=data, count=10)
recommend_entities_methods(project_id)

Returns the available entity recommendation methods.

Parameters

project_id – Project identifier.

recommend_facets(project_id, method=None, count=10, explanation_count=1, data=None)

Recommend facet value based on input facets

Parameters
  • project_id – Project identifier.

  • method

    Method of recommendation. Possible values:

    • conprob: use conditional probability for scoring

    • composition: use sum of individual feature scores for scoring

    • ml_classification: use squirro machine learning service with classifcation workflow

    • ml_regression_aggregation: use squirro machine learning service with regression aggregation workflow

  • count – number of return recommendations

  • explanation_count – number of return explanations for each recommendations, explanations are sorted by score, default is 1

  • data

    input data, json object containing flowing fields:

    • input_features: dictionary of input facets. Each feature is a facet name and list of values. Accept range of values, using elasticsearch range query syntax.

    • filter_query: query to filter data set for recommendations, adhere squirro query syntax (optional)

    • target_feature: name of target facet

    • return_features: list of return facets in recommendation. If this field is not set then name of target facet is used.

    • ml_workflow_id: Identififer of machine learning workflow. Could be None in “adhoc” recommendation methods (e.g conprob, composition) which do not need machine learning training.

Returns

Recommendation response

Example:

data = {
    "input_features": {
        "Job": ["Head of Sales", "Head of Marketing"],
        "City": ["Zurich", "London"],
        "Salary": [{
            "gte": 80000,
            "lte": 120000
        }]
    },
    "filter_query": "$item_created_at>=2018-03-20T00:00:00",
    "target_feature": "Person_Id",
    "return_features": ["Name"],
    "ml_workflow_id": None
}

>>> client.recommend_facets(
...     project_id='2aEVClLRRA-vCCIvnuEAvQ',
...     method='conprob', data=data, count=3)

response = {
    "count": 3,
    "time_ms": 79,
    "params": {...},
    "total": 989,
    "method": {
        "last_updated": null,
        "name": "conprob",
        "ml_workflow_id": null
    },
    "recommendations": [{
        "target_feature": "Person_Id",
        "score": 1.0,
        "explanation": [
            {
                "score": 0.7713846764962218,
                "feature": "City",
                "value": "Zurich"
            },
            {
                "score": 0.7461064995415513,
                "feature": "Job",
                "value": "Head of Sales"
            },
            {
                "score": 0.7289157048296231,
                "feature": "Salary",
                "value": {
                    "gte": 80000,
                    "lte": 100000
                }
            }
        ],
        "return_features": {
            "Name": "Amber Duncan"
        },
        "target_value": "1234"},
        ...
    ]
}
recommend_facets_explore(project_id, method=None, count=10, data=None, start=None, next_params=None, highlight=None)

Explore results of facet recommendation

Parameters
  • project_id – Project identifier.

  • method

    Method of recommendation. Possible values:

    • conprob: use conditional probability for scoring

    • composition: use sum of individual feature scores for scoring

    • ml_classification: use squirro machine learning service with classifcation workflow

    • ml_regression_aggregation: use squirro machine learning service with regression aggregation workflow

  • count – number of return recommendations

  • data

    input data, json object containing flowing fields:

    • input_features: dictionary of input facets. Each feature is a facet name and list of values. Accept range of values, using elasticsearch range query syntax.

    • filter_query: query to filter data set for recommendations, adhere squirro query syntax (optional)

    • target_feature: name of target facet

    • target_value: value of target facet

    • filter_features: dictionary of facets used to filter items. Similar format as input_features

    • return_features: list of return facets in recommendation. If this field is not set then name of target facet is used.

    • ml_workflow_id: Identififer of machine learning workflow. Could be None in “adhoc” recommendation methods (e.g conprob, composition) which do not need machine learning training.

  • start – Zero based starting point.

  • next_params – Parameter that were sent with the previous response as next_params.

  • highlight – Dictionary containing highlight information. Keys are: query (boolean) if True the response will contain highlight information. smartfilters (list): List of Smart Filter names to be highlighted.

Returns

List of items with facets satisfied input

Example:

data = {
    "input_features": {
        "Job": ["Head of Sales", "Head of Marketing"],
        "City": ["Zurich", "London"],
        "Salary": [{
            "gte": 80000,
            "lte": 120000
        }]
    },
    "filter_query": "$item_created_at>=2018-03-20T00:00:00",
    "target_feature": "Person_Id",
    "target_value": "Squirro",
    "filter_features": {
        "Job": ["Head of Sales"]
    },
    "ml_workflow_id": None
}

>>> client.recommend_facets_explore(
...     project_id='2aEVClLRRA-vCCIvnuEAvQ',
...     method='conprob', data=data, count=10)
recommend_facets_methods(project_id)

Returns the available facet recommendation methods.

Parameters

project_id – Project identifier.

recommendation_methods(project_id)

Returns the available recommendation methods.

Parameters

project_id – Project identifier.

scan(project_id, query=None, scroll='5m', preserve_order=False, count=1000, fields=None, highlight=None, created_before=None, created_after=None, options=None, encrypted_query=None, child_count=100)

Returns an iterator to scan through all items of a project.

Note: For smartfilter queries this still returns at maximum 10000 results.

Parameters
  • project_id – The id of the project you want to scan

  • query – An optional query string to limit the items to a matching subset.

  • scroll – A time to use as window to keep the search context active in Elasticsearch. See https://www.elastic.co/guide/en/elasticsearch /reference/current/search-request-scroll.html for more details.

  • preserve_order – This will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution

  • count – The number of results fetched per batch. You only need to adjust this if you e.g. have very big documents. The maximum value that can be set is 10’000.

  • fields – Fields to return

  • highlight – Dictionary containing highlight information. Keys are: query (boolean) if True the response will contain highlight information. smartfilters (list): List of Smart Filter names to be highlighted.

  • created_before – Restrict result set to items created before created_before.

  • created_after – Restrict result set to items created after created_after.

  • options – Dictionary of options that influence the result-set. Valid options are: abstract_size to set the length of the returned abstract in number of characters. Defaults to the configured default_abstract_size (500).

  • child_count – Maximum number of matching entities to return with items. The maximum value that can be set is 100.

Returns

An iterator over all (matching) items.

Open issues/current limitations:
  • ensure this works for encrypted queries too.

  • support fold_near_duplicate option

  • support smart filter queries with more than 10k results

Example:

>>> for item in client.scan(project_id='Sz7LLLbyTzy_SddblwIxaA', query='country:CH AND plants',
...                         count=500, scroll='1m', preserve_order=True):
        # process matched item