TopicApiBaseMixin#
- class TopicApiBaseMixin#
Bases:
object
Methods Summary
assert_permission
([project_id, ...])Ensure the user has the right permissions on the project.
delete_item
(project_id, item_id)Deletes an item.
delete_pipelet
(name)Delete a pipelet.
get_encrypted_query
(project_id[, query, ...])Encrypts and signs the query and returns it.
get_item
(project_id, item_id, **kwargs)Returns the requested item for the provided project.
get_items
(project_id, **kwargs)Returns items for the provided project.
get_pipelet
(name)Return details for one pipelet.
Return all available pipelets.
Return all projects.
get_typeahead_suggestions
(project_id, ...[, ...])Get the typeahead suggestions for a query searchbar_query in the project identified by the id project_id.
Get current squirro version and build number.
modify_item
(project_id, item_id[, star, ...])Updates the flags, entities, and/or keywords of an item.
modify_items
(project_id, items[, ...])Updates the flags and/or keywords of a list of items.
multi_query
(project_id, queries[, rank_method])Perform multiple queries and combine the results based on provided parameters.
query
(project_id[, query, query_context, ...])Returns items for the provided project.
query_entities
(project_id[, query, fields, ...])Query entity and return aggregations of some entity fields
query_for_you
(project_id[, query, ...])Wraps the query endpoint and adds an additional user-behaviour centric view on the data.
recommend
(project_id[, item_id, ...])Returns recommended items for the provided ids or text.
recommend_entities
(project_id[, method, ...])Recommend entity property based on input entity properties
recommend_entities_explore
(project_id[, ...])Explore results of entity recommendation
recommend_entities_methods
(project_id)Returns the available entity recommendation methods.
recommend_facets
(project_id[, method, ...])Recommend facet value based on input facets
recommend_facets_explore
(project_id[, ...])Explore results of facet recommendation
recommend_facets_methods
(project_id)Returns the available facet recommendation methods.
recommendation_methods
(project_id)Returns the available recommendation methods.
scan
(project_id[, query, scroll, ...])Returns an iterator to scan through all items of a project.
Methods Documentation
- assert_permission(project_id=None, user_permissions=None, project_permissions=None)#
Ensure the user has the right permissions on the project.
- Parameters:
project_id – Project identifier.
user_permissions – User permissions required.
project_permissions – Project permissions required.
- Returns:
True if the permissions are met.
Example:
>>> client.assert_permissions('2aEVClLRRA-vCCIvnuEAvQ', user_permissions='admin')
Or with multiple permissions (at least one permission needs to match):
>>> client.assert_permissions('2aEVClLRRA-vCCIvnuEAvQ', project_permissions=['items.read', 'project.read'])
- delete_item(project_id, item_id)#
Deletes an item.
- Parameters:
project_id – Project identifier.
item_id – Item identifier.
Example:
>>> client.delete_item( ... '2aEVClLRRA-vCCIvnuEAvQ', 'haG6fhr9RLCm7ZKz1Meouw')
- delete_pipelet(name)#
Delete a pipelet.
This will break existing enrichments if they still make use of this pipelet.
Example:
>>> client.delete_pipelet('textrazor')
- get_encrypted_query(project_id, query=None, aggregations=None, fields=None, created_before=None, created_after=None, options=None, **kwargs)#
Encrypts and signs the query and returns it. If set the aggregations, created_before, created_after, fields and options are part of the encrypted query as well.
- Parameters:
project_id – Project identifier.
query – query to encrypt.
For additional parameters see self.query().
- Returns:
A dictionary which contains the encrypted query
Example:
>>> client.get_encrypted_query( '2aEVClLRRA-vCCIvnuEAvQ', query='test_query') {'encrypted_query': 'YR4h147YAldsARmTmIrOcJqpuntiJULXPV3ZrX_' 'blVWvbCavvESTw4Jis6sTgGC9a1LhrLd9Nq-77CNX2eeieMEDnPFPRqlPGO8V' 'e2rlwuKuVQJGQx3-F_-eFqF-CE-uoA6yoXoPyYqh71syalWFfc-tuvp0a7c6e' 'eKAO6hoxwNbZlb9y9pha0X084JdI-_l6hew9XKZTXLjT95Pt42vmoU_t6vh_w1' 'hXdgUZMYe81LyudvhoVZ6zr2tzuvZuMoYtP8iMcVL_Z0XlEBAaMWAyM5hk_tAG' '7AbqGejZfUrDN3TJqdrmHUeeknpxpMp8nLTnbFMuHVwnj2hSmoxD-2r7BYbolJ' 'iRFZuTqrpVi0='}
- get_item(project_id, item_id, **kwargs)#
Returns the requested item for the provided project.
- Parameters:
project_id – Project identifier.
item_id – Item identifier.
kwargs – Query parameters. All keyword arguments are passed on verbatim to the API. See the [[Items#Get Item|Get Item]] resource for all possible parameters.
- Keyword Arguments:
- highlight_query (
Union[str,dict]
) – Can be a single query-string using squirro syntax. OR a dictionary containing more metadata, currently supported keys are: highlight_query.query: squirro query syntax highlight_query.like: long piece of text to perform concept search on
- highlight_query (
- Returns:
A dictionary which contains the individual item.
Example:
>>> client.get_item( ... '2aEVClLRRA-vCCIvnuEAvQ', 'haG6fhr9RLCm7ZKz1Meouw') {'item': {'created_at': '2012-10-06T08:27:58', 'id': 'haG6fhr9RLCm7ZKz1Meouw', 'link': 'https://www.youtube.com/watch?v=Zzvhu42dWAc', 'read': True, 'item_score': 0.5, 'score': 0.56, 'sources': [{'id': 'oMNOQ-3rQo21q3UmaiaLHw', 'title': 'Uploads by mymemonic', 'photo': '<path_to_access_datasource_picture>'}, {'id': 'H4nd0CasQQe_PMNDM0DnNA', 'title': 'Squirro Alerts for "memonic"', 'photo': '<path_to_access_datasource_picture>'} ], 'starred': False, 'thumbler_url': '[long url]...jpg', 'title': 'Web Clipping - made easy with Memonic', 'webshot_height': 360, 'webshot_url': 'http://webshot.trunk....jpg', 'webshot_width': 480}}
- get_items(project_id, **kwargs)#
Returns items for the provided project.
DEPRECATED. The query method is more powerful.
- Parameters:
project_id – Project identifier.
kwargs – Query parameters. All keyword arguments are passed on verbatim to the API. See the [[Items#List Items|List Items]] resource for all possible parameters.
- Returns:
A dictionary which contains the items for the project.
Example:
>>> client.get_items('2aEVClLRRA-vCCIvnuEAvQ', count=1) {'count': 1, 'eof': False, 'items': [{'created_at': '2012-10-06T08:27:58', 'id': 'haG6fhr9RLCm7ZKz1Meouw', 'link': 'https://www.youtube.com/...', 'read': True, 'item_score': 0.5, 'score': 0.56, 'sources': [{'id': 'oMNOQ-3rQo21q3UmaiaLHw', 'link': 'https://gdata.youtube...', 'provider': 'feed', 'title': 'Uploads by mymemonic'}, {'id': 'H4nd0CasQQe_PMNDM0DnNA', 'link': None, 'provider': 'savedsearch', 'title': 'Squirro Alerts for "mmonic"' }], 'starred': False, 'thumbler_url': '[long url]...jpg', 'title': 'Web Clipping - made easy with Memonic', 'webshot_height': 360, 'webshot_url': 'http://webshot.trunk....jpg', 'webshot_width': 480}], 'now': '2012-10-11T14:39:54'}
- get_pipelet(name)#
Return details for one pipelet.
- returns:
A dictionary with pipelet details.
Example:
>>> client.get_pipelet('textrazor') {'description': 'Entity extraction with `TextRazor`.', 'description_html': '<p>Entity extraction with '<code>TextRazor</code>.</p>', 'id': 'tenant01/textrazor', 'name': 'textrazor', 'source': 'from squirro.sdk.pipelet import PipeletV1
…’}
- get_pipelets()#
Return all available pipelets.
These pipelets can be used for enrichments of type pipelet.
- Returns:
A dictionary where the value for pipelets is a list of pipelets.
Example:
>>> client.get_pipelets() {'pipelets': [{'id': 'tenant01/textrazor', 'name': 'textrazor'}]}
- get_projects()#
Return all projects.
- get_typeahead_suggestions(project_id, searchbar_query, cursor_pos, max_suggestions=None, options=None, filter_query=None)#
Get the typeahead suggestions for a query searchbar_query in the project identified by the id project_id.
- Parameters:
project_id – Project identifier from which the typeahead suggestions should be returned.
searchbar_query – The full query that goes into a searchbar. The searchbar_query will automatically be parsed and the suggestion on the field defined by the cursor_pos and filtered by the rest of the query will be returned. searchbar_query can not be None.
cursor_pos – The position in the searchbar_query on which the typeahead is needed. cursor_pos parameter follow a 0-index convention, i.e. the first position in the searchbar-query is 0. cursor_pos should be a positive integer.
max_suggestions – Maximum number of typeahead suggestions to be returned. max_suggestions should be a non-negative integer.
options –
Dictionary of options that influence the result-set. Valid options are:
template_params dict containing the query template parameters
select_suggesters (optional) list containing suggesters to be used.
select_facets (optional) list containing facets to be used. Skip this setting or use all to match on all visible facets.
If no suggesters are selected, all suggesters are executed by default. Available suggesters:
”facet_value”: Complete on facet-values that have “autocomplete” enabled (Exact Search)
”facet_value_lenient”: Complete on facet-values that are “analyzed” and “visible” (Lenient exact search, ignores order. All tokens within the searchbar_query are matched as prefixes)
”facet_name: Help to find the correct facet
”content_phrase”: Complete on key-phrases (Fuzzy Search) added via NLP-Tagger
”saved_search”: Complete on queries saved by the user
”popular_query”: Complete on popular queries on the project, filtered by user_id (optional)
”search_history”: Triggered only to pre-populate and show default-suggestions.
”title”: Complete on document title Returns last-N searches, filtered by project_id and user_id.
”collection”: Complete on user collection name
filter_query – Squirro query to limit the typeahead suggestions. Must be of type string. Defaults to None if not specified. As an example, this parameter can be used to filter the typeahead suggestions by a dashboard query on a Squirro dashboard.
- Returns:
A dict of suggestions
Example:
# Default usage >>> client.get_typeahead_suggestions(project_id='Sz7LLLbyTzy_SddblwIxaA', searchbar_query='Country:India c', cursor_pos=15) {'suggestions': [ {'type': 'facetvalue', 'key': 'Country:India City:Calcutta', 'value': 'city:Calcutta', 'score': 12, 'cursor_pos': 26, 'group': 'country'}, {'type': 'facetvalue', 'key': 'Country:India Name:Caesar', 'value': 'name:Caesar', 'score': 8, 'cursor_pos': 24, 'group': 'country'}, {'type': 'facetname', 'key': 'Country:India city:', 'value': 'City', 'score': 6, 'cursor_pos': 19, 'group': 'Fields'} ]}
Example:
# Autocomplete on one specific suggester like `content_phrase` # - The "content_phrase" suggester completes on key-phrases added by # `NLP-Tagger <https://squirro.atlassian.net/wiki/spaces/DOC/pages/2396061784/Content-based+Typeahead>`__ # - Autocompletion supports in-order phrase & fuzzy string matching # - Highlighted tokens are returned as <b> html tags >>> client.get_typeahead_suggestions(project_id=project_id, searchbar_query="pla", cursor_pos=0, # cursor_pos not used for this suggester, but parameter is required on API level options={"select_suggesters": ["content_phrase"]}) {"suggestions":[ { "type":"text", "key":" "plant material"", "value":"<b>plant</b> material", "group_name":"By Content", }, { "type":"text", "key":" "plant proteins"", "value":"<b>plant</b> proteins", "group_name":"By Content", }] }
- get_version()#
Get current squirro version and build number.
- Returns:
Dictionary contains ‘version’, ‘build’ and ‘components’. ‘components’ is used for numeric comparison.
Example:
>>> client.get_version() { "version": "2.4.5", "build": "2874" "components": [2, 4, 5] }
- modify_item(project_id, item_id, star=None, read=None, keywords=None, entities=None, force_cache_clear=None, title=None, generated_summary=None)#
Updates the flags, entities, and/or keywords of an item.
You can only update star, read, and keywords. The new values will overwrite all old values.
- Parameters:
project_id – Project identifier.
item_id – Item identifier.
star – Starred flag for the item, either True or False.
read – Read flag for the item, either True or False.
keywords – Updates to the keywords of the item.
entities – Updates to the entities of the item.
force_cache_clear – Deprecated. This is the default behavior now. Force all relevant caches to be cleared
Example:
>>> client.modify_item( ... '2aEVClLRRA-vCCIvnuEAvQ', 'haG6fhr9RLCm7ZKz1Meouw', ... star=True, ... read=False, ... entities=[], ... keywords={'Canton': ['Zurich'], 'Topic': None, ... 'sports': [{'hockey', 0.9}, {'baseball', 0.1}]
- modify_items(project_id, items, batch_size=10000, force_cache_clear=None)#
Updates the flags and/or keywords of a list of items.
You can only update star, read, and keywords. The new values will overwrite all old values.
- Parameters:
project_id – Project identifier.
items – List of items.
batch_size – An optional batch size (defaults to MAX_UPDATE_COUNT)
force_cache_clear – Deprecated. This is the default behavior now. Force all relevant caches to be cleared
Example:
>>> client.modify_items( ... '2aEVClLRRA-vCCIvnuEAvQ', [ ... { ... 'id': 'haG6fhr9RLCm7ZKz1Meouw', ... 'star': True, ... 'read': False, ... 'keywords': {'Canton': ['Berne'], 'Topic': None, ... 'sports': [{'hockey': 0.3}, ... {'baseball': 0.5}] ... }, ... { ... 'id': 'masnnawefna9MMf3lk', ... 'star': False, ... 'read': True, ... 'keywords': {'Canton': ['Zurich'], 'Topic': None, ... 'sports': [{'hockey': 0.9}, ... {'baseball': 0.1}] ... }], ... batch_size=1000 ... )
- multi_query(project_id, queries, rank_method='rrf')#
Perform multiple queries and combine the results based on provided parameters.
Warning
This is an experimental feature and can change/be removed between releases without notice.
- Parameters:
- Returns:
List of combined items from multiple queries.
Example:
>>> client.multi_query( ... project_id="DSuNrcnlSc6x5SJZh02IyQ", ... queries=[ ... {"query": "food"}, ... {"query": "food or climate"}, ... ], ... rank_method="rrf", ... ) { "results": [ [ { "id": "z9CNx-NMK5ZwCGpg5FrqUw", "title": "Food allergies", }, { "id": "z9CNx-NMK5ZwCGpg5FrqUw", "title": "Food allergies", }, ], [ None, { "id": "DSuNrcnlSc6x5SJZh02IyQ", "title": "Climate change", }, ], ] }
- query(project_id, query=None, query_context=None, aggregations=None, start=None, count=None, fields=None, highlight=None, next_params=None, created_before=None, created_after=None, options=None, encrypted_query=None, child_count=None, timing=False, explain=False, profile=False, **kwargs)#
Returns items for the provided project.
This is the successor to the get_items method and should be used in its place.
For information on the item-structure see Item-Format reference
- Parameters:
project_id (
str
) – Project identifier.query (
Optional
[str
]) – Optional query to run. If provided alongside query_context, will throw an error.query_context (
Optional
[Dict
[str
,Any
]]) –Dictionary with more context of the user’s input / intent:
searchbar_query : Terms that user typed in a searchbar.
dashboard_filters : Additional filter-queries provided via dashboard or widget configuration.
community_query : Selected community in the Community 360 dashboard.
- likeAdditional input to perform approximate search on.
For now like is considered to be a long string (e.g. paragraphs)
parsed : The parsed, analysed and enriched representation of the searchbar_query (response of the configured query-processing workflow)
start – Zero based starting point.
count – Maximum number of items to return.
child_count – Maximum number of entities to return with items.
highlight – Dictionary containing highlight information. Keys are: query (boolean) if True the response will contain highlight information. smartfilters (list): List of Smart Filter names to be highlighted.
spellcheck – If True check the spelling of the provided query.
options –
Dictionary of options that influence the result-set. Valid options are:
fold_near_duplicates to fold near-duplicates together and filter them out of the result-stream. Defaults to False.
abstract_size to set the length of the returned abstract in number of characters. Defaults to the configured default_abstract_size (500).
update_cache if False the result won’t be cached. Used for non-interactive queries that iterate over a large number of items. Defaults to True.
- response_format: Format of the response. Valid options are:
document: The response has a document-based structure.
paragraph: The response is structured as a collection of individual paragraphs.
- search_scope: Scope of the search. Valid options are:
document: Search is performed on the standard documents.
paragraph: Search is conducted using paragraphs.
encrypted_query – Optional Encrypted query returned by get_encrypted_query method. This parameter overrides the query parameter and query_template_params (as part of options parameter), if provided. Returns a 403 if the encrypted query is expired or has been altered with.
next_params – Parameter that were sent with the previous response as next_params.
created_before – Restrict result set to items created before created_before.
created_after – Restrict result set to items created after created_after.
timing (
bool
) – Boolean, specifies if detailed execution time profiling is reported (True). Disabled by default (False). The reported timing profile consists of a call stack with important subroutine execution times measured in milliseconds [ms]. If enabled, the report is returned with key timing_report.explain (
bool
) – If set to True, add explanation of the search to the result.profile (
bool
) – This will run Elastic Search Profiler on the executed query. This will return an in-depth report on the actual internal Elasticsearch operations and their timings. Disabled by default.kwargs – Additional query parameters. All keyword arguments are passed on verbatim to the API.
Examples on how to query items
Search for documents and return specific fields:
# Possible field values: ["title","body","abstract","keywords","starred","read", # "item_id","external_id","created_at","modified_at"] >>> client.query( project_id="DSuNrcnlSc6x5SJZh02IyQ", query="virus", count=1, fields=["title"]) {'count': 1, 'items': [{'id': '7rjxIjg_gPjrfjTk3dsTTA', 'title': "FDA Adviser: Vaccine To Be OK'd In Days, But 'Normal' May Not Return Until Next Fall", 'sources': [{ 'id': '4fp-1YiASwS-kfNEXYus_g', 'title': 'News Source', 'photo': '<path_to_access_datasource_picture>' }] }], 'now': '2022-03-21T16:53:52', 'eof': False, 'total': 2254, 'next_params': {'expected_num_results': 2254, 'start': 1}, 'query_executed': 'virus', 'from_cache': False, 'time_ms': 221}
Detect spelling errors and provide refined search terms:
# Example query `vimus` matches zero items in the index: # --> Suggest similar terms from the index --> return term `virus` to be used for query refinement >>> client.query( project_id="DSuNrcnlSc6x5SJZh02IyQ", query="vimus", count=2, spellcheck=True) {'count': 0, 'items': [], 'now': '2022-03-21T16:55:38', 'eof': True, 'total': 0, 'next_params': {}, 'query_executed': 'vimus', 'from_cache': False, 'spellcheck': [ {'original': 'vimus', 'corrected': 'virus'} ], 'time_ms': 226}
- query_entities(project_id, query=None, fields=None, aggregations=None, start=None, count=None, **kwargs)#
Query entity and return aggregations of some entity fields
- Parameters:
project_id – Project identifier.
count – number of return entities
start – zero based starting point of return entities
fields – List of fields to return
query – query to match entity. Use item query syntax, e.g entity:{type:career}
aggregations – Aggregation of entity fields. For numeric property you need to add prefix numeric_ to field name, e.g. numeric_properties.salary. We support 2 methods of aggregations: “terms” and “stats” (for numeric properties). Default method is “terms” aggregation.
- Returns:
List of entities and aggregations
Example:
aggregations = { "city": { "fields": "properties.city", "size": 3 }, "salary": { "fields": "numeric_properties.salary", "method": "stats" }, "job": { "fields": "properties.job", "size": 3 }, } >>> client.query_entities(project_id='2aEVClLRRA-vCCIvnuEAvQ', ... query='entity:{properties.name:And*}', count=3, ... aggregations=aggregations) response = { "count": 3, "entities": [ { "confidence": 0.8, "name": "Andrea Warren", "external_id": "entity_288", "extracts": [ { ... } ], "properties": { "person_id": "id_andrea warren", "city": "Cooperville", "job": "Tax inspector", "name": "Andrea Warren", "salary": 511937 }, "item_id": "-xkKQf2SBlS-ZRkIfw4Suw", "relevance": 0.8, "child_id": "wQ_atc8Nuk4eqj_xSugMOg", "type": "career", "id": "entity_288" }, ... ], "total": 1213, "aggregations": { "salary": { "stats": { "count": 969, "max": 998787.0, "sum": 490231470.0, "avg": 505914.8297213622, "min": 130.0 } }, "job": { "values": [ { "key": "Conservation officer, nature", "value": 6 }, { "key": "Geneticist, molecular", "value": 6 }, { "key": "Immigration officer", "value": 6 } ] }, ... }, "time_ms": 62 }
- query_for_you(project_id, query=None, query_context=None, aggregations=None, start=None, count=None, fields=None, highlight=None, next_params=None, created_before=None, created_after=None, options=None, encrypted_query=None, child_count=None, timing=False, **kwargs)#
Wraps the query endpoint and adds an additional user-behaviour centric view on the data.
- It allows to easily:
search only within last read items by the user
search only within top trending items on the project (tbd)
- Parameters:
kwargs – The endpoint supports the same arguments as the query endpoint (except pagination).
- Returns:
Matching query items annoted with item_read metadata & sorted DESC by user-read-time:
item_read.last_read_by_user : timestamp of last item.read event (by user)
item_read.read_count_by_user : amount of item.read events (by user)
Example:
>>> client.query_for_you(project_id=f"{project_id}", count=1, fields=["title"]) { 'count': 1, 'items': [ {'id': 'z9CNx-NMK5ZwCGpg5FrqUw', 'title': 'Climate change accelerates', 'sources': [ { 'id': 'Y2zh9RFbTVqG2hBmuoATNQ', 'title': 'CSV', 'photo': '<path_to_access_datasource_picture>' } ], 'item_read': { 'last_read_by_user': '2022-07-04T15:58:24', 'read_count_by_user': 3 }, 'read': True }]}
- recommend(project_id, item_id=None, external_id=None, text=None, method=None, related_fields=None, count=10, fields=None, created_before=None, options=None, created_after=None, query=None, aggregations=None, method_params=None, **kwargs)#
Returns recommended items for the provided ids or text.
- Parameters:
project_id – Project identifier.
item_id – ID of item used for recommendation (optional).
external_id – External ID of item used for recommendation if item_id is not provided (optional)
text – Text content used for recommendation if neither item_id nor external_id are not provided (optional)
method – Recommendation method (optional).
method_params – Dictionary of method parameters used for recommendations (optional).
related_fields – Fields used to find relationship for between items for recommendation. If this param is not set, we use the title and the body of the item.
count – Maximum number of items to return.
fields – Fields to return.
options –
Dictionary of options that influence the result-set. Valid options are:
fold_near_duplicates to fold near-duplicates together and filter them out of the result-stream. Defaults to False.
created_before – Restrict result set to items created before created_before.
created_after – Restrict result set to items created after created_after.
query – Search query to restrict the recommendation set.
aggregations – Aggregation of faceted fields
- recommend_entities(project_id, method=None, count=10, explanation_count=1, data=None)#
Recommend entity property based on input entity properties
- Parameters:
project_id – Project identifier.
method –
Method of recommendation. Possible values:
conprob: use conditional probability for scoring
composition: use sum of individual feature scores for scoring
ml_classification: use squirro machine learning service with classifcation workflow
ml_regression_aggregation: use squirro machine learning service with regression aggregation workflow
count – number of return recommendations
explanation_count – number of return explanations for each recommendations, explanations are sorted by score, default is 1
data –
input data, json object containing flowing fields:
input_features: dictionary of input entity properties. Each feature is a property name and list of values. Accept range of values, using elasticsearch range query syntax.
entity_type: type of entity to filter data for recommendation.
filter_query: query to filter data set for recommendations, adhere squirro query syntax (optional)
target_feature: name of target property
return_features: list of return properties in recommendation. If this field is not set then name of target property is used.
ml_workflow_id: Identififer of machine learning workflow. Could be None in “adhoc” recommendation methods (e.g conprob, composition) which do not need machine learning training.
- Returns:
Recommendation response
Example:
data = { "input_features": { "job": ["Head of Sales", "Head of Marketing"], "city": ["Zurich", "London"], "salary": [{ "gte": 80000, "lte": 120000 }] }, "filter_query": "$item_created_at>=2018-03-20T00:00:00", "target_feature": "person_id", "return_features": ["name"], "ml_workflow_id": None, "entity_type": "career" } >>> client.recommend_entities( ... project_id='2aEVClLRRA-vCCIvnuEAvQ', ... method='conprob', data=data, count=3) response = { "count": 3, "time_ms": 79, "params": {...}, "total": 989, "method": { "last_updated": null, "name": "conprob", "ml_workflow_id": null }, "recommendations": [{ "target_feature": "person_id", "score": 1.0, "explanations": [ { "score": 0.7713846764962218, "feature": "city", "value": "Zurich" }, { "score": 0.7461064995415513, "feature": "job", "value": "Head of Sales" }, { "score": 0.7289157048296231, "feature": "salary", "value": { "gte": 80000, "lte": 100000 } } ], "return_features": { "name": "Amber Duncan" }, "target_value": "person_1234"}, ... ] }
- recommend_entities_explore(project_id, method=None, count=10, data=None, start=None, next_params=None, highlight=None)#
Explore results of entity recommendation
- Parameters:
project_id – Project identifier.
method –
Method of recommendation. Possible values:
conprob: use conditional probability for scoring
composition: use sum of individual feature scores for scoring
count – number of return entities
data –
input data, json object containing flowing fields:
input_features: dictionary of input entity properties. Each feature is a property name and list of values. Accept range of values, using elasticsearch range query syntax.
entity_type: type of entity to filter data for recommendation.
filter_query: query to filter data set for recommendations, adhere squirro query syntax (optional)
target_feature: name of target property
target_value: value of target property
filter_features: dictionary of entity properties used for filtering entities. Similar format as input_features
ml_workflow_id: Identififer of machine learning workflow. Could be None in “adhoc” recommendation methods (e.g conprob, composition) which do not need machine learning training.
start – Zero based starting point.
next_params – Parameter that were sent with the previous response as next_params.
highlight – Dictionary containing highlight information. Keys are: query (boolean) if True the response will contain highlight information. smartfilters (list): List of Smart Filter names to be highlighted.
- Returns:
List of items and entities satisfied input
Example:
data = { "input_features": { "job": ["Head of Sales", "Head of Marketing"], "city": ["Zurich", "London"], "salary": [{ "gte": 80000, "lte": 120000 }] }, "filter_query": "$item_created_at>=2018-03-20T00:00:00", "target_feature": "person_id", "target_value": "a_squirro_employee", "filter_features": { "job": ["Head of Sales"] }, "ml_workflow_id": None, "entity_type": "career" } >>> client.recommend_entities_explore( ... project_id='2aEVClLRRA-vCCIvnuEAvQ', ... method='conprob', data=data, count=10)
- recommend_entities_methods(project_id)#
Returns the available entity recommendation methods.
- Parameters:
project_id – Project identifier.
- recommend_facets(project_id, method=None, count=10, explanation_count=1, data=None)#
Recommend facet value based on input facets
- Parameters:
project_id – Project identifier.
method –
Method of recommendation. Possible values:
conprob: use conditional probability for scoring
composition: use sum of individual feature scores for scoring
ml_classification: use squirro machine learning service with classifcation workflow
ml_regression_aggregation: use squirro machine learning service with regression aggregation workflow
count – number of return recommendations
explanation_count – number of return explanations for each recommendations, explanations are sorted by score, default is 1
data –
input data, json object containing flowing fields:
input_features: dictionary of input facets. Each feature is a facet name and list of values. Accept range of values, using elasticsearch range query syntax.
filter_query: query to filter data set for recommendations, adhere squirro query syntax (optional)
target_feature: name of target facet
return_features: list of return facets in recommendation. If this field is not set then name of target facet is used.
ml_workflow_id: Identififer of machine learning workflow. Could be None in “adhoc” recommendation methods (e.g conprob, composition) which do not need machine learning training.
- Returns:
Recommendation response
Example:
data = { "input_features": { "Job": ["Head of Sales", "Head of Marketing"], "City": ["Zurich", "London"], "Salary": [{ "gte": 80000, "lte": 120000 }] }, "filter_query": "$item_created_at>=2018-03-20T00:00:00", "target_feature": "Person_Id", "return_features": ["Name"], "ml_workflow_id": None } >>> client.recommend_facets( ... project_id='2aEVClLRRA-vCCIvnuEAvQ', ... method='conprob', data=data, count=3) response = { "count": 3, "time_ms": 79, "params": {...}, "total": 989, "method": { "last_updated": null, "name": "conprob", "ml_workflow_id": null }, "recommendations": [{ "target_feature": "Person_Id", "score": 1.0, "explanation": [ { "score": 0.7713846764962218, "feature": "City", "value": "Zurich" }, { "score": 0.7461064995415513, "feature": "Job", "value": "Head of Sales" }, { "score": 0.7289157048296231, "feature": "Salary", "value": { "gte": 80000, "lte": 100000 } } ], "return_features": { "Name": "Amber Duncan" }, "target_value": "1234"}, ... ] }
- recommend_facets_explore(project_id, method=None, count=10, data=None, start=None, next_params=None, highlight=None)#
Explore results of facet recommendation
- Parameters:
project_id – Project identifier.
method –
Method of recommendation. Possible values:
conprob: use conditional probability for scoring
composition: use sum of individual feature scores for scoring
ml_classification: use squirro machine learning service with classifcation workflow
ml_regression_aggregation: use squirro machine learning service with regression aggregation workflow
count – number of return recommendations
data –
input data, json object containing flowing fields:
input_features: dictionary of input facets. Each feature is a facet name and list of values. Accept range of values, using elasticsearch range query syntax.
filter_query: query to filter data set for recommendations, adhere squirro query syntax (optional)
target_feature: name of target facet
target_value: value of target facet
filter_features: dictionary of facets used to filter items. Similar format as input_features
return_features: list of return facets in recommendation. If this field is not set then name of target facet is used.
ml_workflow_id: Identififer of machine learning workflow. Could be None in “adhoc” recommendation methods (e.g conprob, composition) which do not need machine learning training.
start – Zero based starting point.
next_params – Parameter that were sent with the previous response as next_params.
highlight – Dictionary containing highlight information. Keys are: query (boolean) if True the response will contain highlight information. smartfilters (list): List of Smart Filter names to be highlighted.
- Returns:
List of items with facets satisfied input
Example:
data = { "input_features": { "Job": ["Head of Sales", "Head of Marketing"], "City": ["Zurich", "London"], "Salary": [{ "gte": 80000, "lte": 120000 }] }, "filter_query": "$item_created_at>=2018-03-20T00:00:00", "target_feature": "Person_Id", "target_value": "Squirro", "filter_features": { "Job": ["Head of Sales"] }, "ml_workflow_id": None } >>> client.recommend_facets_explore( ... project_id='2aEVClLRRA-vCCIvnuEAvQ', ... method='conprob', data=data, count=10)
- recommend_facets_methods(project_id)#
Returns the available facet recommendation methods.
- Parameters:
project_id – Project identifier.
- recommendation_methods(project_id)#
Returns the available recommendation methods.
- Parameters:
project_id – Project identifier.
- scan(project_id, query=None, scroll='1m', preserve_order=False, preserve_scroll_order=False, count=1000, fields=None, highlight=None, created_before=None, created_after=None, options=None, encrypted_query=None, child_count=100)#
Returns an iterator to scan through all items of a project.
Note: For smartfilter queries this still returns at maximum 10000 results.
- Parameters:
project_id – The id of the project you want to scan
query – An optional query string to limit the items to a matching subset.
scroll – A time to use as window to keep the search context active in Elasticsearch. See https://www.elastic.co/guide/en/elasticsearch /reference/current/search-request-scroll.html for more details.
preserve_order –
This will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution.
Deprecated since version 3.6.4: Use the preserve_scroll_order parameter.
preserve_scroll_order – This will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution.
count – The number of results fetched per batch. You only need to adjust this if you e.g. have very big documents. The maximum value that can be set is 10’000.
fields – Fields to return
highlight – Dictionary containing highlight information. Keys are: query (boolean) if True the response will contain highlight information. smartfilters (list): List of Smart Filter names to be highlighted.
created_before – Restrict result set to items created before created_before.
created_after – Restrict result set to items created after created_after.
options – Dictionary of options that influence the result-set. Valid options are: abstract_size to set the length of the returned abstract in number of characters. Defaults to the configured default_abstract_size (500).
child_count – Maximum number of matching entities to return with items. The maximum value that can be set is 100.
- Returns:
An iterator over all (matching) items.
- Open issues/current limitations:
ensure this works for encrypted queries too.
support fold_near_duplicate option
support smart filter queries with more than 10k results
Example:
>>> for item in client.scan(project_id='Sz7LLLbyTzy_SddblwIxaA', query='country:CH AND plants', ... count=500, scroll='1m', preserve_scroll_order=True): # process matched item