Question Answering#

Profiles: Project Creator, Search User

This page provides an overview of the Question Answering (QA) feature in Squirro.

Project creators can configure the feature and choose where it’s displayed. Search users can use the feature to discover answers to queries typed as questions.

Reference: For project creator configuration instructions, see How To Use The Question Answering Feature.

Introduction#

Question Answering is a feature that allows users to discover direct answers to questions by phrasing their search query in the form of a question.

In the Cognitive Search template, it can be accessed by clicking the dialogue icon (highlighted in orange) as shown in the example below:

Question Answering

In this example, asking the meaning of FDA and clicking the QA icon provides a result that shows the meaning of FDA is Food and Drug Administration.

Clicking the answer will open the document in the viewer with the passage used to provide the answer highlighted.

Tip: You only need to phrase your query in the form of a question, you do not need to add a question mark (?) at the end of your query.

Note

There will not always be an answer available for your query. The widget’s ability to provide an answer depends both on the question phrasing itself and available information in your project data sources.

Background#

Question Answering is a general-purpose task that consists of answering a question by extracting the answer as a span from a contextual text that is supplied together with the question, as in the following example:

Question: "What color is it?"
Context: "The tomato is red."

Answer: "red"

If the answer cannot be found as part of the context, the question is deemed unanswerable and the answer is an empty string, as in the example:

Question: "What is his nickname?"
Context: "The tomato is red."

Answer: "" (unanswerable)

However, QA is not a closed-book Question Answering model, i.e. it always requires a context that is supplied along with the question that possibly contains the answer for the question which is extractable from the text as a span of words.

Furthermore, QA is not a generative Question Answering model, i.e. it can only answer the question verbatim from the context.

Frequently Asked Questions#

Is it necessary to fine-tune a QA model for a specific use-case?#

In general, this is not required. If the questions and context texts are formulated in natural (English) language, the model should be able to predict the answer from the context or say “unanswerable” if the context does not contain an extractable answer.

Only if the target distribution is very different from the training distribution might fine-tuning of a new model be necessary, e.g. if the questions and context are formulated in a different language or if the English terms used are radically different from what might be expected from a natural English speaker.

How many samples are needed for fine-tuning a QA model?#

It is not possible to give a definitive answer to this question, as it depends on many nonlinearly related factors.

However, SQuAD2.0, for example, the de facto standard data set for the training and evaluation of QA models, can serve as a guide.

It contains a total of about 150,000 pairs of questions and corresponding context.

Of these 150,000 samples, about 50,000 questions are unanswerable by the supplied context and require the model to generate no answer predictions.

The dataset is split into a training set, a validation set, and a test set containing 129,941, 6,078, and 5,915 sample pairs, respectively.

Fine-tuning a model to achieve comparable performance to models trained on the SQuAD2.0 task will likely require constructing a dataset with similar properties.

Default Configuration#

The QA feature can be configured via the Squirro Project Configuration.

Project Configuration topic.search.qa-configuration

{
    "min_score": 0.1,
    "use_title_as_fallback": true,
    "num_answers": 3,
    "abstract_size": 5000,
    "lookup_answer_in_top_items": 3,
    "lookup_answer_in_top_abstracts": 3,
    "order_answers_by": ["es_score", "qa_score"],
    "perform_on_languages": ["en"],
    "consider_query_type": false,
    "min_query_length": 3
}

Configuration Reference#

min_score

Specifies a minimum score returned by the model to consider the answer to be relevant. Threshold can be disabled by setting min_score to 0.

Type: number (float)

use_title_as_fallback

Specifies whether to search for answer in the title if item abstract is not provided.

Type: boolean

num_answers

Specifies the number of returned answers.

Type: number (int)

abstract_size

Specifies an item abstract size that is used to search for answer.

Type: number (int)

lookup_answer_in_top_items

Specifies a number of items that are used to search for answer.

Type: number (int)

lookup_answer_in_top_abstracts

Specifies a number of abstracts in the item that are used to search for answer.

Type: number (int)

order_answers_by

Specifies what scores are used to order answers. If more than one value is given, the ordering is performed on multiple scores. Possible values are: es_score, qa_score.

Type: list (of strings)

perform_on_languages

Specifies the list of languages in which the question should be asked to search for an answer. To find an answer regardless of the language of the question, set this option to null. By specifying an empty array, search for answers is not performed for any language (QA is disabled).

Type: list (of strings)

consider_query_type

Specifies whether the type of user query is taken into account. Only natural language questions are considered if this option is set to true. Otherwise the query classification is bypassed and the query is used for question answering, regardless of the predicted question type.

Type: boolean

min_query_length

Specifies the minimum number of words in a query to be considered for question answering. This rule is bypassed if the query ends with a question mark symbol (?).

Type: number (int)