Question Answering#

Profiles: Project Creator, Search User

This page provides an overview of the Question Answering (QA) feature in Squirro.

Project creators can configure the feature and choose where it’s displayed. Search users can use the feature to discover answers to queries typed as questions.

Reference: For project creator configuration instructions, see How To Use The Question Answering Feature.

Introduction#

Question Answering is a feature that allows users to discover direct answers to questions by phrasing their search query in the form of a question.

In the Cognitive Search template, it can be accessed by clicking the dialogue icon (highlighted in orange) as shown in the example below:

Question Answering

In this example, asking the meaning of FDA and clicking the QA icon provides a result that shows the meaning of FDA is Food and Drug Administration.

Clicking the answer will open the document in the viewer with the passage used to provide the answer highlighted.

Tip: You only need to phrase your query in the form of a question, you do not need to add a question mark (?) at the end of your query.

Note

There will not always be an answer available for your query. The widget’s ability to provide an answer depends both on the question phrasing itself and available information in your project data sources.

Background#

Question Answering is a general-purpose task that consists of answering a question by extracting the answer as a span from a contextual text that is supplied together with the question, as in the following example:

Question: "What color is it?"
Context: "The tomato is red."

Answer: "red"

If the answer cannot be found as part of the context, the question is deemed unanswerable and the answer is an empty string, as in the example:

Question: "What is his nickname?"
Context: "The tomato is red."

Answer: "" (unanswerable)

However, QA is not a closed-book Question Answering model, i.e. it always requires a context that is supplied along with the question that possibly contains the answer for the question which is extractable from the text as a span of words.

Furthermore, QA is not a generative Question Answering model, i.e. it can only answer the question verbatim from the context.

Frequently Asked Questions#

Is it necessary to fine-tune a QA model for a specific use-case?#

In general, this is not required. If the questions and context texts are formulated in natural (English) language, the model should be able to predict the answer from the context or say “unanswerable” if the context does not contain an extractable answer.

Only if the target distribution is very different from the training distribution might fine-tuning of a new model be necessary, e.g. if the questions and context are formulated in a different language or if the English terms used are radically different from what might be expected from a natural English speaker.

How many samples are needed for fine-tuning a QA model?#

It is not possible to give a definitive answer to this question, as it depends on many nonlinearly related factors.

However, SQuAD2.0, for example, the de facto standard data set for the training and evaluation of QA models, can serve as a guide.

It contains a total of about 150,000 pairs of questions and corresponding context.

Of these 150,000 samples, about 50,000 questions are unanswerable by the supplied context and require the model to generate no answer predictions.

The dataset is split into a training set, a validation set, and a test set containing 129,941, 6,078, and 5,915 sample pairs, respectively.

Fine-tuning a model to achieve comparable performance to models trained on the SQuAD2.0 task will likely require constructing a dataset with similar properties.

Configuration#

The QA feature can be configured in Setup > Settings > Project Configuration.

Project Configuration topic.search.qa-configuration

Reference: Configuration Schema
pydantic model QAPluginConfig#

Question Answering configuration.

For more information about the Question Answering feature in Squirro see: https://docs.squirro.com/en/latest/technical/search/features/qa.html

Fields:
field abstract_size: int = 5000#

Item abstract size that is used to search for an answer.

field consider_query_type: bool = False#

Take the type of user query into account. Only natural language questions are considered if this option is selected. Otherwise, the query classification is bypassed, and the query is used for question answering, regardless of the predicted question type.

field drop_ratio: int = 0#

How low a score of answers is allowed to fall, compared to the answer with the highest score. If set to 0, the drop ratio is disabled.

field handle_impossible_answer: bool = True#

Handle questions that are impossible to answer.

field lookup_answer_in_top_abstracts: int = 3#

Number of abstracts in the item that are used to search for an answer.

field lookup_answer_in_top_items: int = 3#

Number of items that are used to search for an answer.

field min_query_length: int = 3#

Minimum number of words in a query to be considered for question answering. This rule is bypassed if the query ends with a question mark symbol (?).

field min_score: float = 0.1#

A minimum score returned by the model to consider an answer to be relevant. Threshold can be disabled by setting the option to 0.

field num_answers: int = 3#

Number of returned answers.

field order_answers_by: List[QAScoreField] = [QAScoreField.ES_SCORE, QAScoreField.QA_SCORE]#

What scores are used to order answers. If more than one value is given, the ordering is performed on multiple scores.

field perform_on_languages: List[str] = ['en']#

List of languages in which the question should be asked to search for an answer. To find the answer regardless of the language of the question, set this option to *. By leaving this option empty, search for answers is not performed for any language (QA is disabled).

field use_title_as_fallback: bool = True#

Search for an answer in the title if item abstract is not provided.