Question Answering (QA) is a type of information retrieval. Given a collection of documents (such as the World Wide Web) the system should be able to retrieve answers to questions posed in natural language. QA is regarded as requiring more complex natural language processing (NLP) techniques than other types of information retrieval such as document retrieval, and it is sometimes regarded as the next step beyond search engines.
Closed-domain question answering deals with questions under a specific domain (for example, medicine or automotive maintenance), and can be seen as an easier task because NLP systems can exploit domain-specific knowledge such as ontologies.
Open-domain question answering deals with questions about nearly everything can only rely on general ontologies. On the other hand, these systems have much more data available where from to extract the answer.
The first QA systems were developed in the 1960s and they were
basically natural-language interfaces to expert systems that were
tailored to specific domains. In contrast, current QA systems use text
documents as their underlying knowledge source and combine various
natural language processing techniques to search for the answers.
Current QA systems include a question classifier module that
determines the type of question and the type of answer. After the
question is analysed, the system typically uses several modules that
apply increasingly complex NLP techniques on a gradually reduced
amount of text. Thus, a document retrieval module uses search engines to identify the documents or paragraphs in the document set
that are likely to contain the answer. Subsequently a filter
preselects small text fragments that contains strings of the same type
as the expected answer. For example, if the question is "Who invented
Penicillin" the filter returns text that contain names of people.
Finally, an answer extraction module looks for further clues in
the text to determine if the answer candidate can indeed answer the
question.
Some systems use templates to find the final answer. If you posed the
question "What is a dog?", the system would detect the substring "What
is a X" and look for documents which start with "X is a Y".
Other systems use the result of web search as a
means to expand the amount of text available and therefore increase
the likelihood of finding the correct answer.
More sophisticated systems are capable of performing inference
(such as abduction) and exploiting world
knowledge. Architecture