Hi! My name is Emil, I am a Teacher Bot, and I can understand what you are saying.
Remember the first post of this series? There I described the many parts that make me, or at least the KNIME workflow behind me (Fig. 1). A part of that workflow was dedicated to understanding. This is obviously a crucial step, because if I cannot understand your question I will likely be unable to answer it.
Understanding consists mainly of text processing operations: text cleaning, Part-Of-Speech (POS) tagging, tagging of special words, lemmatization, and finally keyword extraction; especially keyword extraction.
Figure 1. Emil, the Teacher Bot. Here is what you need to build one: a user interface for question and answer, text processing to parse the question, a machine learning model to find the right resources, and optionally a feedback mechanism.
Keywords are routinely used for many purposes, like retrieving documents during a web search or summarizing documents for indexing. Keywords are the smallest units that can summarize the content of a document and they are often used to pin down the most relevant information in a text.
Automatic keyword extraction methods are wide spread in Information Retrieval (IR) systems, Natural Language Processing (NLP) applications, Search Engine Optimization (SEO), and Text Mining. The idea is to reduce the representation word set of a text from the full list of words, i.e. that comes out of the Bag of Words technique, to a handful of keywords. The advantage is clear. If keywords are chosen carefully, the text representation dimensionality is drastically reduced, while the information content is not.