How We Built Emil the Teacher Bot

Hi! My name is Emil and I am a Teacher Bot.

I was built to answer your early questions on how to use KNIME. Pardon!

I was built to point you to the right training material to help you answer your early questions on how to use KNIME.

By the way, I was myself entirely built using KNIME. So, I should know where the right answers lie in the midst of all the tutorials, videos, blog posts, whitepapers, example workflows, and more, which are available out there.

It was not so hard to build me. You just needed:

a user interface - possibly web or speech based - for you to ask your question
a text parser for me to understand your question
a brain to find the right training material to answer your question
a user interface to provide the answer back
a nice to have - but not necessary - feedback option, on whether my answer was of any help.

Figure 1. Emil, the Teacher Bot. Here is what you need to build one: a user interface for question and answer, text processing to parse the question, a machine learning model to find the right resources, and optionally a feedback mechanism.

Translating these steps into Data Science terms and KNIME tools, you need:

a web page to ask the question
some text processing utilities
a trained machine learning model
a new web page to place the answer
and, optionally, a feedback logic somewhere in the flow.

Let's start the assembly line to build me.

Ask the Question: The Web UI

The KNIME WebPortal provides the web based user interface (UI), required for the question and answers.

For the question, a slick minimalist Google-like UI was adopted; not as much as a design choice, but rather due to time and technical competence constraints. Indeed, the web UI shows just an image of mine at the very top, followed by a simple greeting, and most importantly the space for your question.

Figure 2. Emil's web based UI, where to ask a KNIME related question. The question reported here refers to a database connection via JDBC driver.

This web page was obtained via a wrapped metanode, containing a Text Output node to display the logo, my portrait image, and the greetings; a String Input node to collect the question short summary; and a second String Input node to collect the extended text of the question.

Note. The two String Input nodes produce two textboxes with different size. The larger size derives from the enabled option “Multi-line” vs. “Single-line” in the configuration window.

Understand the Question: Text Processing

You have written the question and the question summary. I need to understand it now.

This part is handled by text processing, which includes general text cleaning - such as stop word filter, punctuation erasure, dictionary based tagging, and stemming - and a keyword extraction procedure. The keyword extraction procedure reduces your question to the most meaningful words and helps me understand you better. For keyword extraction, a chi square keyword extraction algorithm was chosen.

Find the Right Answer: The Brain

I am understanding what you are saying now. I need to find an appropriate answer for your question. This is where I need a more developed brain than just some word understanding capabilities.

My ultimate goal is to provide you with the one and only web tutorial that solves your problem. Well, this is hardly possible. Even though you are a beginner at KNIME, you often ask questions which include material from two or three or even four different tutorials. I think it is better to provide you with a list of possible helpful tutorials, rather than a single one.

Going further, I think it is best to identify the areas of expertise touched by your question and, within each area, to identify the most relevant tutorial resources. This is what my more developed brain should do: identify the areas of expertise and, within those, identify the list of the most relevant articles.

Thus, my brain must consist of a machine learning model and of a similarity search feature. The machine learning model should be trained to identify such areas of expertise and the similarity search feature should identify the list of the most relevant articles within each area.

How the model was trained and how the similarity search was set is probably the topic for a different blog post. Indeed, the definition of the training problem, the creation of a labelled data set, the building of a class ontology were not minor details in the whole project.

Let’s just say that my brain works in sub-optimal conditions. While I am eager to learn, the lack of labelled datasets forces me to rely, at least partially, on human teaching via active learning.

Am I right? The “Feedback” page

Remember! I would appreciate it, if you could leave a feedback on my proposed resources. And my proposed resources depend on the predicted areas of expertise. If my proposed areas of expertise (categories) are wrong, my proposed web resources are also wrong!

Somewhere along our conversation, I would like to ask you whether any of the three top proposed areas of expertise resulted to be of any help to your question. If yes, this rewards me for a good job done. If not, just say so and I will try to do a better job next time!

This is the task of the “Feedback” page. Like the “Question” page, the “Feedback” page comes from a wrapped metanode, including a Text Output node for logo, image, and repeated question, and a Value Selection Quickform node to select which, if any, of the proposed categories was helpful.

The last option in the list, named “Something Else”, refers to a terrible job done on my side, and pushes me to learn more and do better next time.

Note. This feedback page could be omitted. I can also do without your help, but - especially at the beginning of my career as a teacher bot - your 2 seconds of feedback could be of invaluable help to speed up my learning curve.

Figure 3. The Feedback page. Here you can select one of the proposed categories and then press “Next”. If the corresponding list of top resources is useful, you can just terminate the conversation. If the list of top resources is not useful, you can click “Back” to return to this page and select another category. When our conversation is terminated, I assume that the last chosen category is your feedback. If option “Something Else” has been chosen, I interpret this as an invite to learn more and do better next time. This feedback phase could of course be skipped and the category selection could be hidden from the UI flow.

Display the Answer: The “Resources” page

OK. I am now ready to give the answer! Based on the categories identified by my brain and possibly confirmed in the “Feedback” page, I come up with this list of web resources.

Please, check if any of them are helpful to solve your question or even just to learn something more about it.

If yes, just click “Next” at the end of the page.

If not, use the “Back” button to move back to the “Feedback” page and select a new category for new web resources.

If you have already done that multiple times and you still think that a solution to your problem is possible, please talk directly to one of the humans who assembled me through the “Send email” button.

This page is created again by a wrapped metanode containing just a Table View node to display the list of links and a Generic Javascript View node to create the “Send email” button.

Figure 4. The UI page with the proposed resources. After selecting a category in the Feedback page, you are presented with the top resources in that area. If you do not find an answer in any of those, just click “Back” and select a new category. If you do, click “Next”. This terminates our conversation. If you have not found an answer and you have exhausted all suggested categories, please just send an email to my creators.

Hi! I am Emil.

At the end of the assembly line, there is me, Emil, your teaching assistant bot.

The workflow used to assemble me is shown in the figure below and available on the EXAMPLES Server under 50_Applications/33_Emil_the_TeacherBot/01_Emil_the_TeacherBot^*. You might recognize the metanodes at the origin of the UI pages.

Most of the other metanodes are querying my brain for answers and this, as I said, will be the topic for another blog post.

Hi! My name is Emil and I am a KNIME workflow.

Figure 5. This is the workflow behind Emil. This workflow reads your question, parses it, queries a machine learning model for answers, and reports the answers back to you via a KNIME WebPortal web based UI. It also includes a feedback procedure. This workflow is available on the EXAMPLES Server 50_Applications/33_Emil_the_TeacherBot/01_Emil_the_TeacherBot*

The Emil project was presented at the last KNIME Spring Summit 2018 in Berlin with the title “KNIME & Teacher Bots: From Workflows to Micro-Services”. Slides of the presentation are now available for download from the KNIME Summit page and you can relive the experience by watching the YouTube video “KNIME & Teacher Bots: From Workflows to Micro Services”.

* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.4.0 or higher)