09 Jul 2018admin

Authors: Alexander Fillbrunn, Anna Martin

With the FIFA World Cup in full swing, quite a few people are enjoying betting games to add some additional suspense to the tournament. But to make informed guesses about the outcome of the games, it is helpful to know how the teams fared in previous world cups and preliminaries.

To give the fans an edge and show them the relevant information, we created an interactive world map that shows the statistics for the different teams. In particular, the application provides the following features:

  • A choropleth map displaying goals, points or wins for each country in a given range of years
  • A slider with two handles to select the year range
  • A popup window, when hovering over a country that shows the countries that national team beat most often
  • Finally, a bar chart displaying the yearly distribution of goals, wins, or points a national team scored, which is shown when the user moves the mouse over the corresponding country on the map.

Not to keep you in suspense any further, Figure 1 shows the screenshot of the final map! At the end of this blog post you also find the interactive version of the visualization. To try out all the features and trace the paths of champions you can download the workflow from the EXAMPLES server at 03_Visualization/04_Geolocation/08_FIFA_World_Cup03_Visualization/04_Geolocation/08_FIFA_World_Cup*.

The rest of the blog post takes you through the crucial code snippets that make the visualization come alive.

Figure 1. The final visualization with countries colored by the number of goals scored. Here we are hovering over Russia, which brings up the pop up window with the pie chart of the countries whose national team has been defeated over the years and the bar chart with the number of goals scored over the years.

Read more

03 Jul 2018admin

Ever sat next to a friend or colleague at the computer and were awed when you suddenly realised the way they do certain tasks is much better? We recently asked KNIME users to share their tips and tricks on using KNIME. In this series of posts we’ll be showing you how the experts use KNIME in the hopes that by sharing ideas you’ll discover some handy techniques.

So where do bunny ears come into this?

How to Enable Flow Variable Ports on a Node

By Alexander Franke

Flow variables are used in KNIME Analytics Platform to parameterize workflows when node settings need to be determined dynamically. They are carried along branches in a workflow via data connections (the black edges between nodes) and also via explicit variable connections (the red edges between nodes).

The “bunny ears” are the flow variable ports on a node (Fig. 1). They are hidden by default so you usually cannot see them.

To enable flow variable ports in any node, you can:

  • Right click the node and select “Show Flow Variable Ports” in the Context menu.


  • Start a data connection at a flow variable port (red circle) and drop it at the top left corner of a node

Ta-dah! Bunny ears.

Find out more about flow variables in Chapter 7.1 Workflow Parameterization: Flow Variables of the KNIME e-learning course.

Figure 1. To show the flow variable ports of any node, use the option “Show Flow Variable Ports” in the context menu or start the connection at a flow variable port (red circle) and drop it at the top of the receiving node. This will make the flow variable ports appear.

Read more

25 Jun 2018greglandrum

How do I know this workflow still does what it’s supposed to do?


Most of the time, I use KNIME Analytics Platform to do exploratory work or build one-off workflows. Once I’ve finished one of these workflows, the real value it has - aside from being a good source of “spare parts” for future work - is that I can go back later and see what I did and how I did it. Some workflows are different though, and end up being used over and over again. When one of these workflows enters “production” and becomes an important part of my work, I like to always be able to be sure that it’s still doing what it’s supposed to do. This is particularly true when something in my environment changes, e.g. I install a new version of KNIME, or move to a new computer, or update some of the community or KNIME Labs nodes that end up liberally sprinkled throughout my workflows. I used to do this testing manually, but then I realized that KNIME itself provides the tools I need to automatically validate my workflows. This blog post is about doing exactly that.

Read more

18 Jun 2018admin

Authors: Vincenzo Tursi, Kathrin Melcher, Rosaria Silipo

Remember Emil the Teacher Bot? Well, today we want to talk about how Emil’s brain was created! The goal when creating Emil’s brain was to enable him to be able to associate the most suitable resources on the KNIME website with keywords from the questions he is asked.

Before we continue, let’s recap what we’ve talked about so far in this Teacher Bot series:

  • The first blog post “Emil the Teacher Bot” describes how the workflow behind Emil assembles a web browser based GUI that has text processing and keyword extraction capabilities with machine learning algorithms.
  • The second post “Keyword Extraction for Understanding” discussed automatic keywords extraction algorithms available in KNIME Analytics Platform and our choice criterion .
  • The third post in the series “An Ontology for Emil” stressed the importance of defining a class ontology before starting a classification project.

As we’ve shown in these previous blog posts, our source was the questions from the KNIME Forum posted between 2013 and 2017. Questions were imported and stored as unanswered, as only a few answers contained links to educational resources and only some of those links referred to up-to-date educational material. At the same time, we also adopted a class ontology with 20 classes.

Emil’s brain’s goal then became two fold:

  1. Associate the right class from the ontology to the keywords summarizing each question.
  2. Within each predicted class, explore educational resources on KNIME site and extract the top four most relevant ones.

Today, let’s concentrate on goal # 1: Associate the right class from the ontology to the input question; i.e. the keywords summarizing the question.

Read more

11 Jun 2018ScottF

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: BIRT meets Tableau and JavaScript. How was the Restaurant?

The Challenge

How would your favorite restaurant perform in a health inspection? Are you sure you are eating in one of the most reliable and safest restaurants in town? If you live in Austin, Texas, we can check.

Over the last three calendar years, data.austintexas.gov has published a dataset of restaurant inspection scores for Austin (TX) restaurants. Inspection scores range between 0 (theoretically) and 100. An inspection score lower than 70 requires corrective measures; repeated low scores may even necessitate the closure of the restaurant.

We will use this dataset to visually explore:

  • How many restaurants have been inspected in each ZIP code area;
  • Of those restaurants how many have scored between 70 and 80, 80 and 90, and 90 and 100, as well as perfect 100 scores;
  • And finally, the average scores for ZIP code locations in the Austin area.

For each one of these goals we will use a:

  • pie chart;
  • grouped bar chart;
  • geolocation map.

So far so good. Now we need to choose the graphical tool for such representations.

Since data manipulation within a reporting environment might be clunky, many KNIME users prefer to run the data manipulation comfortably from KNIME Analytics Platform, and later export the data to their preferred reporting tool.

There are many ways to produce graphics in a report with KNIME Analytics Platform. Today we will consider three: with native graphics in BIRT, with JavaScript based images exported to BIRT, with native graphics in Tableau.

BIRT (Business Intelligence Reporting Tool) is a reporting tool, which to a certain extent, is distributed as an open source tool. The KNIME Report Designer extension integrates BIRT within KNIME Analytics Platform. The “Open Report” button in the KNIME tool bar takes you to the BIRT report environment. The “KNIME” button in the BIRT report environment takes you back to the KNIME workbench.

Tableau Desktop is a reporting tool, which requires a commercial license. 14-day trial licenses are also available at the Tableau Desktop site. The KNIME Tableau extension allows you to communicate with Tableau Desktop via TDE files or directly with the Tableau Server.

It is also possible to produce the graphic image in the KNIME workflow via the JavaScript nodes and subsequently export the image to the reporting tool of your choice.

In today’s Will They Blend, we want to highlight the process of generating a few simple graphics using BIRT, JavaScript, and Tableau. Can we produce three simple charts using those tools? What might that involve? Let’s find out!

Topic. Data Visualization using BIRT, Tableau, and JavaScript

Challenge. Create a Pie Chart, a Bar Chart, and a geo-location map using BIRT, Tableau, and JavaScript

Access Mode. KNIME Tableau extension, Report Designer extension, JavaScript Views extension, Open Street Map Integration extension

Read more

04 Jun 2018Marten Pfannenschmidt

Some time ago, we set our mind to solving a popular Kaggle challenge offered by a Japanese restaurant chain: predict how many future visitors a restaurant will receive.

This is a classic demand prediction problem: how much energy will be required in the next N days, how many milk boxes will be in demand tomorrow, and how many customers will visit our restaurants tonight? We already know how to use KNIME Analytics Platform to solve this kind of time series analytics problems (see whitepaper on energy prediction). So, this time we decided to go for a different approach: a mixed approach.

Thanks to the open architecture of KNIME Analytics Platform, we can practically plug in almost any open source analytics tool, such as Python, R, Weka, to name just three very prominent examples - and, more recently also H2O.

We already developed a cross-platform ensemble model to predict flight delays (another popular challenge). Here, cross-platform means that we trained a model with KNIME, a model with Python, and a model with R. These models from different platforms were then blended together as an ensemble model in a KNIME workflow. Indeed, one of KNIME Analytics Platform’s many qualities consists of its capability to blend data sources, data, models, and, yes, also tools.

For this restaurant demand prediction challenge we decided to raise the bar and develop a solution using the combined power of KNIME Analytics Platform and H2O.

Read more

28 May 2018daria.goldmann

Every data scientist has been there: a new data set and you’re going through nine circles of hell trying to build the best possible model. Which machine learning method will work best this time? What values should be used for the hyperparameters? Which features would best describe the data set? Which combination of all of these would lead to the best model? There is no single right answer to these questions because, as we know, it’s impossible to know a priori which method or features will perform best for any given data set. And that is where parameter optimization comes in.

Parameter optimization is an iterative search for the set of hyperparameters of a machine learning method that leads to the most successful model based on a user-defined optimization function.

Here, we introduce an advanced parameter optimization workflow that uses four common machine learning methods, individually optimizes their hyperparameters, and picks the best combination for the user. In the current implementation the choice of features and one hyperparameter per method are optimized. However, we encourage you to use this workflow as a starting point or a template if you have completely different data and customize it by including additional parameters into the optimization loop (and we will show where you could do that).

Read more

22 May 2018admin

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: The GDPR Force Meets Customer Intelligence – Is It The Dark Side?

Authors: Rosaria Silipo and Phil Winters

The Challenge

The European Union is introducing the General Data Protection Regulation on May 25 2018. This new law has been called the world’s strongest and most far-reaching law aimed at strengthening citizens' fundamental rights in the digital age. The new law applies to any organization processing personal data about a citizen or resident in the EU regardless of that organizations’ location. It is definitely a force to be reckoned with as the fines for non-compliance are extremely high.

Many of the law’s Articles deal with how personal data are collected and processed and include a special emphasis, guidelines and restrictions on what the EU calls “automatic profiling with personal data”. So what has that to do with us data scientists? Many of us use methods and machine learning to create new fact-based insight from the customer or personal data that we collect so that we can take decisions, possibly automatically. So the law applies to us. But that Customer Intelligence we create is fundamental for the successful running of our organizations business.

Does this mean this new force will put limits on what we do? Or even worse, does it mean that we have to go over to the dark side to help our businesses? Absolutely not! The new law defines that we must gain permission, perform tests, and document. In no way does it restrict what honest data scientists can do. If anything, it provides us with opportunities.

Topic. GDPR meets Customer Intelligence

Challenge. Apply machine learning algorithms to create new fact-based insights taking the new GDPR into account

Read more

14 May 2018admin

JavaScript Nuggets on Demand

KNIME Analytics Platform is extremely flexible. It offers not only a number of pre-packaged functionalities for prototyping or routine work, but also a number of integrations for the free coding days. One of these integrations imports the power of JavaScript code into the platform.

This blog post series aims at providing nuggets of JavaScript code to implement more creative drawing and plots than what is already available with the pre-packaged nodes. The nuggets of JavaScript code proposed here implement only one functionality and are explained step by step for all, even the JavaScript beginners, to understand.

Today: Interactive Choropleth World Map using Google GeoChart visualization

Authors: Rosaria Silipo & Paolo Tamagnini

Figure 1. A choropleth map is a geographical map where areas are colored, shaded, or patterned according to a corresponding calculated measure, in this case logarithmic number of 2013 population on a world map.

The Plot

Today we want to draw the choropleth map as shown above. So what do we need?

  • A map of the countries of the world and the corresponding numbers of population.
  • A short JavaScript code to load the Google Charts library and draw the choropleth map based on the population numbers of each country.
  • A Generic JavaScript View node to execute such code within a KNIME workflow.

Our dataset is the CSV file population2013.csv and it contains a list of 214 world countries with their corresponding population numbers as of 2013.

We also have a generic JavaScript View node. The smallest workflow would simply include a File Reader node to read the CSV file and a Generic JavaScript View node with the right JavaScript code nugget to draw the choropleth map. So let’s now have a look at this nugget of JavaScript code.

Read more

07 May 2018Vincenzo

Hi! My name is Emil, I am a Teacher Bot, and I can understand what you are saying.

Remember the first post of this series? There I described the many parts that make me, or at least the KNIME workflow behind me (Fig. 1). A part of that workflow was dedicated to understanding. This is obviously a crucial step, because if I cannot understand your question I will likely be unable to answer it.

Understanding consists mainly of text processing operations: text cleaning, Part-Of-Speech (POS) tagging, tagging of special words, lemmatization, and finally keyword extraction; especially keyword extraction.

Figure 1. Emil, the Teacher Bot. Here is what you need to build one: a user interface for question and answer, text processing to parse the question, a machine learning model to find the right resources, and optionally a feedback mechanism.

Keywords are routinely used for many purposes, like retrieving documents during a web search or summarizing documents for indexing. Keywords are the smallest units that can summarize the content of a document and they are often used to pin down the most relevant information in a text.

Automatic keyword extraction methods are wide spread in Information Retrieval (IR) systems, Natural Language Processing (NLP) applications, Search Engine Optimization (SEO), and Text Mining. The idea is to reduce the representation word set of a text from the full list of words, i.e. that comes out of the Bag of Words  technique, to a handful of keywords. The advantage is clear. If keywords are chosen carefully, the text representation dimensionality is drastically reduced, while the information content is not.

Read more

Subscribe to KNIME news, usage, and development