01 Oct 2018admin

Authors: Maarit Widmann and Moritz Heine

Ever been skewed by the presence of outliers in your set of data? Anomalies, or outliers, can be a serious issue when training machine learning algorithms or applying statistical techniques. They are often the result of errors in measurements or exceptional system conditions and therefore do not describe the common functioning of the underlying system. Indeed, the best practice is to implement an outlier removal phase before proceeding with further analysis.

But hold on there! In some cases, outliers can give us information about localized anomalies in the whole system; so the detection of outliers is a valuable process because of the additional information they can provide about your dataset.

There are many techniques to detect and optionally remove outliers from a dataset. In this blog post, we show an implementation in KNIME Analytics Platform of four of the most frequently used - traditional and novel - techniques for outlier detection.

Read more

24 Sep 2018daria.goldmann

Author: Daria Goldmann

About a year ago we told a beautiful story about how KNIME Analytics Platform can be used to automate an established modeling process using the KNIME Model Factory. Recently our Life Science team faced an exhausting and frightening exercise of building, validating, and scoring models for more than 1500 data sets.

Read more

17 Sep 2018admin

Author: Jim Falgout

You’ve built a predictive model using KNIME Analytics Platform. It’s a very good model. Maybe even an excellent model. You want others to take advantage of your hard work by applying their data to your model. Let’s build an API for that!

An API is an Application Programming Interface. It’s a way to programmatically (i.e. write some code) interface with a computer program. A REST API is a specific sort of API that is used in the world of web service development. REST APIs pass around data in a format known as JSON.

Here are a few reasons for building a REST API for the application of your model:

  • Integrate the application of your model with your company’s web site
  • Integrate the application of your model with business processes in your company
  • Share the application of your model with the outside world (with some controls on top)
  • Sell the application of your model as a service

As you can see from these example usages, APIs are all about sharing and integrating.

Read more

10 Sep 2018Jeany

Author: Jeanette Prinz

In a previous blog post, I discussed visualizations in KNIME Analytics Platform. Having recently moved to Berlin, I have been paying more attention to street graffiti. So today, we will be learning how to tag.

...just kidding. Sort of.

Our focus will be on tagging, but the text-mining (rather than street art) variety: We will learn how to automatically tag disease names in biomedical literature.


The rapid growth in the amount of biomedical literature becoming available makes it impossible for humans alone to extract and exhaust all of the useful information it contains. There is simply too much there. Despite our best efforts, many things would fall through the cracks, including valuable disease-related information. Hence, automated access to disease information is an important goal of text-mining efforts1. This enables, for example, the integration with other data types and the generation of new hypotheses by combining facts that have been extracted from several sources2.

In this blog post, we will use KNIME Analytics Platform to create a model that learns disease names in a set of documents from the biomedical literature. The model has two inputs: an initial list of disease names and the documents. Our goal is to create a model that can tag disease names that are part of our input as well as novel disease names. Hence, one important aspect of this project is that our model should be able to autonomously detect disease names that were not part of the training.

To do this, we will automatically extract abstracts from PubMed and use these documents (the corpus) to train our model starting with an initial list of disease names (the dictionary). We then evaluate the resulting model using documents that were not part of the training. Additionally, we test whether the model can extract new information by comparing the detected disease names to our initial dictionary.

Read more

03 Sep 2018admin

Author: Vincenzo Tursi

KNIME Analytics Platform is the open source software for creating data science. It allows you to design and implement data science workflows with added leverage from KNIME Integrations, KNIME Extensions, Community Extensions, and Partner Extensions.

Moving one step further to now put these data science applications into production, a number of requirements need to be taken into account.

Read more

30 Jul 2018admin

Authors: Christian Dietz, Paolo Tamagnini, Simon Schmid, Michael Berthold

In recent months a wealth of tools has appeared, which claim to automate all or parts of the data science cycle. Those tools often automate only a few phases of the cycle, have a tendency to consider just a small subset of available models, and are limited to relatively straightforward, simple data formats.

At KNIME we take a different stance: automation should not result in black boxes, hiding the interesting pieces from everyone; the modern data science environment should allow automation and interaction to be combined flexibly. If the data science team works on a well defined type of analysis scenario, then more automation may make sense. But more often than not, the interesting analysis scenarios are not that easy to control and a certain amount of interaction with the users is actually highly desirable.

Read more

23 Jul 2018admin

Authors: Maarit Laukkanen, Rosaria Silipo, Heather Fyson

There are all kinds of resources here at KNIME to learn more about using data science with our tools: KNIME Analytics Platform or KNIME Server. There are courses, website articles, Innovation Notes, YouTube videos, noding and development guidelines, our SDK Setup on Github, etc… We even have a range of ebooks! The latest one in the series is about Text Mining, for example. But how can you find out which resource is best matched to what you need to know? What if you’re the kind of person interested in more structured learning and you want to check out our course schedule? Or you live in a faraway place and there’s no course scheduled near you? Some of our courses are in a classroom with teachers, some are run online by the same teachers, and some courses involve videos recorded by these teachers. Some courses cover KNIME Analytics Platform, some KNIME Server; some cover basic functionalities, some are more advanced, some cover data analytics, some text processing, or big data. But, hold on, there… what if you work in a company that produces a lot of data and want to use data analytics to find out more about your business’s impact, but you’re not a data scientist yourself? Well, then our Innovation Notes, a small series of use cases would be good. But where are they? And what if you are a more socially inclined kind of person, you might fancy some networking opportunities and are looking for our list of meetup events or even better, the KNIME Summit?

Do you feel overwhelmed? Is it hard to decide which learning option is the best for you? Maybe the flow chart below can help you navigate the different learning options, according to your needs and inclinations.

Read more

16 Jul 2018Vincenzo

In case you haven’t yet booked your summer holiday (or winter getaway depending which hemisphere you live in), not to worry - we’ve got the perfect thing to keep you busy! KNIME Analytics Platform 3.6 and KNIME Server 4.7 have been released and there’s plenty of new things to try out!


Have a go at creating your own fully functional local big data environment from within a KNIME workflow - thanks to this neat little node, this is now possible! Or, if you’re interested in deep learning, check out the newest enhancements to our deep learning integrations. There’s also plenty of utility nodes to try out as well as many new UI improvements.

On the KNIME Server side, there are new features to check out, too. Like an option that makes it easier for IT to centrally manage KNIME Analytics Platform client preferences, to display job views, and run a workflow faster on distributed executors.

Read more

09 Jul 2018admin

Authors: Alexander Fillbrunn, Anna Martin

With the FIFA World Cup in full swing, quite a few people are enjoying betting games to add some additional suspense to the tournament. But to make informed guesses about the outcome of the games, it is helpful to know how the teams fared in previous world cups and preliminaries.

To give the fans an edge and show them the relevant information, we created an interactive world map that shows the statistics for the different teams. In particular, the application provides the following features:

  • A choropleth map displaying goals, points or wins for each country in a given range of years
  • A slider with two handles to select the year range
  • A popup window, when hovering over a country that shows the countries that national team beat most often
  • Finally, a bar chart displaying the yearly distribution of goals, wins, or points a national team scored, which is shown when the user moves the mouse over the corresponding country on the map.

Not to keep you in suspense any further, Figure 1 shows the screenshot of the final map! At the end of this blog post you also find the interactive version of the visualization. To try out all the features and trace the paths of champions you can download the workflow from the EXAMPLES server at 03_Visualization/04_Geolocation/08_FIFA_World_Cup03_Visualization/04_Geolocation/08_FIFA_World_Cup*.

The rest of the blog post takes you through the crucial code snippets that make the visualization come alive.

Figure 1. The final visualization with countries colored by the number of goals scored. Here we are hovering over Russia, which brings up the pop up window with the pie chart of the countries whose national team has been defeated over the years and the bar chart with the number of goals scored over the years.

Read more

03 Jul 2018admin

Ever sat next to a friend or colleague at the computer and were awed when you suddenly realised the way they do certain tasks is much better? We recently asked KNIME users to share their tips and tricks on using KNIME. In this series of posts we’ll be showing you how the experts use KNIME in the hopes that by sharing ideas you’ll discover some handy techniques.

So where do bunny ears come into this?

How to Enable Flow Variable Ports on a Node

By Alexander Franke

Flow variables are used in KNIME Analytics Platform to parameterize workflows when node settings need to be determined dynamically. They are carried along branches in a workflow via data connections (the black edges between nodes) and also via explicit variable connections (the red edges between nodes).

The “bunny ears” are the flow variable ports on a node (Fig. 1). They are hidden by default so you usually cannot see them.

To enable flow variable ports in any node, you can:

  • Right click the node and select “Show Flow Variable Ports” in the Context menu.


  • Start a data connection at a flow variable port (red circle) and drop it at the top left corner of a node

Ta-dah! Bunny ears.

Find out more about flow variables in Chapter 7.1 Workflow Parameterization: Flow Variables of the KNIME e-learning course.

Figure 1. To show the flow variable ports of any node, use the option “Show Flow Variable Ports” in the context menu or start the connection at a flow variable port (red circle) and drop it at the top of the receiving node. This will make the flow variable ports appear.

Read more

Subscribe to KNIME news, usage, and development