28 Apr 2015rs

Author: Rohit Agarwal, Fractal Analytics
Reposted from Fractal Analytics Blog Post of February 25, 2015


Fortune 100 companies like Amazon and Google have been moving to institutionalize analytics across business processes – the results are for all to see. Achieving scale is a significant challenge to this process. Operationally, there are two ends to this spectrum. One: Scale through people – the focus here is on the analytics delivery team, the constraint being that the output will always and only be proportional to the inputs. At the other end: Scale through automation – this includes a black-box approach where every decision, business or analytical, is hard coded into the system, thus limiting the value of its outputs to the end-user. Either option fails to deliver on the economies of scale.

Read more

15 Apr 2015michael.berthold

Obviously advanced analytics starts with an intuitive, yet powerful interface that allows data scientists to quickly explore different ways to blend and analyze their data. Even better, if those analysis workflows can be easily handed to others, as templates for their own analysis needs. However, when the analysis is being deployed or the results are used for business critical purposes it becomes essential that we can repeat the analytical process and guarantee that the results stay the same. In order to truly productionize advanced analytics, reproducibility is a key requirement.

Read more

30 Mar 2015rs

It is actually quite easy to build a market basket analysis or a recommendation engine [1] – if you use KNIME! A typical analysis goal when applying market basket analysis it to produce a set of association rules in the following form:

IF {pasta, wine, garlic}  THEN pasta-sauce

The first part of the rule is called “antecedent”, the second part is called “consequent”. A few measures, such as support, confidence, and lift, define how reliable each rule is. The most famous algorithm generating these rules is the Apriori algorithm [2].

Read more

17 Mar 2015tobias.koetter

Today we would like to show you how to work with collection cells in KNIME. You might have already come across these cells that represent a collection of cells e.g. a collection of strings representing a frequent item set or items in a transaction.

The workflow associated with this post is available for download in the attachment section of this post or in the EXAMPLES server under 003_Preprocessing/003004_CollectionCookbook_blog.

Read more

09 Feb 2015fogathmann
Author: F. Oliver Gathmann, Data Scientist

Many Life Sciences Discovery Informatics applications have to deal with some unpleasant combination of high data volume, high data velocity, and high data variety - the classic "3Vs of Big Data". While applications that combine high values for all three Vs are rare in the Life Sciences - High-Content Screening (HCS) and Next-Generation Sequencing (NGS) come to mind - you can always rely on your input data to be variable, either in terms of the input formatting, or in terms of of the input data structures, or both. Moreover, in the vast majority of cases the data volume is too large to be handled properly with a collection Excel files, so a robust IT infrastructure for storing and validating the incoming data is required. In short, the average Life Sciences Discovery Informatics application needs to be very nimble and very robust at the same time.

Read more

26 Jan 2015rs

More and more often people are finding that the amount of available raw data collected by a system can grow exponentially fast, quickly reaching a very large size and a very high number of features, sometimes even qualifying for what is referred to as “big data”. In the case of really large data sets, it can then be helpful to take advantage of big data platform performances, especially to run ETL procedures.


Read more

12 Jan 2015rs
Author: Cathy Pearl, User Experience Consultant

Did you know there are nearly 4000 online dating sites out there?  If you’re a Sea Captain, an Ayn Rand fan, or love Star Trek, there is a dating site for you.

I’m not an online dater myself, but I’m fascinated by the science of attraction.  Looking at online dating habits is one way to evaluate modern-day courtship.  Nowadays, 33% of couples have met online (not necessarily through dating sites).  That number is projected to go up to 70% by 2040, which is not surprising, given how much of our lives we spend online these days.

Read more

30 Dec 2014stefferber
Author: Dr. Stefan Ferber, Vice President Engineering Bosch Software Innovations GmbH

Bosch Software Innovations, the Bosch Group’s software and systems house, and KNIME, provider of the only open platform for data-driven innovation, recently announced their cooperation to allow data mining and data analytics in Internet of Things (IoT) applications (https://www.bosch-si.com/newsroom/news/publications/publications-52352.html).

Read more

10 Dec 2014wiswedel

We are getting close to the holiday season and, like every year, we have a new holiday version of KNIME ready to go under the Christmas tree!

KNIME 2.11 was released on December 6 featuring improvements in both the open source KNIME Analytics Platform and KNIME Big Data Extension.

Always with an eye to producing a tool for data-driven innovation, changes in this new version have faithfully followed the guidelines of the “Open for Innovation” manifesto.

Read more

25 Nov 2014kilian.thiel

Sentiment analysis of free-text documents is a common task in the field of text mining. In sentiment analysis predefined sentiment labels, such as "positive" or "negative" are assigned to texts. Texts (here called documents) can be reviews about products or movies, articles, etc.

In this blog post we show an example of assigning predefined sentiment labels to documents, using the KNIME Text Processing extension in combination with traditional KNIME learner and predictor nodes.

A set of 2000 documents has been sampled from the trainings set of the Large Movie Review Dataset v1.0. The Large Movie Review Dataset v1.0 contains 50000 English movie reviews along with their associated sentiment labels "positive" and "negative". For details about the data set see http://ai.stanford.edu/~amaas/data/sentiment/. We sampled 1000 documents of the positive group and 1000 documents of the negative group. The goal here is to assign the correct sentiment label to each document.

Read more

Subscribe to KNIME news, usage, and development