What's New in KNIME 2.10

New Nodes and New Features

 - GUI and Infrastructure
 - Database 
 - Social Media 
 - Data Manipulation
 - Data Mining / Statistics
 - PMML
 - Data Generation
 - Textprocessing
 - Image Import

Other Improvements on existing Nodes and Features

 - Data Manipulation
 - Data Mining
 - Open Street Map Integration
 - See full list of changes in changelog file

 


New Nodes and New Features

GUI and Infrastructure

Important Information about Updating KNIME for Windows Users

KNIME 2.10 is using the latest Java version (Java 7, update 60) for all supported systems: Windows, Mac and Linux. This Java update addresses some instabilities on some flavors of Linux and Mac.

  • On Windows, a re-installation of KNIME is required, due to the underlying Java update. You can still point the newly installed KNIME version to your previous workspace or import the entire workspace into your new installation.
  • On Mac or Linux, you can simply update your KNIME installation via the File menu.

New Intro Page

When you open KNIME, a new intro page greets you. From there you can move directly to useful information (learning hub, example workflows...) or trigger the most common actions (create new workflow, open recent workflows, install extensions).

This page also replaces the previous pop up window containing tips&tricks and news.

 

 

 

Auto-save

Now workflows can rely on the new auto-save feature. Enable it in File -> Preferences -> KNIME -> KNIME GUI

If there is an unexpected break, your workflow will be recovered in the next KNIME session from the latest auto-save file.

Note, this feature is disabled by default. Just go to the preferences and enable it if you find it useful.

 

 

 

Database

Remember the Database Connector node? It used to establish a connection to a database and to select a table from the database through an SQL query.
These two tasks now have been separated and have each their dedicated nodes. A Database Connector node establishes a connection to a database, while an SQL Table Selector node defines an SQL query at least to select the table to work on and maybe more. The new red square port represents a database connection with no selected table.
Following these refinements in the node tasks, the old Database Connector node has been renamed Database Table Connector and the Database Connection Reader / Write nodes have been renamed Database Connection Table Reader / Writer.
Below some of the new Connector and SQL creator nodes are described.

Database Connector Nodes

The generic Database Connector node establishes a connection to a Database through its JDBC driver. No SQL query definition is possible. The JDBC drivers to the most commonly used and most recent database versions are pre-loaded in KNIME and available in the node configuration window.
However, dedicated connector nodes are available for specific databases, such as:
  • MySQL Connector
  • SQLite Connector
  • PostgreSQL Connector
  • Hive Connector (commercial license required)
These nodes create a connection to a MySQL, SQLite, PostgreSQL server respectively using the appropriate JDBC driver. You just need to provide the database address and credentials.

 

Database Table Selector

This node provides an interactive editor to build the SQL query to apply to a database connection established with a Database Connector node (see item above). The main task of this node is to select the table to work on from the database.

SQL Extract / Inject and Database SQL Executor

The SQL Inject node applies an SQL query contained in a variable into a database connection established with a Database Connector node (see item above). It works similarly to the Database Table Selector node using a flow variable content rather than an SQL editor.

The SQL Extract node, on the other side, exports an SQL query from a database port into a variable.

The Database SQL Executor node provides the editor to create and then apply an SQL query to a database connection produced by a Database Connector node (see item above).

 

 

Hive Loader (Cloudera certified, commercial license required)

This node, together with the Hive Connector node, belongs to the big data extension. The Big Data Extension can be installed via the KNIME Update Manager. Go to File → Install KNIME Extensions... and select the appropriate extension from the KNIME Extension Store category.

Note The KNIME Big Data extension requires a commercial license which you can purchase via the KNIME Store.

The Hive Loader node loads data into a Hive database, using the File Handling extensions. First, it copies the data onto the Hive server (using SSH, FTP or any other supported protocol — note that currently remote access to the HiveServer node is required); then a Hive command is executed to import the data into Hive. The node's output is a database connection operating on the imported data table.

 

 

 

Database Sorter / GroupBy / Joiner

These new nodes - Database Sorter, Database GroupBy, Database Joiner - apply to an existing database connection and SQL query. They refine the SQL query to sort, aggregate, or join data tables from one or more databases, respectively using ORDER BY, GROUP BY, and JOIN.

Their configuration window presents the same structure as in the corresponding Data Manipulation nodes: Sorter, GroupBy, and Joiner.

Notice that the Database Joiner node allows to join tables from two different databases connections.

 

 

 

Social Media

Google API

Under KNIME Labs, there are now three nodes dedicated to connect to Google APIs.

  • Google API Connector establishes a connection to access various Google APIs

Once a connection to the Google API engine has been established, you can specifically connect to Google Analytics using the other two nodes:

  • Google Analytics Connector creates a connection to Google Analytics API
  • Google Analytics Query retrieves data from Google Analytics API through a specific query

Note. You need a Google service account and a key file to access Google APIs, as described in https://developers.google.com/accounts/docs/OAuth2ServiceAccount.

 

 

 

Twitter API

Again under KNIME Labs, there are now a few nodes to connect to the Twitter API.

  • Twitter API Connector creates a connection to access Twitter API

Once a connection with Twitter API has been established, you can:

  • post a tweet using the Twitter Post Tweet node
  • search for tweets using the Twitter Search node
  • get timelines for a number of Twitter features (mentions, User, Home, etc ...) with the Twitter Timeline node
  • retrieve user data using the Twitter Users node

Notes.

Twitter's search service and, by extension, Twitter Search API is not meant to be an exhaustive source of tweets. Not all tweets are indexed or made available via the Twitter Search interface/API (see http://apivoice.com/2012/07/12/the-twitter-firehose/)

To access the Twitter API you need a Twitter API key and an access token, as described in https://dev.twitter.com/docs/faq#7447.

 

 

 

Data Manipulation

7 new Distance Nodes

  • Numeric Distances
  • String Distances
  • Bit Vector Distances
  • Byte Vector Distances
  • Mahalanobis Distance
  • Matrix Distance
  • Aggregated Distance
  • Java Distance

The same distance functions are available in all nodes requiring a distance measure. Implemented distance measures are described in the KNIME wiki.

 

 

 

Moving Aggregation

This new node calculates a number of aggregation values for a moving window. The aggregation values are displayed in new columns appended at the end of the table.

This node also calculates a cumulative sum from the beginning till the end of the data set when the "Cumulative computation" flag is enabled.

 

 

 

Date/Time Shift

This new node shifts a DateTime value backwards (negative) or forward (positive) a number of years, months, weeks, days, hours, minutes, seconds, or milliseconds.

The time shift can be applied to the values in a DateTime column, to a fixed DateTime value, or to the current time.

Time Series Missing Values

The new Time Series Missing Value node handles missing values in time series:

  • Using the value of the previous/next non-missing cell.

And for numerical columns only:

  • Average: assigns the average value of the previous and next non-missing values;
  • Linear: assigns the linear interpolation of the previous and next non-missing values.

 

 

 

Data Mining/Statistics

 

 

New View in Statistics Node

The Statistics node has a new view including skewness,  Kurtosis, and histogram.

 

Time Series Analysis

New metanodes for Time Series Analysis:

  • "Seasonality Correction" metanode removes a seasonality pattern from an existing time series
  • "Time Series Auto-Predictive Training" metanode partitions the time series using the first X% of data rows as training set and the remaining data rows as test set; then it trains a predictive model based on the training set time series only
  • "Time Series Auto-Prediction Predictor" metanode applies the model trained in the "Time Series Auto-Predictive Training" node to the test time series and measures the error using a Numeric Scorer node

 

 

 

PMML

PMML 4.2 and JPMML 1.1.3

PMML and JPMML formats have now been upgraded to the latest PMML 4.2 and JPMML 1.1.3.

All learning nodes supporting PMML now support PMML 4.2. New addition: the Naive Bayes Learner node now supports PMML 4.2 as well.

 

 

XML to PMML

This new node transforms a column of XML values into a column of PMML values.

 

Data Generation

Random Number Assigner (Apache)

This new Random Number Assigner is based on the Random Number Generation of the Apache Commons Math Library. It provides 10 random distributions, from simple Uniform, to Cauchy, ChiQuare and many more.

Due to its simple configuration panel it is especially helpful for generating data in a loop.

 

 

Random Boolean Assigner

This new node will generate a column containing boolean values. Boolean values are randomly assigned to each row either using a fixed number of TRUE values or following a pre-defined probability.

 

 

Textprocessing

 

Sentiment Tagsets

A new tag set for sentiment tagging with the following tags:

VERY_POSITIVE, POSITIVE, NEUTAL, NEGATIVE, VERY_NEGATIVE, UNDERSTATEMENT, EXAGGEGATION, IRONY, AMPLIFICATION, NEGATION, ABBREVIATION, UNKNOWN

This tag set can be used by tagger nodes, e.g. Dictionary or Wildcard Tagger to assign sentiment tags to terms. Besides positive and negative tags, other tags line irony, amplification or negation can be assigned as well.

 

 

The Topic Extractor (Parallel LDA)

The Topic Extractor (Parallel LDA) node extracts topics and assigns these topics to documents. Each topic is represented by a set of terms. The number of topics to extract from a list of documents can be specified in the dialog. The output of the node consists of three data tables. The topics as set of terms, the list of documents with assigned topics (as id), and a log likelihood score of the topic model for each iteration, which represents the convergence rate of the training. Simple parallel threaded implementation of LDA, following Newman, Asuncion, Smyth and Welling, Distributed Algorithms for Topic Models published at KDD (2009).

 

 

Image Import

 

Read Images

This new node reads image files (SVG or PNG) using the URLs specified in an input data column.

 

 

String to SVG

This new node converts String values, including XML, from an input data column into SVG images.

 

 

Improvements on existing Nodes and Features

Data Manipulation

  • Column Rename node now handles better multiple columns and long column names and supports flow variables
  • Normalizer , One2Many, and Many2One nodes now all have additional options for column filtering
  • Cross Joiner node has now more options and even faster execution
  • GroupBy, Pivoting, Rule Engine, and Column Aggregator now offer more extensive support for escape characters, such as /n or /t
  • AutoBinner node uses now better defined integer bin boundaries

Data Mining

  • Distance based nodes, like   similarity search, k-medoids, hierarchical clustering, etc..., now support many more distance functions (see "7 New Distance Nodes" above)
  • Tree Ensemble Learner is executed now using a reduced amount of machine memory
  • ROC Curve can now sample data points to build the ROC curves for a large dataset.
  • Polynomial Regression Learner standardized to follow Linear Regression Learner specifics.
  • SOTA and Weka Predictor nodes have been standardized to conform to all other Predictor nodes.

Open Street Maps Integration

  • Open Street Maps nodes now support tracks and routes
  • Open Street Map View now supports hilite on a selection of map marker

Many more small improvements have been made under the hood - please refer to the changelog file.

LinkedInTwitterShare