What's New in KNIME Analytics Platform 3.1 and KNIME Server 4.2

A major KNIME Software update is available as part of the winter release 2015 (released on 6 December). Below are the highlight changes in both the open source KNIME Analytics Platform and the commerical KNIME products.

An upgrade from KNME Analytics Platform 2.x is not possible. You need to start with one of the installation packages from the download page.

GUI and general Structure
        - New Look and Feel
        - New Eclipse and Java beaneath
        - WrappedNodes

Analytics and ETL
        - Simple Regression Tree (New Nodes)
        - Random Forest (New Node) 
        - Active Learning (New Nodes)
        - ARIMA (New Nodes)
        - k-Means
        - Prediction Fusion (New Node)
        - Rank (New Node)
        - Extended PMML Support
        - More In-Database processing nodes (New Nodes)

Streaming
        - Streaming Executor
        - Streamable Text Processing Nodes

KNIME Big Data Extensions
        - KNIME Spark Executor

KNIME Server
        - New Server Features
        - KNIME Cloud Server

KNIME Personal Productivity     
        - WorkflowDiff (New Feature)

Other New Nodes and Features
        - Node Repository - Fuzzy Search

See the full list of changes in the changelog and check out the video on YouTube about all new features in the KNIME software.


GUI and General Structure

New Look & Feel

The most obvious changes to the new look & feel of this release can be seen in the colours and shapes of the nodes. We have taken out the colour shading and resdesigned the nodes to produce a flatter more modern design. The node structure has also been reorganized for easier retrieval of information and nodes.

 

 

 

New Eclipse and Java beneath

Other major aspects of the 3.1 release take us behind the scenes. We have upgraded to be compatible with the latest versions of all the underlying libraries, including updates to Java 8, Eclipse 4, and BIRT 4. And the main editor has undergone severe refactoring to provide a more efficient environment for your workflows. The extension API for data types is another enhancement which makes it easier to generalize to data types and also port types.

 

 

WrappedNode

WrappedNodes (previously: SubNodes) are encapsulated and isolated entities in a workflow. They can be seen as an extension of the MetaNodes, but they have additional benefits: isolated execution, defined variable scopes, custom documentation, and better automatization. Additionally, their content can be run using the Streaming Executor.

Wrapped Nodes Preview

Analytics and ETL

Simple Regression Tree (new nodes)

These new learner and predictor nodes respectively train and apply a simple regression tree. The nodes support ordinary data columns but can also be used with fingerprint columns (bit or byte vectors). Configuration settings include tree depth, node and leave size.

 

 

 

Random Forest (new node)

 

A simplified version of the Tree Ensemble node has been implemented as specified in Breiman, Leo. "Random forests." Machine learning 45, no. 1 (2001): 5-32. The Random Forest node implements a Tree Ensemble node with column sampling as the root square of the original number of columns and bootstrapping with replacement over all data rows. The configuration dialog allows to set the number of models, the depth of the trees, and the minimum node size in the trees.

 

 

Active Learning (new nodes)

The KNIME Active Learning Extension comprises a set of KNIME Nodes
for modular Active Learning and Novelty Detection in KNIME. It's used for semi supervised learning tasks, whereby the system involves the user by asking the right question to further refine the predictive model. For more details see here.

 

 

 

ARIMA (new nodes)

A new set of nodes in KNIME Labs trains, applies, and stores ARIMA models. For now Conditional Likelihood, Maximum Likelihood, and Yule-Walker estimation methods are available for training. Predictions are created as simple future predictions and/or in-sample predictions.

 

 

k-Means

The k-Means node has one additional output port, which now outputs the cluster centers as a data table. No need to build complex PMML processing to extract the k-Means cluster centers anymore!

 

 

 

 

Prediction Fusion (new node)

The Prediction Fusion node enables the fusion of multiple predictions from models different in the underlying algorithms as well as in the input features. Usually, fusing the predictions can result in a better accuracy than just using one single model.  Fusion strategies include: mean, median, maximum, and minimum.

Rank (new node)

The new Rank node produces ranking on a given column, i.e. assigns a rank to each data row depending on the column value. Three different ranking modes are supported: standard, dense, and ordinal. Ranking can also run on multiple columns - identified by "Ranking Attributes" - as well as on single data groups - identified by "Grouping Attributes".

 

Extended PMML Support

The node set for modular PMML has reached maturity and is moved out of the Labs extension. In particular, it has been made more robust with regards to data transformation PMML models, such as column filters. All nodes previously equipped with an optional PMML input port are now deprecated as the PMML documentation generation is to be done using the modular PMML nodes.

 

 

More In-Database processing nodes

This is a set of new nodes that enable in-database processing without needing to write a single line of SQL, including nodes for sampling, pivoting, and binning (also with PMML output).

 

Streaming

Streaming Executor

Streaming execution is a new run mode and is different to the default "node-by-node" execution. Benefits are less I/O and faster runtime at the expense of limited explorativity and traceability.
The Streaming Executor is a Labs exension and has improved greatly. Around additional 100 nodes inherit native 'streamability', including many nodes for preproc, prediction and text-processing. Note, streaming is enabled for WrappedNodes only. 

 

 

Streamable Text Processing Nodes

New streamable text processing nodes are now available. The old nodes have been wrapped in a new container node, that can run on the streamable executor. Whether or not the streamable executor is used is set in the configuration dialog.The old text processing nodes have been deprecated.

 

 

 

KNIME® Spark Executor

(license required)

KNIME Spark Executor v1.3 extends the KNIME® Big Data Connectors enabling customers to create and run Spark applications within either KNIME Analytics Platform or KNIME Server. The new nodes offer seamless easy-to-use data mining, scoring, statistics, data manipulation and data import/export on Spark within the KNIME software.

 

 

 

 

KNIME® Server

(license required)

New KNIME Server Features

Advanced Job Scheduling

The KNIME Server now allows for more fine-grained scheduled jobs. In addition to the repeat interval (which now respects daylight saving) the user can now define filters based on day-of-week, day-of-month, and month. Also a scheduled execution can be skipped if the previous job is still running.

REST interface

Via the REST interface you can execute a workflow with a single call. It automatically creates a new job, executes it with user-defined parameters, returns the results, and discards the job.

It's now also possible to query a list of all active jobs on the server via a REST call.

 

KNIME® Cloud Server

(license required)

We have added a new product to the KNIME family: KNIME Cloud Server. This is a server on demand, which enables you to scale your analytics as and when you need to. Using KNIME Cloud Server you can share data, workflows and metanodes; you can schedule workflows and you can run your workflows remotely on more powerful hardware, freeing up your local resources for other tasks; you can also visualize your results for all end consumers to see on the Cloud WebPortal.

 

KNIME® Personal Productivity

(license required)

WorkflowDiff (new feature)

The WorkflowDiff feature allows you to compare workflows, workflow-templates, snapshots and nodes with each other. Just select two objects, right-click, and select "Compare" in the context menu. The selected objects are then analyzed to identify common nodes as well as insertion, deletions, substitutions and changes in the nodes themselves. For details see the product website here.

 

 

Other New Nodes and Features

Node Repository - Fuzzy Search

There is now a new way to search for nodes in the node repository: by the fuzzy search - click the button on the left of the search field in the node repository view. The benefit of the fuzzy search is that the right node can still be located easily even if the search query is misspelt or an inaccurate node name is given. The nodes most closely matching the given search query appear at the very top of the resulting list of nodes. The Quick Node Insertion Window (Ctrl+Space) with the same search mechanism is still available.
The node repository can now be filtered for streamable nodes, i.e. the nodes that are allowed to be used in streamed fashion (see the new streaming feature. mentioned above). The dropdown-menu in the node repository view therefore offers two options to either filter for streamable nodes ("Show Streamable Nodes Only") or display the information whether the node is streamable ("Show Additional Info").

 

 

 

 

Many more small improvements have been made under the hood - please refer to the changelog file.