What's New in KNIME Analytics Platform 3.3, KNIME Server and KNIME Big Data Extensions

This year's Christmas release, on December 6, 2016, involves a major KNIME® Software update. Here, we have highlighted some of the major changes, new features, and usability improvements in both the open source KNIME Analytics Platform and the commercial KNIME.com products.

You can upgrade from your existing KNIME Analytics Platform 3.2 version by choosing the Update option in the File menu or downloading from scratch from the download page.

KNIME Analytics Platform 3.3

KNIME Server 4.4

KNIME Big Data Extensions

 

See the full list of changes in the changelog


KNIME Analytics Platform 3.3

Curved connections in workflow editor

 

The workflow editor now connects nodes in KNIME using curves instead of straight lines. Optionally, the thickness of the connections can be specified. Old workflows will retain straight lines but you can change on a per workflow basis using the toolbar action. The default can be changed via the KNIME preferences.

Switch to curved connections and very likely you won't switch back!

 

Excel Integration Reworked

 

The entire backend of the Excel integration has been reworked. The underlying library (Apache POI) has been upgraded to version 3.14 and streaming support has been added.

One of the major milestones is the support for large Excel files. Excel files spanning hundreds of thousands of rows and multiple thousands of columns can now be read without problems.

 

Upgrade to Eclipse 4.6

KNIME Analytics Platform 3.3 is based on the latest available Eclipse release, Neon (4.6); adding support for more recent operating system versions, specifically on Linux and Mac OSX.

 

 

New Variable Manipulation Nodes

 

The String Manipulation node and the Math Formula node are now available as variable nodes as well. The functionality and interfaces of those nodes remain the same, but the input and output ports are flow variables. These nodes allow easy manipulation of flow variables. Moreover, the input port of these nodes is optional, allowing them to be used to define new variables.

 

Improvements to the Text Mining Extensions

 

Apache Tika Integration 

The Tika nodes can read and parse many different file formats and extract not only the textual content but also the metadata (author, last date modified, etc) and attachments. File formats can be e.g. pdf, doc, ppt, txt, csv, zip, gt and many more. In addition, it is possible to detect the language of textual data.

 

Stanford NER (Named Entity Recognizer) Learner and Stanford NE Tagger nodes 

We now provide a Stanford NE (Named Entity) Learner node which trains an NE model based on a set of documents and a list of named entities as input. After training, the model can be used by the Stanford NE Tagger node for further analysis. The models can also be scored with a scorer node.

 

More Text Processing Nodes 

  • Document Vector Adapter node adapts document vectors based on a reference feature space.
  • Document Vector Hashing node creates document vectors based on term hashing and is streamable.
  • RSS Feed Reader node reads RSS Feed and returns extracted fields as strings, documents, and XML cells.
  • Stanford Lemmatizer node returns the lemmas of document terms based on the Part-of-Speech tags.
  • Document Term Entropy node computes the entropy of a term in a document.
  • Diacritics Remover node removes diacritics from terms in documents.
  • Markup Tag Filter node filters html tags.

 

Improvements to DeepLearning4J Integration

 

The Deep Learning nodes have been refactored and the Learner and Predictor nodes have been split up into task driven nodes: classification, clustering, and regression. Furthermore, the UI of all Learner nodes was improved in order to make parameter tuning and network configuration easier to understand. Under the hood the integration now uses an improved version of the Deeplearning4J Library which leads to better training performance as well as allowing the use of CUDA 8.0 and compatible Pascal GPUs.

 

New Cloud Connectors

 

The new Amazon S3 and the Azure Blob Store Connection nodes allow you to use the existing remote file handling nodes to work with your cloud data. The connectors are accompanied by File Pickers which enable the creation of pre-signed URLs, by which files can be accessed in any KNIME reader node. This is particularly useful if you want to 'stream' your data in as opposed to downloading the files first.

 

KNIME Server 4.4

WebPortal: Interactivity

 

This is an exciting new feature for the WebPortal. It is now possible for different view nodes that appear on the same page to interact with one another. In this first version of interactivity two types of events - selection and filtering - are available. These are supported by many of the JavaScript enabled nodes. Select a few data points or rows in one view, see the same selection being applied to all other appropriate views and use range sliders to filter out data points.

Additionally the interactive options in the views have been cleaned up and unified and are presented in a menu, which can be activated by clicking the appropriate button. If you need to see more, the browser's fullscreen API can be used on every view, which is very helpful in more complex layouts.

To top this of, there are also two new visualizations, both of which support interactivity: a parallel coordinates plot and an interactive decision tree view.

With all of these features in place you can create more stunning and engaging Web Portal applications and Guided Analytics workflows than ever before.

 

KNIME WebPortal Interactivity - showing selection, filtering and the new JavaScript parallel coordinates plot

 

JavaScript Decision Tree View

More REST methods and token-based auth

 

The KNIME Server can now create time-limited access tokens for a user (based on JWT). Token-based access methods are often preferred over (encrypted) password protected access as it allows different systems to be tied together. Calling "GET /rest/session" will create a token that can be used in subsequent calls using the "Authorization: Bearer" header.

We also added some additional REST calls:

  • Canceling of running jobs (DELETE /rest/v4/jobs/xyz/execution)
  • Setting permissions on repository items (POST /rest/v4/repository/item:permissions)

 

 

KNIME Big Data Extensions

Spark File Reader and Writer nodes

 

A set of nodes was added to support reading and writing most common big data file formats such as Avro, ORC and Parquet from HDFS in Spark.

 

Virtual Data Warehouse

 

Your virtual data warehouse is just a KNIME workflow away with the new Database to Spark and Spark to Database nodes that allow you to read and write data from any JDBC compliant database within Spark.

 

Spark SQL Query

 

If you speak SQL, you can now ask the right questions of your data in Spark with the new Spark SQL Query node that comes with syntax highliting and query completion.

 

Many other improvements have been made under the hood – please refer to the changelog.