Just in time for summer, the new version KNIME 2.10 has been released!
This release marks a new era for KNIME. It features a series of commercial products to incorporate big data strategies into the company analytics, enhance personal productivity, protect partners’ intellectual properties, and fit all collaboration and enterprise needs, from simple to challenging requests from both small-size companies and major multinational enterprises.
However, in this blog, I want to focus on the new features and the improvements introduced in the open source KNIME Analytics Platform version 2.10, now available for free.
One prominent set of changes involves the Database category.
Remember the Database Connector node? It used to establish a connection to a database and to select a table from the same database through a user defined SQL query. The SQL query could then be further perfected using other database nodes or inside the node SQL editor itself.
These two tasks, i.e. establishing the database connection and selecting the table to work on through an SQL query, have been separated and have now their dedicated nodes. A Database Connector node establishes the connection to a database, while an SQL creator node defines the SQL query to at least select the table to work on and maybe more.
Separating connection and table selection allows a much easier and more transparent use of multiple databases at the same time and to increase the portability of workflows relying on database operations. For example, it is possible to prototype the database access on a local SQLite database and then to move into production the same workflow to access a Hadoop database with just a single node change: the initial database connector node.
A generic Database JDBC Connector node can be used to establish the database connection. As before, the JDBC drivers of the most common and most recent database versions are pre-loaded in KNIME and available in this node’s configuration window. However, we have also added dedicated nodes connect to specific databases, like MySQL, PostgreSQL, and SQLite.
Once the connection to a database is available, operations within the database can be modeled as part of the KNIME workflow as well. To get started with added nodes allowing to sort, group and join tables (in addition to a generic SQL query node) and many more will come in future versions of KNIME.
You can, of course, use all of this with Hive, Impala or another Hadoop based database using the generic JDBC Connector node. However, our Cloudera certified, commercial big data connectors come with a dedicated HIVE Connector node which brings along all required libraries.
Another big innovation consists of the introduction of a number of distances, as new nodes and as new features in existing distance based nodes. In particular, there are 7 new nodes for calculating distance matrices using numerical, string, byte and bit vector distances, to name but a few.
Noteworthy is also the new Intro page, which welcomes you when you open KNIME and is your base from where you can move directly to useful information (learning hub, example workflows...) or trigger the most common actions (create new workflow, open recent workflows, install extensions). This page also replaces the previous pop up window containing tips&tricks and news.
In the Data Manipulation category, I would like to signal 2 new nodes: the Date/Time Shift node and the Moving Aggregation node. The first one shifts DateTime values, fixed or from a data column, of a pre-defined time, be it days, minutes, hours, or even years. The second node is an extension of the moving average node and performs aggregation on a moving window. A “Cumulative computation” flag uses the entire data set as a window and therefore calculates cumulative aggregations, such as cumulative sum for example.
Finally I would like to point out the upgrade of all PMML generated models to PMML 4.2 and the new XML to PMML node.
The introduced changes move mainly in the direction of simplifying database operability, quantifying differences by using appropriate distance measures, and jump starting your KNIMEing with the new Intro page!