We are getting close to the holiday season and, like every year, we have a new holiday version of KNIME ready to go under the Christmas tree!
KNIME 2.11 was released on December 6 featuring improvements in both the open source KNIME Analytics Platform and KNIME Big Data Extension.
Always with an eye to producing a tool for data-driven innovation, changes in this new version have faithfully followed the guidelines of the “Open for Innovation” manifesto.
More Integration: Python, JSON, and Big Data
KNIME Analytics Platform now offers a new category featuring a full range of Python nodes based on a native Python installation as opposed to “Jython”, which is the old Java-based Python integration. This integrates more conveniently into your Python environment and provides a more extensive set of available Python libraries such as “scikit-learn” for machine-learning algorithms. Very similar to KNIME’s R-integration the nodes support interactive script development, including result preview, code templates, syntax highlighting, and auto-completion.
A second major integration step in the KNIME Analytics Platform involves a brand new category of nodes for processing JSON. JSON is a lightweight data format that is used in many service-oriented solutions like web services. Integrating those services into KNIME and parsing their results is now easier than ever.
On the Big Data integration side, we have a few new connector and loader nodes to connect to HP Vertica (KNIME Analytics Platform), Impala and Hadoop (KNIME Big Data Connectors).
Better Collaboration: Modular PMML
In the realm of collaboration across analysts and teams, when different tools from different vendors might even be used, PMML model generation has now been enhanced with a number of new nodes for modular PMML model generation. These new nodes efficiently add additional layers of transformations and mining models to the PMML model.
Agile Development: Writer Nodes Now Even More Flexible
To get results out even faster, all writer nodes now support a wider range of output location formats, from local file system paths to remote URLs (for instance “ftp://somehost/somepath/somefile) and workflow relative URLs (e.g. knime://knime.workflow/data/file.txt). Relative workflow URLs as well as remote URLs simplify prototyping as well as moving into production.
Tool Transparency: New GUI and Database Features
Use the new magnifying glass button next to the search box above the Node Repository − or alternatively Ctrl-Space − to search, select, and quickly insert multiple nodes within the workflow editor. Added benefit: It’s easy and quick to use and thanks to its fuzzy search it even forgives typos!
The Database GroupBy node now offers help to users who feel ignorant of the many SQL dialects out there. Indeed, the Database GroupBy node retrieves all aggregation functions available for a specific database and makes them available to the user in its configuration window, including a full description panel.
Powerful: More Data Mining Features and Algorithms
As if the integration of Python, JSON, and big data, the possibility to build modular PMML models, the improvements in file location handling in writer nodes, and the quicker search and insertion of new nodes were not enough to make the KNIME Platforms even more powerful, we have also enhanced data mining functionality with a generous sprinkling of new algorithms and improved features.
- A new node implements the DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm. The special feature of this clustering algorithm is that with the appropriate distance function it is able to cluster oddly shaped groups of data.
- The Target Shuffling node randomly reassigns values inside a selected (target) column. If done often in a loop it gives you a good estimate of the average quality of your selected data mining algorithm on (almost) random data. You would expect your model on the real data to be significantly better, and if not, you should reconsider your approach.
- A new kNN (k Nearest Neighbors) node now supports custom distance functions. So you no longer learn in Euclidean space only but can also apply the algorithm to other objects like strings (e.g. addresses), molecules, or images (via fingerprints).
With all these new nodes and features, distributed all around the 5 pillars of the open architecture, we are confident that this holiday release can boost your data analytics projects.
However, don’t just take our word for it. Download the new version and see for yourself!
Download the new KNIME Analytics Platform 2.11.
For more details about the new features and nodes in KNIME 2.11, check http://tech.knime.org/whats-new-in-knime-211