New Nodes and New Features
- GUI and Infrastructure
- Database
- Social Media
- Data Manipulation
- Data Mining / Statistics
- PMML
- Data Generation
- Textprocessing
- Image Import
Other Improvements on existing Nodes and Features
- Data Manipulation
- Data Mining
- Open Street Map Integration
- See full list of changes in changelog file
New Nodes and New Features
GUI and Infrastructure
Important Information about Updating KNIME for Windows UsersKNIME 2.10 is using the latest Java version (Java 7, update 60) for all supported systems: Windows, Mac and Linux. This Java update addresses some instabilities on some flavors of Linux and Mac.
|
|
New Intro PageWhen you open KNIME, a new intro page greets you. From there you can move directly to useful information (learning hub, example workflows...) or trigger the most common actions (create new workflow, open recent workflows, install extensions). This page also replaces the previous pop up window containing tips&tricks and news. |
|
Auto-saveNow workflows can rely on the new auto-save feature. Enable it in File -> Preferences -> KNIME -> KNIME GUI If there is an unexpected break, your workflow will be recovered in the next KNIME session from the latest auto-save file. Note, this feature is disabled by default. Just go to the preferences and enable it if you find it useful.
|
|
Database
Remember the Database Connector node? It used to establish a connection to a database and to select a table from the database through an SQL query. These two tasks now have been separated and have each their dedicated nodes. A Database Connector node establishes a connection to a database, while an SQL Table Selector node defines an SQL query at least to select the table to work on and maybe more. The new red square port represents a database connection with no selected table. Following these refinements in the node tasks, the old Database Connector node has been renamed Database Table Connector and the Database Connection Reader / Write nodes have been renamed Database Connection Table Reader / Writer. Below some of the new Connector and SQL creator nodes are described. |
|
Database Connector NodesThe generic Database Connector node establishes a connection to a Database through its JDBC driver. No SQL query definition is possible. The JDBC drivers to the most commonly used and most recent database versions are pre-loaded in KNIME and available in the node configuration window.However, dedicated connector nodes are available for specific databases, such as:
|
|
Database Table SelectorThis node provides an interactive editor to build the SQL query to apply to a database connection established with a Database Connector node (see item above). The main task of this node is to select the table to work on from the database. |
|
SQL Extract / Inject and Database SQL ExecutorThe SQL Inject node applies an SQL query contained in a variable into a database connection established with a Database Connector node (see item above). It works similarly to the Database Table Selector node using a flow variable content rather than an SQL editor. The SQL Extract node, on the other side, exports an SQL query from a database port into a variable. The Database SQL Executor node provides the editor to create and then apply an SQL query to a database connection produced by a Database Connector node (see item above). |
|
Hive Loader (Cloudera certified, commercial license required)This node, together with the Hive Connector node, belongs to the big data extension. The Big Data Extension can be installed via the KNIME Update Manager. Go to File → Install KNIME Extensions... and select the appropriate extension from the KNIME Extension Store category. Note The KNIME Big Data extension requires a commercial license which you can purchase via the KNIME Store. The Hive Loader node loads data into a Hive database, using the File Handling extensions. First, it copies the data onto the Hive server (using SSH, FTP or any other supported protocol — note that currently remote access to the HiveServer node is required); then a Hive command is executed to import the data into Hive. The node's output is a database connection operating on the imported data table. |
|
Database Sorter / GroupBy / JoinerThese new nodes - Database Sorter, Database GroupBy, Database Joiner - apply to an existing database connection and SQL query. They refine the SQL query to sort, aggregate, or join data tables from one or more databases, respectively using ORDER BY, GROUP BY, and JOIN. Their configuration window presents the same structure as in the corresponding Data Manipulation nodes: Sorter, GroupBy, and Joiner. Notice that the Database Joiner node allows to join tables from two different databases connections.
|
|
Social Media
Google APIUnder KNIME Labs, there are now three nodes dedicated to connect to Google APIs.
Once a connection to the Google API engine has been established, you can specifically connect to Google Analytics using the other two nodes:
Note. You need a Google service account and a key file to access Google APIs, as described in https://developers.google.com/accounts/docs/OAuth2ServiceAccount. |
|
Twitter APIAgain under KNIME Labs, there are now a few nodes to connect to the Twitter API.
Once a connection with Twitter API has been established, you can:
Notes. Twitter's search service and, by extension, Twitter Search API is not meant to be an exhaustive source of tweets. Not all tweets are indexed or made available via the Twitter Search interface/API (see http://apivoice.com/2012/07/12/the-twitter-firehose/) To access the Twitter API you need a Twitter API key and an access token, as described in https://dev.twitter.com/docs/faq#7447. |
|
Data Manipulation
7 new Distance Nodes
The same distance functions are available in all nodes requiring a distance measure. Implemented distance measures are described in the KNIME wiki. |
|
Moving AggregationThis new node calculates a number of aggregation values for a moving window. The aggregation values are displayed in new columns appended at the end of the table. This node also calculates a cumulative sum from the beginning till the end of the data set when the "Cumulative computation" flag is enabled. |
|
Date/Time ShiftThis new node shifts a DateTime value backwards (negative) or forward (positive) a number of years, months, weeks, days, hours, minutes, seconds, or milliseconds. The time shift can be applied to the values in a DateTime column, to a fixed DateTime value, or to the current time. |
|
Time Series Missing ValuesThe new Time Series Missing Value node handles missing values in time series:
And for numerical columns only:
|
|
Data Mining/Statistics
New View in Statistics NodeThe Statistics node has a new view including skewness, Kurtosis, and histogram. |
|
Time Series AnalysisNew metanodes for Time Series Analysis:
|
|
PMML
PMML 4.2 and JPMML 1.1.3PMML and JPMML formats have now been upgraded to the latest PMML 4.2 and JPMML 1.1.3. All learning nodes supporting PMML now support PMML 4.2. New addition: the Naive Bayes Learner node now supports PMML 4.2 as well. |
|
XML to PMMLThis new node transforms a column of XML values into a column of PMML values. |
Data Generation
Random Number Assigner (Apache)This new Random Number Assigner is based on the Random Number Generation of the Apache Commons Math Library. It provides 10 random distributions, from simple Uniform, to Cauchy, ChiQuare and many more. Due to its simple configuration panel it is especially helpful for generating data in a loop. |
|
Random Boolean AssignerThis new node will generate a column containing boolean values. Boolean values are randomly assigned to each row either using a fixed number of TRUE values or following a pre-defined probability. |
|
Textprocessing
Sentiment TagsetsA new tag set for sentiment tagging with the following tags: VERY_POSITIVE, POSITIVE, NEUTAL, NEGATIVE, VERY_NEGATIVE, UNDERSTATEMENT, EXAGGEGATION, IRONY, AMPLIFICATION, NEGATION, ABBREVIATION, UNKNOWN This tag set can be used by tagger nodes, e.g. Dictionary or Wildcard Tagger to assign sentiment tags to terms. Besides positive and negative tags, other tags line irony, amplification or negation can be assigned as well. |
|
The Topic Extractor (Parallel LDA)The Topic Extractor (Parallel LDA) node extracts topics and assigns these topics to documents. Each topic is represented by a set of terms. The number of topics to extract from a list of documents can be specified in the dialog. The output of the node consists of three data tables. The topics as set of terms, the list of documents with assigned topics (as id), and a log likelihood score of the topic model for each iteration, which represents the convergence rate of the training. Simple parallel threaded implementation of LDA, following Newman, Asuncion, Smyth and Welling, Distributed Algorithms for Topic Models published at KDD (2009). |
|
Image Import
Read ImagesThis new node reads image files (SVG or PNG) using the URLs specified in an input data column. |
|
String to SVGThis new node converts String values, including XML, from an input data column into SVG images. |
|
Improvements on existing Nodes and Features
Data Manipulation
|
Data Mining
|
Open Street Maps Integration
|
Many more small improvements have been made under the hood - please refer to the changelog file.