KNIME Big Data Connectors

KNIME Big Data Connectors allow easy access to Apache Hadoop data from within KNIME Analytics Platform and KNIME Server. This extension offers a set of KNIME nodes for accessing Hadoop/HDFS via Hive or Impala and ships with all required libraries.

Nodes Included: HDFS Connection, webHDFS Connection, HttpFS Connection, HDFS File Permission, Hive Connector, Hive Loader, Impala Connector, Impala Loader

  • Move data between KNIME Analytics Platform or KNIME Server and Hive/Impala
  • Write Hive/Impala SQL queries using the standard KNIME Database Query node
  • Process SQL queries directly in Hive and Impala using standard KNIME database nodes

 

Compatibility

KNIME Big Data Connectors are certified by Cloudera for CDH 5.x, by Hortonworks for HDP 2.1 and 2.2, by MapR for MapR 4.1. and have also been tested against Hadoop 2.4.0 and Hive 0.13.

 

 

Usage example: Querying data in Hive

The Hive Connector node establishes a connection to a Hive database. Once executed, the node returns a database connection that can be used with almost any of KNIME Analytic Platform's standard database nodes.

Usage example: Loading data into Hive

Hive itself does not allow modification of data – only inserting and appending data are supported. Since data import does not follow standard SQL/JDBC procedures, KNIME Big Data Connectors provide a special Hive Loader node. The node makes use of the File Handling extensions and first copies the data onto the Hive server, using HDFS, webHDFS, HttpFS, SSH, FTP or any other supported protocol. A Hive command is then executed to import the data into Hive. The node's output is a database connection operating on the imported table. The Hive Loader node allows you to specify the directory in the remote file system into which the data are uploaded, the table's name, whether existing tables should be dropped, or whether data should be appended, and by which columns data should be partitioned: