KNIME Spark Executor

KNIME® Spark Executor is a set of nodes used to create and execute Apache Spark applications with the familiar KNIME Analytics Platform. Visual programming allows code-free big-data science, while scripting nodes allow detailed control when desired. This KNIME Spark Executor library of nodes enables you to:

 
  • Data and tool blending.
  • Importing, exporting, and accessing data with Hive, HDFS, KNIME Analytics Platform, or KNIME Server.
  • Predictive analytics and scoring on Spark using PMML models built with KNIME workflows.
  • Embedding of Java Spark applications into KNIME workflows.
  • Mixing & matching of local and Hadoop execution within the same workflow.

Spark Node Categories

This library includes nodes to perform the following functions on Spark:

  • I/O
  • Manipulation
  • Dimensionality Reduction
  • Machine Learning (Extensive List)
  • Statistics
  • Scoring


MLlib Integration

Integrate Spark’s scalable machine learning library into your workflows to perform:

  • Classification
  • Regression
  • Clustering
  • Collaborative Filtering
  • Dimensionality Reduction

Functionality

KNIME Spark Executor provides a variety of new KNIME nodes that allow you to create and execute Spark applications without any programming. The new nodes offer seamless, easy-to-use data mining, scoring, statistics, data manipulation and data import/export on Spark from within KNIME Analytics Platform:

  • Integration with Spark MLlib enables complex statistics and powerful machine learning in Spark directly from KNIME Analytics Platform (or KNIME Server), resulting in a collection of the most popular algorithms for
    • classification (decision tree, naïve bayes, …)
    • regression (logistic regression, linear regression, …)
    • clustering (k-means)
    • collaborative filtering (ALS)
    • dimensionality reduction (SVD, PCA)
  • Use PMML models built with KNIME Analytics Platform (or KNIME Server) for prediction in Spark.
  • Preprocess and manipulate data in Spark.
  • Import, export, and access data in Hive, HDFS, KNIME Analytics Platform (or KNIME Server) within Spark.
  • Embed existing Java Spark Applications into your KNIME workflow

(click on the image to see it in full size)

Usage example: K-Means clustering on Spark with data from Hive

The Hive to Spark node imports the results of a Hive query into a Spark DataFrame, keeping the column schema information. A Spark DataFrame is a dataset that is stored in a distributed fashion on your Hadoop cluster. In this example, the Spark Partitioning node first splits the DataFrame into training and test data. The training set flows into the Spark k-Means node that trains a clustering model (using Spark's MLlib) on the data and hands it to the Spark Cluster Assigner node. This node uses the model to label the previously unseen test data. Finally, the Spark to Hive node stores the labeled data back into a Hive table. The Spark to Table node imports the labeled test data into KNIME Analytics Platform.

See the KNIME Spark Executor extension in action!

We highly recommend watching this video to get a feel for what you can do with KNIME Spark Executor.

 

Compatibility

KNIME Spark Executor supports the following Hadoop distributions:

  • Hortonworks HDP 2.2 with Spark 1.2
  • Hortonworks HDP 2.3.0 with Spark 1.3
  • Hortonworks HDP 2.3.4 with Spark 1.5
  • Hortonworks HDP 2.4.x with Spark 1.6
  • Hortonworks HDP 2.5.x with Spark 1.6 and Spark 2.0
  • Cloudera CDH 5.3 with Spark 1.2
  • Cloudera CDH 5.4 with Spark 1.3
  • Cloudera CDH 5.5 with Spark 1.5
  • Cloudera CDH 5.6 with Spark 1.5
  • Cloudera CDH 5.7 with Spark 1.6 and Spark 2.0
  • Cloudera CDH 5.8 with Spark 1.6 and Spark 2.0
  • Cloudera CDH 5.9 with Spark 1.6 and Spark 2.0
  • Cloudera CDH 5.10 with Spark 1.6 and Spark 2.0
  • Cloudera CDH 5.11 with Spark 1.6 and Spark 2.0

 

Installation steps

For KNIME Analytics Platform 3.4 and KNIME Server 4.5

Step 1: Obtain License   KNIME Spark Executor requires a license which you can purchase via the KNIME Store. To give it a try, please request a free 30 day trial license.

 

Step 2: Install Software  Now, you need to install (i) a client-side extension for KNIME Analytics Platform and (ii) the server-side Spark Jobserver.  Please follow the installation guide below:

Step 3: Download License   Finally, to be able to use the installed extension, you need to download your previously obtained license into KNIME Analytics Platform as described here.

For KNIME Analytics Platform 3.3 and KNIME Server 4.4

Step 1: Obtain License   KNIME Spark Executor requires a license which you can purchase via the KNIME Store. To give it a try, please request a free 30 day trial license.

 

Step 2: Install Software  Now, you need to install (i) a client-side extension for KNIME Analytics Platform and (ii) the server-side Spark Jobserver.  Please follow the installation guide below:

Step 3: Download License   Finally, to be able to use the installed extension, you need to download your previously obtained license into KNIME Analytics Platform as described here.

For KNIME Analytics Platform 3.2 and KNIME Server 4.3

Step 1: Obtain License   KNIME Spark Executor requires a license which you can purchase via the KNIME Store. To give it a try, please request a free 30 day trial license.

 

Step 2: Install Software  Now, you need to install (i) a client-side extension for KNIME Analytics Platform and (ii) the server-side Spark Jobserver.  Please follow the installation guide below:

Step 3: Download License   Finally, to be able to use the installed extension, you need to download your previously obtained license into KNIME Analytics Platform as described here.

For KNIME Analytics Platform 3.1 and KNIME Server 4.2

Step 1: Obtain License   KNIME Spark Executor requires a license which you can purchase via the KNIME Store. To give it a try, please request a free 30 day trial license.

 

Step 2: Install Software  Now, you need to install (i) a client-side extension for KNIME Analytics Platform and (ii) the server-side Spark Job Server.  Please follow the installation guide below:

Step 3: Download License   Finally, to be able to use the installed extension, you need to download your previously obtained license into KNIME Analytics Platform as described here.

For KNIME Analytics Platform 2.12 and KNIME Server 4.1

Step 1: Obtain License   KNIME Spark Executor requires a license which you can purchase via the KNIME Store. To give it a try, please request a free 30 day trial license.

 

Step 2: Install Software  Now, you need to install (i) a client-side extension for KNIME Analytics Platform and (ii) the server-side Spark Job Server. Please follow the installation guide below:

Step 3: Download License  Finally, to be able to use the installed extension, you need to download your previously obtained license into KNIME Analytics Platform as described here.

Availability

KNIME Spark Executor is available in the KNIME Store. You can also request a free 30 day trial license here. Please have a look at the KNIME Big Data Extensions product sheet or contact us at bigdata@knime.com.