Spark Executor

SparkSQL meets HiveQL

This workflow builds a line plot of the age distribution for men and women in Maine (US) over the last 5 years. In particular, women data is processed via Hive SQL and men data via Spark SQL. Will they blend? The whole data set is initially read from a Hadoop Hive installation. .... and yes, Spark SQL and Hive SQLdo blend!

Read more about SparkSQL meets HiveQL

Parameter Optimization in Spark

This workflow mixes standard KNIME nodes with the Spark nodes to find the optimal parameters for a k-means clustering using the hillclimbing approach.

Read more about Parameter Optimization in Spark

Recommendation Engine w Spark Collaborative Filtering

1. Create local Spark Context; 2. Read ratings.csv and movies.csv from movie-lens dataset into Spark (https://grouplens.org/datasets/movielens/); 3. Ask user for rating on 20 random movies to build user profile and include in training set; 4.Train Spark Collaborative Filtering Learner (Alternating Least Squares) algorithm https://www.infofarm.be/articles/alternating-least-squares-algorithm-re…; 5.

Read more about Recommendation Engine w Spark Collaborative Filtering

Spark MLlib Decision Tree

This workflow demonstrates the usage of the Spark MLlib Decision Tree Learner and Spark Predictor. It also demonstrates the conversion of categorical columns into numerical columns which is necessary since the MLlib algorithms only support numerical features and labels.

Read more about Spark MLlib Decision Tree

Hive to Spark to Hive

This workflow demonstrates the usage of the Hive to Spark and Spark to Hive nodes that allow you to transfer data between Apache Spark and Apache Hive.

Read more about Hive to Spark to Hive

PMML to Spark Comprehensive Mode Learning Mass Prediction

This workflow demonstrates the usage of the Spark Compiled Model Predictor node which converts a given PMML model into machine code and uses the compiled model to predict vast amounts of data in parallel within Apache Spark.

Read more about PMML to Spark Comprehensive Mode Learning Mass Prediction

Big Data Irish Meter on Spark only

This workflow uses a portion of the Irish Energy Meter dataset, and presents a simple analysis based on the whitepaper "Big Data, Smart Energy, and Predictive Analytics". It is intended to highlight KNIME's Big Data and Spark functionality in the 3.6 release. The workflow creates a Local Big Data Environment, loads the meter dataset to Hive, and then transfers it into Spark. It uses a series of Spark SQL nodes to create datetime fields, and then uses Spark nodes to aggregate energy usage over these datetime fields.

Read more about Big Data Irish Meter on Spark only

Mass Learning Event Prediction MLlib to PMML

This workflow demonstrates the usage of the Spark MLlib to PMML node. Together with the Compiled Model Predictor and the JSON Input/Output node it can be used to model a so called lambda architecture which learns a machine learning model at scale on historical data offline and predicts events online using the learned model.

Read more about Mass Learning Event Prediction MLlib to PMML

Learning Asociation Rule for Next Restaurant Prediction

In this workflow we demonstrate how to use the KNIME Spark nodes for giving locality recommendations. For this we are using the Yelp reviews as provided by the Kaggle challenge (https://www.kaggle.com/yelp-dataset/yelp-dataset). We wanted to find good next localities (e.g., restaurants) for everyone who to date only gave one review.

Read more about Learning Asociation Rule for Next Restaurant Prediction

Modularized Spark Scripting

This workflow demonstrates the usage of the different Spark Java Snippet nodes to read a text file from HDFS, parse it, filter it and write the result back to HDFS.
You might also want to have a look at the provided snippet templates that each of the node provides. In order to do so simply open the configuration dialog of a Spark Java Snippet node and go to the Templates tab.

Read more about Modularized Spark Scripting

Subscribe to Spark Executor

What are you looking for?