SparkSQL meets HiveQL

This workflow builds a line plot of the age distribution for men and women in Maine (US) over the last 5 years. In particular, women data is processed via Hive SQL and men data via Spark SQL. Will they blend? The whole data set is initially read from a Hadoop Hive installation. .... and yes, Spark SQL and Hive SQLdo blend!

Recommendation Engine w Spark Collaborative Filtering

1. Create local Spark Context; 2. Read ratings.csv and movies.csv from movie-lens dataset into Spark (; 3. Ask user for rating on 20 random movies to build user profile and include in training set; 4.Train Spark Collaborative Filtering Learner (Alternating Least Squares) algorithm…; 5.

Spark MLlib Decision Tree

This workflow demonstrates the usage of the Spark MLlib Decision Tree Learner and Spark Predictor. It also demonstrates the conversion of categorical columns into numerical columns which is necessary since the MLlib algorithms only support numerical features and labels.

Big Data Irish Meter on Spark only

This workflow uses a portion of the Irish Energy Meter dataset, and presents a simple analysis based on the whitepaper "Big Data, Smart Energy, and Predictive Analytics". It is intended to highlight KNIME's Big Data and Spark functionality in the 3.6 release. The workflow creates a Local Big Data Environment, loads the meter dataset to Hive, and then transfers it into Spark. It uses a series of Spark SQL nodes to create datetime fields, and then uses Spark nodes to aggregate energy usage over these datetime fields.

Modularized Spark Scripting

This workflow demonstrates the usage of the different Spark Java Snippet nodes to read a text file from HDFS, parse it, filter it and write the result back to HDFS.
You might also want to have a look at the provided snippet templates that each of the node provides. In order to do so simply open the configuration dialog of a Spark Java Snippet node and go to the Templates tab.

Subscribe to Spark Executor

What are you looking for?