Spark MLlib Decision Tree

This workflow demonstrates the usage of the Spark MLlib Decision Tree Learner and Spark Predictor. It also demonstrates the conversion of categorical columns into numerical columns which is necessary since the MLlib algorithms only support numerical features and labels.

Spark MLlib Decision Tree

 

Modularized Spark Scripting

This workflow demonstrates the usage of the different Spark Java Snippet nodes to read a text file from HDFS, parse it, filter it and write the result back to HDFS.
You might also want to have a look at the provided snippet templates that each of the node provides. In order to do so simply open the configuration dialog of a Spark Java Snippet node and go to the Templates tab.

SparkSQL meets HiveQL

This workflow builds a line plot of the age distribution for men and women in Maine (US) over the last 5 years. In particular, women data is processed via Hive SQL and men data via Spark SQL. Will they blend? The whole data set is initially read from a Hadoop Hive installation. .... and yes, Spark SQL and Hive SQLdo blend!

Subscribe to Spark Executor