We built a workflow to train a model. It works fast enough on our local, maybe not so powerful, machine. So far.
The data set is growing. Each month a considerable number of new records is added. Each month the training workflow becomes slower. Shall we start to think of scalability? Shall we consider big data platforms? Could my neat and elegant KNIME workflow be replicated on a big data platform? Indeed it can.
The KNIME Big Data Extensions offers nodes to build and configure workflows to run on the big data platform of choice. The cool feature of the KNIME Big Data Extensions consists in the nodes GUI. The configuration window for each Big Data node has been built as similar as possible to the configuration window of the corresponding KNIME node. The configuration window of a Spark Joiner node will look exactly the same as the configuration window of a Joiner node.
Thus, it is not only possible to replicate your original workflow on a Big Data Platform, it is also extremely easy, since you do not need to learn new scripts or tools instructions. The KNIME Big Data Extensions brings the ease of use of KNIME into the scalability of Big Data.
This video shows how we replicated an existing classical analytics workflow on a Big Data Platform.
The workflows used in the video can be found on the KNIME EXAMPLES server under 50_Applications/28_Predicting_Departure_Delays/02_Scaling_Analytics_w_BigData50_Applications/28_Predicting_Departure_Delays/02_Scaling_Analytics_w_BigData.knwf*
* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)