Authors: Andisa Dewi and Tobias Koetter
The focus today is to show how to perform data exploration and visualization on a large dataset using KNIME Big Data Extensions and make the whole process interactive via the KNIME WebPortal. The data that we will use is the hugely popular NYC taxi dataset.
The idea of this workflow is to explore the taxi dataset step by step. We start with a general overview of the entire dataset and then, in the following step, we filter directly right on the interactive view, e.g select the specific years we want information on, or choose a particular taxi type, then zoom in on the particular subset of data that we are most interested in. The next step involves visualizing the selected subset subsequently. The last step shows visualizations of taxi trips of a certain taxi type in a specific certain NYC borough over during certain years. All the visualizations are accessible via the KNIME WebPortal and the computation is done on a Hadoop cluster using the KNIME Big Data Extension.