Exploratory Data Analysis with KNIME: Univariate and Bivariate Visual Exploration

Mon, 11/12/2018 - 10:00 paolotamag

Author: Paolo Tamagnini

The first step in data science is always data exploration, where we try to understand single attributes and their relationships with each other. Such exploratory analysis can be of two kinds: univariate and multivariate. We will limit the multivariate exploration here to bivariate exploration.

The univariate case considers data columns individually, while the bivariate case takes into account one pair of columns at a time.


Interactive univariate visual exploration

In univariate exploratory data analysis it is common practice to read through the dataset simple statistics one column at a time. Inspecting descriptive statistics, like mean and variance for numerical columns or frequencies of unique values in nominal columns, can be the simplest way to investigate the values within an attribute.

KNIME Analytics Platform has a dedicated node for the preliminary and generic visual exploration of the data at hand - the Data Explorer node. The Data Explorer node provides an interactive view for the univariate exploration of numerical and nominal features.

Have a look at this video about interactive univariate data exploration using the Data Explorer node.

 

Interactive bivariate visual exploration with a scatter plot

In bivariate exploratory data analysis we look for interesting relationships within pairs of columns.

The best and most intuitive way to find interesting relationships among columns is to visualize the data, pair by pair. There are different charts and techniques you can use to do that. The most popular visualization plot in this field is the scatter plot.

In KNIME Analytics Platform you can use the Scatter Plot (JavaScript) node to interactively visualize relationships among different columns in pairs in a dataset.

Furthermore you can also display values from a third column by encoding them in color patterns with the Color Manager node.

Check out this video about interactive bivariate visual exploration with a scatter plot.