Scatter Plot

The best and most intuitive way to find interesting relationships among columns is to visualize the data, pair by pair. A number of different charts and visualization techniques are available for that. The most popular one is the scatter plot.

In KNIME Analytics Platform you can use the Scatter Plot (JavaScript) node to interactively visualize the relationship between two columns in a dataset. Furthermore, you can display values from a third column by assigning colors to them with the Color Manager node.

The workflow used in this video is available on the public KNIME EXAMPLES server under
03_Visualization/02_JavaScript/12_Bivariate_Visual_Exploration_with_Scatter_Plot

Exercise

  • Read the wine.csv dataset.
  • Assign colors to the values in “color” column with the Color Manager node.
  • Draw a scatter plot of “alcohol” column vs. “density” column.
  • Do you observe any particular relationship between these two columns?
  • Switch to mouse mode “Select” in the interactive view and select the outlier point(s) in the plot, that is the most distant data point(s) from the main cloud of points.

Are the selected data points red wines or white wines?

Solution

The following relationships can be observed:

The higher the percentage of alcohol, the lower the density of the wine.

White wines tend to have a lower density than red wines.

There is one remarkable outlier with a high density value. It is a white wine.