The best and most intuitive way to find interesting relationships among columns is to visualize the data, pair by pair. A number of different charts and visualization techniques are available for that. The most popular one is the scatter plot.
The workflow used in this video is available on the public KNIME EXAMPLES server under
- Read the wine.csv dataset.
- Assign colors to the values in “color” column with the Color Manager node.
- Draw a scatter plot of “alcohol” column vs. “density” column.
- Do you observe any particular relationship between these two columns?
- Switch to mouse mode “Select” in the interactive view and select the outlier point(s) in the plot, that is the most distant data point(s) from the main cloud of points.
Are the selected data points red wines or white wines?
The following relationships can be observed:
The higher the percentage of alcohol, the lower the density of the wine.
White wines tend to have a lower density than red wines.
There is one remarkable outlier with a high density value. It is a white wine.