The Data Explorer Node

Before diving even deeper into the analysis, there is already a lot you can learn from your data by simply exploring the statistical properties of the different input columns.

In order to get more familiar with data visualization in KNIME, we would like to start by showing the Data Explorer (Javascript) node. Its View is suitable for an univariate visual exploration of the input data table.

By interacting with the Data Explorer view, the user can get insights on the columns statistical properties and can apply the domain expertise to remove irrelevant columns. The output data table of the Data Explorer node will contain only the columns that remained after the selection process in the node view.

 

The workflow used in this video is available on the public KNIME EXAMPLES server under
03_Visualization/02_JavaScript/11_Univariate_Visual_Exploration_with_Data_Explorer

The Data Explorer (JavaScript) Node is part of the extension KNIME JavaScript Views (Labs).

To install a KNIME Extension, follow instructions in this video: https://youtu.be/8HMx3mjJXiw

The KNIME workflow in this video contains data from WeatherUnderground.com, from the Austin KATT station, which is released under GPLv2.

Data source: https://www.wunderground.com/history/airport/KATT/
Data license: https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

Exercise

  • Read the adult.csv dataset.
  • Inspect the properties of the data with the Data Explorer (JavaScript) node.
  • How many different education levels are represented in the data?
  • In the interactive view, exclude the columns including missing values.
  • Which of the columns contain missing values? How many missing values each?
Solution

In the data, there are 16 unique values in the “education” column. The “workclass” column contains 1,836 missing values, the “occupation” column contains 1,843 missing values and the “native-country” column contains 583 missing values. Exclude these column by first selecting them in the interactive view and then clicking “Apply” and “Close”. Check the output table that it does not contain these columns.