Basic Row Filter

One of the first tasks in data analytics is to extract only some of the records available in the original data set. For example, it is a common operation to extract data only for a given month or a given country for sale reports; to remove outliers in survey data; to get rid of missing records in sensor derived time series; etc.

The following videos show how to implement a number of simple row filter rules using pattern matching, numerical range comparison, and missing value identification as filtering criteria.

 

 

 

 

Reference workflow is on the EXAMPLES server under:
02_ETL_Data_Manipulation/01_Filtering/03_Row_Filtering02_ETL_Data_Manipulation/01_Filtering/03_Row_Filtering*

Exercise

Read adult.csv data set. Then:

  • exclude rows where marital-status is missing.
  • On the remaining rows:
    • extract rows where marital-status = "Divorced"
    • extract rows where marital-status = "Never-married" AND 20 <= age <= 40 AND workclass starts with "S"

 

Solution
  • Resulting table, excluding rows where marital-status is missing, contains 32561 rows
  • On the remaining rows:
    • Table with only rows where marital-status = “Divorced” has 4443 rows
    • Rows where marital-status = "Never-married" AND 20 <= age <= 40 AND workclass starts with "S" are 674.

A possible solution can be found inside the workflow on EXAMPLES Server:
02_ETL_Data_Manipulation/01_Filtering/05_More_Row_Filter_Examples02_ETL_Data_Manipulation/01_Filtering/05_More_Row_Filter_Examples*

 

 


* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)