We use a sample of the airline data to detect outlier airports based on the average arrival delay in them. The techniques we apply are numeric outlier, z-score, DBSCAN and isolation forest. Outliers detected by each of these techniques are visualized on a map of US using the KNIME OSM integration.
3 filtering modes: manually, by tape, by name. - manually you decide which column to keep and which to let go, through Add and Remove buttons. - by type you decide the columns to keep based on their type, like all Strings or all Integers. - by name you decide which columns to keep based on their name through wildcards and Reg Ex
Not only simple filtering with the Row Filter node, but also: filtering according to more complex rules with Nominal Value Row Filter, Rule.based Row Filter, Java Snippet Row Filter, Reference Row Filter node; filtering on geographical coordinates with Geo-coordinate Row Filter node; filtering on a time window with Extract Time Window node; in-database row filtering with Database Row Filter node.
On adult.csv data set: exclude rows where marital-status is missing. On the remaining rows: a. extract rows where marital-status = "Divorced"; b. extract rows where marital-status = "Divorced" OR "Separated" using a Nominal Value Row Filter node; c. extract rows where marital-status = "Divorced" OR "Separated" using a Reference Row Filter node; d. extract rows where marital-status = "Never-married" AND 20
On adult.csv data set: 1. remove column "marital-status; 2. keep only column "marital-status"; 3. keep only String columns using a Column Filter node and then only column "marital-status" using a Reference Column Filter node
This workflow demonstrates how a certain table structure can be ensured with the help of the Table Validator node(s).
3 matching criteria on data colums: on String by full or partial pattern matching, on numbers by range, on missing values, all of them also on collection columns. 1 matching criterion on row numbers: from row number to row number. 1 matching criterion on RowID: full and partial patterm matching. Partial pattern matching is obtained through wild cards and RegEx. All matching criteria can be used in Include or Exclude mode. Include keeps the match results. Exclude excludes it.