Classic Aggregations with GroupBy node

A classic aggregation operation consists of two steps: identifying data groups and calculating the aggregation method on the selected groups.

Groups are identified based on the values in one or more selected columns.

Possible aggregation methods can be many: from more or less complex statistical measures to range; from value listing to value counting; from first to last value in a group; and more.

Check the following videos to know more about grouping and aggregation methods.

 

 

 

Reference workflows are on the EXAMPLES server under:
02_ETL_Data_Manipulation/02_Aggregations/06_Basic_Examples_for_Using_the GroupBy_node02_ETL_Data_Manipulation/02_Aggregations/06_Basic_Examples_for_Using_the GroupBy_node*
and
02_ETL_Data_Manipulation/02_Aggregations/07_Advanced_Usage_of_the GroupBy_node02_ETL_Data_Manipulation/02_Aggregations/07_Advanced_Usage_of_the GroupBy_node*

Exercise

Read adult.csv data set. Then:

  • calculate total number of rows and average age for all Female with income >50K per year
  • on each one of the 4 groups defined by sex and income values, calculate the average of all numerical columns
  • on full input table count:
    • rows with missing values in column “occupation”
    • all rows in column “occupation”
    • rows with no missing value in column “occupation”
    • all rows in another column (i.e. marital-status). Notice that this number should be the same as the number for all rows in column “occupation”.

 

Solution
  • Rows with Female earning > 50K /year are 1179 with average age 42.126 years old

     


    (click on the image to see it in full size)
  • On full input table:
    • 1843 number of rows with missing value in column “occupation”
    • 32561 total number of rows in column “occupation” (or any other column)
    • 30718 number of rows with no missing value in column “occupation”
    • 32561 total number of rows in column “marital-status” (or any other column)

A possible solution can be found inside the workflow on EXAMPLES Server:
02_ETL_Data_Manipulation/02_Aggregations/08_More_GroupBy_Examples02_ETL_Data_Manipulation/02_Aggregations/08_More_GroupBy_Examples*

 

 


* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)