Pivoting

The pivoting function requires one or more grouping columns to define the rows, and one or more pivoting columns to define the columns of the pivot table.

The rows and columns define unique sub-groups of the data. These sub-groups can then be summarized by aggregated measures. The possible aggregations range from listing and counting values, to calculations on date & time, and to statistical measures.

The videos below introduce the pivoting function together with basic and advanced pivoting examples using the Pivoting node.

 

 

 

Reference workflows are on the EXAMPLES server under:

02_ETL_Data_Manipulation/02_Aggregations/09_Examples_for_Using_the_Pivoting_Node

Exercise

Read the adult.csv dataset.

1. Using the “age-bin” column as the grouping column and “workclass” column as the pivoting column, calculate the number of people in groups according to their work class and age bin.

     1a. What is the most common combination of age bin and work class?

     1b. How many people belong to this group?

2. Add “income” column to the grouping columns, and take a look at the income class of the people in the biggest group you calculated in the previous step.

     2a. Do most of them earn more or less than 50K per year?

3. Remove “income” from the grouping columns again, and change the aggregation method:

     3a. What is the most widespread education level according to age bin and work class?

     3b. In which work classes is the most widespread education level of the youngest bin higher than the education of the oldest bin?

Solution

1 a & b: The biggest group (10936) are the people who are younger than 35 years old and work in the private sector.

2 a. The majority (9842) of these people earn less than 50K.

3 a & b: The youngest age bin has a higher most widespread education level than the oldest age bin in the work classes “local-gov”, “state-gov” and “federal-gov”.

You can download the solution workflow here.