Joining and Concatenating


We have two datasets containing sales records. The first data set has sales records from 2008 to beginning of 2011. The second data set has additional sales records for 2011 only. Each record is a sale with date, price, etc ... We want to put the two datasets together keeping the same structure, just piling up sales from 2008-2011 and sales from 2011 only. To do that we use the Concatenate node. The structure of the two datasets is slightly different. The first dataset has a column "card" which the second dataset does not have.


A Joiner node joins two tables together on one or more common key values. Possible join modes: inner join, left outer join, right outer join, full outer join. Two tabs: "Joiner Settings" and "Column Selection". "Joiner Settings" defines the parameters for the join operation: join mode and column keys. "Column Selection" sets which columns to keep and/or drop and strategies to deal with duplicate columns.

JoinConcatenate Examples

On adult.csv data set: 1. calculate the average age and number of rows for the 4 groups (sex, income) and join the corresponding 2 values to each row in the group; 2. extract people with age between 20 and 40 and working in a work group starting with "S" and people with age between 40 and 60 and working in the Private sector (workclass starts with "P"). Put both groups in a single data table.

Subscribe to Joining and Concatenating