Test your Knowledge. Assembling an ETL Workflow

At this point of the course, you should be able to run most of the common ETL operations. Let’s test your knowledge then!

One of the most common problems in data preparation is to move from a series of sparse information to a more compact summary. A classical case is purchase contracts. In a CRM system we have a number of contracts for different amounts, different dates, different countries, for each one of our customers. However, if we want to summarize the information for each customer, it becomes necessary to move from a long sequence of contracts to a summarized information row, such as: the number of days the customer has been with us (loyalty), the amount of money the customer has paid in total to the company so far, and many more.

In this exercise you need to:

  1. Read the file sales2008-2011.csv including a number of contracts with different customers. Each customer is identified by its unique Customer ID. Each contract is identified by its unique RowID.
  2. Convert the contract dates from String to DateTime type using the String to Date/Time node with format dd.MM.yyyy (if you open the file, you will see that this is the format used for the date column).
  3. For each country, extract the active customers. That is find out which customers are active in which countries
  4. For each customer, then calculate:
    1. Date of first contract
    2. Total amount of money throughout all contracts
    3. Countries where customer has signed contracts
    4. The number of days between the first and the last purchase
  5. Extract only those customers active in the USA and who paid in total > 5000 $ and those customers active in Germany and who paid between 1000 and 2000$ in total. Collect them together in a single table.
  6. For the selected customers, extract all their contracts from the original dataset and join the customer summary information to each contract data row.
  7. Write the resulting table to a CSV file in a "data" folder located in the workflow folder

 

Solution

A possible solution can be found inside the workflow on EXAMPLES Server:
02_ETL_Data_Manipulation/00_Basic_Examples/02_ETL_Basics02_ETL_Data_Manipulation/00_Basic_Examples/02_ETL_Basics*

The workflow is displayed in the following figure.

 


(click on the image to see it in full size)

 

 


* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)