Lesson 1. Date&Time & Databases

KNIME-Data-Wranglers-L1-Lesson1

The yellow ETL and data manipulation nodes include some special cases. Date and time values have a dedicated data type called “Date&Time”, with its own data manipulation nodes. In the same way, data that resides on a database can be manipulated with dedicated DB nodes. These DB nodes pass SQL queries from one node to another, which are executed directly on the database when needed. Here we have some videos that cover these special ETL cases.


This lesson includes exercises. The corresponding data files, solution workflows, and prebuilt, empty exercise workflows with instructions are available in the L2-DS KNIME Analytics Platform for Data Scientists - Advanced folder in the E-Learning repository on the KNIME Hub.

Date&Time Manipulation

Date and time values have their dedicated data type Date&Time. Date&Time values can be filtered, aggregated, and manipulated with the specific Date&Time manipulation nodes. In the videos below we show you a few examples of them: how to change the data type from string to the dedicated data type Date&Time, how to extract granularities from a timestamp into multiple columns, and how to modify timestamps to show the relevant digits.

 

 

 

A reference workflow Manipulating Date&Time Values is available on the KNIME Hub.

 

Exercise: Date&Time Manipulation

1) Read the daily_sales.csv file available in the data folder on the KNIME Hub

2) Convert the order times from string to Date&Time. Replace the string column.

3) Add the date “2019-09-03” to all time values. Create a new column called “order_time_date”.

4) Extract the hours of the orders. Are there especially many orders occurring in a certain hour of the day?

Empty exercise workflow 01_DateTime_Manipulation in the KNIME Hub course repository

 

Solution: Date&Time Manipulation

1-2) Download the daily_sales.csv file from the data folder on the KNIME Hub. Read the file with the File Reader node. Use the String to Date&Time node to convert the “order_time” column to Date&Time. You can save the individual configuration steps by pressing the “Guess data type and format” button in its configuration dialog. 

3) Use next the Modify Date node, and select “2019-09-03” in the field that activates, when you select the “Append date” option. Write “_time” as a suffix of appended columns. 

4) Continue the workflow with an Extract Date&Time Fields node, and check the “Hour” field. When you check the output table, you can find many orders between 2 p.m. and 3 p.m.

Solution workflow 01_DateTime_Manipulation - Solution in the KNIME Hub course repository

 

Database Data Manipulation

When your data resides on a database, you can use the DB nodes to perform ETL operations directly on the database. These nodes come with a visual user interface, and many of them look exactly the same as their KNIME native counterparts.

 

A reference workflow Databases - Advanced Usage is available on the KNIME Hub.

 

A reference workflow Database Structure Manipulation is available on the KNIME Hub.

 

A reference workflow Advanced Query Building is available on the KNIME Hub.

 

Exercise: Manipulating Data on a Database

1) Connect to the WebActivity.sqlite database available in the data folder on the KNIME Hub

2) Select the web_activity table

3) Calculate the number of customers in each web activity category

4) Calculate the maximum number of customers out of all web activity categories

5) Filter the aggregated table to the most popular web activity category

6) Filter the original table according to the most popular web activity category

7) Read the filtered data into a KNIME data table

Empty exercise workflow 02_Database_Data_Manipulation in the KNIME Hub course repository

 

Solution: Manipulating Data on a Database

1-2) Download the WebActivity.sqlite file from the data folder on the KNIME Hub. Connect to the database with the SQLite Connector node, and select the web_activity table with the DB Table Selector node.

3) Use the DB GroupBy node, and select WebActivity as the grouping column. Check “Add COUNT(*)” in the manual aggregation tab, or apply the aggregation method count to any column.

4-5) Continue the workflow branch with another DB GroupBy node. Apply the aggregation method MAX to the count column. Filter the first aggregated table by joining it with this table: use the DB Joiner node, select inner join, and set the count and max count columns as the joining columns.

6-7) Join the original web_activity table with the filtered aggregated table, i.e. the output of the DB Joiner node. Select inner join, and set WebActivity as the joining column in both tables. Read the data into KNIME with the DB Reader node.

Solution workflow 02_Database_Data_Manipulation - Solution in the KNIME Hub course repository

 

LinkedInTwitterShare