Lesson 5. Tips and Tricks

KNIME-Data-Wranglers-L1-Lesson5

You have learned how to read, manipulate, aggregate, join, and concatenate data. This last lesson gives you some extra material to think about. Firstly, how to execute the same data wrangling operations on a database rather than on your local machine. Secondly, a little lesson on how to keep your workflows clean and tidy.

This lesson includes exercises, and the data files, solution workflows, and prebuilt, empty exercise workflows with instructions are available in the L1-DW KNIME Analytics Platform for Data Wranglers - Basics folder in the E-Learning repository on the KNIME Hub.

Jump to the following main sections:

In-Database Data Manipulation

Cleaning Up Workflows with Metanodes

In-Database Data Manipulation

When your data resides on a database, you can use the DB nodes to execute ETL operations directly on the database. These nodes come with a visual user interface, and many of them look exactly the same as their KNIME native counterparts.

 

A reference workflow Databases - Advanced Usage is available on the KNIME Hub.

 

Exercise: Manipulating Data on a Database

1) Connect to the WebActivity.sqlite database available in the data folder on the KNIME Hub

2) Select the web_activity table

3) Calculate the number of customers in each web activity category

4) Calculate the maximum number of customers out of all web activity categories

5) Filter the aggregated table to the most popular web activity category

6) Filter the original table according to the most popular web activity category

7) Read the filtered data into a KNIME data table

Empty exercise workflow 15_Database_Data_Manipulation in the KNIME Hub course repository.

 

Solution: Manipulating Data on a Database

1-2) Download the WebActivity.sqlite file from the data folder on the KNIME Hub. Connect to the database with the SQLite Connector node, and select the web_activity table with the DB Table Selector node.

3) Use the DB GroupBy node, and select WebActivity as the grouping column. Check “Add COUNT(*)” in the manual aggregation tab, or apply the aggregation method count to any column.

4-5) Continue the workflow branch with another DB GroupBy node. Apply the aggregation method MAX to the count column. Filter the first aggregated table by joining it with this table: use the DB Joiner node, select inner join, and set the count and max count columns as the joining columns.

6-7) Join the original web_activity table with the filtered aggregated table, i.e. the output of the DB Joiner node. Select inner join, and set WebActivity as the joining column in both tables. Read the data into KNIME with the DB Reader node.

Solution workflow 15_Database_Data_Manipulation - Solution in the KNIME Hub course repository.

Cleaning Up Workflows with Metanodes

When you want to clean your workflow by collecting individual nodes into a subworkflow, you create a metanode of them. At any point, you can add or remove nodes inside the metanode, add inputs and outputs, and expand the metanode back to individual nodes. In the video below we show how you can perform these and other useful operations on metanodes.

 

This is the end of the [L1-DW] KNIME Analytics Platform for Data Wranglers: Basics course. Well done! You’re ready to take the next course [L2-DW] KNIME Analytics Platform for Data Wranglers: Advanced.

LinkedInTwitterShare