Lesson 2. Data Access

KNIME-Data-Wranglers-L1-Lesson2

Data wrangling always starts by accessing data. From KNIME Analytics Platform, you can connect to any data source, for example, CSV files and other formatted text files, Excel workbooks, web services, databases and big data platforms, and proprietary file formats of other software tools.

In this lesson we’ll show how you can access data locally from a file and remotely from a database - both from within a KNIME workflow. For a more exhaustive list and description of all KNIME nodes for data access, download the free e-book “Will they blend?”, a collection of blog posts centered around data access and data blending.

This lesson includes exercises, and the data files, solution workflows, and prebuilt, empty exercise workflows with instructions are available in the L1-DW KNIME Analytics Platform for Data Wranglers - Basics folder in the E-Learning repository on the KNIME Hub.

Jump to the following main sections:

Read Data from File

Accessing Databases

Read Data from File

Text formatted files, Excel files, files in the KNIME native .table format, and many more file types have their own reader nodes. The configuration option they all have in common is the file path.

Reader nodes

The following videos introduce a few common reader nodes.

 

A reference workflow Use the File Reader is available on the KNIME Hub.

 

A reference workflow Table Reader is available on the KNIME Hub.

 

A reference workflow Read an XLS file is available on the KNIME Hub.

 

Exercise: Reading Text Files

Read the adult_men.csv file available in the data folder on the KNIME Hub. This is a subset of the adult.csv dataset from the UCI Machine Learning Repository.

Empty exercise workflow 01_Read_Data_from_Text_File in the KNIME Hub course repository.

 

Solution: Reading Text Files

Download the adult_men.csv file from the data folder on the KNIME Hub, use the File Reader node, and configure the file path according to the location of the file on your machine. 

Solution workflow 01_Read_Data_from_Text_File - Solution in the KNIME Hub course repository.

 

Exercise: Reading .table Files

Read the adult_women.table file available in the data folder on the KNIME Hub. This is a subset of the adult.csv dataset from the UCI Machine Learning Repository.

Empty exercise workflow 02_Read_Data_from_Table_File in the KNIME Hub course repository.

 

Solution: Reading .table Files

Download the adult_women.table file from the data folder on the KNIME Hub, use the Table Reader node, and configure the file path according to the location of the file on your machine.

Solution workflow 02_Read_Data_from_Table_File - Solution in the KNIME Hub course repository.

 

Exercise: Reading Excel Files

Read the auto-mpg.xls file available in the data folder on the KNIME Hub 

Empty exercise workflow 03_Read_Data_from_Excel_File in the KNIME Hub course repository.

 

Solution: Reading Excel Files

Download the auto-mpg.xls file from the data folder on the KNIME Hub, use the Excel Reader (XLS) node, and configure the file path according to the location of the file on your machine. 

Solution workflow 03_Read_Data_from_Excel_File - Solution in the KNIME Hub course repository.

 

Absolute and Relative Paths: the knime:// Protocol

The file path is the most important setting in all these reader nodes. You express it as an absolute path or as a relative path with respect to a certain mountpoint.

An example of a workflow-relative path can be found in the Table Reader workflow on the KNIME Hub.

 

Exercise: Relative and Absolute Paths

Read data adult_men.csv, adult_women.table, and auto-mpg.xls (the files accessed in the previous exercises) available in the data folder on the KNIME Hub using the workflow-relative path

Empty exercise workflow 04_Read_Data_Using_Workflow_Relative_Path in the KNIME Hub course repository.

 

Solution: Relative and Absolute Paths

Download the data files from the data folder on the KNIME Hub, import them to the KNIME Explorer, and access the files using a file path that starts with "knime://" and then shows the path from the currently active workflow to the data file, for example, "knime.workflow/../../data/adult_men.csv". The two dots in the file path indicate a movement to an upper folder level in the KNIME Explorer starting from the position of the currently active workflow.

Solution workflow 04_Read_Data_Using_Workflow_Relative_Path - Solution in the KNIME Hub course repository.

 

Accessing Databases

In a KNIME workflow, you can connect to any JDBC compliant database and access and manipulate data directly on the database. At any point, you can read the data into a local KNIME data table, and vice versa.

A reference workflow Database - Simple IO is available on the KNIME Hub.

 

Exercise: Accessing a Database

1) Connect to the WebActivity.sqlite database available in the data folder on the KNIME Hub

2) Select the web_activity table

3) Read the database table into a local KNIME data table

Empty exercise workflow 05_Read_Data_from_Database in the KNIME Hub course repository.

 

Solution: Accessing a Database

Download the WebActivity.sqlite file from the data folder on the KNIME Hub. Use the SQLite Connector node, and provide the path to the sqlite file in its configuration dialog. Alternatively, if the sqlite file appears in the KNIME Explorer, you can just drag and drop it to the workflow editor. Use the DB Table Selector node to select the web_activity table on the database, and the DB Reader node to read it into KNIME. 

Solution workflow 05_Read_Data_from_Database - Solution in the KNIME Hub course repository.

 

LinkedInTwitterShare