Lesson 1. Overview of KNIME Analytics Platform & Data Access

KNIME-Data-Wranglers-L1-Lesson1

In this lesson we’ll guide you through the first steps, like installing KNIME Analytics Platform, navigating the KNIME workbench, importing and exporting workflows, getting familiar with the basic concepts of visual programming, creating nodes and workflows, and starting a data science project with accessing data.

In KNIME Analytics Platform you can connect to practically any data source: file formats, web services, databases and big data platforms. For a more exhaustive list and description of all KNIME nodes for data access, download the free e-book “Will they blend?”, a collection of blog posts centered around data access and data blending.

This lesson includes exercises, and the data files, solution workflows, and prebuilt, empty exercise workflows with instructions are available in the L1-DS KNIME Analytics Platform for Data Scientists - Basics folder in the E-Learning repository on the KNIME Hub.

Jump to the following main sections:

Installation Guide

Getting Started with KNIME Analytics Platform

Nodes, Data, and Workflows

Read Data from File

Accessing Databases

Accessing REST Services

Installation Guide

KNIME Analytics Platform is open source. All you need to do is download it from the KNIME website.

Different operating systems (Windows, Mac, and Linux) are supported, and you can install your first version with or without the free KNIME extensions. Extensions bring additional analytical capabilities for specific analytics purposes.

Jump to the following subsections:

KNIME Analytics Platform Installation Guide

Here we show you how to install KNIME Analytics Platform, starting with visiting the download page through to launching the application on your machine.

 

Install KNIME Extensions

By installing extensions and integrations you can add nodes for special analytics purposes, such as text mining, big data analytics, parameter optimization, and many more to your installation of KNIME Analytics Platform.

 

Getting Started with KNIME Analytics Platform

KNIME Analytics Platform is installed. Now it's time to launch it. By clicking the icon on the desktop, going to “KNIME” in the Start menu, or directly from the application/executable in the folder where KNIME has been installed

Jump to the following subsections:

The KNIME Workbench

In the following videos, we’ll take you on a tour of the KNIME workbench, with more details about the Node Repository and the Workflow Coach.

 

 

 

 

 

Where to find Example Workflows

The KNIME Hub is the public repository of the KNIME community to upload and share resources, like workflows, extensions, custom nodes, and documentation. All ready to use, just download them or drag&drop them to your KNIME Workbench. Some of those workflows (the educational ones) are also available in the KNIME Explorer panel in the KNIME Workbench.

 

Nodes, Data, and Workflows

Now that everything is set up, let’s focus on the middle part of the workbench - the workflow editor - where you’ll build your workflows. 

In these videos you’ll find out how to create an empty workflow, use your first nodes to perform some tasks, check the results, and document the workflow.

Jump to the following subsections:

Nodes, Workflows, and Workflow Groups

Two basic concepts in visual programming in KNIME are nodes and workflows. We’ll create a new workflow and organize it inside a workflow group in the local workspace. Then we’ll show how to find and use a node and define the settings for its task; and how to enrich your workflow with annotations and comments.

 

 

 

The Data Structure

Let’s inspect the data at the output of a node in more detail. The so-called “data table” is the structure that KNIME uses to organize data.

 

Import and Export Workflows

Portable workflows, but how? Here we show how you can import and export workflows and workflow groups to and from your LOCAL workspace as .knwf (workflows) and .knar (workflow group) files.

 

Read Data from File

From KNIME Analytics Platform, you can access data in different file formats, for example, CSV files and other formatted text files, Excel workbooks, and proprietary file formats of other software tools.

Reader Nodes

Text formatted files, Excel files, files in the KNIME native .table format, and many more file types have their own reader nodes. The configuration option they all have in common is the file path.

 

A reference workflow Use the File Reader is available on the KNIME Hub.

 

A reference workflow Table Reader is available on the KNIME Hub.

 

A reference workflow Read an XLS file is available on the KNIME Hub.

 

Exercise: Reading Text Files

Read the adult_men.csv file available in the data folder on the KNIME Hub. This is a subset of the adult.csv dataset from the UCI Machine Learning Repository.

Empty exercise workflow 01_Read_Data_from_Text_File in the KNIME Hub course repository.

 

Solution: Reading Text Files

Download the adult_men.csv file from the data folder on the KNIME Hub, use the File Reader node, and configure the file path according to the location of the file on your machine.

Solution workflow 01_Read_Data_from_Text_File - Solution in the KNIME Hub course repository.

 

Exercise: Reading .table Files

Read the adult_women.table file available in the data folder on the KNIME Hub. This is a subset of the adult.csv dataset from the UCI Machine Learning Repository.

Empty exercise workflow 02_Read_Data_from_Table_File in the KNIME Hub course repository.

 

Solution: Reading .table Files

Download the adult_women.table file from the data folder on the KNIME Hub, use the Table Reader node, and configure the file path according to the location of the file on your machine.

Solution workflow 02_Read_Data_from_Table_File - Solution in the KNIME Hub course repository.

 

Exercise: Reading Excel Files

Read the auto-mpg.xls file available in the data folder on the KNIME Hub 

Empty exercise workflow 03_Read_Data_from_Excel_File in the KNIME Hub course repository.

 

Solution: Reading Excel Files

Download the auto-mpg.xls file from the data folder on the KNIME Hub, use the Excel Reader (XLS) node, and configure the file path according to the location of the file on your machine. 

Solution workflow 03_Read_Data_from_Excel_File - Solution in the KNIME Hub course repository.

 

Absolute and Relative Paths: the knime:// Protocol

The file path is the most important setting in all these reader nodes. You express it as an absolute path or as a relative path with respect to a certain mountpoint.

 

An example of a workflow-relative path can be found in the Table Reader workflow on the KNIME Hub.

 

Exercise: Relative and Absolute Paths

Read data adult_men.csv, adult_women.table, and auto-mpg.xls (the files accessed in the previous exercises) available in the data folder on the KNIME Hub using the workflow-relative path

Empty exercise workflow 04_Read_Data_Using_Workflow_Relative_Path in the KNIME Hub course repository.

 

Solution: Relative and Absolute Paths

Download the data files from the data folder on the KNIME Hub, import them to the KNIME Explorer, and access the files using a file path that starts with “knime://" and then shows the path from the currently active workflow to the data file, for example, "knime.workflow/../../data/adult_men.csv”. The two dots in the file path indicate a movement to an upper folder level in the KNIME Explorer starting from the position of the currently active workflow. 

Solution workflow 04_Read_Data_Using_Workflow_Relative_Path - Solution in the KNIME Hub course repository.

Accessing Databases

In a KNIME workflow, you can connect to any JDBC compliant database and access and manipulate data directly on the database. At any point, you can read the data into a local KNIME data table, and vice versa. 

 

A reference workflow Database - Simple IO is available on the KNIME Hub.

 

Exercise: Accessing a Database

1) Connect to the WebActivity.sqlite database available in the data folder on the KNIME Hub

2) Select the web_activity table

3) Read the database table into a local KNIME data table

Empty exercise workflow 05_Read_Data_from_Database in the KNIME Hub course repository

 

Solution: Accessing a Database

Download the WebActivity.sqlite file from the data folder on the KNIME Hub. Use the SQLite Connector node, and provide the path to the sqlite file in its configuration dialog. Alternatively, if the sqlite file appears in the KNIME Explorer, you can just drag and drop it to the workflow editor. Use the DB Table Selector node to select the web_activity table on the database, and the DB Reader node to read it into KNIME. 

Solution workflow 05_Read_Data_from_Database - Solution in the KNIME Hub course repository.

Accessing REST Services

Most of the huge amounts of data on the web are available via REST services. As a quick intro, the video below explains what a REST service is, how it operates, what the different methods to call them do, and how you can provide authentication if this is required.

 

Reference workflows are available in the Examples/01_Data_Access/05_REST_Web_Services repository on the KNIME Hub.

LinkedInTwitterShare