In the following we’ll take you through the KNIME Workbench and show you how you can build your first workflow. Most of your questions will probably arise as soon as you start with a real project. In this situation, you’ll find a lot of answers in the KNIME Workbench Guide, and in the E-Learning Course on our website. But don’t get stuck in the guides. Feel free to contact us and the wide community of KNIME Analytics Platform users, too, at the KNIME Forum. We are happy to help you there!
Start KNIME Analytics Platform
Info: If you haven’t yet downloaded and installed KNIME Analytics Platform, please follow the download instructions.
Start KNIME Analytics Platform and when the KNIME Analytics Platform Launcher window appears, define the KNIME workspace here as shown in Figure 1.
The KNIME workspace is a folder on your local computer to store your KNIME workflows, node settings, and data produced by the workflow. The workflows and data stored in your workspace are available through the KNIME Explorer in the upper left corner of the KNIME Workbench.
After selecting a folder as the KNIME workspace for your project, click "Launch". When in use, the KNIME Analytics Platform user interface - the KNIME Workbench - looks like the screenshot shown in Figure 2.
The KNIME Workbench is made up of the following components:
KNIME Explorer: Overview of the available workflows and workflow groups in the active KNIME workspaces, i.e. your local workspace as well as KNIME Servers.
Workflow Coach: Lists node recommendations based on the workflows built by the wide community of KNIME users. It is inactive if you don’t allow KNIME to collect your usage statistics.
Node Repository: All nodes available in core KNIME Analytics Platform and in the extensions you have installed are listed here. The nodes are organized by categories but you can also use the search box on the top of the node repository to find nodes.
Workflow Editor: Canvas for editing the currently active workflow.
Node Description: Description of a selected node (in the Workflow Editor or Node Repository).
Outline: Overview of the currently active workflow.
Console: Shows execution messages indicating what is going on under the hood.
Nodes and Workflows
In KNIME Analytics Platform, individual tasks are represented by nodes. Each node is displayed as a colored box with input and output ports, as well as a status, as shown in Figure 3. The input(s) are the data that the node processes, and the output(s) are the resulting datasets. Each node has specific settings, which we can adjust in a configuration dialog. When we do, the node status changes, shown by a traffic light below each node. Nodes can perform all sorts of tasks, including reading/writing files, transforming data, training models, creating visualizations, and so on.
A collection of interconnected nodes constitutes a workflow, and usually represents some part - or perhaps all - of a particular data analysis project.
Let’s now start building an example workflow, where we analyze some sales data. When we’re finished it will look like the workflow shown in Figure 4.
The example workflow in Figure 4 reads data from a CSV file, filters a subset of the columns, filters out some rows, and visualizes the data in two graphs: a stacked area chart and a pie chart, which you can see in Figure 5: one showing the development of sales over time, and the other showing the share of different countries on total sales.
Build your first workflow
In the following, we want to show you how to build the workflow from Figure 4 yourself. Don't worry if you get stuck, the example workflow is also available on the public EXAMPLES Server under 02_ETL_Data_Manipulation/00_Basic_Examples/00_Visual_Analysis_of_Sales_Data.
To get started, first download the CSV file that contains the data that we are going to use in the workflow. You can download the data here. Next, create a new, empty workflow by:
Clicking "New" in the toolbar panel at the top of the KNIME Workbench
Or by right clicking a folder of your local workspace in the KNIME Explorer, as shown in Figure 6
The first node you need is the File Reader node, which you’ll find in the node repository. You can either navigate to IO → Read → File Reader, or type a part of the name in the search box in the node repository panel.
To use the node in your workflow you can either:
Drag and drop it from the node repository to the workflow editor
Or double click the node in the node repository. It automatically appears in the workflow editor.
Let’s now define the settings for this node:
Open the configuration dialog either by double clicking the File Reader node, or by right clicking it and selecting "Configure…" as shown in Figure 7.
In the configuration dialog, define the file path by clicking the "Browse" button, then check other available settings, and preview the data as shown in Figure 8.
You may now want to inspect the output table to see if the data file was read as you intended. To inspect the output table:
Execute the File Reader node by right clicking the node and selecting "Execute"
Open the output table by right clicking the executed node and selecting the last option in the menu: "File Table"
If the data was read in correctly, add the Column Filter node to the workflow editor and connect it to the File Reader node:
Click the output port of the File Reader node, holding the mouse button and releasing it at the input port of the Column Filter node
Alternatively, select the File Reader node by clicking it once in your workflow, and then double clicking the Column Filter node in the node repository. This method automatically connects the Column Filter node to the File Reader node.
Before you proceed, you must configure the Column filter node:
Move the columns "country", "date", and "amount" into the green-framed Include field either by double clicking them, or using the buttons between the Exclude and Include fields in the configuration dialog shown in Figure 9
Finish the configuration by clicking "OK"
Continue building the workflow:
Adding the Row Filter node to the workflow editor and connecting it to the Column Filter node
Open the configuration dialog of the Row Filter node and exclude rows from the input table where the column "country" has the value "unknown" as shown in Figure 10
Now that the data has been filtered, let’s move on to data visualization:
The workflow is finished, and the next step is to execute it and view the output. You do this either by clicking the "Execute all executable nodes" button in the toolbar shown in Figure 13…
…, or by selecting the last nodes of the different branches of the workflow, right clicking the selection, and then clicking "Execute" in the menu.
- Choose the "Execute and Open Views" option for an unexecuted node as shown in Figure 14
- Or, once a node is executed, right click the node and select "Interactive View: …" instead, as shown in Figure 15
Currently, the pie chart uses default colors for different countries in the data. With the Color Manager node, you can assign the countries other colors than the default ones you see in Figure 5. The colors have to be assigned before building the graph, so you’ll have to add the Color Manager node in the middle of the workflow.
Add the Color Manager node:
By dragging the node from the node repository and releasing it on its place between the Row Filter node and the Pie Donut Chart node in the workflow when the connection has turned red, as shown in Figure 16. The red connection means that it’s ready to accept the new node when you release the mouse.