Lesson 3. Workflow Abstraction

KNIME-Data-Wranglers-L2-Lesson3

Sometimes a node’s configuration depends on a parameter derived from the data and suitable to the task, like for example, when you want to filter rows by the most popular product, rename a column according to the current month, or prompt a user to provide the input. Flow variables are such parameters. They make your workflows parametric, more flexible, and automatic. 

Switches steer the data flow into different paths, and components encapsulate pieces of workflows into reusable KNIME nodes.

This lesson includes exercises. The corresponding data files, solution workflows, and prebuilt, empty exercise workflows with instructions are available in the L2-DW KNIME Analytics Platform for Data Wranglers - Advanced folder in the E-Learning repository on the KNIME Hub.

Flow Variables

Let’s suppose you need your workflow to work with a different parameter at every run, for example, a different country. Or, a node’s exact configuration depends on an intermediate result, for example, the most popular product in the current month. In order to make your node configurations change dynamically, you need to parameterize them. That is, you need Flow Variables.

Creating and Using Flow Variables

Here we show you two ways of creating a flow variable: exporting a node’s configuration, and using a configuration node that prompts a user to provide a value.

A reference workflow Creating and Using Flow Variables is available on the KNIME Hub.

Transforming a Data Cell into a Flow Variable

Yet another way of creating a flow variable is to convert cells in a data row into flow variables in order to use intermediate results as a node’s configuration in the next steps.

A reference workflow Using Table Output as Flow Variable is available on the KNIME Hub.

 

Exercise: Flow Variables

1) Read the sales_data.csv file available in the data folder on the KNIME Hub

2) Filter the orders to those coming from Germany

3) Export the current setting “Germany” as a flow variable called “country”

4) Read the orders_daily.csv file available in the data folder on the KNIME Hub. Use the “country” flow variable to filter the records.

5) Use the Integer Configuration node to define a lower limit for the amount of items per order. Set the default value to 6. 

6) Use this flow variable to exclude daily orders with 6 or less items

Empty exercise workflow 05_Flow_Variables in the KNIME Hub course repository.

 

Solution: Flow Variables

1-3) Download the sales_data.csv file from the data folder on the KNIME Hub, and read the file with the File Reader node. Use the Row Filter node to filter the records for Germany. Click the flow variable button next to the string pattern field, select “Create Variable”, and write “country” in the text field.

4) Download the orders_daily.csv file from the data folder on the KNIME Hub, and read the file with the File Reader node. Continue the workflow branch with another Row Filter node. Drag a flow variable connection from the Row Filter node used in the previous step to this Row Filter node. Open the configuration dialog, click the flow variable button next to the string pattern field, select “Use Variable”, and select “country” in the menu.

5-6) Start a new workflow branch with the Integer Configuration node. Write, for example, “amount” as the name of the flow variable, and 6 as its default value. Connect a third Row Filter node to the File Reader node that reads the orders_daily.csv file, and connect the flow variable output of the Integer Configuration node to this Row Filter node. In the configuration dialog of the Row Filter node, open the Flow Variables tab, navigate to the “lowerBound” item, expand it, and select the “amount” flow variable in the menu next to the “IntCell” item.

Solution workflow 05_Flow_Variables - Solution in the KNIME Hub course repository.

 

Components

Components encapsulate functionalities that can be used in the same way as any KNIME node. The functionality is built by a workflow inside the component, and you can customize it in the component’s configuration dialog. Components can be reused locally and shared via a KNIME Server and the KNIME Hub.

 

 

 

A reference workflow Components and Configuration Nodes is available on the KNIME Hub.

Exercise: Components

1) Read the population2013.csv file available in the data folder on the KNIME Hub

2) Access the Choropleth World Map shared component in the KNIME Hub. Drag and drop the component into your workflow editor.

3) Use the component to show the countries and their populations on a Google Chart

4) Disconnect the component, and add the following sentence into the description of the Column Selection Configuration node (the one to select the country column): “If a country existing in the input table is not shown on the map, try changing its name.”

5) Share the component in your local workspace

Empty exercise workflow 06_Components in the KNIME Hub course repository.

 

Solution: Components

1-3) Download the population2013.csv file from the data folder on the KNIME Hub. Read the data with the File Reader node. Select “Country Name” and “2013” as the country column and numeric column in the component’s configuration dialog. In the component’s interactive view you can see that, for example, Russia is not assigned any value, and this might be because it’s called “Russian Federation” in the data.

4-5) Right click the component, select “Component” > “Disconnect Link”. Add the sentence to the description field of the Column Selection Configuration node in the top branch. Back in the exercise workflow, right click the component, select “Component” > “Share...”. Select a folder in your local workspace. 

Solution workflow 06_Components - Solution in the KNIME Hub course repository.

Switches

You’d like to write data in different file formats, but not all at the same time? You’d like to show your data in a scatter plot or a bar chart, but not always both? Or you’d like to take some actions only if a certain threshold is met? In situations like these, you can build the alternative tasks in parallel branches, and control their execution with a switch node.

Reference workflows are available in the Examples/06_Control_Structures/05_Switches repository on the KNIME Hub.

LinkedInTwitterShare