Seven things to do after installing KNIME Analytics Platform

Fri, 04/22/2016 - 11:49 rs

Note. In KNIME Analytics Platform 3.3 or higher, a number of standard examples are in the folder called Example Workflows in the workspace. You’ll find a topic detector for social media, a recommendation engine to be used in retail, some classic examples for customer intelligence (churn prediction, credit scoring, and customer segmentation), and a few additional basics examples including data blending, reporting, and a simple predictive model training.

Supposing you have already downloaded the KNIME® Analytics Platform, here are 7 steps to make your learning phase more practical, more application oriented, and ultimately faster.

  1. Read the Welcome Page & install Extensions
  2. Explore the “Example Workflow”
  3. Get familiar with the workbench of the KNIME Analytics Platform
  4. Download another example workflow from the EXAMPLES server
  5. Change the example workflow to run on your data
  6. Optionally change the analytics and execute
  7. Read use case associated with example workflow

Go to the folder where KNIME Analytics Platform is installed and start the KNIME application (knime.exe if you are on a Windows OS).

1. Read the Welcome Page & install Extensions

When you start the KNIME Analytics Platform for the first time, you are asked whether or not to allow KNIME to collect anonymous usage data. This data will be used exclusively to improve the usability of the platform. After that, the KNIME workbench opens on the “Welcome to KNIME” page, which provides a little initial help through useful links.

 

The basic KNIME Analytics Platform does not include many of the nodes that you see in the fanciest applications. Those nodes belong to “Extensions” packages and usually must be installed separately. Unless at installation time the package containing all free extensions was selected, you will need to install the KNIME Extensions now. You can do so by following the link “Get additional Nodes” in the “Welcome to KNIME” page. The bare minimum in terms of needed extensions is “KNIME & Extensions”, “KNIME Labs Extensions” and “KNIME Community Contributions –Other”. To continue with the 7 steps, you need to install at least the “KNIME Labs Extensions/KNIME TextProcessing extension”.

 

Data analytics or, as it is called now, data science is a big jar that contains many different types of cookies: predictive analytics, text mining, network analysis, time series prediction, data visualization, and much more. In order to make the training time as focused as possible, you can start by learning more about the tools offered for your particular problem and application field. If you belong to the group of people who first learn and then use, you can follow the link “Learning Hub”. It will take you to a list of resources (books, web sites, videos, etc…) that fit your particular problem.

However, if you belong to the group of people who first use and then learn, you can follow the link “Browse example workflows”. This link will open the EXAMPLES server, a public server with a large number of example workflows for small and big data related problems and across a variety of disciplines.

2. Explore the “Example Workflow”

The KNIME Analytics Platform is shipped with an example workflow, named “Example Workflow”. This is a small, classic data analytics workflow that involves data reading, statistics, machine learning (decision tree) learner and predictor, scoring, visualization through scatter plot, and interactive brushing. If you were intrigued by the title of this post, i.e. by a list of 7 things to do, you probably belong to the group of people who first use then learn. Very well then. Let’s explore this workflow and what to do with it.

In the top left corner, there is a panel named “KNIME Explorer” that will contain all your future work. Under LOCAL (Local Workspace), there is an “Example Workflow”. Double click it, to open it in the workflow editor at the center of the KNIME Analytics Platform. You can see here, that a workflow is a sequence of node (the colored blocks). Each node is dedicated to implement a particular task: read data, visualize data, train a machine learning algorithm, change the data structure, and so on.

 

To understand the task implemented by each of the node in this workflow, read the workflow description in the yellow framed annotation.

Right-clicking a node in the workflow, brings up its context menu with a number of useful commands. Let’s focus for now on “Execute” and “Reset”. “Execute” runs the node, i.e. executes the task the node implements. On the opposite, “Reset” brings back the node to its pre-execution state. Did you notice the change in the traffic light below the node? The traffic light describes the node state: green for successful execution, yellow for ready but not executed, and red for not ready or unsuccessful execution.

The next commands to check are “Configure” and the last context menu item(s). “Configure” opens the node configuration window with the settings required for the node task. The last menu item(s) lead(s) to the output data table(s) produced by the node execution. In the File Reader node, for example, the last menu item is “File Table” and shows the data table read from the input file. Try to change the configuration settings in some of the nodes, for example the Color Manager node, and see how the output data table changes.

Menu items “Delete”, “Cut”, “Copy”, and “Paste” implement what their name suggests: delete, cut, copy, and paste the selected node.

3. Get familiar with the workbench of KNIME Analytics Platform

Now let’s explore what’s round the workflow editor. This is KNIME Analytics Platform. All you need to analyze your data is probably here. It is important to discover what is where.

In the top left corner the KNIME Explorer displays the list of workflows for the selected local workspace (LOCAL) and the list of available KNIME Servers to connect to.  The only server available at the first start is the EXAMPLES server, a public server with many example workflows, the same server accessible via the “Browse example workflows” link in the “Welcome to KNIME” page.

Below the KNIME Explorer, the Node Repository panel displays all nodes available for this KNIME Analytics Platform installation. Nodes are organized by categories, from IO to Analytics, from Scripting to Views and Workflow Control. The category KNIME Labs deserves a few additional words. This category contains all frontier nodes, i.e. all nodes recently developed, fully functioning, but with a possible review in their short future.

On the right side, the Node Description shows information about the task and settings required by a selected node in the workflow or in the Node Repository. So, if you encounter a mysterious node, do not despair! The Node Description panel explains what the node does, the settings required in the configuration window, the data specs in input and output, and the scientific reference for the algorithm implemented (if any).

If you have been following my instructions so far, there is a good chance that you have been clicking around randomly and involuntarily closing a panel or two.  No worries! In the View option of the Main Menu at the very top of the workbench, you can find the missing view and reinstate it into the workbench. Now that you are there, explore all the other commands of the Main menu. In particular, under File notice “Import Workflow” and “Export Workflow” to import workflows created by other users and export your workflows for further usage.

Right below the Main Menu is the Tool Bar. Here take note of the buttons for creating a new workflow, saving an existing one, executing selected nodes, executing all nodes, resetting selected nodes, and resetting all nodes. Also worthy of notice is the grid button (penultimate button): it’s responsible for the grid and its properties in the workflow editor.

Finally, the Console panel hosts all warnings and errors related to your workflow execution and configuration and the “Outline” panel shows a full view of your workflow.

4. Download another example workflow from the EXAMPLES server

The default example workflow that is included with the KNIME Analytics Platform installation is a useful example to learn, but a very limited one. Indeed, it uses the iris data set, which is hardly useful in customer intelligence or time series prediction, or even sentiment analysis. It implements a decision tree, but many other prediction, clustering, pre-processing techniques are possible within KNIME Analytics Platform. Now it’s time to move to a more useful workflow!

We could of course generate the new workflow ourselves, step by step, node after node. But why? The KNIME EXAMPLES server is a public server with hundreds of example workflows ready to use. Let’s use them!

A link to the EXAMPLES server can be found in the KNIME Explorer panel in the top left corner. Right-click and select “Login”. You will then be able to see the content of the server as an anonymous guest with no password required. Example workflows are grouped in a number of categories. You need to find now the category that interests you and, inside that, the workflow that comes closer to the solution you need.

If your problem involves churn prediction, navigate to “50_Applications/_18_ChurnPrediction”. Or if you want to build a graph to visualize a social network, then navigate to “08_Other_Analytics_Types/05_NetworkMining/07_Pubmed_Author_Network_Analysis”. If you’re interested in Market Basket Analysis, navigate to “50_Applications/_16_MarketBasketAnalysis”.

Or if the problem you are trying to solve concerns sentiment analysis, navigate to “08_Other_Analytics_Types/01_Text_Processing/03_SentimentClassification”, and drag & drop (or copy & paste) the example workflow into your LOCAL workspace in the KNIME Explorer panel. Double-click the newly created copy in the LOCAL workspace to open it (double-click) and change it.

 

5. Change the example workflow to run on your data

The sentiment analysis workflow uses 2000 pre-labelled comments to train a decision tree to label similar comments with a similar positive vs. negative tag.

Now, in order to reuse this example workflow you will need to adapt it to your own data, your own label system, and your preferred machine learning algorithm.

Change the file path in the configuration window of the File Reader node to point to your own data and adjust other parameters – such as header comments, presence of short lines, locale, etc … – if needed.

The gray nodes after the File Reader node are metanodes. A metanode is a container of other nodes that can be created to hide the complexity of the analysis.  The metanodes named “Document Creation” and “Preprocessing” contain all needed text cleaning procedures. Double-click them to see their content. If your data need less or additional cleaning, just remove or add the corresponding Text Processing nodes.

To create a new node, drag & drop or double-click the node from the Node Repository panel. To connect the newly created node to existing nodes, click the output port of the preceding node and release the mouse at the input port of the following node.

6. Optionally change the analytics and execute

This example workflow uses a decision tree, because this algorithm offers a nice way to visualize the decision path in a tree (see option “View: Decision Tree View” in the node context menu). You can change it, if you wish, with a more powerful machine learning algorithm, such as a random forest or a neural network. All statistics and machine learning related algorithms can be found in the Node Repository panel under Analytics/Statistics and Analytics/Mining.

 

This is the time to become familiar with all the tasks implemented in the example workflow you have downloaded.

After changing the workflow to adapt it to your needs, just execute it fully (green double arrow in the tool bar) and inspect the final results.

7. Read Use Case associated with example workflow

Great job! You have built your first real application with KNIME Analytics Platform!

Now relax and learn about the associated use case. The link to the specific use case can be found in the description above the workflow. In the case of the “Sentiment Classification” workflow, the use case is available at: https://www.knime.org/blog/sentiment-analysis

Let’s take the time to read and learn!

I hope this short list of actions will be useful while learning how to use KNIME Analytics Platform with a practical problem in mind. If not or if you have additional questions, you can always refer to the Learning Hub or check the KNIME Forum.

Enjoy your further KNIMEing!