What is KNIME Analytics Platform?
KNIME Analytics Platform is the free, open-source software for creating data science. It is helping you discover the potential hidden in your data, mine for fresh insights, or predict new features.
Build your First Workflow
Let’s say that you have some data that you want to process, analyze and visualize. With the following example workflow, you will read, transform and visualize some sales data.
To get started, first download the CSV-file that contains the data that you are going to use in the workflow. Open your Analytics Platform and create a new, empty workflow by clicking “New” in the toolbar.
From the download folder, drag and drop the CSV-file into the workbench editor. A File Reader node will appear on the workflow editor and its configuration dialog will pop-up.
Here you can see the path to the file you dropped into the workflow editor and a preview of the data table. Click OK and execute the File Reader node by right-clicking the node and selecting Execute from the context menu. Now the input data are available at the output port of the File Reader node. To view the output table, right-click the executed node and select the last option in the menu: File Table.
To filter some of the columns out, use the Column Filter node.
In the node repository panel on the left:
- write “column filter node” in the search field
- drag&drop the Column Filter node to your workflow editor
- connect its input port with the output port of the File Reader node
To open the configuration dialog, right-click the node and choose Configure.
Here, move the “columns”, “country”, “date”, and “amount” into the Include field on the right side of the dialog, then click OK. After executing the node, the filtered data table is available at the output port of the Column Filter node.
To clean up your data, filtering out the rows corresponding to specific values of one column, use the Row Filter node. Search for the Row Filter node in the node repository on the left, add it to the workflow and connect it to the Column Filter node.
Open the Row Filter node configuration dialog and exclude rows from the input table where the column “country” has the value “unknown”. Click OK and execute the node. Now, the filtered data table is available at the output port of the Row Filter node.
To visualize the data, for example, building a stacked area chart and a pie chart, use the nodes Stacked Area Chart and Pie/Donut Chart. Search for them and add them to the workflow, connecting both to the Row Filter node.
Configure the column “date” as the x-axis column in the configuration dialog of the Stacked Area Chart node.
Configure the column “country” as category column, “Sum” as aggregation method, and “amount” as frequency column in the Pie/Donut Chart node.
Now you can execute and view the output of these last two nodes. Right-click the node and choose Execute and Open View from the context menu. A new window will open showing the charts you built with the sales data.
Nodes and Workflows
In KNIME Analytics Platform, individual tasks are represented by nodes. Each node is displayed as a colored box with input and output ports. Nodes can perform all sorts of tasks, including reading/writing files, transforming data, training models, creating visualizations, and so on.
The input(s) are the data that the node processes via the node ports, and the output(s) are the resulting data. Each node has specific settings, which you can adjust in a configuration dialog.
Different types are represented by different node ports. Only ports of the same type indicated by the same color can be connected and are. Here are some examples of ports for frequently used data types.
A node can be in four different states. The node status is shown by a traffic light below each node.
A series of interconnected nodes defines a workflow. Nodes can be connected via their input and output ports to form a workflow. Once a workflow is executed the data inside the workflow flows then from left to right through the connections either step by step or entirely.
Components really are KNIME nodes that you create with a KNIME workflow. They encapsulate and abstract functionality, can have their own dialog, and can have their own sophisticated, interactive views. Components can be reused in your own workflows but also shared with others: via KNIME Server or KNIME Hub. They can also represent web pages in an analytical application deployed to others via KNIME Server.
Metanodes allow you to organize your workflows better: you can take part of a larger workflow and collapse it into a gray box that hides that part of the workflow’s functionality. It also makes it easier for others to understand what your workflow does as you can structure it a bit more hierarchically.
Extensions and Integrations
Extensions provide additional functionalities such as access to and processing of complex data types, as well as the addition of advanced machine learning algorithms.
Integrations provide seamless access to some very cool open source projects such as Keras for deep learning, H2O for high performance machine learning, Apache Spark for big data processing, Python and R for scripting, and more.
Learn how to install extensions and integrations here.
Start here and learn more about data science, data wrangling, text processing, big data, and collaboration and deployment at your own pace and in your own schedule! Courses are organized by level: L1 basic, L2 advanced, L3 deployment, L4 specialized.
In our documentation you’ll find a collection of different well curated guides for KNIME Analytics Platform and KNIME Server. For instance you can read a guide about flow variables to control the flow of a workflow and loops to iterate over a certain part of the workflow. The components guide is a detailed guide about how to wrap up nodes and reuse.
Visit KNIME Documentation.
Join the community and find solutions, support, and other inspirational thoughts from other KNIME users. The KNIME Forum is available for all types of questions, comments and conversations.
Visit KNIME Forum.