Create

Ten Hidden Gems from the KNIME Community in 2020

March 24, 2021 — by Rosaria Silipo
Ten Hidden Gems from the KNIME Community in 2020

Surfing the web for blog posts and journal articles about KNIME software and data science

We have put together a list of 10 blog posts about KNIME software, published in 2020 by the KNIME community. The list was compiled based on how much there is to learn on the one hand, and interest and the produced results on the other. For each article we look at why we can recommend reading it.

As you all know, KNIME Analytics Platform is an open source platform for data science needs ranging from data access to machine learning, from data visualization to deep learning, … Let's now hear what the open source communities have to say about it!

# 10 — dkyto — “KNIME — The undermined tool for reporting productivity” — Medium — Mar 29, 2020

“Start your learning journey today to walk away from reporting nightmare to a self-driven insight analytics.”

This is an introductory blog post to the usage of KNIME Analytics Platform. It does not describe how to use KNIME Analytics Platform in detail. You will not find here a step-by-step guide to build your first workflow. However, it states why KNIME Analytics Platform can help you in all stages of a data science task, be it in the usage of machine learning algorithms or in the preparation of data for reporting, all without having to write one single line of code.

# 9 — Nattapat Juthaprachakul, Rui Wang, Siyu, Wu — Yihan Lan, “Want to do Data Analysis without coding? Use KNIME!” — Students’ blog at Simon Fraser University — Feb 3, 2020

However, there is some GOOD NEWS! With great development in GUI-based applications, the introduction of KNIME is a major game changer for common people who generally do not identify themselves as a programmer.

More than a blog, this is a full tutorial on what KNIME Analytics Platform is, how it works, why to use it, and what it can do. Especially for the last part — what it can do — it shows a number of solutions for common data science tasks, such as topic detection, simple classification, churn prediction, and credit scoring. Those are all solutions that can be found on the KNIME Hub. It demonstrates by way of the Titanic dataset how to read, clean, visualize the data, and train and evaluate a machine learning model.

If you want to get a quick tutorial, yet detailed and thorough, we definitely recommend reading this blog post.

# 8 — Fabio Rebecchi– “Codeless Data Science with KNIME” — LinkedIn — Dec 28, 2020

In this article I create a step by step data science pipeline using a visual and codeless workflow with KNIME.

This is another great tutorial on how to build a full data science pipeline with KNIME Analytics Platform. It includes all the steps: from data access to the training of a decision tree, from data preparation to model evaluation, from data exploration to model visualization. The task is to predict employee attrition using the “IBM HR Analytics Employee Attrition & Performance” dataset.

This article is a must read, if you want to learn the basics on how to implement a full data science or data wrangling pipeline.

# 7 — Jitendra Kumar Singh– “Knime: Accessing a REST API with dynamic query param” — Knoldus blog — Jul 2, 2020

In this post, we will learn how to generate dynamic URLs by adding query parameters and get data. Knime platform supports Rest interface with Get-Request and Post-Request Node.”

A blog post by Knoldus could not be missing in this list. Indeed, their blog is a large repository of posts to learn more about KNIME Analytics Platform, data science, data wrangling, and data blending. We chose this post, because here Jitendra Kumar Singh is able to explain very clearly some quite important concepts, while describing a simple, yet necessary task, as accessing external REST services through multiple queries.

This blog contains many more similar posts, which makes it a useful resource for people who are new to KNIME.

# 6 — Ulrich Johannes– “It will go away with the heat — or it won’t — Comparing infection rates and temperature” — Medium — May 5

The data is in a slightly unpleasant format, so we need to perform some preprocessing, …

For the mid-position of this list, we chose this article, which shows how easy it can be to perform data blending (he uses three data sources) and data pre-processing. With a loop and a few joining and aggregation nodes, the final structure of the data is easily achieved. Note the usage of the node for moving aggregation. The pre-processing here is not limited to classic operations on random static observations in the dataset, it operates on time series as well.

# 5 — Tate Lowry– “An in-depth guide for cleaning Server Log Data in KNIME” — Medium — Mar 23, 2020

KNIME excels at allowing users to visually create data workflows without code.

With this blog post, we leave the realm of the generic usage of KNIME Analytics Platform and get into specific solutions for specific tasks. The specific task in object is the extraction of data from a log file, after accessing, reading, parsing, and cleaning the same log file. Beyond that, however, everybody can benefit from a tip or two about data cleaning and data extraction.

This blog post offers a really useful description of data cleaning operations for anybody working with data at any level, especially if dealing with String data.

# 4 — Abhishek Kumar– “Eliciting important features impacting COVID-19 cases through ML algorithms” — Medium — Aug 24

I thought to investigate and decipher the features/variables which are impacting the total number of COVID cases.

We are now entering the top part of the list. Here articles’ authors focus more on the machine learning part of the data science cycle. Bearing in mind the date, it is inevitable that we start talking more and more about COVIID-19. This blog post focuses on the five European countries most impacted by COVID-19 at the beginning of the pandemic: Italy, France, Spain, Germany, and UK. The statistics of candidate and split attributes from a trained random forest is investigated to understand the key factors in predicting the spread of the virus.

# 3 — Israel Fernandez Pina — “UMAP dimension reduction and DBSCAN for clustering MNIST database within KNIME” — Towards Data Science — Nov 13

This is a great blog post! It really is. It combines together dimensionality reduction and visualization, the UMAP algorithm and the DBSCAN algorithm, and finally KNIME Analytics Platform and Python. The goal is to visualize clusters of data from the MNIST dataset, containing images of handwritten digits. Visualization is performed via 2-D or 3-D scatter plots available from the KNIME Plotly integration; clustering is performed via the DBSCAN algorithm through native KNIME nodes; and finally, the dimensionality reduction to just two or three attributes is performed via the UMAP algorithm from Python libraries. Indeed, the Python code is written in the Python Source node — available from the KNIME Python Integration — and becomes just one new node inside the KNIME workflow.

If you are interested in data visualization via Plotly, in integrating your Python script within a KNIME workflow, or just in dimensionality reduction and clustering, this is a must read.

# 2 — Angus Veitch — “TweetKollidR — A Knime workflow for creating text-rich visualisations of Twitter data” — seenanotherway blog — Oct 5

Since writing that post, I have revised and tidied up the workflow so that anyone can use it, and I have made it available on the Knime Hub.

This was an easy placement in the list. Thanks to this blog post, Angus Veitch was Contributor of the month at KNIME for the month of November, in 2020. It is a full detailed description of Angus’ application — TweetKollidR — to analyze and visualize tweets. The application connects to Twitter, performs the required text processing operations, and visualizes user communities and activities. It is an interesting and powerful application. Even if you do not need to use the application itself, by reading the blog post you might learn a thing or two about connecting to Twitter, text processing, and network visualization.

Curious about the blog post at position #1? Let’s see …

# 1 — Dennis Ganzaroli — “Covid 19-Projections with Knime, Jupyter and Tableau” — The Startup — Nov 19

Make projections for covid 19 for the next 30 days by combining KNIME for data integration, Jupyter to fit models, and Tableau to create visualizations.”

Another great story about projections of COVID-19 obtained via a logistic model and visualized on a dashboard. It is a great story of technical integration as well, since the data was on a Google Drive, the data preparation was implemented with KNIME Analytics Platform, the logistic model with Jupyter, and the dashboard with Tableau. The art director of the whole movie, controlling the data pipeline, is a KNIME workflow. Great read, also to know more about how far we are from the end of the pandemic. Dennis also allowed us to republish this blog article on the KNIME Blog. Thanks Dennis!

— —

This is a selection of the 10 top most interesting blog posts using KNIME Analytics Platform published in 2020 by the KNIME open source community. The list was compiled keeping two criteria in mind: how much there is to learn and how interesting the topic and results are. Send an email to blog@knime.com to signal other important articles or blog posts that we might have missed.

As first published on Medium.

You may also like
Thoughts

The Importance of Community in Data Science

Nobody is an island - even less so a data scientist As first published in Data Science Central. Assembling predictive analytics workflows benefits from he...

November 21, 2019 – by Rosaria Silipo &  Paolo Tamagnini