Guided Labeling KNIME Blog Series

In this series, Paolo Tamagnini and Adrian Nembach discuss and compare different techniques used in guided labelling, such as active learning, label density, model uncertainty, and weak supervision.

Guided Labeling on KNIME Hub

Episode 1: An Introduction to Active Learning

When is labeling needed? In the field of machine learning, when most algorithms and models require huge amounts of data with very few specific requirements. Large masses of data need to be labeled to make them usable. This article is an introduction to the active learning technique.

Read blog

Episode 2: Label Density

There are several different active learning sampling strategies. In this article, we look at the label density technique. The concept behind the label density strategy is that when labeling a dataset you want to label where the feature space has a dense cluster of data points.

Read blog

Episode 3: Model Uncertainty

Model uncertainty is a rapid way of moving our decision boundary to the correct position by using as few labels as possible - taking up as little time as possible of our expensive human-in-the-loop expert. Find out more about model uncertainty in this episode.

Read blog

Episode 4: From Exploration to Exploitation

In order to enhance our active learning sampling, in this article we look at how to use both label density - where we explore the feature space to find new things the model has not seen before - and uncertainty sampling - to exploit certain key areas of the feature space.

Read blog

Episode 5: Blending Knowledge with Weak Supervision

When few or no labels are available, there are additional techniques to active learning that you can use to train a model. In this episode we focus on a technique called weak supervision, which leverages on gathering an enormous sample of labels of doubted quality from and several, totally different sources.

Read blog

Episode 6: Comparing Active Learning with Weak Supervision

Weak supervision instead of active learning? The key feature that differentiates active learning from weak supervision is the source of the labels we are using to train a generic classification model from an unlabeled dataset. Find out more about the flexibility offered by weak supervision when blending knowledge from different generic sources.

Read blog

Epidsode 7: Weak Supervision Deployed via Guided Analytics

Let’s assume you want to train a document classifier, a supervised machine learning model that will predict precise categories for each of your unlabeled documents. This model is required for example when dealing with large collections of unlabeled medical records, legal documents or spam emails, defining a recurrent problem across several industries. In this post we: build an application that is able to digest any kind of documents, transform the documents into bags of words, and train a weak supervision model using a labeling function provided by the user.

Read blog

Episode 8: Combining Active Learning with Weak Supervision

Checking feasibility before applying either active learning or weak supervision is great. But why do we even need to select one over the other? Why not use both techniques? This final post in the series explores two Guided Labeling examples demonstrating typical scenarios when active learning and weak supervision can be combined.

Read blog

Explore KNIME

knime_icons_rz Download KNIME

Download KNIME Analytics Platform and try out Guided Labeling for yourself.

Download now

knime_icons_rz Visit KNIME Hub

Look at and download example Guided Labeling workflows on the KNIME Hub.

Learn more

Contact us

For information on KNIME Software and what it can do for you.

Contact us