Guided Labeling KNIME Blog Series
In this series, Paolo Tamagnini and Adrian Nembach discuss and compare different techniques used in guided labelling, such as active learning, label density, model uncertainty, and weak supervision.
Guided Labeling on KNIME Hub
Episode 1: An Introduction to Active Learning
When is labeling needed? In the field of machine learning, when most algorithms and models require huge amounts of data with very few specific requirements. Large masses of data need to be labeled to make them usable. This article is an introduction to the active learning technique.
Read blogEpisode 2: Label Density
There are several different active learning sampling strategies. In this article, we look at the label density technique. The concept behind the label density strategy is that when labeling a dataset you want to label where the feature space has a dense cluster of data points.
Read blogEpisode 3: Model Uncertainty
Model uncertainty is a rapid way of moving our decision boundary to the correct position by using as few labels as possible - taking up as little time as possible of our expensive human-in-the-loop expert. Find out more about model uncertainty in this episode.
Read blogEpisode 4: From Exploration to Exploitation
In order to enhance our active learning sampling, in this article we look at how to use both label density - where we explore the feature space to find new things the model has not seen before - and uncertainty sampling - to exploit certain key areas of the feature space.
Read blogEpisode 5: Blending Knowledge with Weak Supervision
When few or no labels are available, there are additional techniques to active learning that you can use to train a model. In this episode we focus on a technique called weak supervision, which leverages on gathering an enormous sample of labels of doubted quality from and several, totally different sources.
Read blogEpisode 6: Comparing Active Learning with Weak Supervision
Weak supervision instead of active learning? The key feature that differentiates active learning from weak supervision is the source of the labels we are using to train a generic classification model from an unlabeled dataset. Find out more about the flexibility offered by weak supervision when blending knowledge from different generic sources.
Read blogEpidsode 7: Weak Supervision Deployed via Guided Analytics
Let’s assume you want to train a document classifier, a supervised machine learning model that will predict precise categories for each of your unlabeled documents. This model is required for example when dealing with large collections of unlabeled medical records, legal documents or spam emails, defining a recurrent problem across several industries. In this post we: build an application that is able to digest any kind of documents, transform the documents into bags of words, and train a weak supervision model using a labeling function provided by the user.
Read blogEpisode 8: Combining Active Learning with Weak Supervision
Checking feasibility before applying either active learning or weak supervision is great. But why do we even need to select one over the other? Why not use both techniques? This final post in the series explores two Guided Labeling examples demonstrating typical scenarios when active learning and weak supervision can be combined.
Read blogExplore KNIME
Download KNIME
Download KNIME Analytics Platform and try out Guided Labeling for yourself.