Guided Labeling Blog Series - Episode 6: Comparing Active Learning with Weak Supervision

Mon, 07/27/2020 - 10:00 paolotamag

Welcome to the sixth episode of Guided Labeling KNIME Blog Series by Paolo Tamagning and Adrian Nembach (KNIME).

In the last episode we made an analogy with a number of “friends” labeling “movies” with three different outcomes:“good movie” (👍), “not seen movie” ( - ), “bad movie” (👎). We have seen how we can train a machine learning model predicting also movies no friend has watched before and adding to the model additional feature data about such movies. Let’s pick up where we left off.

Guided Labeling Model Uncertainty

You can blend friends' movies opinions in a single model, but how is this useful if you don’t have any labels to train a generic supervised model? How can weak supervision become an alternative to active learning in a generic classification task? How can this analogy with many “friends” labeling “movies” work better than a single human expert like in active learning?

Weak Supervision instead of Active Learning

The key feature that differentiates active learning from weak supervision is the source of the labels we are using to train a generic classification model from an unlabeled dataset:

Unique vs Flexible

In active learning the source of labels - referred to in literature as the “oracle” - is usually quite unique, making it expensive and hard to find.This can be an expensive experiment but more often than not we are talking about a subject matter expert (SME), that is a human with domain expertise. In weak supervision the weak source can be a human with less expertise who makes mistakes, but also something else like an heuristic which applies only to a subset of the dataset.

IF “movie budget category” is “low”
AND “actor popularity” is “none” :
MOVIE LABEL = “👎”
ELSE :
MOVIE LABEL = “-”

Of course this rule (or heuristic) is not accurate at all and only applies to some movies, but this can be thought of as a weak source in weak supervision and considered a labeling function. In most cases you will need an expensive human expert to build those heuristics, but this is still less time consuming than manual, labeling work. Once you have a set of heuristics, you can apply them to millions of datapoints within a few seconds.

Solid vs Weak

While in active learning the label source theoretically always provides a 100% accurate label, in weak supervision we can have weak sources that cannot label all samples and can be less accurate.

Single vs Multiple

Active learning is usually described as a system counting on a single and expensive source of labels. Weak supervision counts on many not so accurate sources.

Human-in-the-Loop vs Prior Model Training

In active learning the labels are provided as the model improves within the human-in-the-loop process. In comparison, in weak supervision the noisy labels are provided from all weak sources before the model is trained.

From Movie Opinions to Any Classification Task

Our example about blending movie opinions from people was helpful to explain the weak supervision framework on an intuitive example. However for movie recommendation use cases there are better algorithms than weak supervision (e.g. collaborative filtering). Weak supervision is powerful because it can been used anywhere where:

  • There is a classification task to be solved
  • You want to use supervised machine learning
  • The dataset to train your model is unlabeled
  • You can use weak label sources

Those requirements are quite flexible making weak supervision versatile for a number of use cases where active learning would have been far more time consuming in terms of manual labeling.

Your unlabeled dataset of documents, images, or customer data can have weak label sources just like you had “opinions from friends” on “movies”. These “friends” can be considered labeling functions which can label only a subset of your rows (in the example that would be only those “movies” they have watched) with accuracy better than random. The “opinions” we had (“👍” or “👎”) are the output labels of the labeling functions.

We can then extend this solution to any machine learning classification problem with missing labels. Those output labels can be only two for binary classification, like in our example, or even more for the multi-class problem. If a labeling function is not able to label a sample it can output a missing value (“-”).

While in active learning the expensive expert was providing labels row by row, in weak supervision we can simply ask the expert to provide a number of labeling functions. By labeling function we mean any heuristic that, in the expert opinion, can label correctly a subset of labels. The expert should provide as many labeling functions as possible that cover as many rows as possible, and that have an accuracy as high as possible (Fig. 1).

Guided Labeling Comparing Weak Supervision with Active Learning

Figure 1 : A possible weak supervision framework: A Domain Expert provides Labeling Functions to the system. The produced weak label sources are fed to the Label Model which outputs the Probabilistic Labels to train the final Discriminative Model.

Labeling functions are only one example of weak label sources though. You can, for example, use predictions of an old model, which was only working for old data points in the training set; you can blend with a public dataset or with information crawled from the internet; ask cheaper non-experts to label your data and treat them as weak label sources. Any strategy that can label a subset of your rows with accuracy better than random labeling can be added to your weak supervision input. The theory behind the Label Model (Fig. 1) algorithm requires all label sources to be independent, however recent research shows that this requirement holds even with a high variety of weak label sources.

When dealing with tons of data and no labels at all, weak supervision's flexibility in blending knowledge from generic different sources can be a solution in training an accurate model without asking any expensive expert to label thousands of samples.

In the next Guided Labeling Blog Post episode we will look at how to train a document classifier in this way, using movie reviews: one more movie example via interactive views!

Stay tuned! And join us on the KNIME Forum to take part in discussions around the Guided Labeling topic on this KNIME Forum thread!

The Guided Labeling Blog Series

By Paolo Tamagnini and Adrian Nembach (KNIME)