Logistic Regression

In statistics, logistic regression, or logit regression, or logit model is a regression model used to predict a categorical or nominal class. Classic logistic regression works for a binary class problem. The algorithm extends to multinomial logistic regression when more than two outcome classes are required. If output classes are also ordered we talk about ordinal logistic regression.

Below are three videos. The first one describes the algorithm behind the logistic regression. The second one shows the basic settings needed to train and apply a logistic regression model within a KNIME workflow; the third video shows additional settings available to train a Logistic Regression algorithm.

 

 

 

The workflow shown in this video can be found on the EXAMPLES server under 04_Analytics/04_Classification_and_Predictive_Modelling/06_Logistic_Regression04_Analytics/04_Classification_and_Predictive_Modelling/06_Logistic_Regression*.

Exercise

Read the wine.csv dataset.

Train a Logistic Regression Model to predict whether a wine is red or white.

  • Use the Normalizer(PMML) node to z normalize all numerical columns.
  • Partition the dataset into a training set (80%) and a test set (20%) using the Partitioning node with the stratified sampling option on the column “Income”.
  • Use the Logistic Regression Learner Node to train the model on the training set and the Logistic Regression Predictor Node to apply the model to the test set.
  • Use the Scorer node to evaluate the accuracy of the model.

 

Solution

With the Logistic Regression Model we can reach an accuracy of 99% on the test set.

A possible solution can be found on the EXAMPLE Server:
04_Analytics/04_Classification_and_Predictive_Modelling/06_Logistic_Regression04_Analytics/04_Classification_and_Predictive_Modelling/06_Logistic_Regression*


* The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher)