ROC Curve

An ROC Curve shows the classification model performance as described by the false positive rate and the true positive rate. By looking at the shape of the ROC curve, you can compare both performances of different models and find the optimal threshold value to classify the data based on their predicted class probabilities. The “area under the curve” statistics complement the visual presentation of the model performance.

You can find the plot and statistics in the interactive output view of the ROC Curve (JavaScript) node.

 

The workflow shown in the "ROC Curve of a Classification Model" video can be found on the EXAMPLES server under 04_Analytics/10_Scoring/01_Evaluating_Classification_Model_Performance

Exercise:

Read the predicted_gender.csv dataset.

The “sex” column contains people’s actual gender: “Female” or “Male”. The “Prediction (sex) ...” columns contain their gender values predicted by two different classification models - a decision tree (DT) and logistic regression model (LR). The “P(sex=Female)...” columns contain the predicted probabilities of being female produced by the two models.

1. Evaluate the performance of the decision tree model using the ROC Curve (JavaScript) node.

  • Set the Class column, Positive class value, and Columns containing the positive class probabilities in the configuration dialog
  • Execute the node and open the interactive view
  • What is the area under the curve for the decision tree model?

2. Compare the performance of the decision tree and logistic regression models by plotting their ROC curves in the same graph.

  • Open the configuration dialog of the ROC Curve (JavaScript) node
  • Add the relevant columns to the Columns containing the positive class probabilities
  • Execute the node and open the interactive view
  • Which of the models perform better?
  • What is the area under the curve for the logistic regression model?

3. Open the interactive view again and change the title of the view to “Performance of Decision Tree and Logistic Regression Models in Predicting Gender”.

Solution

 

Connect the output port of the File Reader node to the top input port of the “ROC Curve (JavaScript)” node. Open the configuration dialog of the ROC Curve (JavaScript) node and select “sex” as the Class column and “Female” as the Positive class value. Add the “P(sex=Female)DT” column to the Columns containing the positive class probabilities.

The area under the curve is 0.849 for the decision tree.

Open the configuration dialog, and add the “P(sex=Female)LR” column to the Columns containing the positive class probabilities.

The area under the curve is 0.931 for the logistic regression model. Therefore, the logistic regression model performs better than the decision tree model in predicting the gender value of the people in the dataset.

Open the menu in the upper right corner of the interactive view, and select “Chart Subtitle”. Write “Performance of Decision Tree and Logistic Regression Models Predicting Gender” in the field.

You can download the solution workflow here.