Just KNIME It!

Prove your KNIME knowledge and practice your workflow building skills by solving our weekly challenges.

Here is how the challenges work:

     We post a challenge on Wednesday
     You create a solution with KNIME
     Upload it to your public KNIME Hub Space
     Post it in the KNIME Forum

Our solution to the challenge comes out on the following Tuesday.

Challenge 23: Modeling Churn Predictions - Part 1

Level: Easy

Description: A telecom company wants you to predict which customers are going to churn (that is, are going to cancel their contracts) based on attributes of their accounts. To this end, you are expected to use a decision tree classifier. The company gives you two datasets (training and test), both with many attributes and the class ‘Churn’ to be predicted (value 0 corresponds to customers that do not churn, and 1 corresponds to those who do). You should train the decision tree classifier with the training data, and assess its quality over the test data (calculate the accuracy, precision, recall, and confusion matrix for example). Note 1: This challenge is a simple introduction to predictive problems, focusing on classification. You are expected to just apply a decision tree classifier (and get an accuracy of about 92%).  A simple solution should consist of 5 nodes. Note 2: In this challenge, do not change the statistical distribution of any attribute or class in the datasets, and use all available attributes. Note 3: Need more help to understand the problem? Check this blog post out.

Author: Aline Bessa

Dataset: Training and Test Data in the KNIME Hub

Solution Summary: Using the learner-predictor paradigm, we trained a decision tree classifier over the training data and assessed its performance over the test data. When training the decision tree, we used Gini index as a metric for the quality of the decision tree, pruned it using the MDL method, and kept at least 6 records per node. By doing this, we achieved an accuracy of about 94%.

Solution Details: After reading the training and test datasets with two instances of the CSV Reader node, we used the Decision Tree Learner node to train a decision tree classifier, and the Decision Tree Predictor node to apply it over the test data in order to assess its performance. Finally, we used the Scorer node to check how well the model classified the test instances. Note: Decision tree models have a number of parameters that can be tuned in order to generate better models. We'll be discussing parameter tuning and model selection later in this series of challenges.

See our solution in the KNIME Hub
A connected brain as a metaphor for learning processes

Never miss a challenge! Sign up for weekly reminder e-mails.

10 Challenge Club

Congratulations to the KNinjas who have aced 10 “Just KNIME It” challenges!

The 10 Challenge Club celebrates "Just KNIME It!" participants who have completed at least 10 challenges. How many challenges have you solved?

Just KNIME It
Previous Just KNIME It! Challenges
LinkedInTwitterShare