The Random Forest model evolved from the simple Decision Tree model, because of the need for more robust classification performance.
A Random Forest is a supervised classification algorithm that builds N slightly differently trained Decision Trees and merges them together to get more accurate and more robust predictions.
The advantage of such a strategy is clear. While the predictions from a single tree are highly sensitive to noise in the training set, predictions from the majority of multiple trees are not - providing the trees are not correlated. Bootstrap sampling is the way to decorrelate the trees by training them on different training sets.
In the first video we briefly explain the theory behind the Random Forest model.
In this second video we briefly explain how to configure the Random Forest Learner and Predictor nodes according to the parameters of a Random Forest model.
The workflow shown in the two videos in this section can be found on the KNIME Hub under this link.
Read the letter-recognition.csv dataset. This dataset was downloaded from UC Irvine Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/Letter+Recognition
Here, we have an image recognition problem. Each image contains an alphabet letter that is described by various measures. Col0 contains the target class (the letter). All other input features are measures of the image.
Train a Random Forest model to predict the alphabet letter in column Col0.
- Partition the dataset into a training set (80%) and a test set (20%). Perform stratified sampling on the target column.
- Train a Random Forest model on the training set to predict values in the target column. Train 5 trees with minimum node size 2.
- Apply the trained model to the test set.
- Evaluate the accuracy of the model by scoring metrics for a classification model.
- Train one random forest with 100 trees and one with 5 trees and compare their performances.
- Use the same training and test set as for the previous model.
- In the step 3, set the “Number of models” setting to 100.
- Complete the steps 4 to 6 as for the previous model. The overall accuracy of the model is 83.6 %.
- The solution workflow is shown below and available for download at https://kni.me/w/siQVYhmV0okIE8Gb
Performance comparison in overall accuracy: