7.6 Composite Models

7.6.1 Random Forests

One simple yet effective composite model is to have an averaging over decision trees, known as a random forest. The idea is to have a number of decision trees, each of which can make a prediction on each example, and to aggregate the predictions of the trees to make a prediction of the forest for each example.

In order to make this work effectively, the trees that make up the forest need to make diverse predictions. There are a number of ways that we can ensure diversity:

  • A subset of the features could be used for each tree. Rather than using all of the features, a random subset of, say one third of the features could be used for each tree.

  • Instead of splitting on the best feature, the tree could choose the best from a smaller set of candidate features at each split. The set of features to choose from could change for each tree or even for each node.

  • Each tree could use a different subset of the examples to train on. Suppose there are m training examples. If there are many examples, each tree could use just a few of them. In bagging a random subset (with replacement) of m examples is selected for each tree to train on. In each of these sets, some examples are not chosen, and some are duplicated. On average, each set contains about 63% of the original examples.

Once the trees have been trained, a prediction can use the average of the predictions of the tree for a probabilistic prediction. Alternatively, each tree can vote with its most likely classification, and the prediction with the most votes can be used.