Machine Learning Series Day 6 (Decision Tree Regressor)

I promise it’s not just another “ML Article.”

Alex Guanga

Published in

Becoming Human: Artificial Intelligence Magazine

4 min readApr 10, 2019

Photo by Javier Allegue Barros on Unsplash

Terminology:

Variance: It measures how far a data set is spread out.
Mean: A calculated “central” value of a set of numbers.
Standard Deviation: Similar to variance, but it is the root value of the variation.
The coefficient of Variation (CV): A measure of relative variability. The ratio of the standard deviation to the mean. The coefficient of variation is useful when comparing two datasets. The larger the CV, the more spread the data is relative to its mean.

Concept:

Being completely honest, I do not use Decision Tree Regressors often. I must say though, a Decision Tree Regressor does establish an excellent foundation for more popular models like a Random Forest and a Decision Tree Classifier (next post). So, if you prefer to wait and read the articles, I’ll post on Random Forests and Decision Tree Classifier, no heart feelings.

The difference between a Decision Tree Classifier and a Decision Tree Regressor is the type of problem they attempt to solve.

Decision Tree Classifier: It’s used to solve classification problems. For example, they are predicting if a person will have their loan approved.
Decision Tree Regressor: It’s used to solve regression problems. For example, prediction of how many people will die because of an opiate overdose.

Let me provide an example to illustrate a Decision Tree Regressor. You’re in a room with many people. We look at age, education, and gender. The objective is to predict their savings. A quick and straightforward approach is to group similar people and average their saving.

One group could be female master graduates who are between 18 and 35 years old.

Another group could be male college students.

You calculate the group’s saving. Now, whenever you have to predict for a new person, you figure out in what group the individuals fall in and use the group’s saving average for prediction.

Simple enough? I hope so.

Details:

Supervised/Unsupervised: Supervised

Regression/Classification: Regression

Background:

How do we correctly group these students?

First, you must understand that the objective is to reduce the variation for the dependent variable.

The variance is the spread of all the possible values someone could save. People could save as little as $100 to as much as $100,000. Our goal is to reduce the variance by averaging similar groups of people.

If the variation is significant, then calculating/predicting someone’s salary is more challenging to figure, as there’s more variance in correctly predicting someone’s saving.

Well, how do we reduce the variation?

Well, we use the dependent variables to create splits or branches. Your objective is to look for categorization (branches) that will reduce the total variation. Think about it. If your grouping people, wouldn’t you want most people to be similar. If so, you understand Decision Tree Regressor.

Visual:

It gets broken down by categories for continuous variables.

Mathematics/Statistics:

We must calculate the standard variation (square root of the variance) of the predictor variable. Then, we must find the split that can best reduce the total variance.

We have ten people who provided the amount the saved:

Overall metrics (aggregate):

We compute the parameters above, but we compute the categorization of the discrete variables: For example, we compute the calculations for gender (male and female) below:

Female:

Male:

Weighted Average/Reduction:

We must then calculate the weighted average for their standard deviation:

You repeat the step above for all the independent variables. The variable that has the greatest reduction in standard deviation is the most optimal split (branch).

To terminate the splitting process, the CV could be used as a ‘stop’ signal: