Decision Trees

OVERVIEW

Decision Trees (DTs) are a type of supervised learning algorithm that is used for both classification and regression tasks. At their core, DTs model decisions and their possible consequences, including chance event outcomes, resource costs, and utility. They are a non-linear means of data analysis, allowing for complex data relationships to be modeled. DTs are widely appreciated for their simplicity and interpretability, as they mimic human decision-making processes more closely than other algorithms.

DTs work by repeatedly splitting the data into smaller subsets based on certain criteria, which results in a tree-like model of decisions. The final model resembles an inverted tree, with the root at the top and the leaves representing the decision outcomes or predictions.

Applications of Decision Trees:

Classification: Determining the category of an object based on its attributes. For example, deciding if an email is spam or not based on its content.

Regression: Predicting a continuous quantity. For example, estimating the price of a house based on features like size and location.

How and Why GINI, Entropy, and Information Gain Are Used

Gini Impurity and Entropy are measures used to quantify the "purity" of a node in a decision tree. A node is considered "pure" if all its samples belong to the same class. These metrics help in choosing the best attribute for splitting the data at each step, aiming to increase the homogeneity of nodes.

Gini Impurity: Measures the probability of incorrectly classifying a randomly chosen element if it was randomly labeled according to the distribution of labels in the subset. A Gini score of 0 indicates perfect purity.

Entropy: Represents the amount of information disorder or uncertainty. An entropy of 0 means that all samples in a node belong to a single class, indicating complete purity.

Information Gain: Calculated as the difference in entropy or Gini impurity before and after a dataset is split on an attribute. It measures the change in information entropy, guiding the selection of the attribute that achieves the most homogeneous sub-nodes.

An example of classification of birds using Decision Trees

Example Using Entropy and Information Gain

Consider a dataset with 10 instances, 6 of which are positive (+) and 4 are negative (-). We want to evaluate a split based on an attribute, X, that divides the dataset into two groups:

Group 1: 2 positive, 2 negative

Group 2: 4 positive, 2 negative

Step 1: Calculate Overall Entropy Before the Split (S)

Step 2: Calculate Entropy After the Split (S1)

Step 3: Calculate Weighted Average of Entropy for the Split

Step 4: Calculate Information Gain

The attribute resulting in the highest Information Gain would be chosen for the split at that node.

Infinite Possibility of Trees

Theoretically, it's possible to create an infinite number of decision trees from a given dataset by changing parameters such as depth limits, minimum samples per leaf, and the criteria for splitting (Gini vs. Entropy). Without constraints, a decision tree could keep splitting until each leaf node represents only one sample, leading to a highly complex model that perfectly fits the training data but likely overfits, performing poorly on unseen data. This flexibility allows for the creation of a vast number of trees, but in practice, we seek to balance the model's complexity with its predictive power to avoid overfitting.

Screenshot 2024-03-26 at 10.35.08 PM.png