ID3 (Iterative Dichotomiser 3)

ID3 is a classic algorithm for building decision trees, a popular machine learning model for classification. Developed by Ross Quinlan, ID3 operates in a top-down, recursive manner. It selects features based on their information gain, aiming to maximize the homogeneity of classes within each node of the tree. The algorithm recursively partitions the dataset into subsets based on the chosen features until either a predetermined depth is reached or the data becomes perfectly classified. ID3 is foundational in the field of decision tree learning and forms the basis for later tree-building algorithms like C4.5 and CART.

Algorithm

The ID3 algorithm begins with the original set $S$ as the root node. On each iteration of the algorithm, it iterates through every unused attribute of the set $S$ and calculates the entropy $H (S)$ or the information gain $IG (S)$ of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set $S$ is then split or partitioned by the selected attribute to produce subsets of the data.The algorithm continues to recurse on each subset, considering only attributes never selected before.

Recursion on a subset may stop in one of these cases:

every element in the subset belongs to the same class; in which case the node is turned into a leaf node and labelled with the class of the examples.
there are no more attributes to be selected, but the examples still do not belong to the same class. In this case, the node is made a leaf node and labelled with the most common class of the examples in the subset.
there are no examples in the subset, which happens when no example in the parent set was found to match a specific value of the selected attribute. An example could be the absence of a person among the population with age over 100 years. Then a leaf node is created and labelled with the most common class of the examples in the parent node's set.

Properties

ID3 does not guarantee an optimal solution. It can converge upon local optima. It uses a greedy strategy by selecting the locally best attribute to split the dataset on each iteration. The algorithm's optimality can be improved by using backtracking during the search for the optimal decision tree at the cost of possibly taking longer.
ID3 can overfit the training data. To avoid overfitting, smaller decision trees should be preferred over larger ones. This algorithm usually produces small trees, but it does not always produce the smallest possible decision tree.
ID3 is harder to use on continuous data than on factored data (factored data has a discrete number of possible values, thus reducing the possible branch points). If the values of any given attribute are continuous, then there are many more places to split the data on this attribute, and searching for the best value to split by can be time-consuming.