Sowing a decision tree
This is my first post - wish me luck!
OK! Let me speak English first - It's gotten so much easier to do data science (duh! tell me about all the .fit() and .predict() that have made our lives easy!) that we have forgotten or simply don't care about the math that drives some of these predictions. I myself can vouch for how many times my brain froze even for the simplest of the concepts (err.... what is supervised learning? just kidding.. but can you tell me what an Entropy is?) that I sometimes can't recollect. This is one such post! I decided to write about decisions trees because I use it so often and frankly its embarrassing to know how to drive but not know the primary difference between electric and gas cars! So, getting back to decision trees - I will try to explain this concept and provide an overview of the technicalities in lay man terms but if you are not familiar with the following terms I suggest you google the following terms before reading further: predictive modeling, classification in ML, probability concepts
Tree based models are so common and widely used for sometime now. These models have evolved so much across the years, and gotten only better after each iteration. Supervised learning algorithm makes certain assumptions, based on which the data is split, and is then used to make predictions. In general, these models are not super hard to understand and can be easily understood by a non technical audience. Decision trees are of two types - Classification and Regression, popularly know as CART models.
Limitations
Trees can be very non-robust. A small change in the training data can result in a large change in the tree and consequently the final predictions.[22]
The problem of learning an optimal decision tree is known to be NP-complete under several aspects of optimality and even for simple concepts.[26][27] Consequently, practical decision-tree learning algorithms are based on heuristics such as the greedy algorithm where locally optimal decisions are made at each node. Such algorithms cannot guarantee to return the globally optimal decision tree. To reduce the greedy effect of local optimality, some methods such as the dual information distance (DID) tree were proposed.[28]
Decision-tree learners can create over-complex trees that do not generalize well from the training data. (This is known as overfitting.[29]) Mechanisms such as pruning are necessary to avoid this problem (with the exception of some algorithms such as the Conditional Inference approach, that does not require pruning).[15][16]
The average depth of the tree that is defined by the number of nodes or tests till classification is not guaranteed to be minimal or small under various splitting criteria.[30]
For data including categorical variables with different numbers of levels, information gain in decision trees is biased in favor of attributes with more levels.[31] However, the issue of biased predictor selection is avoided by the Conditional Inference approach,[15] a two-stage approach,[32] or adaptive leave-one-out feature selection.[33]
Construction of a decision tree
Decision Tree Pruning
Decision Tree Based Techniques
Bagging
Random Forest
Boosted Trees
Gradient Boosting
Extreme Gradient Boosting (or XGBoost)
Decision Tree Parameters
Helpful Links
1. How to calculate ideal Decision Tree depth without overfitting?