General Concepts

Regression

  1. Initial prediction: Avg. value

  2. Build trees iteratively to predict the residuals (actual-predicted) from the previous tree

    1. Residual is the gradient of loss function w.r.t. predicted. This give the name gradient descent to this algo

    2. Remember the gradient descent formula for regression, we are doing kind of the similar thing:

      New Prediction = Old Prediction - Alpha x Gradient w.r.t prediction

  3. Combine the output from trees using a learning rate

$$ Loss Function = \sum \frac{1}{2}(Actual - Predicted)^2 $$

Loss function is minimized when initial value is set as avg. value of all observations

This is the same reason for value in the leaf node being average of all values

Classification

$$ Odds = \frac{Count \ of \ Ones}{ Count \ of \ Zeros} \\ $$

$$ Prob = \frac{e^{log(odds)}}{1+e^{log(odds)}} $$

  1. Initial prediction: log(odds)
  2. Build trees to predict residuals. We fit trees to the gradient of the loss function