Regression

  1. Initial prediction: Avg. value

  2. Build trees iteratively to predict the residuals (actual-predicted) from the previous tree

    1. Residual is the gradient of loss function w.r.t. predicted. This give the name gradient descent to this algo

    2. Remember the gradient descent formula for regression, we are doing kind of the similar thing:

      New Prediction = Old Prediction - Alpha x Gradient w.r.t prediction

  3. Combine the output from trees using a learning rate

$$ Loss Function = \sum \frac{1}{2}(Actual - Predicted)^2 $$

Loss function is minimized for initial value = avg. value

This is the same reason for value in the leaf node being average of all values

Classification

$$ Odds = \frac{Count \ of \ Ones}{ Count \ of \ Zeros} \\ $$

$$ Prob = \frac{e^{log(odds)}}{1+e^{log(odds)}} $$

  1. Initial prediction: log(odds)
  2. Build trees to predict residuals
  3. Leaf node would have log(odds) prediction