Initial prediction: Avg. value
Build trees iteratively to predict the residuals (actual-predicted) from the previous tree
Residual is the gradient of loss function w.r.t. predicted. This give the name gradient descent to this algo
Remember the gradient descent formula for regression, we are doing kind of the similar thing:
New Prediction = Old Prediction - Alpha x Gradient w.r.t prediction
Combine the output from trees using a learning rate
$$ Loss Function = \sum \frac{1}{2}(Actual - Predicted)^2 $$
Loss function is minimized for initial value = avg. value
This is the same reason for value in the leaf node being average of all values
$$ Odds = \frac{Count \ of \ Ones}{ Count \ of \ Zeros} \\ $$
$$ Prob = \frac{e^{log(odds)}}{1+e^{log(odds)}} $$
log(odds)
prediction