Logistic Regression

Logistic regression model is expressed as

$$ P(y=1|X) = \frac{1}{1+e^{-(linear\_regression\_equation)}} \\

P(y=1|X) = \frac{1}{1+e^{-(\beta_{0}+\beta_{1}X_{1}+....+\beta_{n}X_{n})}} $$

Sigmoid function or logit function

$$ p = \frac{1}{1+e^{-y}} = \frac{e^y}{1+e^{y}} $$

or it can be written in a different way as log of odds

$$ y = log(\frac{p}{1-p}) $$

$$ L(\beta_{0}, \beta_{1}...| y_{i}, X_{i}) = \prod_{i=1}^{N}P(y_{i}|X_{i})^{y_{i}}(1-P(y_{i}|X_{i})^{(1-y_{i})}) $$

where

$P(y_{i}|X_{i})$ is the predicted probability for observation i computed using logistic regression function
$y_{i}$ is the actual observed outcome (0 or 1)

Our objective is to maximize this likelihood i.e. find the model parameters which makes the observed data most probable under the model
We maximize log likelihood instead of likelihood since both are maximized at the same point

$$ log L(\beta_{0}, \beta_{1}...| y_{i}, X_{i}) = \sum_{i=1}^{N}y_{i}logP(y_{i}|X_{i})+(1-y_{i})log(1-P(y_{i}|X_{i})) $$

There is no closed form solution for this, so we use gradient descent
Cost function for logistic regression is negative of log-likelihood a.k.a. cross-entropy loss

<aside> ⚡

The maximum value of likelihood is 1 and log-likelihood is 0. This can only happen if we find the best model that fits the observed data perfectly.

In reality, likelihood would be between 0 and 1, while log likelihood would be between -infinity and 0

</aside>