Videos
$$
\begin{align*}
P(Hypothesis|Evidence) &=\frac{P(Evidence|Hypothesis)P(Hypothesis)}{P(Evidence)} \\
\\
&= \frac{P(E|H)P(H)}{P(E|H)P(H)+P(E|notH)P(not H)} \\
\end{align*}
$$
- P(H) is the prior probability
- P(H|E) is the posterior probability
- P(H|E) and P(E|H) are called likelihoods
Naive Bayes
Video: Naive Bayes, Clearly Explained!!!
For spam filter example:
- Find the P(Spam | Text) and P(Normal | Text). Label message as spam based on relative values of these probabilities
- If Text = “Hello World”, then
P(Spam | Text)
is proportional to P(Hello | Spam) x P(World | Spam) x P(Spam)
- Similarly for
P(Normal | Text)
- Note that we ignoring the denominator which is P(Text) which is same for both
- If a word never comes in spam or normal, then P(Word | Spam) would be zero. That would make the P(Spam | Text) = 0
- To overcome this, we make sure that at least one count for each word for spam and normal message. This means that each word occurs at least once in both spam and normal message data
Why is Naive Bayes naive?
Because it doesn’t take into account the order of words by assuming that each word is independent of each other. That allows us to write:
P(Hello, World | Spam) = P(Hello | Spam) P(World | Spam)