Survival Analysis

Survival analysis is a statistical method where the focus is on estimating the time until an event of interest occurs (e.g., customer churn, machine failure, patient survival).

Censoring

Censoring occurs when we do not observe the event of interest (e.g., death, churn, failure) for some subjects during the study period. This means we have incomplete data on their event times, but we still know something about them (e.g., they survived at least until a certain time).

Right Censoring (Most Common): The event hasn't happened yet by the end of the study or observation period
Left Censoring: The event already happened before the subject entered the study, but we don’t know when
Interval Censoring: The event happened in a time range, but we don’t know the exact time

Kaplan-Meier Estimator (KM Estimator)

A non-parametric method
Calculate the probability that an event has not occurred by a given time i.e the probability of survival
- You can understand it as % of individuals who survive at time $t_i$, but it gets complicated with censored variables

$$ S(t)= \prod_{t_i \leq t} \left( 1 - \frac{d_i}{n_i} \right) $$

where…
- S(t) = Estimated survival probability at time t
- $t_i$ = Time points where events occur
- $d_i$ = Number of events (failures) at time $t_i$
- $n_i$ = Number of individuals at risk just before $t_i$

In this image, m is number of failure events, q is the number of censored events and n is the number of individuals at risk just before time t

Kaplan-Meier Curve

Y-axis (Survival Probability S(t)): Probability of survival beyond time t.
X-axis (Time to Event): Duration (e.g., time to death, churn, failure).
Steps in the Curve: Drops occur at event times (when an event happens).