Survival analysis is a statistical method where the focus is on estimating the time until an event of interest occurs (e.g., customer churn, machine failure, patient survival).

Censoring

Censoring occurs when we do not observe the event of interest (e.g., death, churn, failure) for some subjects during the study period. This means we have incomplete data on their event times, but we still know something about them (e.g., they survived at least until a certain time).

  1. Right Censoring (Most Common): The event hasn't happened yet by the end of the study or observation period
  2. Left Censoring: The event already happened before the subject entered the study, but we don’t know when
  3. Interval Censoring: The event happened in a time range, but we don’t know the exact time

Kaplan-Meier Estimator (KM Estimator)

$$ S(t)= \prod_{t_i \leq t} \left( 1 - \frac{d_i}{n_i} \right) $$

In this image, m is number of failure events, q is the number of censored events and n is the number of individuals at risk just before time t

In this image, m is number of failure events, q is the number of censored events and n is the number of individuals at risk just before time t

Kaplan-Meier Curve