Skip to content

Survival Analysis

ngboost-lightning supports time-to-event modeling with right-censored data via LightningBoostSurvival. This estimator uses a censored log-likelihood scoring rule that properly handles observations where the event was not observed (censored).

Setup

Survival data has two target arrays:

  • T — observed time (time-to-event or time-to-censoring)
  • E — event indicator (1 = event observed, 0 = censored)
from ngboost_lightning import LightningBoostSurvival, Weibull

surv = LightningBoostSurvival(
    dist=Weibull,
    n_estimators=200,
    learning_rate=0.05,
)
surv.fit(X_train, T_train, E_train)

Predictions

dist = surv.pred_dist(X_test)

# Predicted survival time
dist.mean()

# Survival function: P(T > t)
import numpy as np
t = np.array([1.0, 2.0, 5.0])
survival_probs = np.exp(dist.logsf(t))

# Hazard-related quantities
dist.logpdf(t)  # log-density
dist.cdf(t)     # CDF = 1 - survival function

Compatible Distributions

Any distribution that implements logsf (log survival function) can be used with the survival estimator:

Distribution Parameters Notes
Weibull log_scale, log_concentration Primary choice for survival
LogNormal mu, log_sigma Log-normal survival times
Exponential log_rate Constant hazard rate

Weibull is the most common choice because it can model both increasing and decreasing hazard rates depending on the learned concentration parameter.

How Censored Training Works

The CensoredLogScore scoring rule modifies the standard log-likelihood:

  • Uncensored observations (E=1): uses logpdf(t) — the event was observed, so we maximize the density at the event time.
  • Censored observations (E=0): uses logsf(t) — we only know the event didn't happen before time t, so we maximize the survival probability.

Gradients flow through both paths, so the model learns from censored observations without imputing event times.

Early Stopping

Early stopping works the same as for regression — pass validation data with event indicators:

surv.fit(
    X_train, T_train, E_train,
    X_val=X_val, T_val=T_val, E_val=E_val,
    early_stopping_rounds=10,
)

Evaluation

The evaluation module provides survival-specific metrics:

from ngboost_lightning.evaluation import concordance_index, calibration_survival

# Concordance index (discrimination)
c_index = concordance_index(dist, T_test, E_test)

# Survival calibration curve
obs, exp = calibration_survival(dist, T_test, E_test)

See Evaluation for more details.

Scoring

The score() method returns the negative mean censored log-likelihood (higher is better, following the sklearn convention):

score = surv.score(X_test, T_test, E_test)