Predictive Survival Analysis
The tidymodels group just released new versions of the core packages that enable (among other things) models for outcomes with censoring.
Censoring is a situation, usually seen in time-to-event data, where we have partial information. For example, suppose we order something online that is expected to be delivered in 2 days. After a day, we don’t know the actual delivery time, but we know that the value is at least a day. This data point is right censored at a value of 1 day.
tidymodels.org has a few articles on modeling these data with the new functionality.
The main distinction for these models is how you quantify model performance. Most modern survival models don’t focus on the predicted event time but emphasize predicting the probability of the event not occurring (e.g., surviving) up to time point \(t_0\). Because of this, we need to use dynamic performance metrics: these are metrics that judge performance at different points in time. Here’s a plot from an analysis where the Brier statistic is computed over a relevant time range:
In this case, the large Brier score values at the first time point indicates mediocre performance. As the evaluation time progresses, the score becomes smaller (which is good) and the model does very well.
To include this type of model, there weren’t many syntax changes:
- Many functions now have an
eval_time
argument to take a vector of time points to evaluate performance measures. - There are some new performance statistics.
- Before modeling, you should probably create a
Surv
object.
Hopefully, we will soon be doing specific tidymodels tutorials on this subject (perhaps at useR). We also have two talks accepted at Posit Conf later this year.