In this document we describe the use of the coda_coxnet() function for the identification of microbial signatures in survival studies. A detailed description of the methodology can be found in Pujolassos, Susín and Calle (2024) Microbiome compositional data analysis for survival studies. NARGAB. https://doi.org/10.1093/nargab/lqae038

1 coda4microbiome for survival studies main functions

2 Survival data description

We illustrate the coda_coxnet() algorithm with data_survival, a dataset included in the coda4microbiome package. This survival dataset was obtained through simulation from the Crohn dataset, also included in the package.

Crohn dataset includes a matrix of microbiome compositions at genus level (48 taxa) of a Crohn’s disease study (Gevers et al., 2014). We randomly selected 75 Crohn’s patients and 75 controls and simulated time-to-event data. Event is set to 1 for those diagnosed with Crohn’s disease and 0 for those not presenting disease. Event_time corresponds to the time at which patients developed Crohn’s disease or to censored time for controls.

3 coda_coxnet model

We first load survival data contained in the package. Then, we call function coda_coxnet() with the default parameters to identify a microbial signature associated to the risk of developing crohn’s disease:

library(coda4microbiome)
data(data_survival) # load data
set.seed(12345)
crohn_cox <- coda_coxnet(x,
                         time = Event_time,
                         status = Event,
                         covar = NULL,
                         lambda = "lambda.1se", nvar = NULL,
                         alpha = 0.9, nfolds = 10, showPlots = FALSE,
                         coef_threshold = 0)

3.1 coda_coxnet() output

The first plot shows the cross-validation accuracy (AUC) curve from cv.glmnet(). We can see that for the default for lambda, i.e. “lambda.1se”, the algorithm selects 6 pairwise log-ratios that correspond to 9 different taxa (see below).

The output of coda_coxnet provides the number, the name and the coefficients of the selected taxa:

crohn_cox$taxa.num
#> [1]  5  8 19 24 31 32 33 39 42

crohn_cox$taxa.name
#> [1] "f__Peptostreptococcaceae_g__" "g__Bacteroides"              
#> [3] "g__Dialister"                 "f__Gemellaceae_g__"          
#> [5] "g__Prevotella"                "g__Roseburia"                
#> [7] "g__Lachnospira"               "g__Streptococcus"            
#> [9] "g__Conchiformibius"

crohn_cox$`log-contrast coefficients`
#> [1] -0.13376011 -0.17842674  0.29181907  0.02155809 -0.22935776 -0.29181907
#> [7] -0.16663633  0.50819611  0.17842674

The algorithm identified a microbiome signature composed of 6 log-ratios that correspond to 9 different taxa.

A graphical representation of the signature can be obtained with the signature plot that represents the selected taxa and their estimated regression coefficients.

crohn_cox$`signature plot`

We summarized the individual risk of developing the event of interest as the microbial risk score. It is defined by the identified microbial signature and their respective abundances in each subject.

A graphical representation of the microbial risk score can be obtained either calling the object crohn_cox$'risk score plot', or running plot_riskscore() function as below:

plot_riskscore(risk.score = crohn_cox$risk.score, 
               x,
               time = Event_time,
               status = Event,
               showPlots = FALSE)

The algorithm provides three classification accuracy measures of the Harrell’s C-index:

  • apparent Cindex: C-index of the signature applied to the same data that was used to generate the model
crohn_cox$`apparent Cindex`
#> [1] 0.7345628
  • mean cv-Cindex and sd cv-Cindex: mean and standard deviation of cross-validation C-indexes

crohn_cox$`mean cv-Cindex` ; crohn_cox$`sd cv-Cindex`
#> [1] 0.6798188
#> [1] 0.02599014

Finally, we can obtain the survival curves stratifying samples based on their microbial risk score. This provides a graphical assessment of the classification accuracy of the microbiome signature. The default stratifying threshold is set to 0.5 (median of the microbial risk score), but it can be adjusted by the user.

survcurve <- plot_survcurves(risk.score = crohn_cox$risk.score,
                time = Event_time,
                status = Event,
                strata.quantile = 0.5)
#> Loading required package: ggplot2
#> Loading required package: ggpubr

References

Gevers D, Kugathasan S, Denson LA et al. The treatment-naïve microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15:382–92