In this document we describe the use of the coda_coxnet()
function for the identification of microbial signatures in survival studies. A detailed description of the methodology can be found in Pujolassos, Susín and Calle (2024) Microbiome compositional data analysis for survival studies. NARGAB. https://doi.org/10.1093/nargab/lqae038
coda_coxnet()
: The algorithm performs variable selection through an elastic-net penalized Cox regression conveniently adapted to CoDA. The result is expressed as the (weighted) balance between two groups of taxa. A microbial risk score is provided for each sample based on the microbial abundances of the taxa included in the model. It allows the use of non-compositional covariates.plot_riskscore()
: the function plots the time of occurrence of the event (or the censored time) for the different samples that are ordered according to their microbial risk scores obtained with coda_coxnet()
function.plot_survcurves()
: the function plots survival curves stratifying samples in two groups according to their microbial risk scores obtained with coda_coxnet()
function.We illustrate the coda_coxnet()
algorithm with data_survival, a dataset included in the coda4microbiome package. This survival dataset was obtained through simulation from the Crohn dataset, also included in the package.
Crohn dataset includes a matrix of microbiome compositions at genus level (48 taxa) of a Crohn’s disease study (Gevers et al., 2014). We randomly selected 75 Crohn’s patients and 75 controls and simulated time-to-event data. Event is set to 1 for those diagnosed with Crohn’s disease and 0 for those not presenting disease. Event_time corresponds to the time at which patients developed Crohn’s disease or to censored time for controls.
We first load survival data contained in the package. Then, we call function coda_coxnet()
with the default parameters to identify a microbial signature associated to the risk of developing crohn’s disease:
library(coda4microbiome)
data(data_survival) # load data
set.seed(12345)
crohn_cox <- coda_coxnet(x,
time = Event_time,
status = Event,
covar = NULL,
lambda = "lambda.1se", nvar = NULL,
alpha = 0.9, nfolds = 10, showPlots = FALSE,
coef_threshold = 0)
The first plot shows the cross-validation accuracy (AUC) curve from cv.glmnet()
. We can see that for the default for lambda, i.e. “lambda.1se”, the algorithm selects 6 pairwise log-ratios that correspond to 9 different taxa (see below).
The output of coda_coxnet provides the number, the name and the coefficients of the selected taxa:
crohn_cox$taxa.num
#> [1] 5 8 19 24 31 32 33 39 42
crohn_cox$taxa.name
#> [1] "f__Peptostreptococcaceae_g__" "g__Bacteroides"
#> [3] "g__Dialister" "f__Gemellaceae_g__"
#> [5] "g__Prevotella" "g__Roseburia"
#> [7] "g__Lachnospira" "g__Streptococcus"
#> [9] "g__Conchiformibius"
crohn_cox$`log-contrast coefficients`
#> [1] -0.13376011 -0.17842674 0.29181907 0.02155809 -0.22935776 -0.29181907
#> [7] -0.16663633 0.50819611 0.17842674
The algorithm identified a microbiome signature composed of 6 log-ratios that correspond to 9 different taxa.
A graphical representation of the signature can be obtained with the signature plot that represents the selected taxa and their estimated regression coefficients.
crohn_cox$`signature plot`
We summarized the individual risk of developing the event of interest as the microbial risk score. It is defined by the identified microbial signature and their respective abundances in each subject.
A graphical representation of the microbial risk score can be obtained either calling the object crohn_cox$'risk score plot'
, or running plot_riskscore()
function as below:
plot_riskscore(risk.score = crohn_cox$risk.score,
x,
time = Event_time,
status = Event,
showPlots = FALSE)
The algorithm provides three classification accuracy measures of the Harrell’s C-index:
apparent Cindex
: C-index of the signature applied to the same data that was used to generate the modelcrohn_cox$`apparent Cindex`
#> [1] 0.7345628
mean cv-Cindex
and sd cv-Cindex
: mean and standard deviation of cross-validation C-indexes
crohn_cox$`mean cv-Cindex` ; crohn_cox$`sd cv-Cindex`
#> [1] 0.6798188
#> [1] 0.02599014
Finally, we can obtain the survival curves stratifying samples based on their microbial risk score. This provides a graphical assessment of the classification accuracy of the microbiome signature. The default stratifying threshold is set to 0.5 (median of the microbial risk score), but it can be adjusted by the user.
survcurve <- plot_survcurves(risk.score = crohn_cox$risk.score,
time = Event_time,
status = Event,
strata.quantile = 0.5)
#> Loading required package: ggplot2
#> Loading required package: ggpubr
Gevers D, Kugathasan S, Denson LA et al. The treatment-naïve microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15:382–92