Gianluca Bontempi *, Benjamin Haibe-kains *, Christine Desmedt, Christos Sotiriou, John Quackenbush
* The authors contributed equally to the work
This page describes the R code, which contains the implementation of the causal ranking approach and
allows to reproduce the experiments described in Section 5.
The code is made of a script causalrank.R, a set of functions contained in metafs.R and a R workspace data.RData.
To run the script “causalrank.R” make sure you put all the files in the same working directory and type source(“causalrank.R”) in the R command window.
- causalrank.R : this script performs the
causal ranking for all the datasets contained in data.RData for
different values of the causalisation parameter lambda.
First it loads the R workspace and then for values of lambda ranging over [0,2] it returns the associated ranking into the output file all.lambda.Rdata.
- metafs.R: this file contains the set of functions needed to implement the causal ranking. In particular the function mmultirank
implements the causal ranking for a given parameter lambda when a list of input/output datasets is provided.
- data.RData: this R workspace contains a single object 'datas.m' that is a list of the 6 datasets described in the paper (Table 1) and composed of three items:
- datas: matrix of (frma) normalized gene expressions with patients in rows and probes in columns.
- annots: dataframe of annotations of the microarray platforms (Affymetrix GeneChip HGU133A); probes in rows and annotations (probe identifier, gene symbol, EntrezGene ID, ...) in columns
- demos: dataframe of clinical information with patients in rows, clinical parameters in columns. These clinical parameters are the following:
- 'er': Estrogen receptor status
- 'pgr': Progesterone receptor status
- 'her2': Human epidermal growth factor 2 status
- 'size': Tumor size (cm)
- 'node': Nodal status
- 'age': Age at diagnosis (years)
- 'grade': Histological grade
- 'surv.time': Time for disease free survival (distant metastasis or recurrence free survival)
- 'surv.event': Event for disease free survival (distant metastasis or recurrence free survival)
- 'surv.bin': Binarized survival data to represent high-risk
patient (patients who relapsed before 5 years after diagnosis) and
low-risk patients (patients who remain free of relapse for at least
five years)
The Gene Set Enrichment Analysis (GSEA) analysis, presented in Figure 4, was performed by running the R scripts "gsea_mimo.R", followed by "gsea_mimo2.R". The R code and the companion files are included in gsea_mimo.zip.
gsea_mimo_res.zip contains all the detailed results from the preranked GSEA analyses.