Computational Biology and Functional Genomics Laboratory

Gianluca Bontempi , Benjamin Haibe-kains , Christine Desmedt, Christos Sotiriou, John Quackenbush

* The authors contributed equally to the work

This page describes the R code, which contains the implementation of the causal ranking approach and
allows to reproduce the experiments described in Section 5.
The code is made of a script causalrank.R, a set of functions contained in metafs.R and a R workspace data.RData.
To run the script “causalrank.R” make sure you put all the files in the same working directory and type source(“causalrank.R”) in the R command window.

causalrank.R : this script performs the causal ranking for all the datasets contained in data.RData for different values of the causalisation parameter lambda.
First it loads the R workspace and then for values of lambda ranging over [0,2] it returns the associated ranking into the output file all.lambda.Rdata.

metafs.R: this file contains the set of functions needed to implement the causal ranking. In particular the function mmultirank
implements the causal ranking for a given parameter lambda when a list of input/output datasets is provided.

data.RData: this R workspace contains a single object 'datas.m' that is a list of the 6 datasets described in the paper (Table 1) and composed of three items:

datas: matrix of (frma) normalized gene expressions with patients in rows and probes in columns.
annots: dataframe of annotations of the microarray platforms (Affymetrix GeneChip HGU133A); probes in rows and annotations (probe identifier, gene symbol, EntrezGene ID, ...) in columns
demos: dataframe of clinical information with patients in rows, clinical parameters in columns. These clinical parameters are the following:

'er': Estrogen receptor status
'pgr': Progesterone receptor status
'her2': Human epidermal growth factor 2 status
'size': Tumor size (cm)
'node': Nodal status
'age': Age at diagnosis (years)
'grade': Histological grade
'surv.time': Time for disease free survival (distant metastasis or recurrence free survival)
'surv.event': Event for disease free survival (distant metastasis or recurrence free survival)
'surv.bin': Binarized survival data to represent high-risk patient (patients who relapsed before 5 years after diagnosis) and low-risk patients (patients who remain free of relapse for at least five years)

The Gene Set Enrichment Analysis (GSEA) analysis, presented in Figure 4, was performed by running the R scripts "gsea_mimo.R", followed by "gsea_mimo2.R". The R code and the companion files are included in gsea_mimo.zip.

gsea_mimo_res.zip contains all the detailed results from the preranked GSEA analyses.

Multiple outputs and causal ranking strategies for gene selection

Gianluca Bontempi *, Benjamin Haibe-kains *, Christine Desmedt, Christos Sotiriou, John Quackenbush

Gianluca Bontempi , Benjamin Haibe-kains , Christine Desmedt, Christos Sotiriou, John Quackenbush