2023 |
Tubella, Andrea Aler; Mollo, Dimitri Coelho; Lindström, Adam Dahlgren; Devinney, Hannah; Dignum, Virginia; Ericson, Petter; Jonsson, Ana; Kampik, Timotheus; Lenaerts, Tom; Mendez, Julian Alfredo; Nieves, Juan Carlos ACROCPoLis: A Descriptive Framework for Making Sense of Fairness Proceedings Article In: Proceedings of the 6th ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, pp. 1014-1025, Association for Computing Machinery, 2023, (Conference: 6th ACM Conference on Fairness, Accountability, and Transparency(6: 12/6/2023-15/06/2023: Chicago)). @inproceedings{info:hdl:2013/366626, Fairness is central to the ethical and responsible development and use of AI systems, with a large number of frameworks and formal notions of algorithmic fairness being available. However, many of the fairness solutions proposed revolve around technical considerations and not the needs of and consequences for the most impacted communities. We therefore want to take the focus away from definitions and allow for the inclusion of societal and relational aspects to represent how the effects of AI systems impact and are experienced by individuals and social groups. In this paper, we do this by means of proposing the ACROCPoLis framework to represent allocation processes with a modeling emphasis on fairness aspects. The framework provides a shared vocabulary in which the factors relevant to fairness assessments for different situations and procedures are made explicit, as well as their interrelationships. This enables us to compare analogous situations, to highlight the differences in dissimilar situations, and to capture differing interpretations of the same situation by different stakeholders. |
Nachtegael, Charlotte; Stefani, Jacopo De; Lenaerts, Tom ALAMBIC: Active Learning Automation with Methods to Battle Inefficient Curation Proceedings Article In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 117–127, Association for Computational Linguistics, 2023, (Conference: European Chapter of the Association for Computational Linguistics(17: 2 May 2023 to 4 May 2023: Dubrovnik, Croatia)). @inproceedings{info:hdl:2013/359290, In this paper, we present ALAMBIC, an open-source dockerized web-based platform for annotating text data through active learning for classification task. Active learning is known to reduce the need of labelling, a time-consuming task, by selecting the most informative instances among the unlabelled instances, reaching an optimal accuracy faster than by just randomly labelling data. ALAMBIC integrates all the steps from data import to customization of the (active) learning process and annotation of the data, with indications of the progress of the trained model that can be downloaded and used in downstream tasks. Its architecture also allows the easy integration of other types of model, features and active learning strategies.The code is available on https://github.com/Trusted-AI-Labs/ALAMBIC and a video demonstration is available on https://youtu.be/4oh8UADfEmY. |
Abels, Axel; Lenaerts, Tom; Trianni, Vito; Nowe, Ann Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making Proceedings Article In: Proceedings of the 40th International Conference on Machine Learning: ICML’23, pp. 79-90, PMLR, 2023, (Conference: 40th International Conference on Machine Learning(Honolulu Hawaii USA)). @inproceedings{info:hdl:2013/364331, Experts advising decision-makers are likely to display expertise which varies as a function of the problem instance. In practice, this may lead to sub-optimal or discriminatory decisions against minority cases. In this work, we model such changes in depth and breadth of knowledge as a partitioning of the problem space into regions of differing expertise. We provide here new algorithms that explicitly consider and adapt to the relationship between problem instances and experts’ knowledge. We first propose and highlight the drawbacks of a naive approach based on nearest neighbor queries. To address these drawbacks we then introduce a novel algorithm — expertise trees — that constructs decision trees enabling the learner to select appropriate models. We provide theoretical insights and empirically validate the improved performance of our novel approach on a range of problems for which existing methods proved to be inadequate. |
Claeskens, G.; Jansen, Maarten Comments on: Statistical inference and large-scale multiple testing for high-dimensional regression models Journal Article In: Test, vol. 32, no. 4, pp. 1177-1179, 2023, (DOI: 10.1007/s11749-023-00896-5). @article{info:hdl:2013/371479b, |
Claeskens, G.; Jansen, Maarten; Zhou, Jing Discussion on: “A scale-free approach for false discovery rate control in generalized linear models” by Dai, Lin, Zing, Liu. Journal Article In: Journal of the American Statistical Association, vol. 118, no. 543, pp. 1573-1577, 2023, (Language of publication: fr). @article{info:hdl:2013/359639b, |
Bhattacharya, Shreya; Lefèvre, Laure; Hayakawa, Hisashi; Jansen, Maarten; Clette, Frédéric L. Scale Transfer in 1849: Heinrich Schwabe to Rudolf Wolf Journal Article In: Solar physics, vol. 298, no. 1, pp. 1-12, 2023, (Language of publication: fr). @article{info:hdl:2013/359132b, |
2022 |
Piron, Anthony; Szymczak, Florian; Alvelos, Maria De Oliveira; Defrance, Matthieu; Lenaerts, Tom; Eizirik, Decio L.; Cnop, Miriam RedRibbon: A new rank-rank hypergeometric overlap pipeline to compare gene and transcript expression signatures Journal Article In: BioRxiv, 2022, (DOI: https://doi.org/10.1101/2022.08.31.505818). @article{info:hdl:2013/353212d, Motivation. High throughput omics technologies have generated a wealth of large protein, gene and transcript datasets that have exacerbated the need for new methods to analyse and compare big datasets. Rank-rank hypergeometric overlap is an important threshold-free method to combine and visualize two ranked lists of P-values or fold-changes, usually from differential gene expression analyses. Here, we introduce a new rank-rank hypergeometric overlap-based method aimed at both gene level and alternative splicing analyses at transcript or exon level, hitherto unreachable as transcript numbers are an order of magnitude larger than gene numbers.Results. We tested the tool on synthetic and real datasets at gene and transcript levels to detect correlation and anti-correlation patterns and found it to be fast and accurate, even on very large datasets thanks to an evolutionary algorithm based minimal P-value search. The tool comes with a ready-to-use permutation scheme allowing the computation of adjusted P-values at low time cost. Additionally, the package is a drop-in replacement to previous packages as a compatibility mode is included, allowing to re-run older studies with close to no change to existing pipelines. RedRibbon holds the promise to accurately extricate detailed information from large analyses.Availability. RNA-sequencing datasets are available through the Gene Expression Omnibus (GEO) portal with accession numbers GSE159984, GSE133218, GSE137136, GSE98485, GSE148058 and GSE108413. The C libraries and R package code are open to the community with a permissive licence (GPL3) and available for download from GitHub https://github.com/antpiron/ale, https://github.com/antpiron/cRedRibbon and https://github.com/antpiron/RedRibbon. |
Grolaux, Robin; Hardy, Alexis; Olsen, Catharina; Dooren, Sonia Van; Smits, Guillaume; Defrance, Matthieu Identification of differentially methylated regions in rare diseases from a single-patient perspective Journal Article In: Clinical Epigenetics, vol. 14, no. 1, 2022, (DOI: 10.1186/s13148-022-01403-7). @article{info:hdl:2013/353081, Abstract Background DNA methylation (5-mC) is being widely recognized as an alternative in the detection of sequence variants in the diagnosis of some rare neurodevelopmental and imprinting disorders. Identification of alterations in DNA methylation plays an important role in the diagnosis and understanding of the etiology of those disorders. Canonical pipelines for the detection of differentially methylated regions (DMRs) usually rely on inter-group (e.g., case versus control) comparisons. However, these tools might perform suboptimally in the context of rare diseases and multilocus imprinting disturbances due to small cohort sizes and inter-patient heterogeneity. Therefore, there is a need to provide a simple but statistically robust pipeline for scientists and clinicians to perform differential methylation analyses at the single patient level as well as to evaluate how parameter fine-tuning may affect differentially methylated region detection. Result We implemented an improved statistical method to detect differentially methylated regions in correlated datasets based on the Z-score and empirical Brown aggregation methods from a single-patient perspective. To accurately assess the predictive power of our method, we generated semi-simulated data using a public control population of 521 samples and investigated how the size of the control population, methylation difference, and region size affect DMR detection. In addition, we validated the detection of methylation events in patients suffering from rare multi-locus imprinting disturbance and evaluated how this method could complement existing tools in the context of clinical diagnosis. Conclusion In this study, we present a robust statistical method to perform differential methylation analysis at the single patient level and describe its optimal parameters to increase DMRs identification performance. Finally, we show its diagnostic utility when applied to rare disorders. |
Bizet, Martin; Defrance, Matthieu; Calonne, Emilie; Bontempi, Gianluca; Sotiriou, Christos; Fuks, Franccois; Jeschke, Jana In: Epigenetics, vol. 17, no. 13, pp. 2434-2454, 2022, (DOI: 10.1080/15592294.2022.2135201). @article{info:hdl:2013/353467b, Illumina Infinium DNA Methylation (5mC) arrays are a popular technology for low-cost, high-throughput, genome-scale measurement of 5mC distribution, especially in cancer and other complex diseases. After the success of its HumanMethylation450 array (450k), Illumina released the MethylationEPIC array (850k) featuring increased coverage of enhancers. Despite the widespread use of 850k, analysis of the corresponding data remains suboptimal: it still relies mostly on Illumina’s default annotation, which underestimates enhancerss and long noncoding RNAs. Results: We have thus developed an approach, based on the ENCODE and LNCipedia databases, which greatly improves upon Illumina’s default annotation of enhancers and long noncoding transcripts. We compared the re-annotated 850k with both 450k and reduced-representation bisulphite sequencing (RRBS), another high-throughput 5mC profiling technology. We found 850k to cover at least three times as many enhancers and long noncoding RNAs as either 450k or RRBS. We further investigated the reproducibility of the three technologies, applying various normalization methods to the 850k data. Most of these methods reduced variability to a level below that of RRBS data. We then used 850k with our new annotation and normalization to profile 5mC changes in breast cancer biopsies. 850k highlighted aberrant enhancer methylation as the predominant feature, in agreement with previous reports. Our study provides an updated processing approach for 850k data, based on refined probe annotation and normalization, allowing for improved analysis of methylation at enhancers and long noncoding RNA genes. Our findings will help to further advance understanding of the DNA methylome in health and disease. |
Rivière, Quentin; Corso, Massimiliano; Ciortan, Madalina; Noël, Grégoire; Verbruggen, Nathalie; Defrance, Matthieu Exploiting Genomic Features to Improve the Prediction of Transcription Factor-Binding Sites in Plants. Journal Article In: Plant and Cell Physiology, vol. 63, no. 10, pp. 1457-1473, 2022, (DOI: 10.1093/pcp/pcac095). @article{info:hdl:2013/352290, The identification of transcription factor (TF) target genes is central in biology. A popular approach is based on the location by pattern matching of potential cis-regulatory elements (CREs). During the last few years, tools integrating next-generation sequencing data have been developed to improve the performance of pattern matching. However, such tools have not yet been comprehensively evaluated in plants. Hence, we developed a new streamlined method aiming at predicting CREs and target genes of plant TFs in specific organs or conditions. Our approach implements a supervised machine learning strategy, which allows decision rule models to be learnt using TF ChIP-chip/seq experimental data. Different layers of genomic features were integrated in predictive models: the position on the gene, the DNA sequence conservation, the chromatin state and various CRE footprints. Among the tested features, the chromatin features were crucial for improving the accuracy of the method. Furthermore, we evaluated the transferability of predictive models across TFs, organs and species. Finally, we validated our method by correctly inferring the target genes of key TFs controlling metabolite biosynthesis at the organ level in Arabidopsis. We developed a tool-Wimtrap-to reproduce our approach in plant species and conditions/organs for which ChIP-chip/seq data are available. Wimtrap is a user-friendly R package that supports an R Shiny web interface and is provided with pre-built models that can be used to quickly get predictions of CREs and TF gene targets in different organs or conditions in Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa and Zea mays. |
Ciortan, Madalina; Defrance, Matthieu GNN-based embedding for clustering scRNA-seq data Journal Article In: Bioinformatics, vol. 38, no. 4, pp. 1037-1044, 2022, (DOI: 10.1093/bioinformatics/btab787). @article{info:hdl:2013/343811b, Abstract Motivation Single-cell RNA sequencing (scRNA-seq) provides transcriptomic profiling for individual cells, allowing researchers to study the heterogeneity of tissues, recognize rare cell identities and discover new cellular subtypes. Clustering analysis is usually used to predict cell class assignments and infer cell identities. However, the high sparsity of scRNA-seq data, accentuated by dropout events generates challenges that have motivated the development of numerous dedicated clustering methods. Nevertheless, there is still no consensus on the best performing method. Results graph-sc is a new method leveraging a graph autoencoder network to create embeddings for scRNA-seq cell data. While this work analyzes the performance of clustering the embeddings with various clustering algorithms, other downstream tasks can also be performed. A broad experimental study has been performed on both simulated and scRNA-seq datasets. The results indicate that although there is no consistently best method across all the analyzed datasets, graph-sc compares favorably to competing techniques across all types of datasets. Furthermore, the proposed method is stable across consecutive runs, robust to input down-sampling, generally insensitive to changes in the network architecture or training parameters and more computationally efficient than other competing methods based on neural networks. Modeling the data as a graph provides increased flexibility to define custom features characterizing the genes, the cells and their interactions. Moreover, external data (e.g. gene network) can easily be integrated into the graph and used seamlessly under the same optimization task. Availability and implementation https://github.com/ciortanmadalina/graph-sc. Supplementary information Supplementary data are available at Bioinformatics online. |
Renaux, Alexandre; Terwagne, Chloé CT; Cochez, Michael; Tiddi, Ilaria; Nowé, Ann; Lenaerts, Tom A knowledge graph approach for interpretable prediction of pathogenic genetic interactions Miscellaneous 2022, (Conference: European Conference on Computational Biology (ECCB) 2022 (2022-07: Sitges, Spain)). @misc{info:hdl:2013/352608, An increasing number of clinical studies are reporting patterns of oligogenic inheritance in genetic diseases. Despite the advent of methods able to predict the pathogenicity of variant combinations, the underlying biological mechanisms remain unknown, since these models offer limited interpretability. To advance towards a better understanding of oligogenic disease aetiology, we developed a new interpretable predictive method based on a knowledge graph. This heterogenous network integrates curated oligogenic combinations together with multiple biological networks and biomedical ontologies. Our approach successfully captures association rules solely based on multi-hop relationships between genes. It combines them as a decision set model which can predict the pathogenicity of new gene pairs. These predictions come with explanations, obtained by querying the knowledge graph, which highlight relevant paths. The benchmarking of this model in a cross-validation setting achieves high accuracy and recalls independent gene pairs from recently published digenic combinations. The analysis of the rule-based paths highlights relevant contributors to the disease and shows the ability of this approach to generate knowledge-based hypotheses to investigate new disease mechanisms. |
Abels, Axel; Lenaerts, Tom; Trianni, Vito; Nowé, Ann A New Approach to Handle Non-Stationarity in Collective Decision-Making Miscellaneous 2022, (Conference: ACM Collective Intelligence conference (CI)(Virtual)). @misc{info:hdl:2013/366666, |
Montero-Porras, Eladio; Gruji’c, Jelena; Domingos, Elias Fernandez; Lenaerts, Tom Inferring Strategies from Observations in Long Iterated Prisoner’s Dilemma Experiments Miscellaneous 2022, (Conference: Complex Systems Conference 2022(17-21/10/2022: Palma de Mallorca, Spain)). @misc{info:hdl:2013/366678, |
Versbraegen, Nassim; Gravel, Barbara; Nachtegael, Charlotte; Renaux, Alexandre; Verkinderen, Emma; Nowé, Ann; Lenaerts, Tom; Papadimitriou, Sofia Taking the prediction of pathogenic variant-combinations to the next level with VarCoPP2.0 Miscellaneous 2022, (Conference: European Conference on Computational Biology (21: 12-21 September 2022: Sitges, Barcelona)). @misc{info:hdl:2013/352566, |
Montero-Porras, Eladio; Grujić, Jelena; Domingos, Elias Fernandez; Lenaerts, Tom Inferring Strategies from Observations in Long Iterated Prisoner’s Dilemma Experiments Miscellaneous 2022, (Conference: International Conference on Social Dilemmas(19-22/07/2022: Coppenhagen, Denmark)). @misc{info:hdl:2013/366679, |
Grolaux, Robin; Hardy, Alexis; Olsen, Catharina; Dooren, Sonia Van; Smits, Guillaume; Defrance, Matthieu Identification of differentially methylated regions in rare diseases from a single-patient perspective Journal Article In: Clinical Epigenetics, vol. 14, no. 1, 2022, (DOI: 10.1186/s13148-022-01403-7). @article{info:hdl:2013/353081b, Abstract Background DNA methylation (5-mC) is being widely recognized as an alternative in the detection of sequence variants in the diagnosis of some rare neurodevelopmental and imprinting disorders. Identification of alterations in DNA methylation plays an important role in the diagnosis and understanding of the etiology of those disorders. Canonical pipelines for the detection of differentially methylated regions (DMRs) usually rely on inter-group (e.g., case versus control) comparisons. However, these tools might perform suboptimally in the context of rare diseases and multilocus imprinting disturbances due to small cohort sizes and inter-patient heterogeneity. Therefore, there is a need to provide a simple but statistically robust pipeline for scientists and clinicians to perform differential methylation analyses at the single patient level as well as to evaluate how parameter fine-tuning may affect differentially methylated region detection. Result We implemented an improved statistical method to detect differentially methylated regions in correlated datasets based on the Z-score and empirical Brown aggregation methods from a single-patient perspective. To accurately assess the predictive power of our method, we generated semi-simulated data using a public control population of 521 samples and investigated how the size of the control population, methylation difference, and region size affect DMR detection. In addition, we validated the detection of methylation events in patients suffering from rare multi-locus imprinting disturbance and evaluated how this method could complement existing tools in the context of clinical diagnosis. Conclusion In this study, we present a robust statistical method to perform differential methylation analysis at the single patient level and describe its optimal parameters to increase DMRs identification performance. Finally, we show its diagnostic utility when applied to rare disorders. |
Bizet, Martin; Defrance, Matthieu; Calonne, Emilie; Bontempi, Gianluca; Sotiriou, Christos; Fuks, Franccois; Jeschke, Jana In: Epigenetics, vol. 17, no. 13, pp. 2434-2454, 2022, (DOI: 10.1080/15592294.2022.2135201). @article{info:hdl:2013/353467d, Illumina Infinium DNA Methylation (5mC) arrays are a popular technology for low-cost, high-throughput, genome-scale measurement of 5mC distribution, especially in cancer and other complex diseases. After the success of its HumanMethylation450 array (450k), Illumina released the MethylationEPIC array (850k) featuring increased coverage of enhancers. Despite the widespread use of 850k, analysis of the corresponding data remains suboptimal: it still relies mostly on Illumina’s default annotation, which underestimates enhancerss and long noncoding RNAs. Results: We have thus developed an approach, based on the ENCODE and LNCipedia databases, which greatly improves upon Illumina’s default annotation of enhancers and long noncoding transcripts. We compared the re-annotated 850k with both 450k and reduced-representation bisulphite sequencing (RRBS), another high-throughput 5mC profiling technology. We found 850k to cover at least three times as many enhancers and long noncoding RNAs as either 450k or RRBS. We further investigated the reproducibility of the three technologies, applying various normalization methods to the 850k data. Most of these methods reduced variability to a level below that of RRBS data. We then used 850k with our new annotation and normalization to profile 5mC changes in breast cancer biopsies. 850k highlighted aberrant enhancer methylation as the predominant feature, in agreement with previous reports. Our study provides an updated processing approach for 850k data, based on refined probe annotation and normalization, allowing for improved analysis of methylation at enhancers and long noncoding RNA genes. Our findings will help to further advance understanding of the DNA methylome in health and disease. |
Piron, Anthony; Szymczak, Florian; Alvelos, Maria De Oliveira; Defrance, Matthieu; Lenaerts, Tom; Eizirik, Decio L.; Cnop, Miriam RedRibbon: A new rank-rank hypergeometric overlap pipeline to compare gene and transcript expression signatures Journal Article In: BioRxiv, 2022, (DOI: https://doi.org/10.1101/2022.08.31.505818). @article{info:hdl:2013/353212, Motivation. High throughput omics technologies have generated a wealth of large protein, gene and transcript datasets that have exacerbated the need for new methods to analyse and compare big datasets. Rank-rank hypergeometric overlap is an important threshold-free method to combine and visualize two ranked lists of P-values or fold-changes, usually from differential gene expression analyses. Here, we introduce a new rank-rank hypergeometric overlap-based method aimed at both gene level and alternative splicing analyses at transcript or exon level, hitherto unreachable as transcript numbers are an order of magnitude larger than gene numbers.Results. We tested the tool on synthetic and real datasets at gene and transcript levels to detect correlation and anti-correlation patterns and found it to be fast and accurate, even on very large datasets thanks to an evolutionary algorithm based minimal P-value search. The tool comes with a ready-to-use permutation scheme allowing the computation of adjusted P-values at low time cost. Additionally, the package is a drop-in replacement to previous packages as a compatibility mode is included, allowing to re-run older studies with close to no change to existing pipelines. RedRibbon holds the promise to accurately extricate detailed information from large analyses.Availability. RNA-sequencing datasets are available through the Gene Expression Omnibus (GEO) portal with accession numbers GSE159984, GSE133218, GSE137136, GSE98485, GSE148058 and GSE108413. The C libraries and R package code are open to the community with a permissive licence (GPL3) and available for download from GitHub https://github.com/antpiron/ale, https://github.com/antpiron/cRedRibbon and https://github.com/antpiron/RedRibbon. |
Rivière, Quentin; Corso, Massimiliano; Ciortan, Madalina; Noël, Grégoire; Verbruggen, Nathalie; Defrance, Matthieu Exploiting Genomic Features to Improve the Prediction of Transcription Factor-Binding Sites in Plants. Journal Article In: Plant and Cell Physiology, vol. 63, no. 10, pp. 1457-1473, 2022, (DOI: 10.1093/pcp/pcac095). @article{info:hdl:2013/352290b, The identification of transcription factor (TF) target genes is central in biology. A popular approach is based on the location by pattern matching of potential cis-regulatory elements (CREs). During the last few years, tools integrating next-generation sequencing data have been developed to improve the performance of pattern matching. However, such tools have not yet been comprehensively evaluated in plants. Hence, we developed a new streamlined method aiming at predicting CREs and target genes of plant TFs in specific organs or conditions. Our approach implements a supervised machine learning strategy, which allows decision rule models to be learnt using TF ChIP-chip/seq experimental data. Different layers of genomic features were integrated in predictive models: the position on the gene, the DNA sequence conservation, the chromatin state and various CRE footprints. Among the tested features, the chromatin features were crucial for improving the accuracy of the method. Furthermore, we evaluated the transferability of predictive models across TFs, organs and species. Finally, we validated our method by correctly inferring the target genes of key TFs controlling metabolite biosynthesis at the organ level in Arabidopsis. We developed a tool-Wimtrap-to reproduce our approach in plant species and conditions/organs for which ChIP-chip/seq data are available. Wimtrap is a user-friendly R package that supports an R Shiny web interface and is provided with pre-built models that can be used to quickly get predictions of CREs and TF gene targets in different organs or conditions in Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa and Zea mays. |
Papadimitriou, Sofia; Gravel, Barbara; Nachtegael, Charlotte; Baere, Elfride De; Loeys, Bart; Vikkula, Miikka; Smits, Guillaume; Lenaerts, Tom 2022, (Conference: Rare Med Symposium(8-12-2022: Gent)). @misc{info:hdl:2013/366742, Background/Aims:Reports of oligogenic cases (i.e. individuals whose disease phenotype can only be explained by the co-occurrence of multiple variants in several genes) have been rapidly increasing, in an effort to close the gap of missing genetic diagnoses. Nevertheless, the quality of this data had never been properly assessed, especially as standards and guidelines for such cases are currently missing. This work, aimed to collect all reported oligogenic cases in one database, OLIDA, assess the quality of the reported information and provide, for the first time, recommendations for their proper reporting. Methods:318 research articles reporting oligogenic cases were extracted from PubMed. Independent curators collected the relevant oligogenic information (i) from the articles and (ii) from public relevant databases. With this data, a transparent curation protocol was developed assigning a confidence score to each oligogenic case based on the amount of pathogenic evidence at the genetic and functional level. The collection and assessment of this data led to the creation of OLIDA, the Oligogenic Diseases Database. Results:OLIDA contains information on oligogenic cases linked to 177 different genetic diseases. Each instance is linked with a confidence score depicting the quality of the associated genetic and functional pathogenic evidence. The data revealed that the majority of papers do not provide proper genetic evidence excluding a monogenic model, while this evidence is rarely coupled with functional experiments for confirmation. Our recommendations stress the necessity of fulfilling both conditions. The use of multiple extended pedigrees showing a clear segregation of the reported variants, control cohorts of a suitable size, as well as functional experiments showing the synergistic effect of the involved variants are essential for this purpose. Conclusion:With our work we reveal the recurrent issues on the reporting of oligogenic cases and stress the need for the development of standards in the field. As the number of papers identifying oligogenic causes to disease is increasing rapidly, initiating this discussion is imperative. |
Ciortan, Madalina; Defrance, Matthieu GNN-based embedding for clustering scRNA-seq data Journal Article In: Bioinformatics, vol. 38, no. 4, pp. 1037-1044, 2022, (DOI: 10.1093/bioinformatics/btab787). @article{info:hdl:2013/343811c, Abstract Motivation Single-cell RNA sequencing (scRNA-seq) provides transcriptomic profiling for individual cells, allowing researchers to study the heterogeneity of tissues, recognize rare cell identities and discover new cellular subtypes. Clustering analysis is usually used to predict cell class assignments and infer cell identities. However, the high sparsity of scRNA-seq data, accentuated by dropout events generates challenges that have motivated the development of numerous dedicated clustering methods. Nevertheless, there is still no consensus on the best performing method. Results graph-sc is a new method leveraging a graph autoencoder network to create embeddings for scRNA-seq cell data. While this work analyzes the performance of clustering the embeddings with various clustering algorithms, other downstream tasks can also be performed. A broad experimental study has been performed on both simulated and scRNA-seq datasets. The results indicate that although there is no consistently best method across all the analyzed datasets, graph-sc compares favorably to competing techniques across all types of datasets. Furthermore, the proposed method is stable across consecutive runs, robust to input down-sampling, generally insensitive to changes in the network architecture or training parameters and more computationally efficient than other competing methods based on neural networks. Modeling the data as a graph provides increased flexibility to define custom features characterizing the genes, the cells and their interactions. Moreover, external data (e.g. gene network) can easily be integrated into the graph and used seamlessly under the same optimization task. Availability and implementation https://github.com/ciortanmadalina/graph-sc. Supplementary information Supplementary data are available at Bioinformatics online. |
Abels, Axel; Domingos, Elias Fernandez; Lenaerts, Tom; Trianni, Vito; Nowé, Ann Bias Mitigation in Decision-Making with Expert Advice Miscellaneous 2022, (Conference: Benelux AI Conference (BNAIC) and Benelux machine learning conference (Benelearn)(7-9/11/2022: Antwerpen, Belgique)). @misc{info:hdl:2013/366668b, |
Abels, Axel; Lenaerts, Tom; Trianni, Vito; Nowé, Ann A Novel Approach to Handle Non-stationarity in Collective Decision-Making with Experts Miscellaneous 2022, (Conference: ACM Collective Intelligence Conference 2022(20-21 Octobre 2022: Online)). @misc{info:hdl:2013/352851b, |
Piron, Anthony; Colli, Maikel Luis; Defrance, Matthieu; Eizirik, Decio L.; Mercader, Josep Maria; Cnop, Miriam Identification of novel type 1 and type 2 diabetes genes by colocalisation of human islet eQTL and GWAS variants Miscellaneous 2022, (Conference: EASD Annual Meeting of the European Association for the Study of Diabetes(58th: 19 – 23 September 2022: Stockholm, Sweden)). @misc{info:hdl:2013/353214, |
Montero-Porras, Eladio; Gruji’c, Jelena; Domingos, Elias Fernandez; Lenaerts, Tom Inferring Strategies from Observations in Long Iterated Prisoner’s Dilemma Experiments Miscellaneous 2022, (Conference: Complex Systems Conference 2022(17-21/10/2022: Palma de Mallorca, Spain)). @misc{info:hdl:2013/366678b, |
Versbraegen, Nassim; Gravel, Barbara; Nachtegael, Charlotte; Renaux, Alexandre; Verkinderen, Emma; Nowé, Ann; Lenaerts, Tom; Papadimitriou, Sofia Taking the prediction of pathogenic variant-combinations to the next level with VarCoPP2.0 Miscellaneous 2022, (Conference: European Conference on Computational Biology (21: 12-21 September 2022: Sitges, Barcelona)). @misc{info:hdl:2013/352566b, |
Montero-Porras, Eladio; Grujić, Jelena; Domingos, Elias Fernandez; Lenaerts, Tom Inferring Strategies from Observations in Long Iterated Prisoner’s Dilemma Experiments Miscellaneous 2022, (Conference: International Conference on Social Dilemmas(19-22/07/2022: Coppenhagen, Denmark)). @misc{info:hdl:2013/366679b, |
Terrucha, Ines; Domingos, Elias Fernandez; Santos, Francisco C; Simoens, Pieter; Lenaerts, Tom The art of compensation : how hybrid teams solve collective risk dilemmas Miscellaneous 2022, (Conference: Adaptive and Learning Agents (ALA) Workshop(9-10/5/2022: Auckland, NZ)). @misc{info:hdl:2013/366661, |
Nachtegael, Charlotte; Gravel, Barbara; Dillen, Arnau; Smits, Guillaume; Nowe, Ann; Papadimitriou, Sofia; Lenaerts, Tom Scaling up the oligogenic diseases research with OLIDA: the Oligogenic Diseases Database Miscellaneous 2022, (Conference: Genomics of Rare Disease 2022). @misc{info:hdl:2013/352609b, The study of genetic variation associated with disease has shown the inadequacy of the “one gene – one disease phenotype” paradigm for many cases, leading to the notion of a conceptual continuum starting from monogenic disorders to oligogenic and polygenic diseases. An important step towards understanding non-Mendelian disorders was the creation of the Digenic Diseases Database (DIDA), collecting curated scientific information on digenic variant combinations involved in digenic diseases. Different machine learning methods aiming to tackle the cause of digenic diseases have successfully used DIDA as a benchmark dataset and have been in turn used in scientific studies analysing novel oligogenic cases. While this marked a new age of predictive tools and underlined the importance of DIDA, these advances also demonstrated the need to expand further in the genetic disease continuum, beyond digenic diseases, in a continuous and more careful manner. Moreover, a structured re-evaluation of the inclusion of oligogenic combinations in such a database and their pathogenic link to diseases has become essential, in order to aid researchers in using high-quality and properly curated information when assessing their medical cases. We present OLIDA (https://olida.ibsquare.be/), the Oligogenic Diseases Database, which reinvents DIDA, containing newly and fully re-curated data and freely accessible information on oligogenic variant combinations, i.e. combinations of variants in multiple genes involved in an oligogenic disease, published in the scientific literature until February 2020. The database includes 916 oligogenic variant combinations, 192 of them involving more than two genes, linked to 159 genetic diseases. OLIDA provides, for the first time in the field, a structured protocol for the evaluation of the pathogenicity of each oligogenic combination, based on the genetic and functional evidence supporting it, paying special attention to their joint variant effect. The evidence is derived from a combination of the results presented in the scientific papers and information from knowledge databases, and is depicted with a confidence score. OLIDA further follows the FAIR principles on data management. To conclude, OLIDA is the first database containing oligogenic variant combinations and, for each, a confidence score of its pathogenic involvement in the associated disease. With this work, we are initiating the important discussion on how the evidence of pathogenicity related to oligogenic diseases should be reported and evaluated in the scientific literature, a concept that becomes increasingly important with the growing amount of data in the field. |
Piron, Anthony; Szymczak, Florian; Alvelos, Maria De Oliveira; Defrance, Matthieu; Lenaerts, Tom; Eizirik, Decio L.; Cnop, Miriam RedRibbon: A new rank-rank hypergeometric overlap pipeline to compare gene and transcript expression signatures Journal Article In: BioRxiv, 2022, (DOI: https://doi.org/10.1101/2022.08.31.505818). @article{info:hdl:2013/353212c, Motivation. High throughput omics technologies have generated a wealth of large protein, gene and transcript datasets that have exacerbated the need for new methods to analyse and compare big datasets. Rank-rank hypergeometric overlap is an important threshold-free method to combine and visualize two ranked lists of P-values or fold-changes, usually from differential gene expression analyses. Here, we introduce a new rank-rank hypergeometric overlap-based method aimed at both gene level and alternative splicing analyses at transcript or exon level, hitherto unreachable as transcript numbers are an order of magnitude larger than gene numbers.Results. We tested the tool on synthetic and real datasets at gene and transcript levels to detect correlation and anti-correlation patterns and found it to be fast and accurate, even on very large datasets thanks to an evolutionary algorithm based minimal P-value search. The tool comes with a ready-to-use permutation scheme allowing the computation of adjusted P-values at low time cost. Additionally, the package is a drop-in replacement to previous packages as a compatibility mode is included, allowing to re-run older studies with close to no change to existing pipelines. RedRibbon holds the promise to accurately extricate detailed information from large analyses.Availability. RNA-sequencing datasets are available through the Gene Expression Omnibus (GEO) portal with accession numbers GSE159984, GSE133218, GSE137136, GSE98485, GSE148058 and GSE108413. The C libraries and R package code are open to the community with a permissive licence (GPL3) and available for download from GitHub https://github.com/antpiron/ale, https://github.com/antpiron/cRedRibbon and https://github.com/antpiron/RedRibbon. |
Montero-Porras, Eladio; Grujić, Jelena; Domingos, Elias Fernandez; Lenaerts, Tom Inferring strategies from observations in long iterated Prisoner’s dilemma experiments Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-11654-2). @article{info:hdl:2013/344327b, While many theoretical studies have revealed the strategies that could lead to and maintain cooperation in the Iterated Prisoner’s dilemma, less is known about what human participants actually do in this game and how strategies change when being confronted with anonymous partners in each round. Previous attempts used short experiments, made different assumptions of possible strategies, and led to very different conclusions. We present here two long treatments that differ in the partner matching strategy used, i.e. fixed or shuffled partners. Here we use unsupervised methods to cluster the players based on their actions and then Hidden Markov Model to infer what the memory-one strategies are in each cluster. Analysis of the inferred strategies reveals that fixed partner interaction leads to behavioral self-organization. Shuffled partners generate subgroups of memory-one strategies that remain entangled, apparently blocking the self-selection process that leads to fully cooperating participants in the fixed partner treatment. Analyzing the latter in more detail shows that AllC, AllD, TFT- and WSLS-like behavior can be observed. This study also reveals that long treatments are needed as experiments with less than 25 rounds capture mostly the learning phase participants go through in these kinds of experiments. |
Piron, Anthony; Colli, Maikel Luis; Defrance, Matthieu; Eizirik, Decio L.; Mercader, Josep Maria; Cnop, Miriam Identification of novel type 1 and type 2 diabetes genes by colocalisation of human islet eQTL and GWAS variants Miscellaneous 2022, (Conference: EASD Annual Meeting of the European Association for the Study of Diabetes(58th: 19 – 23 September 2022: Stockholm, Sweden)). @misc{info:hdl:2013/353214b, |
Montero-Porras, Eladio; Lenaerts, Tom; Gallotti, Riccardo; Gruji’c, Jelena Fast deliberation is related to unconditional behaviour in iterated Prisoners’ Dilemma experiments Journal Article In: Scientific Reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-24849-4). @article{info:hdl:2013/366631, Abstract People have different preferences for what they allocate for themselves and what they allocate to others in social dilemmas. These differences result from contextual reasons, intrinsic values, and social expectations. What is still an area of debate is whether these differences can be estimated from differences in each individual’s deliberation process. In this work, we analyse the participants’ reaction times in three different experiments of the Iterated Prisoner’s Dilemma with the Drift Diffusion Model, which links response times to the perceived difficulty of the decision task, the rate of accumulation of information (deliberation), and the intuitive attitudes towards the choices. The correlation between these results and the attitude of the participants towards the allocation of resources is then determined. We observe that individuals who allocated resources equally are correlated with more deliberation than highly cooperative or highly defective participants, who accumulate evidence more quickly to reach a decision. Also, the evidence collection is faster in fixed neighbour settings than in shuffled ones. Consequently, fast decisions do not distinguish cooperators from defectors in these experiments, but appear to separate those that are more reactive to the behaviour of others from those that act categorically. |
Nachtegael, Charlotte; Gravel, Barbara; Dillen, Arnau; Smits, Guillaume; Nowe, Ann; Papadimitriou, Sofia; Lenaerts, Tom Scaling up oligogenic diseases research with OLIDA: The Oligogenic Diseases Database Journal Article In: Database, vol. 2022, 2022, (DOI: 10.1093/database/baac023). @article{info:hdl:2013/342417b, Improving the understanding of the oligogenic nature of diseases requires access to high-quality, well-curated Findable, Accessible, Interoperable, Reusable (FAIR) data. Although first steps were taken with the development of the Digenic Diseases Database, leading to novel computational advancements to assist the field, these were also linked with a number of limitations, for instance, the ad hoc curation protocol and the inclusion of only digenic cases. The OLIgogenic diseases DAtabase (OLIDA) presents a novel, transparent and rigorous curation protocol, introducing a confidence scoring mechanism for the published oligogenic literature. The application of this protocol on the oligogenic literature generated a new repository containing 916 oligogenic variant combinations linked to 159 distinct diseases. Information extracted from the scientific literature is supplemented with current knowledge support obtained from public databases. Each entry is an oligogenic combination linked to a disease, labelled with a confidence score based on the level of genetic and functional evidence that supports its involvement in this disease. These scores allow users to assess the relevance and proof of pathogenicity of each oligogenic combination in the database, constituting markers for reporting improvements on disease-causing oligogenic variant combinations. OLIDA follows the FAIR principles, providing detailed documentation, easy data access through its application programming interface and website, use of unique identifiers and links to existing ontologies. Database URL: https://olida.ibsquare.be |
Domingos, Elias Fernandez; Terrucha, Ines; Suchon, Remi; Grujić, Jelena; Burguillo, Juan J. C.; Santos, Francisco C.; Lenaerts, Tom Delegation to artificial agents fosters prosocial behaviors in the collective risk dilemma Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-11518-9). @article{info:hdl:2013/349554b, Home assistant chat-bots, self-driving cars, drones or automated negotiation systems are some of the several examples of autonomous (artificial) agents that have pervaded our society. These agents enable the automation of multiple tasks, saving time and (human) effort. However, their presence in social settings raises the need for a better understanding of their effect on social interactions and how they may be used to enhance cooperation towards the public good, instead of hindering it. To this end, we present an experimental study of human delegation to autonomous agents and hybrid human-agent interactions centered on a non-linear public goods dilemma with uncertain returns in which participants face a collective risk. Our aim is to understand experimentally whether the presence of autonomous agents has a positive or negative impact on social behaviour, equality and cooperation in such a dilemma. Our results show that cooperation and group success increases when participants delegate their actions to an artificial agent that plays on their behalf. Yet, this positive effect is less pronounced when humans interact in hybrid human-agent groups, where we mostly observe that humans in successful hybrid groups make higher contributions earlier in the game. Also, we show that participants wrongly believe that artificial agents will contribute less to the collective effort. In general, our results suggest that delegation to autonomous agents has the potential to work as commitment devices, which prevent both the temptation to deviate to an alternate (less collectively good) course of action, as well as limiting responses based on betrayal aversion. |
Montero-Porras, Eladio; Grujić, Jelena; Domingos, Elias Fernandez; Lenaerts, Tom Inferring strategies from observations in long iterated Prisoner’s dilemma experiments Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-11654-2). @article{info:hdl:2013/344327c, While many theoretical studies have revealed the strategies that could lead to and maintain cooperation in the Iterated Prisoner’s dilemma, less is known about what human participants actually do in this game and how strategies change when being confronted with anonymous partners in each round. Previous attempts used short experiments, made different assumptions of possible strategies, and led to very different conclusions. We present here two long treatments that differ in the partner matching strategy used, i.e. fixed or shuffled partners. Here we use unsupervised methods to cluster the players based on their actions and then Hidden Markov Model to infer what the memory-one strategies are in each cluster. Analysis of the inferred strategies reveals that fixed partner interaction leads to behavioral self-organization. Shuffled partners generate subgroups of memory-one strategies that remain entangled, apparently blocking the self-selection process that leads to fully cooperating participants in the fixed partner treatment. Analyzing the latter in more detail shows that AllC, AllD, TFT- and WSLS-like behavior can be observed. This study also reveals that long treatments are needed as experiments with less than 25 rounds capture mostly the learning phase participants go through in these kinds of experiments. |
Montero-Porras, Eladio; Lenaerts, Tom; Gallotti, Riccardo; Gruji’c, Jelena Fast deliberation is related to unconditional behaviour in iterated Prisoners’ Dilemma experiments Journal Article In: Scientific Reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-24849-4). @article{info:hdl:2013/366631b, Abstract People have different preferences for what they allocate for themselves and what they allocate to others in social dilemmas. These differences result from contextual reasons, intrinsic values, and social expectations. What is still an area of debate is whether these differences can be estimated from differences in each individual’s deliberation process. In this work, we analyse the participants’ reaction times in three different experiments of the Iterated Prisoner’s Dilemma with the Drift Diffusion Model, which links response times to the perceived difficulty of the decision task, the rate of accumulation of information (deliberation), and the intuitive attitudes towards the choices. The correlation between these results and the attitude of the participants towards the allocation of resources is then determined. We observe that individuals who allocated resources equally are correlated with more deliberation than highly cooperative or highly defective participants, who accumulate evidence more quickly to reach a decision. Also, the evidence collection is faster in fixed neighbour settings than in shuffled ones. Consequently, fast decisions do not distinguish cooperators from defectors in these experiments, but appear to separate those that are more reactive to the behaviour of others from those that act categorically. |
Han, The Anh T. A. H.; Lenaerts, Tom; Santos, Francisco C.; Pereira, Luís Moniz Voluntary safety commitments provide an escape from over-regulation in AI development Journal Article In: Technology in society, vol. 68, 2022, (DOI: 10.1016/j.techsoc.2021.101843). @article{info:hdl:2013/339040, With the introduction of Artificial Intelligence (AI) and related technologies in our daily lives, fear and anxiety about their misuse as well as their inherent biases, incorporated during their creation, have led to a demand for governance and associated regulation. Yet regulating an innovation process that is not well understood may stifle this process and reduce benefits that society may gain from the generated technology, even under the best intentions. Instruments to shed light on such processes are thus needed as they can ensure that imposed policies achieve the ambitions for which they were designed. Starting from a game-theoretical model that captures the fundamental dynamics of a race for domain supremacy using AI technology, we show how socially unwanted outcomes may be produced when sanctioning is applied unconditionally to risk-taking, i.e. potentially unsafe, behaviours. We demonstrate here the potential of a regulatory approach that combines a voluntary commitment approach reminiscent of soft law, wherein technologists have the freedom of choice between independently pursuing their course of actions or establishing binding agreements to act safely, with either a peer or governmental sanctioning system of those that do not abide by what they pledged. As commitments are binding and sanctioned, they go beyond the classic view of soft law, akin more closely to actual law-enforced regulation. Overall, this work reveals how voluntary but sanctionable commitments generate socially beneficial outcomes in all scenarios envisageable in a short-term race towards domain supremacy through AI technology. These results provide an original dynamic systems perspective of the governance potential of enforceable soft law techniques or co-regulatory mechanisms, showing how they may impact the ambitions of developers in the context of the AI-based applications. |
Bizet, Martin; Defrance, Matthieu; Calonne, Emilie; Bontempi, Gianluca; Sotiriou, Christos; Fuks, Franccois; Jeschke, Jana In: Epigenetics, vol. 17, no. 13, pp. 2434-2454, 2022, (DOI: 10.1080/15592294.2022.2135201). @article{info:hdl:2013/353467, Illumina Infinium DNA Methylation (5mC) arrays are a popular technology for low-cost, high-throughput, genome-scale measurement of 5mC distribution, especially in cancer and other complex diseases. After the success of its HumanMethylation450 array (450k), Illumina released the MethylationEPIC array (850k) featuring increased coverage of enhancers. Despite the widespread use of 850k, analysis of the corresponding data remains suboptimal: it still relies mostly on Illumina’s default annotation, which underestimates enhancerss and long noncoding RNAs. Results: We have thus developed an approach, based on the ENCODE and LNCipedia databases, which greatly improves upon Illumina’s default annotation of enhancers and long noncoding transcripts. We compared the re-annotated 850k with both 450k and reduced-representation bisulphite sequencing (RRBS), another high-throughput 5mC profiling technology. We found 850k to cover at least three times as many enhancers and long noncoding RNAs as either 450k or RRBS. We further investigated the reproducibility of the three technologies, applying various normalization methods to the 850k data. Most of these methods reduced variability to a level below that of RRBS data. We then used 850k with our new annotation and normalization to profile 5mC changes in breast cancer biopsies. 850k highlighted aberrant enhancer methylation as the predominant feature, in agreement with previous reports. Our study provides an updated processing approach for 850k data, based on refined probe annotation and normalization, allowing for improved analysis of methylation at enhancers and long noncoding RNA genes. Our findings will help to further advance understanding of the DNA methylome in health and disease. |
Paldino, Gian Marco; Caro, Fabrizio De; Stefani, Jacopo De; Vaccaro, Alfredo A.; Villacci, Domenico D.; Bontempi, Gianluca A Digital Twin Approach for Improving Estimation Accuracy in Dynamic Thermal Rating of Transmission Lines Journal Article In: Energies, vol. 15, no. 6, 2022, (DOI: 10.3390/en15062254). @article{info:hdl:2013/342471b, The limitation of transmission lines thermal capacity plays a crucial role in the safety and reliability of power systems. Dynamic thermal line rating approaches aim to estimate the transmission line’s temperature and assess its compliance with the limitations above. Existing physics-based standards estimate the temperature based on environment and line conditions measured by several sensors. This manuscript shows that estimation accuracy can be improved by adopting a data-driven Digital Twin approach. The proposed method exploits machine learning by learning the input–output relation between the physical sensors data and the actual conductor temperature, serving as a digital equivalent to physics-based standards. An experimental assessment on real data, comparing the proposed approach with the IEEE 738 standard, shows a reduction of 60% of the Root Mean Squared Error and a decrease in the maximum estimation error from above 10 °C to below 7 °C. These preliminary results suggest that the Digital Twin provides more accurate and robust estimations, serving as a complement, or a potential alternative, to traditional methods. |
Marquis, Bastien; Jansen, Maarten Information criteria bias correction for group selection Journal Article In: Statistical papers, 2022, (Language of publication: fr). @article{info:hdl:2013/335472, |
Cimpeanu, Theodor; Santos, Francisco C.; Pereira, Luís Marcelo; Lenaerts, Tom; Han, The Anh T. A. H. Artificial intelligence development races in heterogeneous settings Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-05729-3). @article{info:hdl:2013/341515, |
Montero-Porras, Eladio; Grujić, Jelena; Domingos, Elias Fernández; Lenaerts, Tom Inferring strategies from observations in long iterated Prisoner’s dilemma experiments Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-11654-2). @article{info:hdl:2013/344327, |
Marquis, Bastien; Jansen, Maarten Information criteria bias correction for group selection Journal Article In: Statistical papers, vol. 63, no. 5, pp. 1387-1414, 2022, (Language of publication: fr). @article{info:hdl:2013/335472b, |
Ciortan, Madalina; Defrance, Matthieu GNN-based embedding for clustering scRNA-seq data Journal Article In: Bioinformatics, vol. 38, no. 4, pp. 1037-1044, 2022, (DOI: 10.1093/bioinformatics/btab787). @article{info:hdl:2013/343811, |
Bizet, Martin; Defrance, Matthieu; Calonne, Emilie; Bontempi, Gianluca; Sotiriou, Christos; Fuks, Franccois; Jeschke, Jana In: Epigenetics, vol. 17, no. 13, pp. 2434-2454, 2022, (DOI: 10.1080/15592294.2022.2135201). @article{info:hdl:2013/353467c, Illumina Infinium DNA Methylation (5mC) arrays are a popular technology for low-cost, high-throughput, genome-scale measurement of 5mC distribution, especially in cancer and other complex diseases. After the success of its HumanMethylation450 array (450k), Illumina released the MethylationEPIC array (850k) featuring increased coverage of enhancers. Despite the widespread use of 850k, analysis of the corresponding data remains suboptimal: it still relies mostly on Illumina’s default annotation, which underestimates enhancerss and long noncoding RNAs. Results: We have thus developed an approach, based on the ENCODE and LNCipedia databases, which greatly improves upon Illumina’s default annotation of enhancers and long noncoding transcripts. We compared the re-annotated 850k with both 450k and reduced-representation bisulphite sequencing (RRBS), another high-throughput 5mC profiling technology. We found 850k to cover at least three times as many enhancers and long noncoding RNAs as either 450k or RRBS. We further investigated the reproducibility of the three technologies, applying various normalization methods to the 850k data. Most of these methods reduced variability to a level below that of RRBS data. We then used 850k with our new annotation and normalization to profile 5mC changes in breast cancer biopsies. 850k highlighted aberrant enhancer methylation as the predominant feature, in agreement with previous reports. Our study provides an updated processing approach for 850k data, based on refined probe annotation and normalization, allowing for improved analysis of methylation at enhancers and long noncoding RNA genes. Our findings will help to further advance understanding of the DNA methylome in health and disease. |
Simar, Cédric; Petit, Robin; Bozga, Nichita; Leroy, Axelle; Alvarez, Ana Maria Cebolla; Petieau, Mathieu; Bontempi, Gianluca; Chéron, Guy Riemannian classification of single-trial surface EEG and sources during checkerboard and navigational images in humans. Journal Article In: PloS one, vol. 17, no. 1, pp. e0262417, 2022, (DOI: 10.1371/journal.pone.0262417). @article{info:hdl:2013/366038b, Different visual stimuli are classically used for triggering visual evoked potentials comprising well-defined components linked to the content of the displayed image. These evoked components result from the average of ongoing EEG signals in which additive and oscillatory mechanisms contribute to the component morphology. The evoked related potentials often resulted from a mixed situation (power variation and phase-locking) making basic and clinical interpretations difficult. Besides, the grand average methodology produced artificial constructs that do not reflect individual peculiarities. This motivated new approaches based on single-trial analysis as recently used in the brain-computer interface field. |
Jansen, Maarten Wavelets from a Statistical Perspective Book CRC Press, 2022, (Language of publication: fr). @book{info:hdl:2013/333285, |
Marquis, Bastien; Jansen, Maarten Information criteria bias correction for group selection Journal Article In: Statistical papers, vol. 63, no. 5, pp. 1387-1414, 2022, (Language of publication: fr). @article{info:hdl:2013/335472c, |
Journals and Conferences Publications
2023 |
ACROCPoLis: A Descriptive Framework for Making Sense of Fairness Proceedings Article In: Proceedings of the 6th ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, pp. 1014-1025, Association for Computing Machinery, 2023, (Conference: 6th ACM Conference on Fairness, Accountability, and Transparency(6: 12/6/2023-15/06/2023: Chicago)). |
ALAMBIC: Active Learning Automation with Methods to Battle Inefficient Curation Proceedings Article In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 117–127, Association for Computational Linguistics, 2023, (Conference: European Chapter of the Association for Computational Linguistics(17: 2 May 2023 to 4 May 2023: Dubrovnik, Croatia)). |
Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making Proceedings Article In: Proceedings of the 40th International Conference on Machine Learning: ICML’23, pp. 79-90, PMLR, 2023, (Conference: 40th International Conference on Machine Learning(Honolulu Hawaii USA)). |
Comments on: Statistical inference and large-scale multiple testing for high-dimensional regression models Journal Article In: Test, vol. 32, no. 4, pp. 1177-1179, 2023, (DOI: 10.1007/s11749-023-00896-5). |
Discussion on: “A scale-free approach for false discovery rate control in generalized linear models” by Dai, Lin, Zing, Liu. Journal Article In: Journal of the American Statistical Association, vol. 118, no. 543, pp. 1573-1577, 2023, (Language of publication: fr). |
Scale Transfer in 1849: Heinrich Schwabe to Rudolf Wolf Journal Article In: Solar physics, vol. 298, no. 1, pp. 1-12, 2023, (Language of publication: fr). |
2022 |
RedRibbon: A new rank-rank hypergeometric overlap pipeline to compare gene and transcript expression signatures Journal Article In: BioRxiv, 2022, (DOI: https://doi.org/10.1101/2022.08.31.505818). |
Identification of differentially methylated regions in rare diseases from a single-patient perspective Journal Article In: Clinical Epigenetics, vol. 14, no. 1, 2022, (DOI: 10.1186/s13148-022-01403-7). |
In: Epigenetics, vol. 17, no. 13, pp. 2434-2454, 2022, (DOI: 10.1080/15592294.2022.2135201). |
Exploiting Genomic Features to Improve the Prediction of Transcription Factor-Binding Sites in Plants. Journal Article In: Plant and Cell Physiology, vol. 63, no. 10, pp. 1457-1473, 2022, (DOI: 10.1093/pcp/pcac095). |
GNN-based embedding for clustering scRNA-seq data Journal Article In: Bioinformatics, vol. 38, no. 4, pp. 1037-1044, 2022, (DOI: 10.1093/bioinformatics/btab787). |
A knowledge graph approach for interpretable prediction of pathogenic genetic interactions Miscellaneous 2022, (Conference: European Conference on Computational Biology (ECCB) 2022 (2022-07: Sitges, Spain)). |
A New Approach to Handle Non-Stationarity in Collective Decision-Making Miscellaneous 2022, (Conference: ACM Collective Intelligence conference (CI)(Virtual)). |
Inferring Strategies from Observations in Long Iterated Prisoner’s Dilemma Experiments Miscellaneous 2022, (Conference: Complex Systems Conference 2022(17-21/10/2022: Palma de Mallorca, Spain)). |
Taking the prediction of pathogenic variant-combinations to the next level with VarCoPP2.0 Miscellaneous 2022, (Conference: European Conference on Computational Biology (21: 12-21 September 2022: Sitges, Barcelona)). |
Inferring Strategies from Observations in Long Iterated Prisoner’s Dilemma Experiments Miscellaneous 2022, (Conference: International Conference on Social Dilemmas(19-22/07/2022: Coppenhagen, Denmark)). |
Identification of differentially methylated regions in rare diseases from a single-patient perspective Journal Article In: Clinical Epigenetics, vol. 14, no. 1, 2022, (DOI: 10.1186/s13148-022-01403-7). |
In: Epigenetics, vol. 17, no. 13, pp. 2434-2454, 2022, (DOI: 10.1080/15592294.2022.2135201). |
RedRibbon: A new rank-rank hypergeometric overlap pipeline to compare gene and transcript expression signatures Journal Article In: BioRxiv, 2022, (DOI: https://doi.org/10.1101/2022.08.31.505818). |
Exploiting Genomic Features to Improve the Prediction of Transcription Factor-Binding Sites in Plants. Journal Article In: Plant and Cell Physiology, vol. 63, no. 10, pp. 1457-1473, 2022, (DOI: 10.1093/pcp/pcac095). |
2022, (Conference: Rare Med Symposium(8-12-2022: Gent)). |
GNN-based embedding for clustering scRNA-seq data Journal Article In: Bioinformatics, vol. 38, no. 4, pp. 1037-1044, 2022, (DOI: 10.1093/bioinformatics/btab787). |
Bias Mitigation in Decision-Making with Expert Advice Miscellaneous 2022, (Conference: Benelux AI Conference (BNAIC) and Benelux machine learning conference (Benelearn)(7-9/11/2022: Antwerpen, Belgique)). |
A Novel Approach to Handle Non-stationarity in Collective Decision-Making with Experts Miscellaneous 2022, (Conference: ACM Collective Intelligence Conference 2022(20-21 Octobre 2022: Online)). |
Identification of novel type 1 and type 2 diabetes genes by colocalisation of human islet eQTL and GWAS variants Miscellaneous 2022, (Conference: EASD Annual Meeting of the European Association for the Study of Diabetes(58th: 19 – 23 September 2022: Stockholm, Sweden)). |
Inferring Strategies from Observations in Long Iterated Prisoner’s Dilemma Experiments Miscellaneous 2022, (Conference: Complex Systems Conference 2022(17-21/10/2022: Palma de Mallorca, Spain)). |
Taking the prediction of pathogenic variant-combinations to the next level with VarCoPP2.0 Miscellaneous 2022, (Conference: European Conference on Computational Biology (21: 12-21 September 2022: Sitges, Barcelona)). |
Inferring Strategies from Observations in Long Iterated Prisoner’s Dilemma Experiments Miscellaneous 2022, (Conference: International Conference on Social Dilemmas(19-22/07/2022: Coppenhagen, Denmark)). |
The art of compensation : how hybrid teams solve collective risk dilemmas Miscellaneous 2022, (Conference: Adaptive and Learning Agents (ALA) Workshop(9-10/5/2022: Auckland, NZ)). |
Scaling up the oligogenic diseases research with OLIDA: the Oligogenic Diseases Database Miscellaneous 2022, (Conference: Genomics of Rare Disease 2022). |
RedRibbon: A new rank-rank hypergeometric overlap pipeline to compare gene and transcript expression signatures Journal Article In: BioRxiv, 2022, (DOI: https://doi.org/10.1101/2022.08.31.505818). |
Inferring strategies from observations in long iterated Prisoner’s dilemma experiments Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-11654-2). |
Identification of novel type 1 and type 2 diabetes genes by colocalisation of human islet eQTL and GWAS variants Miscellaneous 2022, (Conference: EASD Annual Meeting of the European Association for the Study of Diabetes(58th: 19 – 23 September 2022: Stockholm, Sweden)). |
Fast deliberation is related to unconditional behaviour in iterated Prisoners’ Dilemma experiments Journal Article In: Scientific Reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-24849-4). |
Scaling up oligogenic diseases research with OLIDA: The Oligogenic Diseases Database Journal Article In: Database, vol. 2022, 2022, (DOI: 10.1093/database/baac023). |
Delegation to artificial agents fosters prosocial behaviors in the collective risk dilemma Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-11518-9). |
Inferring strategies from observations in long iterated Prisoner’s dilemma experiments Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-11654-2). |
Fast deliberation is related to unconditional behaviour in iterated Prisoners’ Dilemma experiments Journal Article In: Scientific Reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-24849-4). |
Voluntary safety commitments provide an escape from over-regulation in AI development Journal Article In: Technology in society, vol. 68, 2022, (DOI: 10.1016/j.techsoc.2021.101843). |
In: Epigenetics, vol. 17, no. 13, pp. 2434-2454, 2022, (DOI: 10.1080/15592294.2022.2135201). |
A Digital Twin Approach for Improving Estimation Accuracy in Dynamic Thermal Rating of Transmission Lines Journal Article In: Energies, vol. 15, no. 6, 2022, (DOI: 10.3390/en15062254). |
Information criteria bias correction for group selection Journal Article In: Statistical papers, 2022, (Language of publication: fr). |
Artificial intelligence development races in heterogeneous settings Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-05729-3). |
Inferring strategies from observations in long iterated Prisoner’s dilemma experiments Journal Article In: Scientific reports, vol. 12, no. 1, 2022, (DOI: 10.1038/s41598-022-11654-2). |
Information criteria bias correction for group selection Journal Article In: Statistical papers, vol. 63, no. 5, pp. 1387-1414, 2022, (Language of publication: fr). |
GNN-based embedding for clustering scRNA-seq data Journal Article In: Bioinformatics, vol. 38, no. 4, pp. 1037-1044, 2022, (DOI: 10.1093/bioinformatics/btab787). |
In: Epigenetics, vol. 17, no. 13, pp. 2434-2454, 2022, (DOI: 10.1080/15592294.2022.2135201). |
Riemannian classification of single-trial surface EEG and sources during checkerboard and navigational images in humans. Journal Article In: PloS one, vol. 17, no. 1, pp. e0262417, 2022, (DOI: 10.1371/journal.pone.0262417). |
Wavelets from a Statistical Perspective Book CRC Press, 2022, (Language of publication: fr). |
Information criteria bias correction for group selection Journal Article In: Statistical papers, vol. 63, no. 5, pp. 1387-1414, 2022, (Language of publication: fr). |