Complete List of Works – Machine Learning Group

Lunghi, Daniele

Toward Robust Credit Card Fraud Detection: A Domain-Specific Study of Adversarial Machine Learning PhD Thesis

2025.

Abstract | BibTeX

Njoku, Uchechukwu U. F.; Abelló, Alberto; Bilalli, Besim; Bontempi, Gianluca

On many-objective feature selection and the need for interpretability Journal Article

In: Expert systems with applications, vol. 267, 2025, (DOI: 10.1016/j.eswa.2024.126191).

Abstract | Links | BibTeX

Nachtegael, Charlotte

Active learning for biomedical relation extraction, the oligogenic use case PhD Thesis

2024.

Abstract | Links | BibTeX

@phdthesis{nachtegael2024active,

title = {Active learning for biomedical relation extraction, the oligogenic use case},

author = {Charlotte Nachtegael},

url = {https://difusion.ulb.ac.be/vufind/Record/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/375304/TOC},

year  = {2024},

date = {2024-06-28},

urldate = {2024-06-28},

abstract = {In a context where technological advancements have enabled increased availability of genetic data through high-throughput sequencing technologies, the complexity of genetic diseases has become increasingly apparent. Oligogenic diseases, characterised by a combination of genetic variants in two or more genes, have emerged as a crucial research area, challenging the traditional model of "one genotype, one phenotype". Thus, understanding the underlying mechanisms and genetic interactions of oligogenic diseases has become a major priority in biomedical research. This context underlines the importance of developing dedicated tools to study these complex diseases.Our first major contribution, OLIDA, is an innovative database designed to collect data on variant combinations responsible for these diseases, filling significant gaps in the current knowledge, focused up until now on the digenic diseases. This resource, accessible via a web platform, adheres to FAIR principles and represents a significant advancement over its predecessor, DIDA, in terms of data curation and quality assessment.Furthermore, to support the biocuration of oligogenic diseases, we used active learning to construct DUVEL, a biomedical corpus focused on digenic variant combinations. To achieve this, we first investigated how to optimise these methods across numerous biomedical relation extraction datasets and developed a web-based platform, ALAMBIC, for text annotation using active learning. Our results and the quality of the corpus obtained demonstrate the effectiveness of active learning methods in biomedical relation annotation tasks.By establishing a curation pipeline for oligogenic diseases, as well as a standards for integrating active learning methods into biocuration, our work represents a significant advancement in the field of biomedical natural language processing and the understanding of oligogenic diseases.

},

keywords = {},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

Verhelst, Theo

Causal and predictive modeling of customer churn – Lessons learned from empirical and theoretical research PhD Thesis

2024.

Abstract | Links | BibTeX

@phdthesis{nokey,

title = {Causal and predictive modeling of customer churn - Lessons learned from empirical and theoretical research},

author = {Theo Verhelst},

url = {https://difusion.ulb.ac.be/vufind/Record/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/368384/Holdings},

year  = {2024},

date = {2024-01-29},

urldate = {2024-01-29},

abstract = {Customer churn is an important concern for large companies, especially in the 

telecommunications sector. Customer retention campaigns are often used to mitigate 

churn, but targeting the right customers based on their historical profiles 

presents an important challenge. Companies usually have recourse to two datadriven 

approaches: churn prediction and uplift modeling. In churn prediction, 

customers are selected on the basis of their propensity to churn in the near future. 

In uplift modeling, only customers who react positively to the campaign 

are considered. Uplift modeling is used in various other domains, such as marketing, 

healthcare, and finance. Despite the theoretical appeal of uplift modeling, its 

added value with respect to conventional machine learning approaches has rarely 

been quantified in the literature. 

This doctoral thesis is the result of a collaborative research project between 

the Machine Learning Group (ULB) and Orange Belgium, funded by Innoviris. 

This collaboration offers a unique research opportunity to assess the added value 

of causal-oriented strategies to address customer churn in the telecommunication 

sector. Following the introduction, we give the necessary background in probability 

theory, causality theory, and machine learning, and we describe the state of 

the art in uplift modeling and counterfactual identification. Then, we present the 

contributions of this thesis: 

• An empirical comparison of various predictive and causal models for selecting 

customers in churn prevention campaigns. We perform several benchmarks 

of different state-of-the-art approaches on real-world datasets and in 

live campaigns with our industrial partner, we propose a new approach that 

exploits domain knowledge to improve predictions, and we make available 

the first public churn dataset for uplift modeling, whose unique characteristics 

make it more challenging than the few other public uplift datasets. 

• Counterfactual identification allows one to classify the different behaviors 

of customers in response to a marketing incentive. This can be used to establish 

profiles of customers sensitive to the campaign, and subsequently 

improve marketing operations. We derive novel bounds and point estimators 

on the probability of counterfactual statements based on uplift models. 

• A comprehensive comparison of predictive and uplift modeling, starting 

from firm theoretical foundations and highlighting the parameters that influence 

the performance of both approaches. In particular, we provide a new 

formulation of the measure of profit, a formal proof of the convergence of 

the uplift curve to the measure of profit, and an illustration, through simulations, 

of the conditions under which predictive approaches still outperform 

uplift modeling. 

Our theoretical and empirical assessments of uplift modeling suggest that it often 

fails to deliver the anticipated advantages over predictive modeling, especially in 

scenarios such as customer churn within the telecom sector, characterized by class 

imbalance, limited separability, and cost-benefit considerations. These results are 

broadly aligned with the practical experience of our industrial partner and with 

the existing scientific literature. Our counterfactual probability estimators allow 

us to characterize customers at a level inaccessible to conventional predictive modeling, 

revealing new insights on the behavior and preferences of customers.},

keywords = {},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

Customer churn is an important concern for large companies, especially in the
telecommunications sector. Customer retention campaigns are often used to mitigate
churn, but targeting the right customers based on their historical profiles
presents an important challenge. Companies usually have recourse to two datadriven
approaches: churn prediction and uplift modeling. In churn prediction,
customers are selected on the basis of their propensity to churn in the near future.
In uplift modeling, only customers who react positively to the campaign
are considered. Uplift modeling is used in various other domains, such as marketing,
healthcare, and finance. Despite the theoretical appeal of uplift modeling, its
added value with respect to conventional machine learning approaches has rarely
been quantified in the literature.
This doctoral thesis is the result of a collaborative research project between
the Machine Learning Group (ULB) and Orange Belgium, funded by Innoviris.
This collaboration offers a unique research opportunity to assess the added value
of causal-oriented strategies to address customer churn in the telecommunication
sector. Following the introduction, we give the necessary background in probability
theory, causality theory, and machine learning, and we describe the state of
the art in uplift modeling and counterfactual identification. Then, we present the
contributions of this thesis:
• An empirical comparison of various predictive and causal models for selecting
customers in churn prevention campaigns. We perform several benchmarks
of different state-of-the-art approaches on real-world datasets and in
live campaigns with our industrial partner, we propose a new approach that
exploits domain knowledge to improve predictions, and we make available
the first public churn dataset for uplift modeling, whose unique characteristics
make it more challenging than the few other public uplift datasets.
• Counterfactual identification allows one to classify the different behaviors
of customers in response to a marketing incentive. This can be used to establish
profiles of customers sensitive to the campaign, and subsequently
improve marketing operations. We derive novel bounds and point estimators
on the probability of counterfactual statements based on uplift models.
• A comprehensive comparison of predictive and uplift modeling, starting
from firm theoretical foundations and highlighting the parameters that influence
the performance of both approaches. In particular, we provide a new
formulation of the measure of profit, a formal proof of the convergence of
the uplift curve to the measure of profit, and an illustration, through simulations,
of the conditions under which predictive approaches still outperform
uplift modeling.
Our theoretical and empirical assessments of uplift modeling suggest that it often
fails to deliver the anticipated advantages over predictive modeling, especially in
scenarios such as customer churn within the telecom sector, characterized by class
imbalance, limited separability, and cost-benefit considerations. These results are
broadly aligned with the practical experience of our industrial partner and with
the existing scientific literature. Our counterfactual probability estimators allow
us to characterize customers at a level inaccessible to conventional predictive modeling,
revealing new insights on the behavior and preferences of customers.

Close

Molinghen, Yannick; Avalos, Raphaël; Achter, Mark Van; Nowé, Ann; Lenaerts, Tom

Laser Learning Environment: A new environment for coordination-critical multi-agent tasks Proceedings Article

In: Oliehoek, Frans F. A.; Manon, Kok; Verwer, Sicco (Ed.): Artificial Intelligence and Machine Learning: Revised Selected Papers, Springer Science and Business Media Deutschland GmbH, 2024, (Conference: Benelux Conference Ai conference, BNAIC(35: 8-10/11/2023: TU Delft)).

Abstract | Links | BibTeX

@inproceedings{info:hdl:2013/370546b,

title = {Laser Learning Environment: A new environment for coordination-critical multi-agent tasks},

author = {Yannick Molinghen and Raphaël Avalos and Mark Van Achter and Ann Nowé and Tom Lenaerts},

editor = {Frans F. A. Oliehoek and Kok Manon and Sicco Verwer},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/370546/4/2404.03596v1.pdf},

year  = {2024},

date = {2024-01-01},

booktitle = {Artificial Intelligence and Machine Learning: Revised Selected Papers},

publisher = {Springer Science and Business Media Deutschland GmbH},

series = {Communications in Computer and Information Science},

abstract = {We introduce the Laser Learning Environment (LLE), a collaborative multi-agent reinforcement learning environment where coordination is key. In LLE, agents depend on each other to make progress (interdependence), must jointly take specific sequences of actions to succeed (perfect coordination), and accomplishing those joint actions does not yield any intermediate reward (zero-incentive dynamics). The challenge of such problems lies in the difficulty of escaping state space bottlenecks caused by interdependence steps since escaping those bottlenecks is not rewarded. We test multiple state-of-the-art value-based MARL algorithms against LLE and show that they consistently fail at the collaborative task because of their inability to escape state space bottlenecks, even though they successfully achieve perfect coordination. We show that Q-learning extensions such as prioritised experience replay and n-steps return hinder exploration in environments with zero-incentive dynamics, and find that intrinsic curiosity with random network distillation is not sufficient to escape those bottlenecks. We demonstrate the need for novel methods to solve this problem and the relevance of LLE as cooperative MARL benchmark.},

note = {Conference: Benelux Conference Ai conference, BNAIC(35: 8-10/11/2023: TU Delft)},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Leung, Chin Wing; Lenaerts, Tom; Turrini, Paolo

To Promote Full Cooperation in Social Dilemmas, Agents Need to Unlearn Loyalty Proceedings Article

In: Larson, Kate (Ed.): Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, pp. 111-119, International Joint Conferences on Artificial Intelligence (IJCAI) Organization, 2024, (Conference: International Joint Conference on Artificial Intelligence(33: 3/8-9/8/2024: Jeju. Korea)).

Abstract | Links | BibTeX

@inproceedings{info:hdl:2013/385907b,

title = {To Promote Full Cooperation in Social Dilemmas, Agents Need to Unlearn Loyalty},

author = {Chin Wing Leung and Tom Lenaerts and Paolo Turrini},

editor = {Kate Larson},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/385907/3/0013.pdf},

year  = {2024},

date = {2024-01-01},

booktitle = {Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence},

pages = {111-119},

publisher = {International Joint Conferences on Artificial Intelligence (IJCAI) Organization},

abstract = {If given the choice, what strategy should agents use to switch partners in strategic social interactions? While many analyses have been performed on specific switching heuristics, showing how and when these lead to more cooperation, no insights have been provided into which rule will actually be learnt by agents when given the freedom to do so. Starting from a baseline model that has demonstrated the potential of rewiring for cooperation, we provide answers to this question over the full spectrum of social dilemmas. Multi-agent Q-learning with Boltzmann exploration is used to learn when to sever or maintain an association. In both the Prisoner's Dilemma and the Stag Hunt games we observe that the Out-for-Tat rewiring rule, breaking ties with other agents choosing socially undesirable actions, becomes dominant, confirming at the same time that cooperation flourishes when rewiring is fast enough relative to imitation. Nonetheless, in the transitory region before full cooperation, a Stay strategy, keeping a connection at all costs, remains present, which shows that loyalty needs to be overcome for full cooperation to emerge. In conclusion, individuals learn cooperation-promoting rewiring rules but need to overcome a kind of loyalty to achieve full cooperation in the full spectrum of social dilemmas.},

note = {Conference: International Joint Conference on Artificial Intelligence(33: 3/8-9/8/2024: Jeju. Korea)},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Abels, Axel; Lenaerts, Tom; Trianni, Vito; Nowé, Ann

Dealing with Expert Bias in Collective Decision-making Miscellaneous

2024, (Conference: European Conference on Artificial Intelligence(27: 19/10-24/10/2024: Santiago de Compostella)).

Links | BibTeX

Gravel, Barbara; Renaux, Alexandre; Papadimitriou, Sofia; Smits, Guillaume; Nowé, Ann; Lenaerts, Tom

Prioritization of variant combinations in whole exomes Miscellaneous

2024, (Conference: European Conference on Computational Biology.(23: 16/09-20/09/2024: Turku, Finland)).

Links | BibTeX

Bosch, Inas; Gravel, Barbara; Lenaerts, Tom

Knowledge graph embeddings for the prediction of pathogenic gene pairs Miscellaneous

2024, (Conference: European Conference on Computational Biology.(23: 16/09-20/09/2024: Turku, Finland)).

Links | BibTeX

Kirchsteiger, Georg; Lenaerts, Tom; Suchon, Remi

Growing cooperation Miscellaneous

2024, (Conference: Conference of the French Experimental Economics Association(14: grenoble, France)).

Links | BibTeX

Terrucha, Ines; Domingos, Elias Fernandez; Suchon, Remi; Santos, Francisco C; Simoens, Pieter; Lenaerts, Tom

Humans program artificial delegates to accurately solve collective-risk dilemmas, but lack precision Miscellaneous

2024, (Conference: Machine+behavior Conference(Berlin, Allemagne)).

Links | BibTeX

Rivière, Quentin; Raskin, Virginie; Melo, Romário; Boutet, Stéphanie; Corso, Massimiliano; Defrance, Matthieu; Webb, Alex A. R.; Verbruggen, Nathalie; Anoman, Djoro Armand

Effects of light regimes on circadian gene co‐expression networks in Arabidopsis thaliana Journal Article

In: Plant Direct, vol. 8, no. 8, 2024, (DOI: 10.1002/pld3.70001).

Abstract | Links | BibTeX

@article{info:hdl:2013/384388,

title = {Effects of light regimes on circadian gene co‐expression networks in Arabidopsis thaliana},

author = {Quentin Rivière and Virginie Raskin and Romário Melo and Stéphanie Boutet and Massimiliano Corso and Matthieu Defrance and Alex A. R. Webb and Nathalie Verbruggen and Djoro Armand Anoman},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/384388/3/Plant-Direct-2024.pdf},

year  = {2024},

date = {2024-01-01},

journal = {Plant Direct},

volume = {8},

number = {8},

abstract = {Abstract Light/dark (LD) cycles are responsible for oscillations in gene expression, which modulate several aspects of plant physiology. Those oscillations can persist under constant conditions due to regulation by the circadian oscillator. The response of the transcriptome to light regimes is dynamic and allows plants to adapt rapidly to changing environmental conditions. We compared the transcriptome of Arabidopsis under LD and constant light (LL) for 3 days and identified different gene co‐expression networks in the two light regimes. Our studies yielded unforeseen insights into circadian regulation. Intuitively, we anticipated that gene clusters regulated by the circadian oscillator would display oscillations under LD cycles. However, we found transcripts encoding components of the flavonoid metabolism pathway that were rhythmic in LL but not in LD. We also discovered that the expressions of many stress‐related genes were significantly increased during the dark period in LD relative to the subjective night in LL, whereas the expression of these genes in the light period was similar. The nocturnal pattern of these stress‐related gene expressions suggested a form of “skotoprotection.” The transcriptomics data were made available in a web application named Cyclath , which we believe will be a useful tool to contribute to a better understanding of the impact of light regimes on plants.},

note = {DOI: 10.1002/pld3.70001},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Terrucha, Ines; Domingos, Elias Fernández; Simoens, Pieter; Lenaerts, Tom

Committing to the wrong artificial delegate in a collective-risk dilemma is better than directly committing mistakes Journal Article

In: Scientific reports, vol. 14, no. 1, 2024, (DOI: 10.1038/s41598-024-61153-9).

Abstract | Links | BibTeX

Kirchsteiger, Georg; Lenaerts, Tom; Suchon, Remi

Voluntary versus mandatory information disclosure in the sequential prisoner’s dilemma Journal Article

In: Economic theory, 2024, (DOI: 10.1007/s00199-024-01563-y).

Abstract | Links | BibTeX

Nachtegael, Charlotte; Stefani, Jacopo De; Cnudde, Anthony; Lenaerts, Tom

DUVEL: an active-learning annotated biomedical corpus for the recognition of oligogenic combinations Journal Article

In: Database, vol. 2024, no. 2024, 2024, (DOI: 10.1093/database/baae039).

Abstract | Links | BibTeX

@article{info:hdl:2013/374632,

title = {DUVEL: an active-learning annotated biomedical corpus for the recognition of oligogenic combinations},

author = {Charlotte Nachtegael and Jacopo De Stefani and Anthony Cnudde and Tom Lenaerts},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/374632/1/doi_358276.pdf},

year  = {2024},

date = {2024-01-01},

journal = {Database},

volume = {2024},

number = {2024},

abstract = {Abstract While biomedical relation extraction (bioRE) datasets have been instrumental in the development of methods to support biocuration of single variants from texts, no datasets are currently available for the extraction of digenic or even oligogenic variant relations, despite the reports in literature that epistatic effects between combinations of variants in different loci (or genes) are important to understand disease etiologies. This work presents the creation of a unique dataset of oligogenic variant combinations, geared to train tools to help in the curation of scientific literature. To overcome the hurdles associated with the number of unlabelled instances and the cost of expertise, active learning (AL) was used to optimize the annotation, thus getting assistance in finding the most informative subset of samples to label. By pre-annotating 85 full-text articles containing the relevant relations from the Oligogenic Diseases Database (OLIDA) with PubTator, text fragments featuring potential digenic variant combinations, i.e. gene–variant–gene–variant, were extracted. The resulting fragments of texts were annotated with ALAMBIC, an AL-based annotation platform. The resulting dataset, called DUVEL, is used to fine-tune four state-of-the-art biomedical language models: BiomedBERT, BiomedBERT-large, BioLinkBERT and BioM-BERT. More than 500 000 text fragments were considered for annotation, finally resulting in a dataset with 8442 fragments, 794 of them being positive instances, covering 95% of the original annotated articles. When applied to gene–variant pair detection, BiomedBERT-large achieves the highest F1 score (0.84) after fine-tuning, demonstrating significant improvement compared to the non-fine-tuned model, underlining the relevance of the DUVEL dataset. This study shows how AL may play an important role in the creation of bioRE dataset relevant for biomedical curation applications. DUVEL provides a unique biomedical corpus focusing on 4-ary relations between two genes and two variants. It is made freely available for research on GitHub and Hugging Face. Database URL: https://huggingface.co/datasets/cnachteg/duvel or https://doi.org/10.57967/hf/1571},

note = {DOI: 10.1093/database/baae039},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Abstract While biomedical relation extraction (bioRE) datasets have been instrumental in the development of methods to support biocuration of single variants from texts, no datasets are currently available for the extraction of digenic or even oligogenic variant relations, despite the reports in literature that epistatic effects between combinations of variants in different loci (or genes) are important to understand disease etiologies. This work presents the creation of a unique dataset of oligogenic variant combinations, geared to train tools to help in the curation of scientific literature. To overcome the hurdles associated with the number of unlabelled instances and the cost of expertise, active learning (AL) was used to optimize the annotation, thus getting assistance in finding the most informative subset of samples to label. By pre-annotating 85 full-text articles containing the relevant relations from the Oligogenic Diseases Database (OLIDA) with PubTator, text fragments featuring potential digenic variant combinations, i.e. gene–variant–gene–variant, were extracted. The resulting fragments of texts were annotated with ALAMBIC, an AL-based annotation platform. The resulting dataset, called DUVEL, is used to fine-tune four state-of-the-art biomedical language models: BiomedBERT, BiomedBERT-large, BioLinkBERT and BioM-BERT. More than 500 000 text fragments were considered for annotation, finally resulting in a dataset with 8442 fragments, 794 of them being positive instances, covering 95% of the original annotated articles. When applied to gene–variant pair detection, BiomedBERT-large achieves the highest F1 score (0.84) after fine-tuning, demonstrating significant improvement compared to the non-fine-tuned model, underlining the relevance of the DUVEL dataset. This study shows how AL may play an important role in the creation of bioRE dataset relevant for biomedical curation applications. DUVEL provides a unique biomedical corpus focusing on 4-ary relations between two genes and two variants. It is made freely available for research on GitHub and Hugging Face. Database URL: https://huggingface.co/datasets/cnachteg/duvel or https://doi.org/10.57967/hf/1571

Close

Lillepea, Kristiina; Juchnewitsch, Anna Grete; Kasak, Laura; Valkna, Anu; Dutta, Avirup; Pomm, Kristjan; Poolamets, Olev; Nagirnaja, Liina; Tamp, Erik; Mahyari, Eisa; Vihljajev, Vladimir; Tjagur, Stanislav; Papadimitriou, Sofia; Riera-Escamilla, Antoni; Versbraegen, Nassim; Farnetani, Ginevra; Castillo-Madeen, Helen; Sütt, Mailis; Kübarsepp, Viljo; Tennisberg, Sven; Korrovits, Paul; Krausz, Csilla; Aston, Kenneth Ivan; Lenaerts, Tom; Conrad, Donald D. F.; Punab, Margus; Laan, Maris

Toward clinical exomes in diagnostics and management of male infertility Journal Article

In: American journal of human genetics, vol. 111, no. 5, pp. 877-895, 2024, (DOI: 10.1016/j.ajhg.2024.03.013).

Abstract | Links | BibTeX

@article{info:hdl:2013/374767,

title = {Toward clinical exomes in diagnostics and management of male infertility},

author = {Kristiina Lillepea and Anna Grete Juchnewitsch and Laura Kasak and Anu Valkna and Avirup Dutta and Kristjan Pomm and Olev Poolamets and Liina Nagirnaja and Erik Tamp and Eisa Mahyari and Vladimir Vihljajev and Stanislav Tjagur and Sofia Papadimitriou and Antoni Riera-Escamilla and Nassim Versbraegen and Ginevra Farnetani and Helen Castillo-Madeen and Mailis Sütt and Viljo Kübarsepp and Sven Tennisberg and Paul Korrovits and Csilla Krausz and Kenneth Ivan Aston and Tom Lenaerts and Donald D. F. Conrad and Margus Punab and Maris Laan},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/374767/3/Lillepeaetal.pdf},

year  = {2024},

date = {2024-01-01},

journal = {American journal of human genetics},

volume = {111},

number = {5},

pages = {877-895},

abstract = {Infertility, affecting ∼10% of men, is predominantly caused by primary spermatogenic failure (SPGF). We screened likely pathogenic and pathogenic (LP/P) variants in 638 candidate genes for male infertility in 521 individuals presenting idiopathic SPGF and 323 normozoospermic men in the ESTAND cohort. Molecular diagnosis was reached for 64 men with SPGF (12%), with findings in 39 genes (6%). The yield did not differ significantly between the subgroups with azoospermia (20/185, 11%), oligozoospermia (18/181, 10%), and primary cryptorchidism with SPGF (26/155, 17%). Notably, 19 of 64 LP/P variants (30%) identified in 28 subjects represented recurrent findings in this study and/or with other male infertility cohorts. NR5A1 was the most frequently affected gene, with seven LP/P variants in six SPGF-affected men and two normozoospermic men. The link to SPGF was validated for recently proposed candidate genes ACTRT1, ASZ1, GLUD2, GREB1L, LEO1, RBM5, ROS1, and TGIF2LY. Heterozygous truncating variants in BNC1, reported in female infertility, emerged as plausible causes of severe oligozoospermia. Data suggested that several infertile men may present congenital conditions with less pronounced or pleiotropic phenotypes affecting the development and function of the reproductive system. Genes regulating the hypothalamic-pituitary-gonadal axis were affected in >30% of subjects with LP/P variants. Six individuals had more than one LP/P variant, including five with two findings from the gene panel. A 4-fold increased prevalence of cancer was observed in men with genetic infertility compared to the general male population (8% vs. 2%; p = 4.4 ?x 10−3). Expanding genetic testing in andrology will contribute to the multidisciplinary management of SPGF.},

note = {DOI: 10.1016/j.ajhg.2024.03.013},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Gravel, Barbara; Renaux, Alexandre; Papadimitriou, Sofia; Smits, Guillaume; Nowe, Ann; Lenaerts, Tom

Prioritization of oligogenic variant combinations in whole exomes Journal Article

In: Bioinformatics, vol. 40, no. 4, 2024, (DOI: 10.1093/bioinformatics/btae184).

Abstract | Links | BibTeX

@article{info:hdl:2013/374647,

title = {Prioritization of oligogenic variant combinations in whole exomes},

author = {Barbara Gravel and Alexandre Renaux and Sofia Papadimitriou and Guillaume Smits and Ann Nowe and Tom Lenaerts},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/374647/1/doi_358291.pdf},

year  = {2024},

date = {2024-01-01},

journal = {Bioinformatics},

volume = {40},

number = {4},

abstract = {Motivation: Whole exome sequencing (WES) has emerged as a powerful tool for genetic research, enabling the collection of a tremendous amount of data about human genetic variation. However, properly identifying which variants are causative of a genetic disease remains an important challenge, often due to the number of variants that need to be screened. Expanding the screening to combinations of variants in two or more genes, as would be required under the oligogenic inheritance model, simply blows this problem out of proportion. Results: We present here the High-throughput oligogenic prioritizer (Hop), a novel prioritization method that uses direct oligogenic information at the variant, gene and gene pair level to detect digenic variant combinations in WES data. This method leverages information from a knowledge graph, together with specialized pathogenicity predictions in order to effectively rank variant combinations based on how likely they are to explain the patient’s phenotype. The performance of Hop is evaluated in cross-validation on 36 120 synthetic exomes for training and 14 280 additional synthetic exomes for independent testing. Whereas the known pathogenic variant combinations are found in the top 20 in approximately 60% of the cross-validation exomes, 71% are found in the same ranking range when considering the independent set. These results provide a significant improvement over alternative approaches that depend simply on a monogenic assessment of pathogenicity, including early attempts for digenic ranking using monogenic pathogenicity scores.},

note = {DOI: 10.1093/bioinformatics/btae184},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Terrucha, Ines; Domingos, Elias Fernández; Santos, Francisco C.; Simoens, Pieter; Lenaerts, Tom

The art of compensation: How hybrid teams solve collective-risk dilemmas Journal Article

In: PloS one, vol. 19, no. 2 February, 2024, (DOI: 10.1371/journal.pone.0297213).

Abstract | Links | BibTeX

@article{info:hdl:2013/371876,

title = {The art of compensation: How hybrid teams solve collective-risk dilemmas},

author = {Ines Terrucha and Elias Fernández Domingos and Francisco C. Santos and Pieter Simoens and Tom Lenaerts},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/371876/1/doi_355520.pdf},

year  = {2024},

date = {2024-01-01},

journal = {PloS one},

volume = {19},

number = {2 February},

abstract = {It is widely known how the human ability to cooperate has influenced the thriving of our species. However, as we move towards a hybrid human-machine future, it is still unclear how the introduction of artificial agents in our social interactions affect this cooperative capacity. In a one-shot collective risk dilemma, where enough members of a group must cooperate in order to avoid a collective disaster, we study the evolutionary dynamics of cooperation in a hybrid population. In our model, we consider a hybrid population composed of both adaptive and fixed behavior agents. The latter serve as proxies for the machine-like behavior of artificially intelligent agents who implement stochastic strategies previously learned offline. We observe that the adaptive individuals adjust their behavior in function of the presence of artificial agents in their groups to compensate their cooperative (or lack of thereof) efforts. We also find that risk plays a determinant role when assessing whether or not we should form hybrid teams to tackle a collective risk dilemma. When the risk of collective disaster is high, cooperation in the adaptive population falls dramatically in the presence of cooperative artificial agents. A story of compensation, rather than cooperation, where adaptive agents have to secure group success when the artificial agents are not cooperative enough, but will rather not cooperate if the others do so. On the contrary, when risk of collective disaster is low, success is highly improved while cooperation levels within the adaptive population remain the same. Artificial agents can improve the collective success of hybrid teams. However, their application requires a true risk assessment of the situation in order to actually benefit the adaptive population (i.e. the humans) in the long-term.},

note = {DOI: 10.1371/journal.pone.0297213},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Lenaerts, Tom; Saponara, Marco; Pacheco, Jorge J. M.; Santos, Francisco C.

Evolution of a theory of mind Journal Article

In: iScience, vol. 27, no. 2, 2024, (DOI: 10.1016/j.isci.2024.108862).

Abstract | Links | BibTeX

Stefanija, Ana Pop; Buelens, Bart; Goesaert, Elfi; Lenaerts, Tom; Pierson, Jean René; Bussche, Jan Van

Toward a Solid Acceptance of the Decentralized Web of Personal Data: Societal and Technological Convergence Journal Article

In: Communications of the ACM, vol. 67, no. 1, pp. 43-46, 2024, (DOI: 10.1145/3624555).

Abstract | Links | BibTeX

Juchnewitsch, Anna Grete; Pomm, Kristjan; Dutta, Avirup; Tamp, Erik; Valkna, Anu; Lillepea, Kristiina; Mahyari, Eisa; Tjagur, Stanislav; Belova, Galina; Kübarsepp, Viljo; Castillo-Madeen, Helen; Riera-Escamilla, Antoni; Põlluaas, Lisanna; Nagirnaja, Liina; Poolamets, Olev; Vihljajev, Vladimir; Sütt, Mailis; Versbraegen, Nassim; Papadimitriou, Sofia; McLachlan, Robert Ian; Jarvi, Keith Allen; Schlegel, Peter P. N.; Tennisberg, Sven; Korrovits, Paul; Vigh-Conrad, Katinka; O’Bryan, Moira M. K.; Aston, Kenneth Ivan; Lenaerts, Tom; Conrad, Donald D. F.; Kasak, Laura; Punab, Margus; Laan, Maris

Undiagnosed RASopathies in infertile men Journal Article

In: Frontiers in endocrinology, vol. 15, 2024, (DOI: 10.3389/fendo.2024.1312357).

Abstract | Links | BibTeX

@article{info:hdl:2013/374860,

title = {Undiagnosed RASopathies in infertile men},

author = {Anna Grete Juchnewitsch and Kristjan Pomm and Avirup Dutta and Erik Tamp and Anu Valkna and Kristiina Lillepea and Eisa Mahyari and Stanislav Tjagur and Galina Belova and Viljo Kübarsepp and Helen Castillo-Madeen and Antoni Riera-Escamilla and Lisanna Põlluaas and Liina Nagirnaja and Olev Poolamets and Vladimir Vihljajev and Mailis Sütt and Nassim Versbraegen and Sofia Papadimitriou and Robert Ian McLachlan and Keith Allen Jarvi and Peter P. N. Schlegel and Sven Tennisberg and Paul Korrovits and Katinka Vigh-Conrad and Moira M. K. O’Bryan and Kenneth Ivan Aston and Tom Lenaerts and Donald D. F. Conrad and Laura Kasak and Margus Punab and Maris Laan},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/374860/1/doi_358504.pdf},

year  = {2024},

date = {2024-01-01},

journal = {Frontiers in endocrinology},

volume = {15},

abstract = {RASopathies are syndromes caused by congenital defects in the Ras/mitogen-activated protein kinase (MAPK) pathway genes, with a population prevalence of 1 in 1,000. Patients are typically identified in childhood based on diverse characteristic features, including cryptorchidism (CR) in >50% of affected men. As CR predisposes to spermatogenic failure (SPGF; total sperm count per ejaculate 0–39 million), we hypothesized that men seeking infertility management include cases with undiagnosed RASopathies. Likely pathogenic or pathogenic (LP/P) variants in 22 RASopathy-linked genes were screened in 521 idiopathic SPGF patients (including 155 CR cases) and 323 normozoospermic controls using exome sequencing. All 844 men were recruited to the ESTonian ANDrology (ESTAND) cohort and underwent identical andrological phenotyping. RASopathy-specific variant interpretation guidelines were used for pathogenicity assessment. LP/P variants were identified in PTPN11 (two), SOS1 (three), SOS2 (one), LZTR1 (one), SPRED1 (one), NF1 (one), and MAP2K1 (one). The findings affected six of 155 cases with CR and SPGF, three of 366 men with SPGF only, and one (of 323) normozoospermic subfertile man. The subgroup “CR and SPGF” had over 13-fold enrichment of findings compared to controls (3.9% vs. 0.3%; Fisher’s exact test},

note = {DOI: 10.3389/fendo.2024.1312357},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Attafi, Omar Abdelghani; Clementel, Damiano; Kyritsis, Konstantinos; Capriotti, Emidio; Farrell, Gavin; Fragkouli, Styliani-Christina; Castro, Leyla Jael; Hatos, András; Lenaerts, Tom; Mazurenko, Stanislav; Mozaffari, Soroush; Pradelli, Franco; Ruch, Patrick; Savojardo, Castrense; Turina, Maria Paola; Zambelli, Federico; Piovesan, Damiano; Monzon, Alexander Miguel; Psomopoulos, Fotis F. E.; Tosatto, Silvio S. C. E.

DOME Registry: implementing community-wide recommendations for reporting supervised machine learning in biology Journal Article

In: GigaScience, vol. 13, pp. 8, 2024, (DOI: 10.1093/gigascience/giae094).

Abstract | Links | BibTeX

@article{info:hdl:2013/385906,

title = {DOME Registry: implementing community-wide recommendations for reporting supervised machine learning in biology},

author = {Omar Abdelghani Attafi and Damiano Clementel and Konstantinos Kyritsis and Emidio Capriotti and Gavin Farrell and Styliani-Christina Fragkouli and Leyla Jael Castro and András Hatos and Tom Lenaerts and Stanislav Mazurenko and Soroush Mozaffari and Franco Pradelli and Patrick Ruch and Castrense Savojardo and Maria Paola Turina and Federico Zambelli and Damiano Piovesan and Alexander Miguel Monzon and Fotis F. E. Psomopoulos and Silvio S. C. E. Tosatto},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/385906/3/giae094-2.pdf},

year  = {2024},

date = {2024-01-01},

journal = {GigaScience},

volume = {13},

pages = {8},

abstract = {Abstract Supervised machine learning (ML) is used extensively in biology and deserves closer scrutiny. The Data Optimization Model Evaluation (DOME) recommendations aim to enhance the validation and reproducibility of ML research by establishing standards for key aspects such as data handling and processing, optimization, evaluation, and model interpretability. The recommendations help to ensure that key details are reported transparently by providing a structured set of questions. Here, we introduce the DOME registry (URL: registry.dome-ml.org), a database that allows scientists to manage and access comprehensive DOME-related information on published ML studies. The registry uses external resources like ORCID, APICURON, and the Data Stewardship Wizard to streamline the annotation process and ensure comprehensive documentation. By assigning unique identifiers and DOME scores to publications, the registry fosters a standardized evaluation of ML methods. Future plans include continuing to grow the registry through community curation, improving the DOME score definition and encouraging publishers to adopt DOME standards, and promoting transparency and reproducibility of ML in the life sciences.},

note = {DOI: 10.1093/gigascience/giae094},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Colot, Martin; Simar, Cédric; Petieau, Mathieu; Alvarez, Ana Maria Cebolla; Chéron, Guy; Bontempi, Gianluca

EMG subspace alignment and visualization for cross-subject hand gesture classification Miscellaneous

2024, (Conference: ECML-PKDD 2023 Worshop – Adapting to change : Reliable Learning Across Domains (2023-09-18: Turin)).

Abstract | Links | BibTeX

Cerqueira, Vitor; Torgo, Luis; Bontempi, Gianluca

Instance-based meta-learning for conditionally dependent univariate multi-step forecasting Journal Article

In: International journal of forecasting, 2024, (DOI: 10.1016/j.ijforecast.2023.12.010).

Abstract | Links | BibTeX

Simar, Cédric; Colot, Martin; Alvarez, Ana Maria Cebolla; Petieau, Mathieu; Chéron, Guy; Bontempi, Gianluca

Machine learning for hand pose classification from phasic and tonic EMG signals during bimanual activities in virtual reality Journal Article

In: Frontiers in Neuroscience, vol. 18, 2024, (DOI: 10.3389/fnins.2024.1329411).

Abstract | Links | BibTeX

@article{info:hdl:2013/373455,

title = {Machine learning for hand pose classification from phasic and tonic EMG signals during bimanual activities in virtual reality},

author = {Cédric Simar and Martin Colot and Ana Maria Cebolla Alvarez and Mathieu Petieau and Guy Chéron and Gianluca Bontempi},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/373455/1/doi_357099.pdf},

year  = {2024},

date = {2024-01-01},

journal = {Frontiers in Neuroscience},

volume = {18},

abstract = {Myoelectric prostheses have recently shown significant promise for restoring hand function in individuals with upper limb loss or deficiencies, driven by advances in machine learning and increasingly accessible bioelectrical signal acquisition devices. Here, we first introduce and validate a novel experimental paradigm using a virtual reality headset equipped with hand-tracking capabilities to facilitate the recordings of synchronized EMG signals and hand pose estimation. Using both the phasic and tonic EMG components of data acquired through the proposed paradigm, we compare hand gesture classification pipelines based on standard signal processing features, convolutional neural networks, and covariance matrices with Riemannian geometry computed from raw or xDAWN-filtered EMG signals. We demonstrate the performance of the latter for gesture classification using EMG signals. We further hypothesize that introducing physiological knowledge in machine learning models will enhance their performances, leading to better myoelectric prosthesis control. We demonstrate the potential of this approach by using the neurophysiological integration of the “move command" to better separate the phasic and tonic components of the EMG signals, significantly improving the performance of sustained posture recognition. These results pave the way for the development of new cutting-edge machine learning techniques, likely refined by neurophysiology, that will further improve the decoding of real-time natural gestures and, ultimately, the control of myoelectric prostheses.},

note = {DOI: 10.3389/fnins.2024.1329411},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Paldino, Gian Marco; Lebichot, Bertrand; Borgne, Yann-Aël Le; Siblini, Wissam; Oblé, Frédéric; Boracchi, Giacomo; Bontempi, Gianluca

The role of diversity and ensemble learning in credit card fraud detection Journal Article

In: Advances in Data Analysis and Classification, vol. 18, no. 1, pp. 193-217, 2024, (DOI: 10.1007/s11634-022-00515-5).

Abstract | Links | BibTeX

Lebichot, Bertrand; Siblini, Wissam; Paldino, Gian Marco; Borgne, Yann-Aël Le; Oblé, Frédéric; Bontempi, Gianluca

Assessment of catastrophic forgetting in continual credit card fraud detection Journal Article

In: Expert systems with applications, vol. 249, 2024, (DOI: 10.1016/j.eswa.2024.123445).

Abstract | Links | BibTeX

Jansen, Maarten; Claeskens, G.

Nonparametric estimation Book Chapter

In: 2024, (Language of publication: fr).

Links | BibTeX

Jansen, Maarten; Claeskens, Gerda

The Cramér-Rao Lower Bound Book Chapter

In: 2024, (Language of publication: fr).

Links | BibTeX

Bhattacharya, Shreya; Lefèvre, Laure; Chatzistergos, T; Hayakawa, Hisashi; Jansen, Maarten

RudolfWolf to AlfredWolfer: The Transfer of the Reference Observer in the International Sunspot Number Series (1876–1893) Journal Article

In: Solar physics, vol. 299, 2024, (Language of publication: fr).

Links | BibTeX

Jansen, Maarten

Information criteria for structured parameter selection in high dimensional tree and graph models Journal Article

In: Digital signal processing, vol. 148, 2024, (Language of publication: fr).

Links | BibTeX

Tubella, Andrea Aler; Mollo, Dimitri Coelho; Lindström, Adam Dahlgren; Devinney, Hannah; Dignum, Virginia; Ericson, Petter; Jonsson, Ana; Kampik, Timotheus; Lenaerts, Tom; Mendez, Julian Alfredo; Nieves, Juan Carlos

ACROCPoLis: A Descriptive Framework for Making Sense of Fairness Proceedings Article

In: Proceedings of the 6th ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, pp. 1014-1025, Association for Computing Machinery, 2023, (Conference: 6th ACM Conference on Fairness, Accountability, and Transparency(6: 12/6/2023-15/06/2023: Chicago)).

Abstract | Links | BibTeX

Abels, Axel; Lenaerts, Tom; Trianni, Vito; Nowe, Ann

Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making Proceedings Article

In: Proceedings of the 40th International Conference on Machine Learning: ICML’23, pp. 79-90, PMLR, 2023, (Conference: 40th International Conference on Machine Learning(Honolulu Hawaii USA)).

Abstract | Links | BibTeX

Versbraegen, Nassim; Gravel, Barbara; Nachtegael, Charlotte; Renaux, Alexandre; Verkinderen, Emma; Nowé, Ann; Lenaerts, Tom; Papadimitriou, Sofia

Faster and more accurate pathogenic combination predictions with VarCoPP2.0 Miscellaneous

2023, (Conference: Benelux AI Conference (BNAIC) / Benelux Machine Learning Conference (Benelearn)(8-10/11/2023: Delft, les Pays-Bas)).

Links | BibTeX

Cheron, Julian; Beccari, Leonardo; Hague, Perrine; Icick, Romain; Despontin, Chloé; Carusone, Teresa; Defrance, Matthieu; Bhogaraju, Sagar; Martin-Garcia, Elena; Capellan, Roberto; Maldonado, Rafael; Vorspan, Florence; Bonnefont, Jérôme; d’Exaerde, Alban

USP7/Maged1-mediated H2A monoubiquitination in the paraventricular thalamus: an epigenetic mechanism involved in cocaine use disorder Journal Article

In: Nature communications, vol. 14, no. 1, 2023, (DOI: 10.1038/s41467-023-44120-2).

Abstract | Links | BibTeX

Piron, Anthony; Szymczak, Florian; Papadopoulou, Theodora; Alvelos, Maria Inês; Defrance, Matthieu; Lenaerts, Tom; Eizirik, Décio L; Cnop, Miriam

RedRibbon: A new rank–rank hypergeometric overlap for gene and transcript expression signatures Journal Article

In: Life science alliance, vol. 7, no. 2, pp. e202302203, 2023, (DOI: 10.26508/lsa.202302203).

Abstract | Links | BibTeX

Terrucha, Ines; Domingos, Elias Fernandez; Simoens, Pieter; Lenaerts, Tom

To avoid collective disasters, it is better to commit to a flawed AI than to commit the errors ourselves Miscellaneous

2023, (Conference: Evolutionary Dynamics in social, cooperative and hybrid AI workshop(Cracovie)).

Links | BibTeX

Giuili, Edoardo; Grolaux, Robin; Macedo, Catarina Z N M CZNM; Desmyter, Laurence; Pichon, Bruno; Neuens, Sebastian; Vilain, Catheline; Olsen, Catharina; Dooren, Sonia Van; Smits, Guillaume; Defrance, Matthieu

Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs). Journal Article

In: Human genetics, 2023, (DOI: 10.1007/s00439-023-02609-2).

Abstract | Links | BibTeX

@article{info:hdl:2013/364749,

title = {Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs).},

author = {Edoardo Giuili and Robin Grolaux and Catarina Z N M CZNM Macedo and Laurence Desmyter and Bruno Pichon and Sebastian Neuens and Catheline Vilain and Catharina Olsen and Sonia Van Dooren and Guillaume Smits and Matthieu Defrance},

url = {http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/364749},

year  = {2023},

date = {2023-01-01},

journal = {Human genetics},

abstract = {Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models' predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data.},

note = {DOI: 10.1007/s00439-023-02609-2},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models’ predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data.

Close

Terrucha, Ines; Domingos, Elias Fernandez; Suchon, Remi; Santos, Francisco C; Simoens, Pieter; Lenaerts, Tom

Delegation to autonomous agents : a key to overcome past failure and focus on the collective target ahead Miscellaneous

2023, (Conference: the 9th International Conference on Computational Social Science (IC2S2)(17-19/07/2023: Copenhagen)).

Links | BibTeX

Araujo, Natalia Souza; Perez, Rémy; Willot, Quentin; Defrance, Matthieu; Aron, Serge

Facing lethal temperatures: Heat-shock response in desert and temperate ants. Journal Article

In: Ecology and evolution, vol. 13, no. 9, pp. e10438, 2023, (DOI: 10.1002/ece3.10438).

Abstract | Links | BibTeX

@article{info:hdl:2013/366010,

title = {Facing lethal temperatures: Heat-shock response in desert and temperate ants.},

author = {Natalia Souza Araujo and Rémy Perez and Quentin Willot and Matthieu Defrance and Serge Aron},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/366010/3/Araujo-2023.pdf},

year  = {2023},

date = {2023-01-01},

journal = {Ecology and evolution},

volume = {13},

number = {9},

pages = {e10438},

abstract = {Global climate changes may cause profound effects on species adaptation, particularly in ectotherms for whom even moderate warmer temperatures can lead to disproportionate heat failure. Still, several organisms evolved to endure high desert temperatures. Here, we describe the thermal tolerance survival and the transcriptomic heat stress response of three genera of desert (Cataglyphis, Melophorus, and Ocymyrmex) and two of temperate ants (Formica and Myrmica) and explore convergent and specific adaptations. We found heat stress led to either a reactive or a constitutive response in desert ants: Cataglyphis holgerseni and Melophorus bagoti differentially regulated very few transcripts in response to heat (0.12% and 0.14%, respectively), while Cataglyphis bombycina and Ocymyrmex robustior responded with greater expression alterations (respectively affecting 0.6% and 1.53% of their transcriptomes). These two responsive mechanisms-reactive and constitutive-were related to individual thermal tolerance survival and convergently evolved in distinct desert ant genera. Moreover, in comparison with desert species, the two temperate ants differentially expressed thousands of transcripts more in response to heat stress (affecting 8% and 12.71% of F. fusca and Myr. sabuleti transcriptomes). In summary, we show that heat adaptation in thermophilic ants involved changes in the expression response. Overall, desert ants show reduced transcriptional alterations even when under high thermal stress, and their expression response may be either constitutive or reactive to temperature increase.},

note = {DOI: 10.1002/ece3.10438},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Hardy, Alexis; Duharcourt, Sandra; Defrance, Matthieu

DNA Modification Patterns Filtering and Analysis Using DNAModAnnot. Journal Article

In: Methods in molecular biology, vol. 2624, pp. 87-114, 2023, (DOI: 10.1007/978-1-0716-2962-8_7).

Abstract | Links | BibTeX

Nachtegael, Charlotte; Stefani, Jacopo De; Lenaerts, Tom

ALAMBIC : Active Learning Automation Methods to Battle Inefficient Curation Miscellaneous

2023, (Conference: European Chapter of the Association for Computational Linguistics: System Demonstrations (17: Dubrovnik)).

Abstract | Links | BibTeX

Gravel, Barbara; Papadimitriou, Sofia; Nachtegael, Charlotte; Baere, Elfride De; Loeys, Bart; Vikkula, Miikka; Smits, Guillaume; Lenaerts, Tom

The importance of good data quality and proper pathogenicity reporting in the medical genetics field: the case of oligogenic diseases Miscellaneous

2023, (Conference: Genomics of Rare Disease(17: 24-26/04/2023: Wellcome Genome Campus, UK)).

Links | BibTeX

Bosch, Inas; Renaux, Alexandre; Gravel, Barbara; Lenaerts, Tom

Knowledge graph embeddings for the prediction of pathogenic gene pairs Miscellaneous

2023, (Conference: Benelux Ai Conference (BNAIC)(8-10/11/2023: Delft, les Pays-Bas)).

Links | BibTeX

Abels, Axel; Lenaerts, Tom; Nowé, Ann

Mitigating Biases and Reward Uncertainty in Collective Decision-Making Miscellaneous

2023, (Conference: 7th Annual Center for Human-Compatible AI Workshop(7: 16-18/06/2023: Pacific Grove, California, USA)).

Links | BibTeX

Nachtegael, Charlotte; Stefani, Jacopo De; Lenaerts, Tom

A study of deep active learning methods to reduce labelling efforts in biomedical relation extraction Journal Article

In: PloS one, vol. 18, no. 12, pp. e0292356, 2023, (DOI: 10.1371/journal.pone.0292356).

Abstract | Links | BibTeX

@article{info:hdl:2013/366625,

title = {A study of deep active learning methods to reduce labelling efforts in biomedical relation extraction},

author = {Charlotte Nachtegael and Jacopo De Stefani and Tom Lenaerts},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/366625/3/journal.pone.0292356.pdf},

year  = {2023},

date = {2023-01-01},

journal = {PloS one},

volume = {18},

number = {12},

pages = {e0292356},

abstract = {Automatic biomedical relation extraction (bioRE) is an essential task in biomedical research in order to generate high-quality labelled data that can be used for the development of innovative predictive methods. However, building such fully labelled, high quality bioRE data sets of adequate size for the training of state-of-the-art relation extraction models is hindered by an annotation bottleneck due to limitations on time and expertise of researchers and curators. We show here how Active Learning (AL) plays an important role in resolving this issue and positively improve bioRE tasks, effectively overcoming the labelling limits inherent to a data set. Six different AL strategies are benchmarked on seven bioRE data sets, using PubMedBERT as the base model, evaluating their area under the learning curve (AULC) as well as intermediate results measurements. The results demonstrate that uncertainty-based strategies, such as Least-Confident or Margin Sampling, are statistically performing better in terms of F1-score, accuracy and precision, than other types of AL strategies. However, in terms of recall, a diversity-based strategy, called Core-set, outperforms all strategies. AL strategies are shown to reduce the annotation need (in order to reach a performance at par with training on all data), from 6% to 38%, depending on the data set; with Margin Sampling and Least-Confident Sampling strategies moreover obtaining the best AULCs compared to the Random Sampling baseline. We show through the experiments the importance of using AL methods to reduce the amount of labelling needed to construct high-quality data sets leading to optimal performance of deep learning models. The code and data sets to reproduce all the results presented in the article are available at https://github.com/oligogenic/Deep_active_learning_bioRE .},

note = {DOI: 10.1371/journal.pone.0292356},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Renaux, Alexandre; Terwagne, Chloé CT; Cochez, Michael; Tiddi, Ilaria; Nowe, Ann; Lenaerts, Tom

A knowledge graph approach to predict and interpret disease-causing gene interactions Journal Article

In: BMC bioinformatics, vol. 24, no. 1, 2023, (DOI: 10.1186/s12859-023-05451-5).

Abstract | Links | BibTeX

@article{info:hdl:2013/363454,

title = {A knowledge graph approach to predict and interpret disease-causing gene interactions},

author = {Alexandre Renaux and Chloé CT Terwagne and Michael Cochez and Ilaria Tiddi and Ann Nowe and Tom Lenaerts},

url = {https://dipot.ulb.ac.be/dspace/bitstream/2013/363454/1/doi_347098.pdf},

year  = {2023},

date = {2023-01-01},

journal = {BMC bioinformatics},

volume = {24},

number = {1},

abstract = {Background: Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results. Results: We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction. Conclusion: Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.},

note = {DOI: 10.1186/s12859-023-05451-5},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Background: Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results. Results: We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction. Conclusion: Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.

Close

Abels, Axel; Lenaerts, Tom; Trianni, Vito; Nowé, Ann

Dealing with expert bias in collective decision-making Journal Article

In: Artificial intelligence, vol. 320, pp. 103921, 2023, (DOI: 10.1016/j.artint.2023.103921).

Abstract | Links | BibTeX

Domingos, Elias Fernandez; Santos, Francisco C; Lenaerts, Tom

EGTtools: Evolutionary game dynamics in Python Journal Article

In: iScience, vol. 26, no. 4, pp. 106419, 2023, (DOI: 10.1016/j.isci.2023.106419).

Abstract | Links | BibTeX

Jacquemin, Valérie; Versbraegen, Nassim; Duerinckx, Sarah; Massart, Annick; Soblet, Julie; Perazzolo, Camille; Deconinck, Nicolas; Brischoux-Boucher, Elise; Leener, Anne De; Revencu, Nicole; Janssens, Sandra; Moorgat, Stèphanie; Blaumeiser, Bettina; Avela, Kristiina; Touraine, Renaud; Jaoude, Imad Abou; Keymolen, Kathelijn; Saugier-Veber, P.; Lenaerts, Tom; Abramowicz, Marc; Pirson, Isabelle

Congenital hydrocephalus: new Mendelian mutations and evidence for oligogenic inheritance. Journal Article

In: Human genomics, vol. 17, no. 1, pp. 16, 2023, (DOI: 10.1186/s40246-023-00464-w).

Abstract | Links | BibTeX