Recent rapid advancements in the field of machine learning, coupled with the development of novel tools for molecular profiling of cancer tissues provide new opportunities to better understand the mechanisms of tumorigenesis. In particular, the progress is achieved via learning from so-called "omics" data (genomic, transcriptomic, proteomic, metabolic, etc.) to extract biomarkers of survival and response to treatment. We see various techniques, both classical statistical methods, as well as more advanced deep learning tools, being increasingly effective in extracting these biomarkers and thus improving clinical decisions. Our group centers its research on multi-omics data integration and develops methods for omics data analysis with a specific focus on biological questions that have the potential to influence the current practices to treat cancer. The success of our mission is based on a deep understanding of biological data, data generation procedures, and underlying molecular processes in cancer cells.

Aims and Objectives
Being driven by biological questions, we aim to understand the heterogeneity and plasticity of cancer cells, potentially leading to treatment resistance and cancer relapse. In particular, we seek to discover how cancer cells hijack mechanisms for transcriptional control and thus gain oncogenic and treatment resistance properties encoded in the human genome. The ability to correctly formalize these biological questions in computational and mathematical terms and, together with biologists and clinicians, design experiments and trials is paramount to achieving our goal.

1. Methods for modeling effects of genetic changes in cancer. Many studies, including ours, highlighted the effects of coding mutations on cancer cell phenotypes. However, the contribution of noncoding mutations constituting more than 98% of somatic mutations in cancer DNA is yet much less studied. We now developed a method to predict 3D interactions between active DNA elements based on their nucleotide sequence [1]. In this work, we take into account epigenetic context to make cell type-specific predictions relevant to studying the effects of mutations in a particular cancer type or tissue.



2. Methods for molecular signal deconvolution. "Omics" data are generally obtained from a mixed population of malignant and non-malignant cells constituting tumors. In addition, malignant cells of different patients (and often even within the same tumor) present a certain degree of signal variation (this is what we refer to as heterogeneity). Signal deconvolution, allowing us to extract molecular properties specific to cancer cells, is one of the major challenges when one works with bulk tumor data (production of bulk data is labor- and cost-efficient compared to single-cell data and can be implemented for cancer patients in clinics). In our group, we have successfully developed methods to detect molecular (omics) characteristics of cancer cells from tumor samples largely infiltrated with non-malignant cells. e.g., cells from blood vessels and immune and stromal cells found in human tumors [2, 3, 4]. Taking a step further and analyzing the cancer-specific deconvolved signals then allowed us to get insights into certain oncogenic mechanisms, for instance, to address causes and consequences of DNA hypermethylation in 19 human cancer types [4].
Currently, our group is developing methods for the signal deconvolution of bulk genomic data. The first project aims to build a time- and memory-efficient solution for accurately estimating absolute gene copy numbers and genotypes from whole genome or whole exome DNA sequencing experiments of bulk human tumors. Our second ongoing project in the direction of signal deconvolution aims at characterizing shared intratumor transcriptional heterogeneity from bulk RNA data without any reference profiles used as additional input (project supported by an SNF project grant).



3. Methods for data integration and survival models for prediction of clinical outcome. Due to high data generation costs, many cancer research projects often use one or two modalities of the data (genomic, epigenetic, transcriptomic, or imaging). Yet, the multi-layer integration of omics data has the potential to provide a bigger picture of the molecular processes driving cancer progression. Therefore, we develop a methodology to build interpretable survival and treatment response models based on multi-level omics data (project partially supported by the SNF Sinergia'2022). We have been exploring multi-task learning and group/network regularization [5], and we have proposed the knowledge distillation approach in the context of survival analysis [6]. Importantly, another mission of our team is to provide general standards for single and multi-omics survival analysis models [7-8].



To sum up, we lead truly interdisciplinary projects at the forefront of computational cancer research. While choosing state-of-the-art computational algorithms and developing new methods, we pay great attention to the underlying biological question and go into the very details of modeled molecular processes. We tackle the question of causes of heterogeneity and plasticity of malignant cells, linking this phenomenon to genetic events, spatial composition of tumor microenvironment, and downstream consequences on patient survival and treatment response. The long-term goal of our research group is to provide functional insights into cancer development and progression that could be translated into predictive models that can be then used for the treatment of patients based on their molecular profiles. Currently, we are interested in understanding the oncogenic processes related to several cancer types: esophageal adenocarcinoma, mesothelioma, lung cancer, neuroblastoma, adrenocortical carcinoma, Ewing sarcoma, and lymphoma. But we are open to collaborations with research groups studying other types of cancer.




References

  1. UniversalEPI: harnessing attention mechanisms to decode chromatin interactions in rare and unexplored cell types. A. Grover, L. Zhang, T. Muser, S. Haefliger, M. Wang, F.J. Theis, I.L. Ibarra, E. Krymova, V. Boeva, doi: https://doi.org/10.1101/2024.11.22.624813, BioRxiv, [Link to the paper]
  2. QuantumClone: Clonal assessment of functional mutations in cancer based on a genotype-aware method for clonal reconstruction. P. Deveau, L. Colmet Daage, D. Oldridge, V. Bernard, A. Bellini, M. Chicard, N. Clement, E. Lapouble, V. Combaret, A. Boland, V. Meyer, J.-F. Deleuze, I. Janoueix-Lerosey, E. Barillot, O. Delattre, J. Maris, G. Schleiermacher, and V. Boeva. Bioinformatics. 2018 Jan 12. doi: 10.1093/bioinformatics/bty016. [Epub ahead of print]. PMID: 29342233 Link to the paper
  3. SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability. D. Iakovishina, I. Janoueix-Lerosey, E. Barillot, M. Regnier and V. Boeva. Bioinformatics. 2016. 32 (7): 984-992. PMID: 26740523 Link to the paper
  4. Deciphering the etiology and role in oncogenic transformation of the CpG island methylator phenotype (CIMP): a pan-cancer analysis J. Yates and V. Boeva. Briefings in Bioinformatics, 2022. 23(2):bbab610. Link to the paper
  5. Exploring pathway-based group lasso for cancer survival analysis: a special case of multi-task learning G. Malenova, D. Rowson and V. Boeva. Frontiers in Genetics. 2021. 12:771301. doi: 10.3389/fgene.2021.771301 [Link to the paper]
  6. Sparsesurv: a Python package for fitting sparse survival models via knowledge distillation. D. Wissel, N. Janakarajan, J. Schulte, D. Rowson, X. Yuan, V. Boeva. Bioinformatics, 2024, 40(9):btae521. Link to the paper
  7. Systematic comparison of multi-omics survival models reveals a widespread lack of noise resistance. D. Wissel, D. Rowson, V. Boeva. Cell Reports Methods, 2023. DOI:https://doi.org/10.1016/j.crmeth.2023.100461.Link to the paper
  8. SurvBoard: standardised benchmarking for multi-omics cancer survival models. D. Wissel, N. Janakarajan, A. Grover, E. Toniato, M. Rodriguez-Martinez and V. Boeva. BioRxiv. Link to the paper, Link to the leaderboard

Complete list of publications: Link

Complete list of developed software: Link

We offer several internship projects to bachelor and masters students. Please check this page for more information!