One of our aims is to create or improve methodology to analyze cancer molecular data. This includes methods to work with high-throughput sequencing datasets (WGS, WES, amplicon seq, ChIP-seq, RNA-seq) and approaches to work with motifs in DNA sequences (motif discovery for TFBSs, analysis of the repetitive elements).

Software developed by the members of the lab

Data integration and foundation models

  • CancerFoundation - a single-cell transcriptomics cancer foundation model

    Barkmann, et al. Proceedings of NeurIPS workshop "New Frontiers of AI for Drug Discovery and Development", 2024 Link to the paper.

  • scTree - discovering cellular hierarchies in the presence of batch effects in scRNA-seq data

    M. Vandenhirtz, et al. Proceedings of ICML 2024 Workshop on structured probabilistic inference and generative modeling (SPIGM) & the ICML 2024 Workshop on Accessible and Efficient Foundation Models for Biological Discovery (AccMLBio), 2024 Link to the paper.


Spatial data analysis

  • DeepCMorph - histopathological image classification with cell morphology aware deep neural networks

    A. Ignatov, et al. Proceedings of The 9th IEEE Workshop on Computer Vision for Microscopy Image Analysis (CVMI) of CVPR 2024, 2024. Link to the paper.


Survival analysis for cancer patients

  • Sparsesurv - a Python package for fitting sparse survival models via knowledge distillation

    D. Wissel, et al. Bioinformatics, 2024, 40(9):btae521. Link to the paper.

  • SurvBoard - leaderboard to compare and validate multi-omics survival methods across 28 datasets

    D. Wissel, et al. BioRxiv Link to the preprint.


Analysis of ChIP-seq data

  • CHIPIN - normalization of the CHIP-seq signal without spike-in data, when matched RNA-seq data are available

    L. Polit, et al. BMC Bioinformatics, 2021, 22: 407. Link to the paper

  • LILY - detection of super-enhancers in cancer samples

    V. Boeva, et al. Nature Genetics, 2017, 49(9):1408-1413. PMID: 28740262

  • HMCan - detection of chromatin modifications in ChIP-seq data (specifically in cancer genomes)

    Ashoor et al., Bioinformatics, 2013, 29 (23): 2979-2986. PMID: 24021381.

  • HMCan-diff - detection of differential chromatin modifications in ChIP-seq data (by applying a correction to copy number alterations, HMCan-diff allows comparison of different cancer cell lines or cancer cells vs normal cells)

    Ashoor et al., Nucleic Acids Research2017, 45(8):e58. PMID: 28053124.

  • MICSA - detection of transcription factor binding sites in ChIP-seq data using information about de novo identified binding motifs

    Boeva et al., Nucleic Acids Research, 2010, 38(11):e126. PMID: 20375099.

  • Nebula - a web-server for advanced ChIP-seq data analysis

    Boeva et al., Bioinformatics, 2012, 28(19):2517-9. PMID: 22829625.


Analysis of DNA sequencing data (WGS, WES, ultra-deep targeted sequencing data)

  • FREEC & Control-FREEC - detection of copy number alterations (specifically in cancer genomes) using whole genome or whole exome sequencing data

    Boeva, at al., Bioinformatics, 2012, 28(3):423-5. PMID: 22155870.
    Boeva, et al., Bioinformatics, 2011, 27(2):268-9. PMID: 21081509.

  • QuantumClone - clonal reconstruction method for whole genome or whole exome sequencing data

    Deveau et al., Bioinformatics, 2018, 34(11): 1808-1816. Link to the paper

  • SV-Bay - structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability

    Iakovishina et al., Bioinformatics. 2016. 32(7):984-92. PMID: 26740523.

  • ONCOCNV - detection of copy number changes in high-depth/amplicon sequencing data

    Boeva et al., Bioinformatics, 2014, 30(24):3443-3450. PMID: 25016581.

  • SVDetect - detection of genomic structural variations from paired-end and mate-pair sequencing data (in collaboration with B. Zeitouni)

    Zeitouni, et al., Bioinformatics, 2010, 26(15):1895-6. PMID: 20639544.


Sequence analysis

  • ChIPmunk - de novo motif discovery (in collaboration with I. Kulakovskiy)

    Kulakovskiy, et al., Bioinformatics, 2010, 26(20):2622-3. PMID: 20736340.

  • AhoPro - evaluation of over-representation of one or more given motifs in DNA sequences

    Boeva et al., Algorithms for Molecular Biology, 2007, 2:13. PMID: 17927813.

  • TandemSwan - detection of fuzzy tandem repeats in DNA sequences

    Iakovishina et al., Bioinformatics. 2016. pii: btv751. PMID: 26740523.


Data visualization

  • Feature Clock - visualization of high-dimensional effects in two-dimensional plots like UMAP and t-SNE

    Ovcharenko, et al., Proceedings of IEEE VIS 2024, 2024. Link to the paper

  • SegAnnDB - interactive genomic data segmentation framework (in collaboration with T.D. Hocking)

    Hocking, et al., Bioinformatics, 2014, 30(11):1539-1546. PMID: 24493034.


Simulation and mapping accuracy assessment of sequencing reads