Control-FREE Copy Number and Genotype Caller Prediction of copy numbers and allelic content using deep-sequencing data |
Introduction
Control-FREEC is a tool for detection of copy-number changes and allelic imbalances (including LOH) using deep-sequencing data
originally developed by the
Bioinformatics Laboratory of Institut Curie (Paris). Nowdays, Control-FREEC is supported by the team of
Valentina Boeva at Institut Cochin, Inserm(Paris).
Control-FREEC automatically computes, normalizes, segments copy number and beta allele frequency (BAF) profiles, then calls copy number alterations and LOH.
The control (matched normal) sample is optional for whole genome sequencing data but mandatory for whole exome or targeted sequencing data.
For whole genome sequencing data analysis, the program can also use mappability data (files created by
GEM).
Starting from version v8.0, we provide a possibility to detect subclonal gains and losses and evaluate the likeliest average ploidy of the sample. Also, the procedure for evaluation of tumor purity has been improved.
Input for CNA detection: aligned single-end, paired-end or mate-pair data in SAM, BAM, SAMtools pileup.
Control-FREEC accepts .GZ files. Support of Eland, BED, SOAP, arachne, psl (BLAT) and Bowtie formats has been discontinued starting from version v8.0.
Input for CNA+LOH detection: There are two options: (a) provide aligned reads in SAMtools pileup format. Files can be GZipped; (b) provide BAM files together with options "makePileup" and "fastaFile" (see How to create a config file?)
Output: Regions of gain, loss and LOH, normalized copy number and BAF profiles.
Starting from Control-FREEC v5.0, the program can be used on exome-sequencing data. Starting from version v8.0, read counts are calculated by exon and not per window (set "window=0").
Starting from Control-FREEC v6.0, the user can use multiple threads to run Control-FREEC. 30x coverage WGS data with a control (i.e., two pileup.gz files) will be fully processed (CNA and LOH info) in one hour using 6 threads.
Control-FREEC publications
- Control-FREEC: a tool for assessing copy number and allelic content using next generation sequencing data. V. Boeva, T. Popova, K. Bleakley, P. Chiche, I. Janoueix-Lerosey, O. Delattre and E. Barillot.
Bioinformatics, 2012, 28(3):423-5. PMID: 22155870.
CNA detection part of Control-FREEC (simply FREEC)
- Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization.
V. Boeva, A. Zinovyev, K. Bleakley, J.-P. Vert, I. Janoueix-Lerosey, O. Delattre and E. Barillot. Bioinformatics, 2011, 27(2):268-9.
PMID: 21081509.
LOH detection part of Control-FREEC
Downloads
Starting from Control-FREEC v5.7 Windows is no longer supported. However, you can still download Control-FREEC v5.6 for Windows 32-bit (archive with a binary version (Win32)) or contact me for support.Download the latest release of Control-FREEC for Linux from its GitHub page:
- Linux 64-bit: Download and unpack the archive (Linux 64bit). Contains a binary version of FREEC.
Download a test datasets:
- Data for HCC1143 and HCC1143-BL (from Chiang et al., 2009) to test CNA predictions: test.zip (143M)
- Dataset (cancer, unpublished) to test LOH predictions: testChr19.zip (1334 M)
Download mappability tracks if you want to include mappability information:
- hg38, up to 2 mismatches, read length 100bp
- hg19, up to 2 mismatches, read length 35-76bp
- hg19, up to 2 mismatches, read length 100bp
- hg17, up to 2 mismatches
- hg18, up to 2 mismatches
- mm9, up to 2 mismatches, mm9
- mm10, up to 4 mismatches, read length 100bp (created by Kim Wong)
Do not forget to extract files from the archive! You can also generate a mappability track for other genomes using GEM.
Download files with SNPs (only if you have high coverage data and you want to detect allelic status; then, you must transform read files into pileup format)
- hg19_snp131.SingleDiNucl.1based.txt
- hg19_snp137.SingleDiNucl.1based.txt.gz (created by Niklas Malmqvist) Unzip it before the use!!
- hg19_snp142.SingleDiNucl.1based.txt.gz Unzip it before the use!!
- hg18_snp130.SNP.1based.txt
- mm10_snp137 (created by Kim Wong)
Starting from Control-FREEC v9.3, .txt.gz, .vcf and .vcf.gz files are also accepted! For the .txt files with SNPs, please refer to FREEC FAQ Q19 to understand how these files are generated.
Links to Documentation
People who contributed to the Control-FREEC idea and code:
- Valentina Boeva
- Andrei Zinovyev
- Tatiana Popova
- Carino Gurjao
- Kevin Bleakley
- Pierre Chiche
- Joern Toedling
- Jean-Philippe Vert
- Isabelle Janoueix
- Emmanuel Barillot
- Olivier Delattre
Contacts
I will be pleased to address any question or concern about the Control-FREEC software:
- Mail to Valentina Boeva
- If you put in CC Carino Gurjao you are likely to get a quicker answer.
IMPORTANT: In case of a Control-FREEC error, please share your config file and the output of the program into the command line (log file).