cn.FARMS

cn.FARMS is a latent variable model for detecting copy number variations in microarray data. Previous CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with the disease, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, cn.FARMS, which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. Our FARMS-algorithm for summarizing gene expression array data can be found here.

Please cite:

Djork-Arné Clevert, Andreas Mitterecker, Andreas Mayr, Marianne Tuefferd, An De Bondt, Willem Talloen, Hinrich W.H. Göhlmann, and Sepp Hochreiter . cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate, Nucleic Acids Research 2011, doi:10.1093/nar/gkr197

Install the R-package directly from bioconductor:

source("http://www.bioconductor.org/biocLite.R")
biocLite("cn.farms")

Paper, supplement and manual:

Data used in our experiments:

Publication: D. F. Conrad, D. Pinto, R. Redon, L. Feuk, O. Gokcumen, Y. Zhang , et al. Origins and functional impact of copy number variation in the human genome (2010), Nature, 464(7289),704-712

Additional annotation files: