학술논문

jahmm: A tool for discretizing multiple ChIP seq profiles
Document Type
Working Paper
Source
Subject
Statistics - Applications
Language
Abstract
Chromatin immunoprecipitation and high throughput sequencing (ChIP-seq) is the de facto standard method to map chromatin features on genomes. The output of ChIP-seq is quantitative within a single genome-wide profile, but there is no natural way to compare experiments, which is why the data is often discretized as present/absent calls. Many tools perform this task efficiently, however they process a single input at a time, which produces discretization conflicts among replicates. Here we present the implementation of a Hidden Markov Model (HMM) using mixture negative multinomial emissions to discretize ChIP-seq profiles. The method gives meaningful discretization for a wide range of features and allows to merge datasets from different origins into a single discretized profile, which resolves discretization conflicts. A quality control step performed after the discretization accepts or rejects the discretization as a whole. The implementation of the model is called jahmm, and it is available as an R package. The source can be downloaded from http://github.com/gui11aume/jahmm
Comment: 25 pages, 3 figures