Posted by: shrikantmantri | October 11, 2009

Genomic DNA k-mer spectra: models and modalities.

via Genome Biology on 10/10/09

Publication Date: 2009 Oct 8 PMID: 19814784
Authors: Chor, B. – Horn, D. – Levy, Y. – Goldman, N. – Massingham, T.
Journal: Genome Biol

ABSTRACT: BACKGROUND: The empirical frequencies of DNA k-mers in whole genome sequences provide an interesting perspective on genomic complexity and the availability of large segments of genomic sequence from many organisms means that analysis of k-mers with non-trivial lengths is now possible. RESULTS: We study the k-mer spectra of more than 100 species from archaea, bacteria, and eukaryota, particularly looking at the modalities of the distributions. As expected, most species have a unimodal k-mer spectrum. However, a few species, including all mammals, have multimodal spectra. These species coincide with the tetrapods. Genomic sequences are clearly very complex, and cannot be fully explained by any simple probabilistic model. Yet we sought such explanation for the observed modalities, and discovered that low-order Markov models capture this property (and some others) fairly well. CONCLUSIONS: Multimodal spectra are characterized by specific ranges of values of C+G contents and of CpG dinucleotide suppression, a range that encompasses all tetrapods analysed. Other genomes, like the protozoa Entamoeba histolytica, which also exhibit CpG suppression, do not have multimodal k-mer spectra. Groupings of functional elements of the human genome have either unimodal or multimodal behaviour.

post to: CiteULike


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: