The
Hidden Layer of Noncoding RNA in the Evolution and Genetic Programming
of Complex Organisms
John S. Mattick
Institute for Molecular Bioscience, University of Queensland, Brisbane
4072 Australia
Recent
evidence suggests that at least half of the genes in the mammalian genome
do not encode proteins. Most of the mammalian genome is transcribed,
the vast majority (~98%) of which is non-protein-coding RNA, comprising
introns of protein-coding transcripts and introns and exons of non-protein-coding
transcripts (ncRNAs). These transcripts include complex clusters of
overlapping and antisense transcripts, "intergenic" transcripts and
pseudogene transcripts that appear to participate in both local and
long-distance regulatory networks. Many transcripts (including intronic
RNAs) are processed to smaller RNAs, including snoRNAs that edit other
RNAs, and microRNAs that control many aspects of development, including
embryogenic patterning, adipocyte formation, hematopoietic differentiation,
apoptosis and insulin secretion, and are perturbed in a range of cancers.
RNA signaling is also involved in chromosome dynamics and chromatin
modification, epigenetic processes which, like alternative splicing
(also likely to be controlled by trans-acting RNAs), are essential to
differentiation and development. Interestingly many ncRNAs with conserved
functions, such as XIST and H19, are not highly conserved at the sequence
level, suggesting that (like language) their sequences can drift easily
and yet retain the same function, which also suggests that there may
be many more microRNAs and other regulatory RNAs to be discovered. Many
putative ncRNAs identified in the RIKEN mouse cDNA project exhibit tissue-specific
expression patterns and are dynamically induced by physiological stimuli.
There are also many nucleic-acid binding proteins that appear to interact
with complexes containing RNA, but whose exact specificity is unknown.
In addition, a significant proportion of the human genome appears to
be under evolutionary selection, including thousands of ultra-conserved
noncoding sequences and transposon-free regions that have remained unchanged
throughout mammalian evolution, suggesting extended regions of complex
regulatory information that operate via unknown mechanisms, observations
that are hard to reconcile with current models of gene regulation. Many
noncoding regions are conserved between species in complex patterns
that are not evident from pairwise comparisons alone, suggesting that
many sequences are under negative or positive selection in different
lineages, presumably related to their common ontogeny and phenotypic
differences driven by adaptive radiation, respectively.
These observations, and the increasing number of complex genetic phenomena
shown to be directed by regulatory RNAs, suggest that the majority of
the human genome, and those of other complex organisms, is in fact functional
(not junk), and devoted to an advanced genetic regulatory system that
is primarily transacted by RNA. This conclusion is also supported by
an information theoretic analysis and empirical data that show that
regulatory networks are accelerating networks and that bacteria have
been likely limited in their complexity by a regulatory system based
simply on analogue controls (i.e. proteins), implying that multicellular
organisms must have breached this limit by evolving a new regulatory
system, based on sequence-specific RNA signaling. If this is correct,
our current conceptions of the information content of the mammalian
genome and the genetic programming of mammalian development and variation
will have to be completely reassessed, with enormous implications for
biology and medicine.