Scientists report most comprehensive view of mammalian transcriptome

17 Jul 2008

Scientists from the University of Queensland, Australia and Applied Biosystems Inc. have teamed together to conduct the most comprehensive analysis to date of a mammalian transcriptome, the vast collection of RNAs transcribed from a mouse genome. RNA expression analysis data from this study represents the highest-resolution view of mammalian transcriptomes derived from both differentiated cells and stem cells. Results of this study are expected to help researchers to understand more fully the complexity of the genomic landscape of mammals. The study, published in the July 2008 issue of the journal Nature Methods, was also discussed at the International Congress of Genetics (ICG) meeting which took place on 12-17 July.

According to the authors of this paper, results of this study are significant because their findings will help researchers to identify distinguishing features in the genetic makeup of stem cells, and better understand how breakdowns in molecular pathways can lead to complex diseases such as cancer. For example, having a reliable method for detecting RNA splice variants will be essential for understanding gene fusion events, which are molecular characteristics of cancers such as leukaemia. Screening human cell samples for these kinds of RNA signatures has the potential to be developed into a diagnostic approach for identifying cancer at the molecular level.

Almost all of the DNA in the mammalian genome is transcribed into either RNA molecules from genes that encode proteins, or non-coding RNAs that regulate the activity of genes. By profiling the totality of RNA transcripts generated from the genomes of mouse embryoid body (EB) cells and embryonic stem cells (ESC), researchers in this study generated more than 10 billion bases of sequence from all RNA transcripts. This in-depth level of coverage of mouse cell line transcriptomes revealed thousands of previously unknown RNA transcripts, and allowed researchers to distinguish RNAs transcribed from the coding or sense strand, and non-coding RNAs that reside on the antisense strand of double-stranded DNA.

By also identifying an unexpectedly large number of variant transcripts derived from genomic loci of stem cells, researchers shed light on the complexity of biological pathways involved in regulating the pluripotency of stem cells, a key to understanding how stem cells differentiate into specific cell types.

Applied Biosystems’ SOLiD™ System was an essential technology used by scientists to profile the mammalian transcriptomes with an unprecedented depth of coverage. Researchers used the SOLiD System to perform a sequencing-based transcriptome profiling technique, using methodology developed at the University of Queensland to construct short quantitative random RNA libraries (SQRL).

Using the SQRL method, researchers created random cDNA libraries that gave them 25-35 base pair length sequence tags, each tag representing a particular RNA transcript generated from the mouse genome. The ability of the SOLiD System to both accurately detect even minute quantities of RNA transcripts and generate up to 240 million sequence tags per run enabled the researchers to rapidly perform a digital RNA expression analysis application and obtain an exact count of the number of RNA sequence tags generated from the genome of the different cell lines.

“For the first time we are starting to accumulate data sets that allow us to look at that entire complexity of all of the RNA present in a mammalian cell,” said Dr Sean Grimmond PhD, an associate professor at the Institute of Molecular Bioscience, University of Queensland, and senior author of the study. “This finding demonstrates that a digital gene expression methodology performed with the SOLiD System is far superior to array profiling approaches in terms of having a higher sensitivity and being able to see more RNAs in a transcriptome.”

By counting the number of sequence tags, and finding tags that map to previously discovered genes in archived data bases, the researchers were able to calculate the number of variant RNA transcripts that originate from specific regions or loci of the genome. From these short sequence tags, they were able to characterise RNAs as splice variants, multiple RNA transcripts that result from transcription of a single region of the genome; identify single base changes (SNPs) within transcripts; and detect other kinds of variants.

According to the authors of this study, the SQRL technique, which benefits from the tag throughput levels of the SOLiD System, effectively profiled transcriptomes by detecting RNA expression events that occur below the level of detection of traditional transcriptome analysis technologies such as microarrays.

Current array hybridisation technologies are insufficient to address the complexities of the mammalian transcriptome, as they do not have the sensitivity to detect RNAs expressed at very low levels. Moreover, array profiling requires hybridisation of transcripts to a known complementary sequence that has been fixed on a slide or chip. Alternatively, the SQRL method allows researchers to use a hypothesis-neutral approach to RNA expression analysis that identifies the low levels of novel non-coding RNAs expressed as splice variants, antisense strands, as well as repeat genetic elements, which make up a large portion of the transcriptome.

“Using the SQRL approach allowed us to discover RNA molecules that could not have been discovered using alternative methods such as array profiling,” said Kevin McKernan, Applied Biosystems’ senior director of scientific operations, and one of the co-authors of the study. “For example, this method allowed us to discover thousands of new splice variants. Also, being able to capture information about which DNA strand – sense or antisense – contains specific RNA transcripts provides us with an important detail for gaining a better understanding of antisense regulation and how non-coding RNAs function."

Researchers also used the SOLiD system to detect SNPs in both coding and non-coding RNAs, making it possible for them to explore mutation status and RNA editing events on a genome-wide scale, furthering their understanding of how variant non-coding RNA transcripts influence regulation of gene expression.

Links

Tags