Research Consortium Announces Significant Progress to Close Gaps and Uncover Novel Genes in the Human Reference Genome Sequence

25 Feb 2013

Roche announced today that a consortium consisting of researchers from Penn State University, the National Center for Biotechnology Information, Children’s Hospital Oakland Research Institute and Roche 454 Life Sciences is working on a new comprehensive de novo assembly of a human genome to augment and supplement the current human reference genome sequence. The team has presented the latest results today at the Advances in Genome Biology and Technology (AGBT) congress in Marco Island, Florida.

Under the leadership of Stephan Schuster, Ph.D., Professor at Penn State University, the consortium is analyzing and assembling the RP11 human reference genome as part of new efforts to close gaps in the human reference assembly using Roche’s 454 GS FLX+ Sequencer. To date, the draft assembly covers a significant number of the remaining human reference sequence gaps and has revealed 36 million bases of novel sequence, including novel genes with potential biological relevance.

“We are very proud to have been able to contribute to a project of such importance and potential impact on future genomic research with our unique long-read sequencing technology,” said Dan Zabrowski, Head of Roche Applied Science. “This project also shows the power of combining different innovative sequencing analysis and assembly technologies.”

This new de novo assembly is quickly becoming the most complete available of the Human Reference Genome using next-generation sequencing technology. The size and contiguity of the new assembly matches that of previous Sanger-based assemblies, including the J. Craig Venter genome (HuRef) published in 2007. In total, the latest draft assembly fully spans 76 remaining gaps and extends into 13 additional repeat regions, as well as revealing a total of 36 million bases of novel genomic sequences.

“I am pleased with the overall progress of the project and the high quality of the assembly even at this early stage,” said Stephan Schuster. “The 454 Sequencing technology has proven to sequence entire human genomes with even coverage and the long reads enable Sanger-like sequencing of reference genomes.”

The current draft assembly was generated using a hybrid of 18X 454 GS FLX+ long read sequence data and 7.5X short read sequence data from the Illumina MiSeq and HiSeq platforms. De novo genome assembly was performed using the Roche 454 GS De Novo Assembler software (Newbler). Significant ongoing efforts to add additional sequence data and apply different bioinformatics strategies are expected to further improve the contiguity of the assembly and quality of results, which will be made publically available to the research community.

Tags