Shotgun sequence assembly and recent segmental duplications within the human genome

Nature. 2004 Oct 21;431(7011):927-30. doi: 10.1038/nature03062.

Abstract

Complex eukaryotic genomes are now being sequenced at an accelerated pace primarily using whole-genome shotgun (WGS) sequence assembly approaches. WGS assembly was initially criticized because of its perceived inability to resolve repeat structures within genomes. Here, we quantify the effect of WGS sequence assembly on large, highly similar repeats by comparison of the segmental duplication content of two different human genome assemblies. Our analysis shows that large (> 15 kilobases) and highly identical (> 97%) duplications are not adequately resolved by WGS assembly. This leads to significant reduction in genome length and the loss of genes embedded within duplications. Comparable analyses of mouse genome assemblies confirm that strict WGS sequence assembly will oversimplify our understanding of mammalian genome structure and evolution; a hybrid strategy using a targeted clone-by-clone approach to resolve duplications is proposed.

Publication types

  • Comparative Study

MeSH terms

  • Animals
  • Chromosomes, Human / genetics
  • Computational Biology / methods
  • Gene Duplication*
  • Genes, Duplicate / genetics
  • Genome, Human*
  • Genomics / methods*
  • Humans
  • Mice
  • Physical Chromosome Mapping / methods*
  • Sensitivity and Specificity
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*