Tracking the Evolution of the SARS Coronavirus Using High-throughput, High-density Resequencing Arrays

Christopher W. Wong, Thomas J. Albert, Vinsensius B. Vega, Jason E. Norton, David J. Cutler, Todd A. Richmond, Lawrence W. Stanton, Edison T. Liu and Lance D. Miller

Expression Genomics Laboratory, Genome Institute of Singapore, SINGAPORE; NimbleGen Systems, Inc., Madison, WI; Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD.


This page contains supplementary information for the paper of the same name published in Genome Research (March, 2004).
  • Figures from the paper
  • Supplementary Figures
  • Tables from the paper
  • Supplementary Tables
  • Protocols

Figures from the paper

Figure 1. SARS Resequencing Array. (A) Diagram of the different sequence variants that can be detected by the array. Specific probes were designed to screen for previously published insertion and deletion sequences. (B) Resequencing array hybridized with Cy-3-labeled SARS-CoV cDNA. (C) Close-up view of oligonucleotide probes synthesized on the array. The 4 possible nucleotides for each position are synthesized adjacent to each other. SARS cDNA bound to perfect-match (PM) probes (in red) fluoresce with higher intensity than those bound to mismatch (MM) probes (in black). [JPEG]


Figure 2. Distribution and frequency of ambiguous calls across the SARS-CoV genome. We observed N calls at a total of 1148 bases in this study, of which 580 occurred in more than 1 sample. [JPEG]


Figure 3. Stratification of probes according to %G/C and assessment of probe performance. All PM probes were binned according to %G/C, and average PM/MM ratios, call rates and average feature intensities were calculated. G/C content <20% or >50% leads to lowest PM/MM ratios, resulting in increased rate of ambiguous calls. [JPEG]


Figure 4. Effects of secondary structure on probe annealing and ambiguous calls. The most stable structure as predicted using GeneRunner software, is illustrated for 2 sequences with recurrent Ns: (A) Bases 25953-9, (B) Bases 22781-96. In both cases, the frequency of ambiguous calls peaks at the bases within the predicted loop structure. [JPEG]


Supplementary Figure

Suppl. Fig. 1. Deletions detected by Resequencing Array. According to published reports, there is a 6 nt deletion in SIN2748 and a 5 nt deletion in SIN2677 (indicated by dashes in the blue boxes above) when compared against the GIS SARS-CoV consensus and TOR2 sequences. SIN2748 and SIN2677 were resequenced using the Array, and their corresponding sequences are shown in purple. As expected, the deletions were accurately mapped, and we obtained clean sequence reads in the region flanking the deletions. [PDF]


Tables from the paper

Table 1. Call rate and accuracy of SARS Resequencing Array. Discordant calls differ between array and ACS. Ambiguous calls refer to bases that lack sufficient information for high confidence base assignment. Call rate is the percentage of genome sequence with high confidence base calls. Accuracy is the percentage of correctly called bases (as determined by ACS) over the total number of bases called (excluding ambiguous calls). These results are based on duplicate hybridizations. Tissue 1 was hybridized on 3 pairs of arrays. The data for tissues 3 and 4 are not shown as ACS sequence is not available. [PDF]


Table 2. Selected polymorphisms identified by resequencing array. Vero isolates 1-4 and Tissue 1 are new SARS-CoV samples. Previously reported markers used to distinguish between the T:T:T:T and C:G:C:C strains of SARS-CoV are shaded. Novel variants are shown in BOLD. *The position of each nucleotide is based on SARS-CoV isolate SIN2500 (gb: AY283794). [PDF]


Supplementary Tables

Suppl. Table 1: PCR primer sequences. [HTML] [EXCEL]


Suppl. Table 2. SARS-CoV sequences at NCBI as of August 7, 2003. [HTML] [EXCEL]


Protocols

Amplification of SARS Coronavirus [PDF]


Hybridization on Resequencing Chip [PDF]


Contact me at wongc@gis.a-star.edu.sg

Last updated 2.12.2004