CPSC 536A -- Lecture 20 on March 27, 2001

Lecture by Anne Condon.

MASSIVELY PARALLEL SIGNATURE SEQUENCING (MPSS)

Source paper: Brenner et al. "Gene Expression Analysis ...", Nature Biotech, Vol. 18, June 2000

Technologies:

There are 2 more important technologies used for gene expression analysis:

1. Microarrays

Use of microarrays for gene expression analysis assumes that one knows a superset of the genes to be expressed.

An advantage is that one can detect a wide range of expression levels, which means that rare mRNA's can be detected.

Limitations of microarray technologies (particularly cDNA microarrays) include the variability in measurements, due to probe hybridization differences and cross-reactivity, element-to-element differenceswithin micro-arrays, and microarray-to-microarray differences.

2. SAGE

SAGE is considered to be statistically more robust than microarrays (because counting statistics are well modelled by the Poisson distribution) but is less scalable. It is based on beads counting.

New technolgy: MPSS

Brenner et al. proposed a method for gene expression analysis, which they call massively parallel signature sequencing, that addresses the limitations of the above two technologies. The method is as follows:

a) Copies of cDNA templates are attached to glass microbeads, with a unique template on each bead. (If the template library has multiple identical strands, each gets its own bead.)

b) The beads are arrayed in a flow cell.

c) Bases of the templates are read off, one by one, using fluorescence-based methods

Attaching templates to beads

In the reported experiment, the template library contained 3-4 x 10⁴ mRNA strands. Strands are converted to cDNA strands using tailed poly-T primers, and the ends of the cDNA strands are restricted. The cDNA strands are then inserted into a cloning vector. The cloning vector contains a set of 1.67 x 10⁷ 32-mer oligo address tags, thus forming a set of 5-7 x 10¹¹ conjugates.

Select a small random sample of about 1.5 x 10⁵ conjugates. With high probability, the sample has the following properties:

- each cDNA is represented at least once in the sample

- tach tag is represented at most once

Amplify this sample using PCR, render single-stranded, and add a fluorescent tag.

Prepare a population of microbeads, each with about 10⁶ copies of the complement of an address tag (anti-tag) attached to its surface. The beads were prepared in 8 rounds, with one of 8 4-mers subunit attached to each bead at each round. (Result: 8⁸ distinct tags = 16.7 million tags.)

Combine beads and conjugates (under stringent hybridization conditions). Conjugates attach to beads, with 1% of the beads getting about 10⁴-10⁵ copies of a single kind of template molecule. Using a fluorescence-activated cell sorter, obtain the loaded beads.

Reading off bases

Bases are read off from the free end in cycles, 4 bases per cycle. Initially, three bases at the 3' end of the molecule are exposed (made single-stranded) using the enzyme DpnII, so the first cycle actually reads off 3 rather than 4 bases. In what follows, we describe the more general cycle, where 4 bases are read off.

a) 16 classes of adaptor molecules are used:
         o 4 represent possible bases at position 1
         o 4 represent possible bases at position 2
             ...
         o 4 represent possible bases at position 4
There are 4³ possible adaptors in each class, all of which have a common decoder at one end. The 4³ adaptors correspond to the possibilities for the three free bases among the four bases at the free end.

Picture for class 2A: the *'s indicate the double stranded molecule that is to be read off, attached to the bead. Four bases at the free end are being read off in this cycle. The dashed part is the same in all adaptors. Decoders will be explained later.

The dashed part is: ACGAGCTGCCAGTC
TGCTCGACGGTCAG

                   positions to be read off:
                                                                  4 3 2 1
      (bead)* * * * * * * * * * * * * * * * * * 3'----------------decoder----3'
                   * * * * * * * * * * * * * * N N A N-----------------anti-decoder-5'

There are 16 times 4³ = 2¹⁰ adaptors in total.

b) Wash all 2¹⁰ adaptors over the beads and ligate. 4 adaptors will ligate to the templates on each bead.

c) In 16 separate steps, wash one fluorescently labeled decoder over the beads. This decoder hybridizes to the beads with matching anti-decoder. There is a unique anti-decoder per class. Read off which beads light up, using fluorescence imaging, augmented by software to track any movement of the beads in the flow cell.

d) Add type IIs restrition endonuclease BbvI. This restriction enzyme recognizes the sequence
- GCTGC -
- CGACG -
in the fixed part of the adaptor. This endonuclease chews off the end of the strand attached to the bead, so that the next four bases of the cDNA template are exposed!

Repeat from step a) until the desired number of bases are read off.

MODULE 7: BIOMOLECULAR COMPUTING

Content:

Adleman's experiment; "classical" models for DNA computing
Self-assembly models for DNA computing; Winfree's work
Combinatorial and algorithmic problems arising in biomolecular
Computation: inverse RNA folding, theory of self-assembly and resource bounded tiling

Idea: encode information in DNA strands

Adleman's Hamiltonian Path Experiment

Input: generate random paths

associate DNA strands with nodes and edges
join edge strands in test tube to form double-stranded paths (hybridization, ligation)
wash to form single-stranded paths

Process:

select path from S to T
select path with k (k=7) nodes
select path entering all nodes at least once (see slide 1)

- attach strand associated with node 2 to beads and introduce it into the test tube

- the paths that enter node 2 hybridize to strands on the beads

- remove beads

- wash and detach deisred paths

Output: "YES" if a path remains

Biomolecular Computation Research

Classical DNA/RNA computation (Eg. search and prune)
O(n) - biostep computation (Eg. self-assembly of 3D DNA molecules)
DNA Computing on Surfaces

Advantages (over "solution phase" chemistry):

- facile purification steps

- reduced interference between strands

- easily automated

Disadvantages:

- loss of information density (2D)

- lower surface hybridization efficiency

- slower surface enzymes kinetics

DNA Surface Model

Input: DNA strands representing (0,1)^n

- encoding of Binary Information in DNA strands: a strand is composed of words. Each word is a short DNA strand (16-mer) representing one or more bits.

Process: has 3 steps

- MARK: strands in which bit j = 0 (or 1) hybridize with Watson Crick complements of the word containing bit j, followed by polymerization

- DESTROY

- UNMARK

Eg. Satisfiability problem

Output: exactly the strands that remain on the surface