Collections, Iterators and Generics - Part 2

Objectives

In this lab, you will...
  • Discuss how to use Collections to solve a few small problems.
  • Use a List to find recognition sites for an enzyme in a DNA sequence.
  • Use a Set to help implement a Polynomial class

Pre-lab

Virtually all interesting programs must remember collections of items. As we have discussed in lecture, the problems we are solving need different kinds of collections. Every time we need a collection of items, we must determine how we will use the collection and any properties the collection must support. For example, we discussed in lecture how the software in a cash register that prints a summarized grocery bill needs to remember a collection of items, each item need only appear once in the collection, but the software needs to remember how many items a given type have been bought. A Set provides the desired properties as the collection it remembers does not contain duplicates. As another example, a program that remembers the order of phone callers to a call center (e.g., Revenue Canada, Telus, etc.), needs to keep the callers in the order that they started to wait which is supported by a FIFO queue.

Before coming to the lab, review your notes on the Java Collections Framework so that you have an understanding of the properties of different data structures.


In-lab Exercises

Part 1. Finding Recognition Sites in a DNA Sequence

Biologists do all sorts of stuff with DNA sequences. Another task they need to perform is to find all of the recognition sites in a DNA sequence for a specified restriction endonuclease. A restriction endonuclease recognition site is comprised of between 4 to 6 nucleotides (remember these are denoted by the letters A, C, G and T). To find the recognition sites, software must scan the DNA sequence and report all positions in the sequence which match the recognition site.

For example, imagine that the sequence consists (in part) of:

 ACTCGGGTCCACTCGGAAT

and the restriction endonuclease recognition site is

 TCGG

Then, assuming the first position in the sequence is numbered 0, the recognition sites are at 2 and 12. e.g.,

 ACTCGGGTCCACTCGGAAT
   TCGG      TCGG

In this part of the lab, you will add a method to your DNASequence class from last week's lab to find the recognition sites for a given restriction endonuclease. This is not quite as simple as it first sounds, so give the problem some thought before starting. Here are the files supplied in last week's lab in case you need them again: dnalab.zip

Test your method with these two restriction endonucleases against the SARS sequence:

  1. GAATTC (this is the EcoR1 restriction enzyme)
  2. GATCC (this is the BamH1 restriction enzyme)

Your test driver must report the locations of each recognition site for each of these restriction endonucleases. (Hint: A list iterator may be handy here!) (Hint 2: The first location of the EcoR1 restriction enzyme is 15809.

 

Part 2. Implementing a Polynomial Class (time permitting)

In this section you are to implement a class that represents a polynomial. The polynomial 5x^3 - 2x^2 + 3x - 4, for example, is to be represented using the terms: (5, 3), (-2, 2), (3, 1), (-4, 0). The terms of the polynomial are to be stored in a set so that each term in the set has a unique power. If a term is added to the set with the same power as an existing term, the coefficients of the two terms are to be added together. So, for example, if we add the term (4, 2) to the set of terms above, we end up with the set: (5, 3), (2, 2), (3, 1), (-4, 0). Note that the terms (-2, 2) and (4, 2) have been combined together into the single term (2, 2) by adding the coefficients.

You will find this exercise easier if you use a TreeSet (i.e., an ordered set). Note that a TreeSet orders its elements in increasing order. We want the terms in the polynomial to be stored in order of decreasing powers. Create a class that implements Comparator<PTerm> so that terms are ordered accordingly. Do not define a natural ordering for PTerm objects - be sure to use a Comparator instead.

Your Polynomial class must support the following methods:

public Polynomial( Set<PTerm> terms );
// constructor creates a new polynomial from the given terms

public double evaluate( double x );
// evaluates the polynomial at the given value of x

public String toString();
// returns a string representation of the polynomial of the form: + 4x^3 - 2x^1 + 7x^0 Note that powers are ordered from
// highest to lowest

public void add( Polynomial other );
// adds other polynomial to this polynomial. Be sure to combine together terms that have the same power and remove any terms
// that have a resulting zero coefficient

In addition to implementing a Polynomial class, you will need to implement a PTerm class that represents a single term in the polynomial. Recall that the application gets to define what it means for two objects to be equal. In our case, it makes sense to consider to PTerm objects to be equal if they have the same power - this way we can get the Set to ensure that there's only one term with a given power in the set at any given time. You should therefore override equals() for PTerm so that two terms are considered equal if they have the same power (regardless of coefficient).

After you've implemented your Polynomial class, test it carefully. Does it order terms from highest power to lowest when toString() is called? What if there are no terms in the polynomial when toString() or evaluate() is called? When you add another polynomial to this polynomial, do terms get combined together properly? Are terms with a zero coefficient removed?

 

Did you know? In addition to a combined program in Computer Science with Microbiology & Immunology mentioned in an earlier lab, there's also a combined program in Computer Science with Mathematics, and another in Computer Science with Statistics. If you are interested in knowing more about these options, see our website or make an appointment to see one of our advisors by e-mailing undergrad-info@cs.ubc.ca

 

When you are finished call your TA over and have your solution checked!