Concerning Multimaps
I sort of came to my conclusion about this already, but I'll spam it here anyway for posterity.
When faced when multimaps, there are three modes of resolution: randomly select 1, report all, or report none.
Currently, it seems that by default I find all possible mappings, and only during the output phase do I filter to one of the above three (in reality... the latter 2) cases. This isn't very computationally efficient, so I suspect we'll have to adapt something like a
report
variable found in
readaligner
.
-- Main.jujubix - 21 May 2010
Concerning the Class Hierarchy
As the library starts to take shape, we have to decide upon a class hierarchy which project will be built upon. I imagine that changing the hierarchy down the road will be difficult, so in hopes or avoiding that, let's commit ourselves to a single hierarchy.
Some history about the existing hierarchy directories:
- Originally, there was only
IO
, alignment
, and index
- IO would read in the reference and reads
- The index (Kmer) would return positions in the reference that matched the first k bases of a read
- The aligner would align the entire to the reference at the specified position
- Then then index was swapped... aligner was completely replaced when searching for exact reads
- The index would "locate" the position in the reference where the entire read was found
- Inexact reads were supported, leading to the need for
Mapper
classes
- Would "map" reads to the reference, but allowed some form of variation (e.g. mismatches, gaps, etc...)
- Some required
aligner
classes, bringing back the need for them
- To reduce the code seen in /tools/,
Drivers
were created
- Essentially, took in a
mapper
, input
and output
classes, and ran through every read in the given file
-
Pairend
classes were introduced to handle the post-processing to make reads paired...
- These were fed into some specific
Drivers
, and works independently from index and mappers
As you can see, the entire hierarchy wasn't carefully planned, and rather extended when the need arose... so I wouldn't be surprised if there was room for improvement... or a completely restructuring.
Some personal concerns:
- Some classes in
IO
are actually Types... this could be pulled out
- The creation of every
Mapper
class requires the addition of a new Locate
functions in the Index class
- Should the index simply be a container? And the mapper classes take care of the actually "locating", using the index?
-- Main.jujubix - 26 May 2010
Topic revision: r2 - 2010-05-26
- jujubix