Difference: JaysJournal (18 vs. 19)

Revision 192010-05-27 - jayzhang

Line: 1 to 1
 

05/06/10

I'm starting this journal 3 days late (whoops). Here's a brief overview of what happened during my first three days:
  • Read through a bunch of Daniel's journal entries to get myself a little more familiar with the project so far.
Line: 172 to 172
 
  • Optimize!
  • Read up on policy classes (Chris mentioned this briefly but said it was a low priority, so I just wanted to jot it down before I forgot about it).
Added:
>
>

05/26/10

Spent a lot of time just thinking...Daniel and I were revisiting the class design for the aligner, and we encountered some design issues with the Mapper and Index classes. Specifically, at this point, the Mapper classes feel too dependent on the specific Index implementation, a problem arising from using external libraries (readaligner). For example, I wanted to move the NGS_FMIndex::LocateWithMismatches and NGS_FMIndex::LocateWithGaps methods into their respective Mapper classes, but I found it troublesome, since the LocateWithX methods encapsulate the Query::align() method. The Query class is a readaligner class and must take in TextCollection objects to be instantiated. Those objects are outputted from the FM Index used in readaligner only, so if the Mappers were to instantiate their own Query classes, they would essentially have to be used with only the FM Index implementation.

Daniel and I have also decided to put an option in the Mapper classes for the user to choose the maximum number of results they want. This is a minor optimization that would allow us to terminate the alignment function once a set amount of results have been reached, rather than going through with the whole alignment. This might also allow us to use fixed-length arrays instead of vectors (or at least reserve the space in the vector beforehand to prevent reallocs).

Another area I've investigated today was the cost of abstraction in our Index classes. I tried unabstracting the NGS_FMIndex and comparing runtimes. On 125k reads, I get about 9.8-10.1 seconds for both implementations, so the abstraction cost isn't really an issue. I just realized that Query objects are also abstracted those methods are called much more often. This might also be a good area to investigate.

Finally, I just realized that string 's can reserve space, too. I think this might be a good area to improve on, as I'm seeing a lot of calls to malloc from both string and vector. Considering both of these could get quite large in some cases, reserving memory beforehand may be a good idea so we don't get frequent realloc 's.

To do:

  • Discuss class design with Daniel and come up with a finalized design we're both happy with. This may or may not mean changing the current design (which is good, except for the bit with Index and Mapper.
  • Look into unabstracting the Query class and see how that fares. If it's a big improvement, look into policy classes.
  • Add an option to limit number of results.
  • Move instantiation of Query objects into the Mapper 's (if Daniel agrees)
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback