Difference: JaysJournal (18 vs. 19)

Revision 192010-05-27 - jayzhang

Line: 1 to 1
	05/06/10 I'm starting this journal 3 days late (whoops). Here's a brief overview of what happened during my first three days: Read through a bunch of Daniel's journal entries to get myself a little more familiar with the project so far.
Line: 172 to 172
	Optimize! Read up on policy classes (Chris mentioned this briefly but said it was a low priority, so I just wanted to jot it down before I forgot about it).
Added:
> >	05/26/10 Spent a lot of time just thinking...Daniel and I were revisiting the class design for the aligner, and we encountered some design issues with the `Mapper` and `Index` classes. Specifically, at this point, the `Mapper` classes feel too dependent on the specific `Index` implementation, a problem arising from using external libraries (readaligner). For example, I wanted to move the `NGS_FMIndex::LocateWithMismatches` and `NGS_FMIndex::LocateWithGaps` methods into their respective `Mapper` classes, but I found it troublesome, since the `LocateWithX` methods encapsulate the `Query::align()` method. The `Query` class is a readaligner class and must take in `TextCollection` objects to be instantiated. Those objects are outputted from the FM Index used in readaligner only, so if the `Mappers` were to instantiate their own `Query` classes, they would essentially have to be used with only the FM Index implementation. Daniel and I have also decided to put an option in the `Mapper` classes for the user to choose the maximum number of results they want. This is a minor optimization that would allow us to terminate the alignment function once a set amount of results have been reached, rather than going through with the whole alignment. This might also allow us to use fixed-length arrays instead of vectors (or at least reserve the space in the vector beforehand to prevent `reallocs`). Another area I've investigated today was the cost of abstraction in our `Index` classes. I tried unabstracting the `NGS_FMIndex` and comparing runtimes. On 125k reads, I get about 9.8-10.1 seconds for both implementations, so the abstraction cost isn't really an issue. I just realized that `Query` objects are also abstracted those methods are called much more often. This might also be a good area to investigate. Finally, I just realized that `string` 's can reserve space, too. I think this might be a good area to improve on, as I'm seeing a lot of calls to `malloc` from both `string` and `vector`. Considering both of these could get quite large in some cases, reserving memory beforehand may be a good idea so we don't get frequent `realloc` 's. To do: Discuss class design with Daniel and come up with a finalized design we're both happy with. This may or may not mean changing the current design (which is good, except for the bit with `Index` and `Mapper`. Look into unabstracting the `Query` class and see how that fares. If it's a big improvement, look into policy classes. Add an option to limit number of results. Move instantiation of `Query` objects into the `Mapper` 's (if Daniel agrees)

View topic | History: r73 < r72 < r71 < r70 | More topic actions...

Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback