Difference: JaysJournal (16 vs. 17)

Revision 172010-05-22 - jayzhang

Line: 1 to 1
 

05/06/10

I'm starting this journal 3 days late (whoops). Here's a brief overview of what happened during my first three days:
  • Read through a bunch of Daniel's journal entries to get myself a little more familiar with the project so far.
Line: 149 to 149
 
  • More profiling to figure out what's wrong; maybe play with report, k, and iteration limit values.
  • Try making CMake compile different implementations of the local aligner based on CPU features.
Added:
>
>

05/21/10

Okay, I've been staring at the computer screen for way too long. I tried to figure out what's different between our baligner and the readaligner and makes readaligner so much faster, but I just can't seem to find it. Here are some of my findings:
  • On 125,000 reads, readaligner runs in about 9-10 seconds, while baligner runs in between 15-16 seconds.
  • If I comment out the lines that actually do the alignment (the call to Query::align()), readaligner takes <1 second, while baligner takes about 3 seconds. This means baligner has a lot more overhead when starting and finishing (i.e. not actually "aligning"), but this still doesn't account for the 6-7 second difference.
  • On 1000 reads, baligner and readaligner call Query::align() the same amount of times (if I use the Query class's own reverse functionality and comment out Daniel's). However, it seems calls after that start to diverge, but only slightly. For example, the call after Query::align(), MismatchQuery::firstStep(), gets called 3634 times on baligner, but only 3630 times on readaligner. It looks like this small difference is propagated downward through the calls, and ends in a difference of about 1000 more calls in baligner in a low-level function. This small difference in the beginning might be the source of the problem, since the increase in function calls could potentially get huge in 125,000 reads. However, I haven't profiled that yet, since it would take so long. Perhaps this is a good next step?
  • baligner is much more complicated than readaligner in terms of class structure, so this might be giving us some overhead. Also, I've noticed vectors are used a lot when passing information around; this might be a source of overhead, especially since just the vectors seem to be returned, instead of a pointer to a vector. This could be a future point of optimization, but I doubt it would save on very much time.

I think the most promising area to look into right now is still the Query::align() area, and why it's calling MismatchQuery::firstStep() slightly more than the readaligner. I think I'm going to run a profile comparing 125,000 reads on both aligners, just to confirm if it is that portion of the code that's giving us the problem. Also, it might just be a bunch of small optimization problems and overhead from using vectors, etc. that could be resulting in the performance loss, although I kind of doubt it.

Todo:

  • More profiling!
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback