(r11) JaysJournal < BETA

05/06/10

I'm starting this journal 3 days late (whoops). Here's a brief overview of what happened during my first three days:

Read through a bunch of Daniel's journal entries to get myself a little more familiar with the project so far.
Had a meeting with Daniel and Chris where we discussed my first task:
- First, I should make or beef up the existing test suite.
- Then, I'm to make a vectorized version of the Smith-Waterman aligner that Daniel has working.
- I should also familiarize myself with existing tools (e.g. Bowtie, BWA, Mosaik, and SamTools).
I ended up spending a lot of time reading through the various guides Daniel posted, as well as the Git guide and the Google C++ style guide to get myself more familiarized with what I should be doing and how I should be doing it.
I also spent some time reading up on the Smith-Waterman algorithm and reading up on sequence alignment in general to get myself a little more familiar with the project.
Checked out Daniel's existing code, especially local_aligner.cc bit (which I'm assuming my vectorized SW will be based a lot off of). I can understand the majority of the code, although I'm a little iffy on some of the gap penalty calculations.

So after all that, I think I'm finally ready to do a little coding. Today, I'll mostly be working on the test suite and trying to get that out of the way.

End-of-the-day post:

I've looked through the local aligner source code in a bit more detail and I think I know my way around it pretty well now. I'm still a little unclear on the free gap policies (I understand the premise and a little of how it works, but I still need to wrap my head around some of the cases in the algorithm), but Chris told me not to worry too much about it, as it's not a priority for now.

I've managed to modify Daniel's test/correctness_test.cc so that it works on the existing code, and all the test cases pass. The test cases Daniel wrote were pretty complete but only tested the ScoreOnlyAlignment method in the local aligner. So next up will be writing some test cases for the FullAlignment, which may be a little harder since it doesn't just output numbers. I've written a few tests so far, and I think I may have found one or two bugs (but I'm going to have to look into that some more to see if it was my test case or the code that was failing). So tomorrow, I will mostly be doing...

Finishing up the test cases for the FullAlignment method.
Fixing any bugs that might come up along the way.
If I get a chance...maybe even start looking into the vectorized methods!

05/07/10

Finished writing the test cases for the local aligner and corrected a very minor bug I found along the way. I mostly used Daniel's test cases for the ScoreOnlyAlignment method, but added the FullAlignment method to the tests. However, I didn't manage to get middle-of-sequence mismatch and gap tests working, since there are so many different possibilities for the alignment to turn out, and it depends very heavily on the scoring method (and the values of the different penalties) as well.

Chris also gave me the source code for a vectorized Smith-Waterman function (from the SHRiMP library) and I spent the rest of the afternoon looking at that and trying to understand it. I found the documentation for the SSE2 functions on the Intel website (link to come later, not on the right now). From my understanding (and Chris' explanation), the function goes through the matrix using a diagonal vector ("positive slope") operating on 8*16bit values, and calculates the highest score from that vector. It uses the affine gap penalty method to calculate scores. From my discussion with Chris, I should now be working on:

Integrating the vectorized Smith-Waterman function into the NGSA library (I can either try using the source code given or make my own).
Configuring CMake so that it detects support for SSE2 (and maybe SSE3) so that it will compile either the vectorized Smith-Waterman or non-vectorized Smith-Waterman, depending on the machine.
Take a look at the banded Smith-Waterman function and try integrating that, too, perhaps. This function only checks a "strip" of cells going down the diagonal of the score matrix, since in practice, alignments aren't as useful when there are large variations from a full match.
Finally, in the distant future, start thinking about the free gap algorithms and vectorizing them.

05/10/10

I spent the morning trying to integrate the SHRiMP vectorized Smith-Waterman code into a new local aligner class, but I couldn't get it to pass the test cases and kept getting segmentation faults, etc. So, I decided it was more trouble than it was worth and to try building my own version of the vectorized Smith-Waterman directly. I also had a lot of trouble getting CMake to do the right thing, but I later figured out it was because I forgot to use the -msse2 -msse3 compiler flags.

I'm about a quarter complete on a score-only vectorized version for the local aligner, and I've got a basic game plan laid out for completing it; I've also found that the best way to understand some code is to try to code it yourself! I now have a much firmer grasp on the whole SHRiMP aligner and on why it does some of the things it does (for example, one question I had was why it stored values for b_gap, but no a_gap array). I think by tomorrow, I can:

Get the score-only vectorized Smith-Waterman finished and hopefully tested out.
Time permitting, get started on the full alignment with traceback.
Read up on CMake a bit more; it's still giving me a strange amount of trouble...

05/11/10

Today felt pretty pointless. I managed to finish my score-only Smith-Waterman alignment, but it ended up failing a bunch of tests. So I took a look at the SHRiMP code again and made a few changes based on that code to iron out the bugs. In the end, my code ended up looking almost exactly like the SHRiMP code, and yet was still buggy, so I ended up just copying and pasting the code into my version. I decided to cut and paste the SHRiMP code into my own class for a couple of reasons:

I'll have to use that code again, but modified when I do the full alignment (with traceback).
The SHRiMP code required a call to initialization, which was troublesome and not really required given the way the NGSA LocalAligner works.
Since SHRiMP was in C, I couldn't wrap their functions in namespaces.

I'll still have to make sure if my decision was correct with Chris, though.

The good news is, I got all the tests to pass on the score-only alignment. Now all I have to do is get a full alignment working. I also want to try to optimize for the linear gap penalty method, since that case is much simpler than the affine gap penalty. Although the linear case is just affine with a gap open penalty set to 0, I think I can do away with some of the vectors in the calculation, and so make it a little faster. That'll be a low priority, though. So for tomorrow, I think I'll be able to finish:

The full Smith-Waterman alignment (vectorized)
Hopefully get that to pass all the tests (i.e. make the behaviour the same as the non-vectorized version)
Take a look at optimizing for linear gap penalty if I have time.

05/12/10

Chris told me that the full alignment was a lower priority, since the score-only alignment is used much more often. So, I've tried optimizing for linear gap penalty instead. I managed to improve performance by not doing a few calculations required in the affine and also improve the memory usage slightly (by not having to use one genome-length-size array).

I did a few preliminary tests, using a randomly generated genome/read sequences of length 1000 each. Over 1000 iterations, my results were:

38.836576365 seconds using affine gap penalty
28.907083597 seconds using linear gap penalty

So I get a roughly 25% decrease in calculation time. Tests using more/fewer iterations and using different genome/read lengths show more or less the same amount.

I also finished a very preliminary version of the banded algorithm (non-vectorized), though I haven't tested or even compiled it yet. It's basically just a recoding of the original Smith-Waterman aligner in that it goes through the matrix left-to-right and top-to-bottom, but instead of starting from x=0 and going to x=read length, it starts and ends based on boundaries defined by two diagonals going through the matrix. I'm not too happy about this arrangement right now, since if I were to do a full alignment, I would have to keep the whole matrix in memory, which is a waste since I'm only calculating a small subset of it. Another method I've been thinking of was to "flip" the matrix, so it's a diamond shape and start calculating in vertical lines. I think this would be the way to go for the vectorized version, but I have no idea how to do that right now!

I do remember doing a "flip" in CS320, but I think it was with a square, whereas this matrix could be rectangular (shouldn't be that much different). I should look up my old notes later on, too.

For tomorrow, I will:

Finish up my banded non-vectorized algorithm and test it
Try coming up with a vectorized version (may involve having to "flip" the matrix).

05/13/10

Had a discussion with Chris about implementing the banded local aligner. He told me that I have actually been doing my non-vectorized version right and I should go left-to-right, top-to-bottom through the array with artificial bounds instead of from 0 to max x. He also told me that I can pretty much assume the matrix will be square, since we will be passing output from the index into the aligner, and the index will find a reference that is roughly the same size as the read. So, that solves the slope issue. He also told me a general way to traverse the matrix with a vector: instead of going left-to-right, then starting at the beginning, the vector will follow a right-down-right-down pattern (so it will "bounce").

I came in today and realized my method of finding the bounding lines on the banded non-vectorized version was pretty complicated; instead of finding the equations for the two bounding lines given some k distance of separation from the diagonal (which involves finding the slope and y-intercept and has square roots in it), I instead take the diagonal line and calculate all cells that are a horizontal distance k from it. This is much easier, and works out to pretty much the same thing. It has the added advantage of letting me take in a horizontal k value for my band radius, which will let me guarantee a correct score given the number of gaps <= k.

I managed to finish up the non-vectorized version as well as make up a (I think) pretty comprehensive test suite for it. In the end, I passed all my own tests, though I did miss a "-1" in one line of code that had me banging my head for a while. I also managed to improve the memory efficiency of the score-only local alignment function in general based on a few tricks I learned while combing through the vectorized Smith-Waterman code, though I've only implemented them in my banded aligner for now. Improvements include:

Keeping only one read-sized row of scores instead of 2.
Keeping only one read-sized row to keep track of gaps in the x-axis instead of 2 rows.
Having just one int to keep track of the last match, instead of 2 read-sized rows.
Having just one int to keep track of the last gap in y, instead of 2 read-sized rows.

To do:

Start tackling the vectorized banded algorithm! That'll be pretty challenging and probably keep me busy all day.
(Much later) Add in my improvements to the original local aligner.
(Still much later) Polish up the different aligners and have CMake compile either the vectorized or non-vectorized versions of the aligners based on CPU support.

05/14/10

I finished the vectorized banded implementation with all test cases passing! I really thought it would take more time and I was surprised when all the test cases passed. The implementation works only with square matrices, and uses 16x 8-bit vectors, so it should be quite a speed-up from the regular banded implementation. The strange thing with 8-bit vectors is that the instruction set is missing a few things compared to 16-bit vectors. For example, there's no insert or extract function, so I ended up casting the __m128i 's into an int8_t array and inserting/extracting manually; I'm not really sure if that's a good idea, though. Also, there was no max function for signed integers, so I had to write my own for that as well.

For Monday:

Edit the CMakeList.txt files so that it compiles vector/non-vector versions based whether SSE2 is available or not.
Polish up all the code I've written and merge.
Add in my improvements to the original local aligner.

05/17/10

I've integrated all the different aligners I've made together into one package. So now, there are two different local aligners, banded and un-banded, each of which have two different implementations, vectorized and non-vectorized. Which version of those two aligners are compiled depends on whether the target machine supports at least SSE2. I haven't really figured out how to get CMake to automatically detect whether SSE2 is supported or not, but instead I've added a variable to the CMakeLists.txt to be set manually.

I also cleaned up my implementations and optimized the existing local aligner implementation (only the ScoreOnlyAlignment method) to be more memory-efficient and faster. All public interfaces for both vectorized/non-vectorized versions of the aligners now match in both input and output and should be more-or-less equivalent in usage (with the exception of the free gap bits in the regular local aligner).

I'm not really sure what else to do for now, so I'll probably have to ask Chris/Daniel tomorrow.

Topic revision: r11 - 2010-05-18 - jayzhang