> > | 05/13/10
Had a discussion with Chris about implementing the banded local aligner. He told me that I have actually been doing my non-vectorized version right and I should go left-to-right, top-to-bottom through the array with artificial bounds instead of from 0 to max x. He also told me that I can pretty much assume the matrix will be square, since we will be passing output from the index into the aligner, and the index will find a reference that is roughly the same size as the read. So, that solves the slope issue. He also told me a general way to traverse the matrix with a vector: instead of going left-to-right, then starting at the beginning, the vector will follow a right-down-right-down pattern (so it will "bounce").
I came in today and realized my method of finding the bounding lines on the banded non-vectorized version was pretty complicated; instead of finding the equations for the two bounding lines given some k distance of separation from the diagonal (which involves finding the slope and y-intercept and has square roots in it), I instead take the diagonal line and calculate all cells that are a horizontal distance k from it. This is much easier, and works out to pretty much the same thing. It has the added advantage of letting me take in a horizontal k value for my band radius, which will let me guarantee a correct score given the number of gaps <= k .
I managed to finish up the non-vectorized version as well as make up a (I think) pretty comprehensive test suite for it. In the end, I passed all my own tests, though I did miss a "-1" in one line of code that had me banging my head for a while. I also managed to improve the memory efficiency of the score-only local alignment function in general based on a few tricks I learned while combing through the vectorized Smith-Waterman code, though I've only implemented them in my banded aligner for now. Improvements include:
- Keeping only one read-sized row of scores instead of 2.
- Keeping only one read-sized row to keep track of gaps in the x-axis instead of 2 rows.
- Having just one
int to keep track of the last match, instead of 2 read-sized rows.
- Having just one
int to keep track of the last gap in y, instead of 2 read-sized rows.
To do:
- Start tackling the vectorized banded algorithm! That'll be pretty challenging and probably keep me busy all day.
- (Much later) Add in my improvements to the original local aligner.
- (Still much later) Polish up the different aligners and have CMake compile either the vectorized or non-vectorized versions of the aligners based on CPU support.
|