In Section 4.2, we presented a published software pipelining example and assert that it is erroneous. We were wrong; the example is actually correct. The problem is that our tool is for the C62x family of fixed-point DSPs, where multiplies take 2 cycles, whereas the MPYSP instruction used in the code is actually a floating-point multiply for the C67x family, which takes 4 cycles. With a 4 cycle latency for the multiply, the code is correct (and our tool agrees if we specify 4 cycle latency for the MPYSP instruction). As a feeble excuse for our mistake, we note that the article by Oshana from which we took the example starts by saying the code examples are for the C62x family, so the article still isn't totally perfect, but the error is a trivial nomenclature nitpick, not a serious code-scheduling bug. Plus, we should have noticed the use of floating-point instructions in code ostensibly for a fixed-point DSP. Many thanks to Brett Huber of Texas Instruments for not only spotting our error, but also correctly figuring out exactly how we went wrong.