On 07/26/2009 04:45 PM, Aurelien Jarno wrote: > Knowing that $31 could be used for prefetch, I have modified the > assembly code from memchr.S to use it. It passes all the testsuite. > This isn't intended to be a prefetch instruction, it's meant to be fetching the data for the next word. I.e. we've unrolled the loop and there's at least 8 bytes left in the search. Note the # At least two quads remain to be accessed. comment. At that point we're supposed to be more than 16 bytes away from the end of the input buffer. Actually, the confusion I see is farther upthread: > > >>>>> The problem is that the memchr() function on alpha uses > prefetch, which > > >>>>> can cause a page boundary to be crossed, while the standards > (POSIX and > > >>>>> C99) says it should stop when a match is found. I didn't realize this when I wrote the function. The entire function should be rewritten, since there's little point in using a prefetch instruction that close to the load. Prefetch instructions should be used to move data into the L1 cache, not hide the 3 cycle load delay. Thus a prefetch, if used, should be several cache lines ahead, not just a single word. Perhaps a better solution would be to read words until we get cacheline aligned, then read the entire line into 8 registers, prefetch the next line, then process each register one by one. Try this. r~