On 03/15/2013 09:50 AM, Peter Lieven wrote: > this patch adds the usage of buffer_find_nonzero_offset() > to skip large areas of zeroes. > > compared to loop unrolling presented in an earlier > patch this adds another 50% performance benefit for > skipping large areas of zeroes. loop unrolling alone > added close to 100% speedup. > > Signed-off-by: Peter Lieven > --- > util/bitops.c | 26 +++++++++++++++++++++++--- > 1 file changed, 23 insertions(+), 3 deletions(-) > + while (size >= BITS_PER_LONG) { > + if ((tmp = *p)) { > + goto found_middle; > + } > + if (((uintptr_t) p) % sizeof(VECTYPE) == 0 > + && size >= BITS_PER_BYTE * sizeof(VECTYPE) > + * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) { Another instance where a helper function to check for alignment would be nice. Except this time you have a BITS_PER_BYTE factor, so you would be calling something like buffer_can_use_vectors(buf, size / BITS_PER_BYTE) > + unsigned long tmp2 = > + buffer_find_nonzero_offset(p, ((size / BITS_PER_BYTE) & > + ~(BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * > + sizeof(VECTYPE) - 1))); Type mismatch - buffer_find_nonzero_offset returns size_t, which isn't necessarily the same size as unsigned long. I'm not sure if it can bite you. > + result += tmp2 * BITS_PER_BYTE; > + size -= tmp2 * BITS_PER_BYTE; > + p += tmp2 / sizeof(unsigned long); > + if (!size) { > + return result; > + } > + if (tmp2) { Do you really need this condition, or would it suffice to just 'continue;' the loop? Once buffer_find_nonzero_offset returns anything that leaves size as non-zero, we are guaranteed that the loop will goto found_middle without any further calls to buffer_find_nonzero_offset. > + if ((tmp = *p)) { > + goto found_middle; > + } > + } > } > + p++; > result += BITS_PER_LONG; > size -= BITS_PER_LONG; > } > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org