[Qemu-devel] [RFC][PATCH 7/9] bitops: use vector algorithm to optimize find_next

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC][PATCH 7/9] bitops: use vector algorithm to optimize find_next_bit()
@ 2013-03-12 15:52 Peter Lieven
  2013-03-12 16:04 ` Eric Blake
  0 siblings, 1 reply; 2+ messages in thread
From: Peter Lieven @ 2013-03-12 15:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: Kevin Wolf, Paolo Bonzini, Orit Wasserman, Stefan Hajnoczi

this patch adds the usage of buffer_find_nonzero_offset()
to skip large areas of zeroes.

compared to loop unrolling this adds another 50% performance
benefit for skipping large areas of zeroes.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
  util/bitops.c |   23 ++++++++++++++++++++---
  1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/util/bitops.c b/util/bitops.c
index e72237a..0a056ff 100644
--- a/util/bitops.c
+++ b/util/bitops.c
@@ -42,10 +42,27 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
          size -= BITS_PER_LONG;
          result += BITS_PER_LONG;
      }
-    while (size & ~(BITS_PER_LONG-1)) {
-        if ((tmp = *(p++))) {
-            goto found_middle;
+    while (size >= BITS_PER_LONG) {
+        if ((tmp = *p)) {
+             goto found_middle;
+         }
+        if (((uintptr_t) p) % sizeof(VECTYPE) == 0
+              && size >= BITS_PER_BYTE*8*sizeof(VECTYPE)) {
+          unsigned long tmp2 =
+              buffer_find_nonzero_offset(p, ((size/BITS_PER_BYTE) & ~(8*sizeof(VECTYPE)-1)));
+          result += tmp2 * BITS_PER_BYTE;
+          size -= tmp2 * BITS_PER_BYTE;
+          p += tmp2 / sizeof(unsigned long);
+          if (!size) {
+              return result;
+          }
+          if (tmp2) {
+             if ((tmp = *p)) {
+                 goto found_middle;
+             }
+          }
          }
+        p++;
          result += BITS_PER_LONG;
          size -= BITS_PER_LONG;
      }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 7/9] bitops: use vector algorithm to optimize find_next_bit()
  2013-03-12 15:52 [Qemu-devel] [RFC][PATCH 7/9] bitops: use vector algorithm to optimize find_next_bit() Peter Lieven
@ 2013-03-12 16:04 ` Eric Blake
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Blake @ 2013-03-12 16:04 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-devel@nongnu.org,
	Orit Wasserman

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

On 03/12/2013 09:52 AM, Peter Lieven wrote:
> this patch adds the usage of buffer_find_nonzero_offset()
> to skip large areas of zeroes.
> 
> compared to loop unrolling this adds another 50% performance
> benefit for skipping large areas of zeroes.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  util/bitops.c |   23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)

> +        if (((uintptr_t) p) % sizeof(VECTYPE) == 0
> +              && size >= BITS_PER_BYTE*8*sizeof(VECTYPE)) {

Spaces around binary operators.  CHAR_BITS instead of magic 8.

> +          unsigned long tmp2 =
> +              buffer_find_nonzero_offset(p, ((size/BITS_PER_BYTE) &
> ~(8*sizeof(VECTYPE)-1)));

Spaces around binary operators.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-03-12 16:06 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-12 15:52 [Qemu-devel] [RFC][PATCH 7/9] bitops: use vector algorithm to optimize find_next_bit() Peter Lieven
2013-03-12 16:04 ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).