[Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible
@ 2013-03-12 15:50 Peter Lieven
  2013-03-12 16:01 ` Eric Blake
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Lieven @ 2013-03-12 15:50 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: Kevin Wolf, Paolo Bonzini, Orit Wasserman, Stefan Hajnoczi

performance gain on SSE2 is approx. 20-25%. altivec
is not tested. performance for unsigned long arithmetic
is unchanged.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
  util/cutils.c |    5 +++++
  1 file changed, 5 insertions(+)

diff --git a/util/cutils.c b/util/cutils.c
index a09d8e8..23f0cd6 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len)
       * latency.
       */

+    if (((uintptr_t) buf) % sizeof(VECTYPE) == 0
+          && len % 8*sizeof(VECTYPE) == 0) {
+        return buffer_find_nonzero_offset(buf, len)==len;
+    }
+
      size_t i;
      long d0, d1, d2, d3;
      const long * const data = buf;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible
  2013-03-12 15:50 [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible Peter Lieven
@ 2013-03-12 16:01 ` Eric Blake
  2013-03-12 16:03   ` Peter Lieven
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Blake @ 2013-03-12 16:01 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-devel@nongnu.org,
	Orit Wasserman

[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]

On 03/12/2013 09:50 AM, Peter Lieven wrote:
> performance gain on SSE2 is approx. 20-25%. altivec
> is not tested. performance for unsigned long arithmetic
> is unchanged.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  util/cutils.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/util/cutils.c b/util/cutils.c
> index a09d8e8..23f0cd6 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len)
>       * latency.
>       */
> 
> +    if (((uintptr_t) buf) % sizeof(VECTYPE) == 0
> +          && len % 8*sizeof(VECTYPE) == 0) {

Space around binary operators.  Use CHAR_BITS instead of a magic number
8.  Also, did you mean:

len % (CHAR_BITS * sizeof(VECTYPE))

instead of what you wrote as '(len % 8) * sizeof(VECTYPE)'?

> +        return buffer_find_nonzero_offset(buf, len)==len;
> +    }
> +
>      size_t i;
>      long d0, d1, d2, d3;
>      const long * const data = buf;

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible
  2013-03-12 16:01 ` Eric Blake
@ 2013-03-12 16:03   ` Peter Lieven
  2013-03-12 16:09     ` Eric Blake
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Lieven @ 2013-03-12 16:03 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-devel@nongnu.org,
	Orit Wasserman


Am 12.03.2013 um 17:01 schrieb Eric Blake <eblake@redhat.com>:

> On 03/12/2013 09:50 AM, Peter Lieven wrote:
>> performance gain on SSE2 is approx. 20-25%. altivec
>> is not tested. performance for unsigned long arithmetic
>> is unchanged.
>> 
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> util/cutils.c |    5 +++++
>> 1 file changed, 5 insertions(+)
>> 
>> diff --git a/util/cutils.c b/util/cutils.c
>> index a09d8e8..23f0cd6 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len)
>>      * latency.
>>      */
>> 
>> +    if (((uintptr_t) buf) % sizeof(VECTYPE) == 0
>> +          && len % 8*sizeof(VECTYPE) == 0) {
> 
> Space around binary operators.  Use CHAR_BITS instead of a magic number
> 8.  Also, did you mean:
> 
> len % (CHAR_BITS * sizeof(VECTYPE))
> 
> instead of what you wrote as '(len % 8) * sizeof(VECTYPE)'?

the 8 is not BITS_PER_BYTE or CHAR_BITS its the number of
vectors in one loop in buffer_find_nonzero_offset(). I will add
a constant for this to make it clearer.

Peter

> 
>> +        return buffer_find_nonzero_offset(buf, len)==len;
>> +    }
>> +
>>     size_t i;
>>     long d0, d1, d2, d3;
>>     const long * const data = buf;
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible
  2013-03-12 16:03   ` Peter Lieven
@ 2013-03-12 16:09     ` Eric Blake
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Blake @ 2013-03-12 16:09 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-devel@nongnu.org,
	Orit Wasserman

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]

On 03/12/2013 10:03 AM, Peter Lieven wrote:

>>> +    if (((uintptr_t) buf) % sizeof(VECTYPE) == 0
>>> +          && len % 8*sizeof(VECTYPE) == 0) {
>>
>> Space around binary operators.  Use CHAR_BITS instead of a magic number
>> 8.

> the 8 is not BITS_PER_BYTE or CHAR_BITS its the number of
> vectors in one loop in buffer_find_nonzero_offset(). I will add
> a constant for this to make it clearer.

Indeed, now I see it - 8 is the unroll factor.  Well, all the more
evidence that a named constant makes the code easier to read, compared
to me mis-interpreting the magic number.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-03-12 16:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-12 15:50 [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible Peter Lieven
2013-03-12 16:01 ` Eric Blake
2013-03-12 16:03   ` Peter Lieven
2013-03-12 16:09     ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).