* [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible
@ 2013-03-12 15:50 Peter Lieven
2013-03-12 16:01 ` Eric Blake
0 siblings, 1 reply; 4+ messages in thread
From: Peter Lieven @ 2013-03-12 15:50 UTC (permalink / raw)
To: qemu-devel@nongnu.org
Cc: Kevin Wolf, Paolo Bonzini, Orit Wasserman, Stefan Hajnoczi
performance gain on SSE2 is approx. 20-25%. altivec
is not tested. performance for unsigned long arithmetic
is unchanged.
Signed-off-by: Peter Lieven <pl@kamp.de>
---
util/cutils.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/util/cutils.c b/util/cutils.c
index a09d8e8..23f0cd6 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len)
* latency.
*/
+ if (((uintptr_t) buf) % sizeof(VECTYPE) == 0
+ && len % 8*sizeof(VECTYPE) == 0) {
+ return buffer_find_nonzero_offset(buf, len)==len;
+ }
+
size_t i;
long d0, d1, d2, d3;
const long * const data = buf;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible
2013-03-12 15:50 [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible Peter Lieven
@ 2013-03-12 16:01 ` Eric Blake
2013-03-12 16:03 ` Peter Lieven
0 siblings, 1 reply; 4+ messages in thread
From: Eric Blake @ 2013-03-12 16:01 UTC (permalink / raw)
To: Peter Lieven
Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-devel@nongnu.org,
Orit Wasserman
[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]
On 03/12/2013 09:50 AM, Peter Lieven wrote:
> performance gain on SSE2 is approx. 20-25%. altivec
> is not tested. performance for unsigned long arithmetic
> is unchanged.
>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> util/cutils.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/util/cutils.c b/util/cutils.c
> index a09d8e8..23f0cd6 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len)
> * latency.
> */
>
> + if (((uintptr_t) buf) % sizeof(VECTYPE) == 0
> + && len % 8*sizeof(VECTYPE) == 0) {
Space around binary operators. Use CHAR_BITS instead of a magic number
8. Also, did you mean:
len % (CHAR_BITS * sizeof(VECTYPE))
instead of what you wrote as '(len % 8) * sizeof(VECTYPE)'?
> + return buffer_find_nonzero_offset(buf, len)==len;
> + }
> +
> size_t i;
> long d0, d1, d2, d3;
> const long * const data = buf;
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible
2013-03-12 16:01 ` Eric Blake
@ 2013-03-12 16:03 ` Peter Lieven
2013-03-12 16:09 ` Eric Blake
0 siblings, 1 reply; 4+ messages in thread
From: Peter Lieven @ 2013-03-12 16:03 UTC (permalink / raw)
To: Eric Blake
Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-devel@nongnu.org,
Orit Wasserman
Am 12.03.2013 um 17:01 schrieb Eric Blake <eblake@redhat.com>:
> On 03/12/2013 09:50 AM, Peter Lieven wrote:
>> performance gain on SSE2 is approx. 20-25%. altivec
>> is not tested. performance for unsigned long arithmetic
>> is unchanged.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> util/cutils.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/util/cutils.c b/util/cutils.c
>> index a09d8e8..23f0cd6 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len)
>> * latency.
>> */
>>
>> + if (((uintptr_t) buf) % sizeof(VECTYPE) == 0
>> + && len % 8*sizeof(VECTYPE) == 0) {
>
> Space around binary operators. Use CHAR_BITS instead of a magic number
> 8. Also, did you mean:
>
> len % (CHAR_BITS * sizeof(VECTYPE))
>
> instead of what you wrote as '(len % 8) * sizeof(VECTYPE)'?
the 8 is not BITS_PER_BYTE or CHAR_BITS its the number of
vectors in one loop in buffer_find_nonzero_offset(). I will add
a constant for this to make it clearer.
Peter
>
>> + return buffer_find_nonzero_offset(buf, len)==len;
>> + }
>> +
>> size_t i;
>> long d0, d1, d2, d3;
>> const long * const data = buf;
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible
2013-03-12 16:03 ` Peter Lieven
@ 2013-03-12 16:09 ` Eric Blake
0 siblings, 0 replies; 4+ messages in thread
From: Eric Blake @ 2013-03-12 16:09 UTC (permalink / raw)
To: Peter Lieven
Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-devel@nongnu.org,
Orit Wasserman
[-- Attachment #1: Type: text/plain, Size: 708 bytes --]
On 03/12/2013 10:03 AM, Peter Lieven wrote:
>>> + if (((uintptr_t) buf) % sizeof(VECTYPE) == 0
>>> + && len % 8*sizeof(VECTYPE) == 0) {
>>
>> Space around binary operators. Use CHAR_BITS instead of a magic number
>> 8.
> the 8 is not BITS_PER_BYTE or CHAR_BITS its the number of
> vectors in one loop in buffer_find_nonzero_offset(). I will add
> a constant for this to make it clearer.
Indeed, now I see it - 8 is the unroll factor. Well, all the more
evidence that a named constant makes the code easier to read, compared
to me mis-interpreting the magic number.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-03-12 16:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-12 15:50 [Qemu-devel] [RFC][PATCH 4/9] buffer_is_zero: use vector optimizations if possible Peter Lieven
2013-03-12 16:01 ` Eric Blake
2013-03-12 16:03 ` Peter Lieven
2013-03-12 16:09 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).