From: Paolo Bonzini <pbonzini@redhat.com>
To: Peter Lieven <pl@dlhnet.de>
Cc: Orit Wasserman <owasserm@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Corentin Chary <corentin.chary@gmail.com>
Subject: Re: [Qemu-devel] [RFC] find_next_bit optimizations
Date: Mon, 11 Mar 2013 16:58:43 +0100 [thread overview]
Message-ID: <513DFF33.5080400@redhat.com> (raw)
In-Reply-To: <32589117-17AC-4A82-820F-364CA5EEC23E@dlhnet.de>
Il 11/03/2013 16:37, Peter Lieven ha scritto:
>
> Am 11.03.2013 um 16:29 schrieb Paolo Bonzini <pbonzini@redhat.com>:
>
>> Il 11/03/2013 16:24, Peter Lieven ha scritto:
>>>
>>>> How would that be different in your patch? But you can solve it by
>>>> making two >= loops, one checking for 4*BITS_PER_LONG and one checking
>>>> BITS_PER_LONG.
>>>
>>> This is what I have now:
>>>
>>> diff --git a/util/bitops.c b/util/bitops.c
>>> index e72237a..b0dc93f 100644
>>> --- a/util/bitops.c
>>> +++ b/util/bitops.c
>>> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>>> const unsigned long *p = addr + BITOP_WORD(offset);
>>> unsigned long result = offset & ~(BITS_PER_LONG-1);
>>> unsigned long tmp;
>>> + unsigned long d0,d1,d2,d3;
>>>
>>> if (offset >= size) {
>>> return size;
>>> }
>>> size -= result;
>>> - offset %= BITS_PER_LONG;
>>> + offset &= (BITS_PER_LONG-1);
>>> if (offset) {
>>> tmp = *(p++);
>>> tmp &= (~0UL << offset);
>>> @@ -42,7 +43,19 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>>> size -= BITS_PER_LONG;
>>> result += BITS_PER_LONG;
>>> }
>>> - while (size & ~(BITS_PER_LONG-1)) {
>>> + while (size >= 4*BITS_PER_LONG) {
>>> + d0 = *p;
>>> + d1 = *(p+1);
>>> + d2 = *(p+2);
>>> + d3 = *(p+3);
>>> + if (d0 || d1 || d2 || d3) {
>>> + break;
>>> + }
>>> + p+=4;
>>> + result += 4*BITS_PER_LONG;
>>> + size -= 4*BITS_PER_LONG;
>>> + }
>>> + while (size >= BITS_PER_LONG) {
>>> if ((tmp = *(p++))) {
>>> goto found_middle;
>>> }
>>>
>>
>> Minus the %= vs. &=,
>>
>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> Perhaps:
>>
>> tmp = *p;
>> d1 = *(p+1);
>> d2 = *(p+2);
>> d3 = *(p+3);
>> if (tmp) {
>> goto found_middle;
>> }
>> if (d1 || d2 || d3) {
>> break;
>> }
>
> i do not know what gcc interally makes of the d0 || d1 || d2 || d3 ?
It depends on the target and how expensive branches are.
> i would guess its sth like one addition w/ carry and 1 test?
It could be either 4 compare-and-jump sequences, or 3 bitwise ORs
followed by a compare-and-jump.
That is, either:
test %r8, %r8
jz second_loop
test %r9, %r9
jz second_loop
test %r10, %r10
jz second_loop
test %r11, %r11
jz second_loop
or
or %r9, %r8
or %r11, %r10
or %r8, %r10
jz second_loop
Don't let the length of the code fool you. The processor knows how to
optimize all of these, and GCC knows too.
> your proposed change would introduce 2 tests (maybe)?
Yes, but I expect they to be fairly well predicted.
> what about this to be sure?
>
> tmp = *p;
> d1 = *(p+1);
> d2 = *(p+2);
> d3 = *(p+3);
> if (tmp || d1 || d2 || d3) {
> if (tmp) {
> goto found_middle;
I suspect that GCC would rewrite it my version (definitely if it
produces 4 compare-and-jumps; but possibly it does it even if it goes
for bitwise ORs, I haven't checked.
Regarding your other question ("one last thought. would it make sense to
update only `size`in the while loops and compute the `result` at the end
as `orgsize` - `size`?"), again the compiler knows better and might even
do this for you. It will likely drop the p increases and use p[result],
so if you do that change you may even get the same code, only this time
p is increased and you get an extra subtraction at the end. :)
Bottom line: don't try to outsmart an optimizing C compiler on
micro-optimization, unless you have benchmarked it and it shows there is
a problem.
Paolo
> }
> break;
> }
>
> Peter
>
next prev parent reply other threads:[~2013-03-11 15:58 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-11 13:44 [Qemu-devel] [RFC] find_next_bit optimizations Peter Lieven
2013-03-11 14:04 ` Peter Maydell
2013-03-11 14:14 ` Paolo Bonzini
2013-03-11 14:22 ` Peter Lieven
2013-03-11 14:29 ` Peter Lieven
2013-03-11 14:35 ` Paolo Bonzini
2013-03-11 15:24 ` Peter Lieven
2013-03-11 15:25 ` Peter Maydell
2013-03-11 15:29 ` Paolo Bonzini
2013-03-11 15:37 ` Peter Lieven
2013-03-11 15:58 ` Paolo Bonzini [this message]
2013-03-11 17:06 ` ronnie sahlberg
2013-03-11 17:07 ` Paolo Bonzini
2013-03-11 18:20 ` Peter Lieven
2013-03-12 7:32 ` [Qemu-devel] [PATCH] bitops: unroll while loop in find_next_bit() Peter Lieven
2013-03-11 15:37 ` [Qemu-devel] [RFC] find_next_bit optimizations Peter Maydell
2013-03-11 15:41 ` Peter Lieven
2013-03-11 15:42 ` Paolo Bonzini
2013-03-11 15:48 ` Peter Lieven
2013-03-12 8:35 ` Stefan Hajnoczi
2013-03-12 8:41 ` Peter Lieven
2013-03-12 15:12 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=513DFF33.5080400@redhat.com \
--to=pbonzini@redhat.com \
--cc=corentin.chary@gmail.com \
--cc=owasserm@redhat.com \
--cc=pl@dlhnet.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.