From: Paolo Bonzini <pbonzini@redhat.com>
To: Peter Lieven <pl@dlhnet.de>
Cc: Orit Wasserman <owasserm@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Corentin Chary <corentin.chary@gmail.com>
Subject: Re: [Qemu-devel] [RFC] find_next_bit optimizations
Date: Mon, 11 Mar 2013 16:58:43 +0100 [thread overview]
Message-ID: <513DFF33.5080400@redhat.com> (raw)
In-Reply-To: <32589117-17AC-4A82-820F-364CA5EEC23E@dlhnet.de>
Il 11/03/2013 16:37, Peter Lieven ha scritto:
>
> Am 11.03.2013 um 16:29 schrieb Paolo Bonzini <pbonzini@redhat.com>:
>
>> Il 11/03/2013 16:24, Peter Lieven ha scritto:
>>>
>>>> How would that be different in your patch? But you can solve it by
>>>> making two >= loops, one checking for 4*BITS_PER_LONG and one checking
>>>> BITS_PER_LONG.
>>>
>>> This is what I have now:
>>>
>>> diff --git a/util/bitops.c b/util/bitops.c
>>> index e72237a..b0dc93f 100644
>>> --- a/util/bitops.c
>>> +++ b/util/bitops.c
>>> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>>> const unsigned long *p = addr + BITOP_WORD(offset);
>>> unsigned long result = offset & ~(BITS_PER_LONG-1);
>>> unsigned long tmp;
>>> + unsigned long d0,d1,d2,d3;
>>>
>>> if (offset >= size) {
>>> return size;
>>> }
>>> size -= result;
>>> - offset %= BITS_PER_LONG;
>>> + offset &= (BITS_PER_LONG-1);
>>> if (offset) {
>>> tmp = *(p++);
>>> tmp &= (~0UL << offset);
>>> @@ -42,7 +43,19 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>>> size -= BITS_PER_LONG;
>>> result += BITS_PER_LONG;
>>> }
>>> - while (size & ~(BITS_PER_LONG-1)) {
>>> + while (size >= 4*BITS_PER_LONG) {
>>> + d0 = *p;
>>> + d1 = *(p+1);
>>> + d2 = *(p+2);
>>> + d3 = *(p+3);
>>> + if (d0 || d1 || d2 || d3) {
>>> + break;
>>> + }
>>> + p+=4;
>>> + result += 4*BITS_PER_LONG;
>>> + size -= 4*BITS_PER_LONG;
>>> + }
>>> + while (size >= BITS_PER_LONG) {
>>> if ((tmp = *(p++))) {
>>> goto found_middle;
>>> }
>>>
>>
>> Minus the %= vs. &=,
>>
>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> Perhaps:
>>
>> tmp = *p;
>> d1 = *(p+1);
>> d2 = *(p+2);
>> d3 = *(p+3);
>> if (tmp) {
>> goto found_middle;
>> }
>> if (d1 || d2 || d3) {
>> break;
>> }
>
> i do not know what gcc interally makes of the d0 || d1 || d2 || d3 ?
It depends on the target and how expensive branches are.
> i would guess its sth like one addition w/ carry and 1 test?
It could be either 4 compare-and-jump sequences, or 3 bitwise ORs
followed by a compare-and-jump.
That is, either:
test %r8, %r8
jz second_loop
test %r9, %r9
jz second_loop
test %r10, %r10
jz second_loop
test %r11, %r11
jz second_loop
or
or %r9, %r8
or %r11, %r10
or %r8, %r10
jz second_loop
Don't let the length of the code fool you. The processor knows how to
optimize all of these, and GCC knows too.
> your proposed change would introduce 2 tests (maybe)?
Yes, but I expect they to be fairly well predicted.
> what about this to be sure?
>
> tmp = *p;
> d1 = *(p+1);
> d2 = *(p+2);
> d3 = *(p+3);
> if (tmp || d1 || d2 || d3) {
> if (tmp) {
> goto found_middle;
I suspect that GCC would rewrite it my version (definitely if it
produces 4 compare-and-jumps; but possibly it does it even if it goes
for bitwise ORs, I haven't checked.
Regarding your other question ("one last thought. would it make sense to
update only `size`in the while loops and compute the `result` at the end
as `orgsize` - `size`?"), again the compiler knows better and might even
do this for you. It will likely drop the p increases and use p[result],
so if you do that change you may even get the same code, only this time
p is increased and you get an extra subtraction at the end. :)
Bottom line: don't try to outsmart an optimizing C compiler on
micro-optimization, unless you have benchmarked it and it shows there is
a problem.
Paolo
> }
> break;
> }
>
> Peter
>
next prev parent reply other threads:[~2013-03-11 15:58 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-11 13:44 [Qemu-devel] [RFC] find_next_bit optimizations Peter Lieven
2013-03-11 14:04 ` Peter Maydell
2013-03-11 14:14 ` Paolo Bonzini
2013-03-11 14:22 ` Peter Lieven
2013-03-11 14:29 ` Peter Lieven
2013-03-11 14:35 ` Paolo Bonzini
2013-03-11 15:24 ` Peter Lieven
2013-03-11 15:25 ` Peter Maydell
2013-03-11 15:29 ` Paolo Bonzini
2013-03-11 15:37 ` Peter Lieven
2013-03-11 15:58 ` Paolo Bonzini [this message]
2013-03-11 17:06 ` ronnie sahlberg
2013-03-11 17:07 ` Paolo Bonzini
2013-03-11 18:20 ` Peter Lieven
2013-03-12 7:32 ` [Qemu-devel] [PATCH] bitops: unroll while loop in find_next_bit() Peter Lieven
2013-03-11 15:37 ` [Qemu-devel] [RFC] find_next_bit optimizations Peter Maydell
2013-03-11 15:41 ` Peter Lieven
2013-03-11 15:42 ` Paolo Bonzini
2013-03-11 15:48 ` Peter Lieven
2013-03-12 8:35 ` Stefan Hajnoczi
2013-03-12 8:41 ` Peter Lieven
2013-03-12 15:12 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=513DFF33.5080400@redhat.com \
--to=pbonzini@redhat.com \
--cc=corentin.chary@gmail.com \
--cc=owasserm@redhat.com \
--cc=pl@dlhnet.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).