From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:34199)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pl@dlhnet.de>) id 1UF7Ki-0006eb-85
	for qemu-devel@nongnu.org; Mon, 11 Mar 2013 14:20:27 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@dlhnet.de>) id 1UF7Kd-0004sm-Oe
	for qemu-devel@nongnu.org; Mon, 11 Mar 2013 14:20:24 -0400
Received: from ssl.dlhnet.de ([91.198.192.8]:38099 helo=ssl.dlh.net)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@dlhnet.de>)
	id 1UF7Kd-0004sb-Ie
	for qemu-devel@nongnu.org; Mon, 11 Mar 2013 14:20:19 -0400
Message-ID: <513E2062.6030102@dlhnet.de>
Date: Mon, 11 Mar 2013 19:20:18 +0100
From: Peter Lieven <pl@dlhnet.de>
MIME-Version: 1.0
References: <513DDFA3.1020308@dlhnet.de> <513DE6D6.9000105@redhat.com>
	<FE8F3AB7-45F9-4B76-B2D4-46179AAD08FE@dlhnet.de>
	<513DEBCF.9050407@redhat.com>
	<E594D4E5-EDA7-4277-9910-D498E9E1E66A@dlhnet.de>
	<513DF854.80003@redhat.com>
	<32589117-17AC-4A82-820F-364CA5EEC23E@dlhnet.de>
	<513DFF33.5080400@redhat.com>
	<CAN05THTp7aPeuN7EJc+E03PKmhYZVvWH9j3qgFu5_c9Yz0nYnQ@mail.gmail.com>
	<513E0F4C.8080202@redhat.com>
In-Reply-To: <513E0F4C.8080202@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] find_next_bit optimizations
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Orit Wasserman <owasserm@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, ronnie sahlberg <ronniesahlberg@gmail.com>, Corentin Chary <corentin.chary@gmail.com>

Am 11.03.2013 18:07, schrieb Paolo Bonzini:
> Il 11/03/2013 18:06, ronnie sahlberg ha scritto:
>> Even more efficient might be to do bitwise instead of logical or
>>
>>>>        if (tmp | d1 | d2 | d3) {
>> that should remove 3 of the 4 conditional jumps
>> and should become 3 bitwise ors and one conditional jump
> 
> Without any serious profiling, please let the compiler do that.

Paolo is right, i ran some tests with gcc 4.6.3 on x86_64 (with -O3) and tried the
various ideas. They all made no significant difference. Even unrolling to 8 unsigned
longs didn't change anything.

What I tried is running 1^20 interations of find_next_bit(bitfield,4194304,0);
I choosed the bitfield to be 4MByte which equals a 16GB VM. The bitfield was
complete zeroed so find_next_bit had to run completely through the bitfield.

The original version took 1 minute and 10 seconds whereas all other took
approx. 37-38 seconds which is almost a 100% boost ;-)

So I think this here is the final version:

    while (size >= 4*BITS_PER_LONG) {
        unsigned long d1, d2, d3;
        tmp = *p;
        d1 = *(p+1);
        d2 = *(p+2);
        d3 = *(p+3);
        if (tmp) {
            goto found_middle;
        }        
        if (d1 || d2 || d3) {
            break;
        }
        p += 4;
        result += 4*BITS_PER_LONG;
        size -= 4*BITS_PER_LONG;
    }

Peter