From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:34199) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UF7Ki-0006eb-85 for qemu-devel@nongnu.org; Mon, 11 Mar 2013 14:20:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UF7Kd-0004sm-Oe for qemu-devel@nongnu.org; Mon, 11 Mar 2013 14:20:24 -0400 Received: from ssl.dlhnet.de ([91.198.192.8]:38099 helo=ssl.dlh.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UF7Kd-0004sb-Ie for qemu-devel@nongnu.org; Mon, 11 Mar 2013 14:20:19 -0400 Message-ID: <513E2062.6030102@dlhnet.de> Date: Mon, 11 Mar 2013 19:20:18 +0100 From: Peter Lieven MIME-Version: 1.0 References: <513DDFA3.1020308@dlhnet.de> <513DE6D6.9000105@redhat.com> <513DEBCF.9050407@redhat.com> <513DF854.80003@redhat.com> <32589117-17AC-4A82-820F-364CA5EEC23E@dlhnet.de> <513DFF33.5080400@redhat.com> <513E0F4C.8080202@redhat.com> In-Reply-To: <513E0F4C.8080202@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] find_next_bit optimizations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Orit Wasserman , "qemu-devel@nongnu.org" , ronnie sahlberg , Corentin Chary Am 11.03.2013 18:07, schrieb Paolo Bonzini: > Il 11/03/2013 18:06, ronnie sahlberg ha scritto: >> Even more efficient might be to do bitwise instead of logical or >> >>>> if (tmp | d1 | d2 | d3) { >> that should remove 3 of the 4 conditional jumps >> and should become 3 bitwise ors and one conditional jump > > Without any serious profiling, please let the compiler do that. Paolo is right, i ran some tests with gcc 4.6.3 on x86_64 (with -O3) and tried the various ideas. They all made no significant difference. Even unrolling to 8 unsigned longs didn't change anything. What I tried is running 1^20 interations of find_next_bit(bitfield,4194304,0); I choosed the bitfield to be 4MByte which equals a 16GB VM. The bitfield was complete zeroed so find_next_bit had to run completely through the bitfield. The original version took 1 minute and 10 seconds whereas all other took approx. 37-38 seconds which is almost a 100% boost ;-) So I think this here is the final version: while (size >= 4*BITS_PER_LONG) { unsigned long d1, d2, d3; tmp = *p; d1 = *(p+1); d2 = *(p+2); d3 = *(p+3); if (tmp) { goto found_middle; } if (d1 || d2 || d3) { break; } p += 4; result += 4*BITS_PER_LONG; size -= 4*BITS_PER_LONG; } Peter