From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:35522) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UF31N-0004Xv-CC for qemu-devel@nongnu.org; Mon, 11 Mar 2013 09:44:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UF31I-0003NO-Cf for qemu-devel@nongnu.org; Mon, 11 Mar 2013 09:44:09 -0400 Received: from ssl.dlhnet.de ([91.198.192.8]:36168 helo=ssl.dlh.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UF31I-0003N8-6d for qemu-devel@nongnu.org; Mon, 11 Mar 2013 09:44:04 -0400 Message-ID: <513DDFA3.1020308@dlhnet.de> Date: Mon, 11 Mar 2013 14:44:03 +0100 From: Peter Lieven MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] [RFC] find_next_bit optimizations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "qemu-devel@nongnu.org" Cc: Orit Wasserman , Corentin Chary , Paolo Bonzini Hi, I ever since had a few VMs which are very hard to migrate because of a lot of memory I/O. I found that finding the next dirty bit seemed to be one of the culprits (apart from removing locking which Paolo is working on). I have to following proposal which seems to help a lot in my case. Just wanted to have some feedback. I applied the same unrolling idea like in buffer_is_zero(). Peter --- a/util/bitops.c +++ b/util/bitops.c @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, const unsigned long *p = addr + BITOP_WORD(offset); unsigned long result = offset & ~(BITS_PER_LONG-1); unsigned long tmp; + unsigned long d0,d1,d2,d3; if (offset >= size) { return size; } size -= result; - offset %= BITS_PER_LONG; + offset &= (BITS_PER_LONG-1); if (offset) { tmp = *(p++); tmp &= (~0UL << offset); @@ -43,6 +44,18 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, result += BITS_PER_LONG; } while (size & ~(BITS_PER_LONG-1)) { + while (!(size & (4*BITS_PER_LONG-1))) { + d0 = *p; + d1 = *(p+1); + d2 = *(p+2); + d3 = *(p+3); + if (d0 || d1 || d2 || d3) { + break; + } + p+=4; + result += 4*BITS_PER_LONG; + size -= 4*BITS_PER_LONG; + } if ((tmp = *(p++))) { goto found_middle; }