From mboxrd@z Thu Jan 1 00:00:00 1970 From: OHMURA Kei Subject: Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling Date: Fri, 12 Feb 2010 11:03:54 +0900 Message-ID: <4B74B70A.4030805@lab.ntt.co.jp> References: <4B728FF9.6010707@lab.ntt.co.jp> <4B72B28E.6010801@redhat.com> <4B72D706.3070602@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Cc: drepper@redhat.com, mtosatti@redhat.com, Yoshiaki Tamura , ohmura.kei@lab.ntt.co.jp To: Anthony Liguori , Avi Kivity , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" Return-path: Received: from tama500.ecl.ntt.co.jp ([129.60.39.148]:49248 "EHLO tama500.ecl.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753212Ab0BLCEP (ORCPT ); Thu, 11 Feb 2010 21:04:15 -0500 In-Reply-To: <4B72D706.3070602@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: On 02/11/2010 Anthony Liguori wrote: > Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more > sense. Maybe I'm missing something here. I couldn't find leul_to_cpu(), so have defined it in bswap.h. Correct? --- a/bswap.h +++ b/bswap.h @@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #ifdef HOST_WORDS_BIGENDIAN #define cpu_to_32wu cpu_to_be32wu +#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v) #else #define cpu_to_32wu cpu_to_le32wu +#define leul_to_cpu(v) (v) #endif On 02/10/2010 Ulrich Drepper wrote: > If you're optimizing this code you might want to do it all. The > compiler might not see through the bswap call and create unnecessary > data dependencies. Especially problematic if the bitmap is really > sparse. Also, the outer test is != while the inner test is >. Be > consistent. I suggest to replace the inner loop with > > do { > ... > } while (c != 0); > > Depending on how sparse the bitmap is populated this might reduce the > number of data dependencies quite a bit. Combining all comments, the code would be like this. if (bitmap_ul[i] != 0) { c = leul_to_cpu(bitmap_ul[i]); do { j = ffsl(c) - 1; c &= ~(1ul << j); page_number = i * HOST_LONG_BITS + j; addr1 = page_number * TARGET_PAGE_SIZE; addr = offset + addr1; ram_addr = cpu_get_physical_page_desc(addr); cpu_physical_memory_set_dirty(ram_addr); } while (c != 0); }