From mboxrd@z Thu Jan 1 00:00:00 1970 From: OHMURA Kei Subject: Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling Date: Thu, 18 Feb 2010 14:57:47 +0900 Message-ID: <4B7CD6DB.60908@lab.ntt.co.jp> References: <4B728FF9.6010707@lab.ntt.co.jp> <4B72B28E.6010801@redhat.com> <4B72D706.3070602@codemonkey.ws> <4B74B70A.4030805@lab.ntt.co.jp> <4B77EDC2.7000401@redhat.com> <4B78E5C5.80802@lab.ntt.co.jp> <247526C9-7810-4F4D-AE3D-C1A774FF6FFB@suse.de> <4B7A7E72.6060305@lab.ntt.co.jp> <2AB041C1-C6BC-41DD-B574-308B994C2B2B@suse.de> <4B7BBA1D.2060703@lab.ntt.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Avi Kivity , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" , mtosatti@redhat.com, drepper@redhat.com, Yoshiaki Tamura , ohmura.kei@lab.ntt.co.jp To: Alexander Graf Return-path: Received: from tama500.ecl.ntt.co.jp ([129.60.39.148]:54007 "EHLO tama500.ecl.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751371Ab0BRF6J (ORCPT ); Thu, 18 Feb 2010 00:58:09 -0500 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: >>>>> "We think"? I mean - yes, I think so too. But have you actually measured it? >>>>> How much improvement are we talking here? >>>>> Is it still faster when a bswap is involved? >>>> Thanks for pointing out. >>>> I will post the data for x86 later. >>>> However, I don't have a test environment to check the impact of bswap. >>>> Would you please measure the run time between the following section if possible? >>> It'd make more sense to have a real stand alone test program, no? >>> I can try to write one today, but I have some really nasty important bugs to fix first. >> >> OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of >> x86 and ppc. >> >> >> By the way, the following data is a result of x86 measured in QEMU/KVM. >> This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio). > > That does indeed look promising! > > Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly. I measured runtime of the test code with sample data. My test environment and results are described below. x86 Test Environment: CPU: 4x Intel Xeon Quad Core 2.66GHz Mem size: 6GB ppc Test Environment: CPU: 2x Dual Core PPC970MP Mem size: 2GB The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS was live migrating. To measure the runtime I copied cpu_get_real_ticks() of QEMU to my test program. Experimental results: Test1: Guest OS read 3GB file, which is bigger than memory. orig.(msec) patch(msec) ratio x86 0.3 0.1 6.4 ppc 7.9 2.7 3.0 Test2: Guest OS read/write 3GB file, which is bigger than memory. orig.(msec) patch(msec) ratio x86 12.0 3.2 3.7 ppc 251.1 123 2.0 I also measured the runtime of bswap itself on ppc, and I found it was only just 0.3% ~ 0.7 % of the runtime described above.