From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NhzOS-0005De-Ag for qemu-devel@nongnu.org; Thu, 18 Feb 2010 00:57:44 -0500 Received: from [199.232.76.173] (port=42320 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NhzOQ-0005DW-L3 for qemu-devel@nongnu.org; Thu, 18 Feb 2010 00:57:42 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1NhzOO-0006Gw-Nn for qemu-devel@nongnu.org; Thu, 18 Feb 2010 00:57:42 -0500 Received: from tama500.ecl.ntt.co.jp ([129.60.39.148]:53997) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NhzON-0006Gi-Vq for qemu-devel@nongnu.org; Thu, 18 Feb 2010 00:57:40 -0500 Message-ID: <4B7CD6DB.60908@lab.ntt.co.jp> Date: Thu, 18 Feb 2010 14:57:47 +0900 From: OHMURA Kei MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling References: <4B728FF9.6010707@lab.ntt.co.jp> <4B72B28E.6010801@redhat.com> <4B72D706.3070602@codemonkey.ws> <4B74B70A.4030805@lab.ntt.co.jp> <4B77EDC2.7000401@redhat.com> <4B78E5C5.80802@lab.ntt.co.jp> <247526C9-7810-4F4D-AE3D-C1A774FF6FFB@suse.de> <4B7A7E72.6060305@lab.ntt.co.jp> <2AB041C1-C6BC-41DD-B574-308B994C2B2B@suse.de> <4B7BBA1D.2060703@lab.ntt.co.jp> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Graf Cc: ohmura.kei@lab.ntt.co.jp, "kvm@vger.kernel.org" , mtosatti@redhat.com, Yoshiaki Tamura , "qemu-devel@nongnu.org" , Avi Kivity , drepper@redhat.com >>>>> "We think"? I mean - yes, I think so too. But have you actually measured it? >>>>> How much improvement are we talking here? >>>>> Is it still faster when a bswap is involved? >>>> Thanks for pointing out. >>>> I will post the data for x86 later. >>>> However, I don't have a test environment to check the impact of bswap. >>>> Would you please measure the run time between the following section if possible? >>> It'd make more sense to have a real stand alone test program, no? >>> I can try to write one today, but I have some really nasty important bugs to fix first. >> >> OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of >> x86 and ppc. >> >> >> By the way, the following data is a result of x86 measured in QEMU/KVM. >> This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio). > > That does indeed look promising! > > Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly. I measured runtime of the test code with sample data. My test environment and results are described below. x86 Test Environment: CPU: 4x Intel Xeon Quad Core 2.66GHz Mem size: 6GB ppc Test Environment: CPU: 2x Dual Core PPC970MP Mem size: 2GB The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS was live migrating. To measure the runtime I copied cpu_get_real_ticks() of QEMU to my test program. Experimental results: Test1: Guest OS read 3GB file, which is bigger than memory. orig.(msec) patch(msec) ratio x86 0.3 0.1 6.4 ppc 7.9 2.7 3.0 Test2: Guest OS read/write 3GB file, which is bigger than memory. orig.(msec) patch(msec) ratio x86 12.0 3.2 3.7 ppc 251.1 123 2.0 I also measured the runtime of bswap itself on ppc, and I found it was only just 0.3% ~ 0.7 % of the runtime described above.