From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NhzOS-0005De-Ag
	for qemu-devel@nongnu.org; Thu, 18 Feb 2010 00:57:44 -0500
Received: from [199.232.76.173] (port=42320 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NhzOQ-0005DW-L3
	for qemu-devel@nongnu.org; Thu, 18 Feb 2010 00:57:42 -0500
Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim
	4.60) (envelope-from <ohmura.kei@lab.ntt.co.jp>) id 1NhzOO-0006Gw-Nn
	for qemu-devel@nongnu.org; Thu, 18 Feb 2010 00:57:42 -0500
Received: from tama500.ecl.ntt.co.jp ([129.60.39.148]:53997)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <ohmura.kei@lab.ntt.co.jp>) id 1NhzON-0006Gi-Vq
	for qemu-devel@nongnu.org; Thu, 18 Feb 2010 00:57:40 -0500
Message-ID: <4B7CD6DB.60908@lab.ntt.co.jp>
Date: Thu, 18 Feb 2010 14:57:47 +0900
From: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the
	dirty-bitmap-traveling
References: <4B728FF9.6010707@lab.ntt.co.jp> <4B72B28E.6010801@redhat.com>
	<4B72D706.3070602@codemonkey.ws>
	<4B74B70A.4030805@lab.ntt.co.jp> <4B77EDC2.7000401@redhat.com>
	<4B78E5C5.80802@lab.ntt.co.jp>
	<247526C9-7810-4F4D-AE3D-C1A774FF6FFB@suse.de>
	<4B7A7E72.6060305@lab.ntt.co.jp>
	<2AB041C1-C6BC-41DD-B574-308B994C2B2B@suse.de>
	<4B7BBA1D.2060703@lab.ntt.co.jp>
	<FE3E6123-9040-40E4-8D38-63E1532B9843@suse.de>
In-Reply-To: <FE3E6123-9040-40E4-8D38-63E1532B9843@suse.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexander Graf <agraf@suse.de>
Cc: ohmura.kei@lab.ntt.co.jp, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, mtosatti@redhat.com, Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>, drepper@redhat.com

>>>>> "We think"? I mean - yes, I think so too. But have you actually measured it?
>>>>> How much improvement are we talking here?
>>>>> Is it still faster when a bswap is involved?
>>>> Thanks for pointing out.
>>>> I will post the data for x86 later.
>>>> However, I don't have a test environment to check the impact of bswap.
>>>> Would you please measure the run time between the following section if possible?
>>> It'd make more sense to have a real stand alone test program, no?
>>> I can try to write one today, but I have some really nasty important bugs to fix first.
>>
>> OK.  I will prepare a test code with sample data.  Since I found a ppc machine around, I will run the code and post the results of
>> x86 and ppc.
>>
>>
>> By the way, the following data is a result of x86 measured in QEMU/KVM.  
>> This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio).
> 
> That does indeed look promising!
> 
> Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly.


I measured runtime of the test code with sample data.  My test environment 
and results are described below.

x86 Test Environment:
CPU: 4x Intel Xeon Quad Core 2.66GHz
Mem size: 6GB

ppc Test Environment:
CPU: 2x Dual Core PPC970MP
Mem size: 2GB

The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS
was live migrating.  To measure the runtime I copied cpu_get_real_ticks() of
QEMU to my test program.


Experimental results:
Test1: Guest OS read 3GB file, which is bigger than memory. 
       orig.(msec)    patch(msec)    ratio
x86    0.3            0.1            6.4 
ppc    7.9            2.7            3.0 

Test2: Guest OS read/write 3GB file, which is bigger than memory. 
       orig.(msec)    patch(msec)    ratio
x86    12.0           3.2            3.7 
ppc    251.1          123            2.0 


I also measured the runtime of bswap itself on ppc, and I found it was only 
just 0.3% ~ 0.7 % of the runtime described above.