From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1O8xh0-0000kB-Ox
	for qemu-devel@nongnu.org; Mon, 03 May 2010 11:36:22 -0400
Received: from [140.186.70.92] (port=56450 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1O8xgz-0000jt-E3
	for qemu-devel@nongnu.org; Mon, 03 May 2010 11:36:22 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <tamura.yoshiaki@gmail.com>) id 1O8xgx-0001e6-7F
	for qemu-devel@nongnu.org; Mon, 03 May 2010 11:36:21 -0400
Received: from mail-gy0-f173.google.com ([209.85.160.173]:34116)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <tamura.yoshiaki@gmail.com>) id 1O8xgx-0001e1-3A
	for qemu-devel@nongnu.org; Mon, 03 May 2010 11:36:19 -0400
Received: by gyd5 with SMTP id 5so1192590gyd.4
	for <qemu-devel@nongnu.org>; Mon, 03 May 2010 08:36:18 -0700 (PDT)
MIME-Version: 1.0
Sender: tamura.yoshiaki@gmail.com
In-Reply-To: <4BDEBC09.5020501@linux.vnet.ibm.com>
References: <1271829445-5328-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp>
	<1271829445-5328-6-git-send-email-tamura.yoshiaki@lab.ntt.co.jp>
	<4BD0A35E.8000205@linux.vnet.ibm.com> <4BD11604.3060309@lab.ntt.co.jp>
	<4BD19F12.2020004@linux.vnet.ibm.com> <4BD1A52C.1090406@redhat.com>
	<n2r87e9effc1005030232sae40dd01w95988ed658c065e9@mail.gmail.com>
	<4BDEBC09.5020501@linux.vnet.ibm.com>
Date: Tue, 4 May 2010 00:36:14 +0900
Message-ID: <g2w87e9effc1005030836t243f3a7fxe3c2782b543e052d@mail.gmail.com>
From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: [Qemu-devel] Re: [RFC PATCH 05/20] Introduce put_vector() and
	get_vector to QEMUFile and qemu_fopen_ops().
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Cc: ohmura.kei@lab.ntt.co.jp, kvm@vger.kernel.org, mtosatti@redhat.com, Anthony Liguori <aliguori@us.ibm.com>, qemu-devel@nongnu.org, yoshikawa.takuya@oss.ntt.co.jp, Avi Kivity <avi@redhat.com>

2010/5/3 Anthony Liguori <aliguori@linux.vnet.ibm.com>:
> On 05/03/2010 04:32 AM, Yoshiaki Tamura wrote:
>>
>> 2010/4/23 Avi Kivity<avi@redhat.com>:
>>
>>>
>>> On 04/23/2010 04:22 PM, Anthony Liguori wrote:
>>>
>>>>>
>>>>> I currently don't have data, but I'll prepare it.
>>>>> There were two things I wanted to avoid.
>>>>>
>>>>> 1. Pages to be copied to QEMUFile buf through qemu_put_buffer.
>>>>> 2. Calling write() everytime even when we want to send multiple pages
>>>>> at
>>>>> once.
>>>>>
>>>>> I think 2 may be neglectable.
>>>>> But 1 seems to be problematic if we want make to the latency as small
>>>>> as
>>>>> possible, no?
>>>>>
>>>>
>>>> Copying often has strange CPU characteristics depending on whether the
>>>> data is already in cache. =A0It's better to drive these sort of
>>>> optimizations
>>>> through performance measurement because changes are not always obvious=
.
>>>>
>>>
>>> Copying always introduces more cache pollution, so even if the data is =
in
>>> the cache, it is worthwhile (not disagreeing with the need to measure).
>>>
>>
>> Anthony,
>>
>> I measure how long it takes to send all guest pages during migration, an=
d
>> I
>> would like to share the information in this message. =A0For convenience,
>> I modified
>> the code to do migration not "live migration" which means buffered file =
is
>> not
>> used here.
>>
>> In summary, the performance improvement using writev instead of write/se=
nd
>> when
>> we used GbE seems to be neglectable, however, when the underlying networ=
k
>> was
>> fast (InfiniBand with IPoIB in this case), writev performed 17% faster
>> than
>> write/send, and therefore, it may be worthwhile to introduce vectors.
>>
>> Since QEMU compresses pages, I copied a junk file to tmpfs to dirty page=
s
>> to let
>> QEMU to transfer fine number of pages. =A0After setting up the guest, I =
used
>> cpu_get_real_ticks() to measure the time during the while loop calling
>> ram_save_block() in ram_save_live(). =A0I removed the qemu_file_rate_lim=
it()
>> to
>> disable the function of buffered file, and all of the pages would be
>> transfered
>> at the first round.
>>
>> I measure 10 times for each, and took average and standard deviation.
>> Considering the results, I think the trial number was enough. =A0In addi=
tion
>> to
>> time duration, number of writev/write and number of pages which were
>> compressed
>> (dup)/not compressed (nodup) are demonstrated.
>>
>> Test Environment:
>> CPU: 2x Intel Xeon Dual Core 3GHz
>> Mem size: 6GB
>> Network: GbE, InfiniBand (IPoIB)
>>
>> Host OS: Fedora 11 (kernel 2.6.34-rc1)
>> Guest OS: Fedora 11 (kernel 2.6.33)
>> Guest Mem size: 512MB
>>
>> * GbE writev
>> time (sec): 35.732 (std 0.002)
>> write count: 4 (std 0)
>> writev count: 8269 (std 1)
>> dup count: 36157 (std 124)
>> nodup count: 1016808 (std 147)
>>
>> * GbE write
>> time (sec): 35.780 (std 0.164)
>> write count: 127367 (21)
>> writev count: 0 (std 0)
>> dup count: 36134 (std 108)
>> nodup count: 1016853 (std 165)
>>
>> * IPoIB writev
>> time (sec): 13.889 (std 0.155)
>> write count: 4 (std 0)
>> writev count: 8267 (std 1)
>> dup count: 36147 (std 105)
>> nodup count: 1016838 (std 111)
>>
>> * IPoIB write
>> time (sec): 16.777 (std 0.239)
>> write count: 127364 (24)
>> writev count: 0 (std 0)
>> dup count: 36173 (std 169)
>> nodup count: 1016840 (std 190)
>>
>> Although the improvement wasn't obvious when the network wan GbE,
>> introducing
>> writev may be worthwhile when we focus on faster networks like
>> InfiniBand/10GE.
>>
>> I agree that separating this optimization from the main logic of Kemari
>> since
>> this modification must be done widely and carefully at the same time.
>>
>
> Okay. =A0It looks like it's clear that it's a win so let's split it out o=
f the
> main series and we'll treat it separately. =A0I imagine we'll see even mo=
re
> positive results on 10 gbit and particularly if we move migration out int=
o a
> separate thread.

Great!
I also wanted to test with 10GE but I'm physically away from my office
now, and can't set up the test environment.  I'll measure the numbers
w/ 10GE next week.

BTW, I was thinking to write a patch to separate threads for both
sender and receiver of migration.  Kemari especially needs a separate
thread receiver, so that monitor can accepts commands from other HA
tools.  Is someone already working on this?  If not, I would add it to
my task list :-)

Thanks,

Yoshi

>
> Regards,
>
> Anthony Liguori
>
>> Thanks,
>>
>> Yoshi