From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34347)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lilei@linux.vnet.ibm.com>) id 1VjPtv-0003vR-Jl
	for qemu-devel@nongnu.org; Thu, 21 Nov 2013 03:46:23 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <lilei@linux.vnet.ibm.com>) id 1VjPtn-0004XI-LH
	for qemu-devel@nongnu.org; Thu, 21 Nov 2013 03:46:15 -0500
Received: from e28smtp05.in.ibm.com ([122.248.162.5]:43754)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lilei@linux.vnet.ibm.com>) id 1VjPtm-0004Ws-UP
	for qemu-devel@nongnu.org; Thu, 21 Nov 2013 03:46:07 -0500
Received: from /spool/local
	by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <lilei@linux.vnet.ibm.com>;
	Thu, 21 Nov 2013 14:16:03 +0530
Received: from d28relay04.in.ibm.com (d28relay04.in.ibm.com [9.184.220.61])
	by d28dlp03.in.ibm.com (Postfix) with ESMTP id BA14A1258051
	for <qemu-devel@nongnu.org>; Thu, 21 Nov 2013 14:16:54 +0530 (IST)
Received: from d28av02.in.ibm.com (d28av02.in.ibm.com [9.184.220.64])
	by d28relay04.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	rAL8juRN45154342
	for <qemu-devel@nongnu.org>; Thu, 21 Nov 2013 14:15:56 +0530
Received: from d28av02.in.ibm.com (localhost [127.0.0.1])
	by d28av02.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id
	rAL8jxTc012835
	for <qemu-devel@nongnu.org>; Thu, 21 Nov 2013 14:15:59 +0530
Message-ID: <528DC844.7020100@linux.vnet.ibm.com>
Date: Thu, 21 Nov 2013 16:45:56 +0800
From: Lei Li <lilei@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <1382412341-1173-1-git-send-email-lilei@linux.vnet.ibm.com>
	<52692C10.3080604@redhat.com> <526A0870.3020401@linux.vnet.ibm.com>
	<526A1DF8.2040406@redhat.com> <526A62E7.9080201@linux.vnet.ibm.com>
In-Reply-To: <526A62E7.9080201@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 0/17 v2] Localhost migration with side
 channel for ram
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: aarcange@redhat.com, quintela@redhat.com, mdroth@linux.vnet.ibm.com, mrhines@linux.vnet.ibm.com, qemu-devel@nongnu.org, Anthony Liguori <aliguori@amazon.com>, lagarcia@br.ibm.com, rcj@linux.vnet.ibm.com

On 10/25/2013 08:24 PM, Lei Li wrote:
> On 10/25/2013 03:30 PM, Paolo Bonzini wrote:
>> Il 25/10/2013 06:58, Lei Li ha scritto:
>>> Right now just has inaccurate numbers without the new vmsplice, which
>>> based on
>>> the result from info migrate, as the guest ram size increases, 
>>> although the
>>> 'total time' is number of times less compared with the current live
>>> migration, but the 'downtime' performs badly.
>> Of course.
>>> For a 1GB ram guest,
>>>
>>> total time: 702 milliseconds
>>> downtime: 692 milliseconds
>>>
>>> And when the ram size of guest increasesexponentially, those numbers 
>>> are
>>> proportional to it.
>>>   I will make a list of the performance with the new vmsplice later, 
>>> I am
>>> sure it'd be much better than this at least.
>> Yes, please.  Is the memory usage is still 2x without vmsplice?
>>
>> I think you have a nice proof of concept, but on the other hand this
>> probably needs to be coupled with some kind of postcopy live migration,
>> that is:
>>
>> * the source starts sending data
>>
>> * but the destination starts running immediately
>>
>> * if the machine needs a page that is missing, the destination asks the
>> source to send it
>>
>> * as soon as it arrives, the destination can restart
>>
>> Using postcopy is problematic for reliability: if the destination fails,
>> the virtual machine is lost because the source doesn't have the latest
>> content of memory.  However, this is a much, much smaller problem for
>> live QEMU upgrade where the network cannot fail.
>>
>> If you do this, you can achieve pretty much instantaneous live upgrade,
>> well within your original 200 ms goals.  But the flipping code with
>> vmsplice should be needed anyway to avoid doubling memory usage, and
>
> Yes, I have read the postcopy migration patches, it does perform very
> good on downtime, as just send the vmstates then switch the execution
> to destination host. And as you pointed out, it can not avoid
> doubling memory usage.
>
> The numbers list above are based on the old vmsplice as I have not yet
> worked on the benchmark for performance, it actually copys data rather
> than moving. As the feedback for this version is positive, now I am
> trying to get a real result out with the new vmsplice.
>
> BTW, kernel side is looking for huge page solution for the improvement of
> performance.
>
> The recently patches from kernel as link,
>
> http://article.gmane.org/gmane.linux.kernel/1574277

Hi Paolo,

I have been working on the benchmark of the performance, I am afraid that it may take
a bit more time as there has some problems on the new vmsplice which kernel side is
working on right now.

I will post a v3 of the series with your comments in previous version fixed soon.

>
>> it's looking pretty good in this version already!  I'm relieved that the
>> RDMA code was designed right!
>
> I am happy with it too. :)
> Those RDMA hooks really make thingsmore flexible!
>
>
>> Paolo
>>
>
>


-- 
Lei