From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38717)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lancelotds@163.com>) id 1VJZtJ-0003oB-Qh
	for qemu-devel@nongnu.org; Tue, 10 Sep 2013 22:11:00 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <lancelotds@163.com>) id 1VJZtB-0005Ij-LD
	for qemu-devel@nongnu.org; Tue, 10 Sep 2013 22:10:49 -0400
Received: from m12-16.163.com ([220.181.12.16]:60116)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lancelotds@163.com>) id 1VJZt9-0005Ct-DF
	for qemu-devel@nongnu.org; Tue, 10 Sep 2013 22:10:41 -0400
Date: Wed, 11 Sep 2013 09:54:47 +0800 (CST)
From: junqing.wang@cs2c.com.cn
Sender: lancelotds@163.com
In-Reply-To: <522F1045.2000705@redhat.com>
References: <1378784607-7398-1-git-send-email-junqing.wang@cs2c.com.cn>
	<522F1045.2000705@redhat.com>
Content-Type: multipart/alternative;
	boundary="----=_Part_41423_443349766.1378864487959"
MIME-Version: 1.0
Message-ID: <4bf58665.2cac.1410aba5e17.Coremail.junqing.wang@cs2c.com.cn>
Subject: Re: [Qemu-devel] [PATCH RFC 0/4] Curling: KVM Fault Tolerance
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Orit Wasserman <owasserm@redhat.com>
Cc: qemu-devel@nongnu.org

------=_Part_41423_443349766.1378864487959
Content-Type: text/plain; charset=GBK
Content-Transfer-Encoding: 7bit

Hi,

>The first is that if the VM failure happen in the middle on the live migration  >the backup VM state will be inconsistent which means you can't failover to it.

Yes, I have concerned about this problem. That is why we need a prefetch buffer.

>Solving it is not simple as you need some transaction mechanism that will >change the backup VM state only when the transaction completes (the live migration completes). >Kemari has something like that. >

The backup VM state will be loaded only when the one whole migration data is prefetched. Otherwise, VM state will not be loaded. So the backup VM is ensured to have a consistent state like a checkpoint.
However, how close this checkpoint to the point of the VM failure depends on the workload and bandwidth.

>The second is that sadly live migration doesn't always converge this means  >that the backup VM won't have a consist state to failover to. >You need to detect such a case and throttle down the guest to force convergence.

Yes, that's a problem. AFAK, qemu already have an auto convergence feature.
>>From another perspective,  if many migrations could not converge, maybe the workload is high and the bandwidth is low,  and it is not recommended to use FT in general.


------=_Part_41423_443349766.1378864487959
Content-Type: text/html; charset=GBK
Content-Transfer-Encoding: 7bit

<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial">Hi,<br><br>&gt;The&nbsp;first&nbsp;is&nbsp;that&nbsp;if&nbsp;the&nbsp;VM&nbsp;failure&nbsp;happen&nbsp;in&nbsp;the&nbsp;middle&nbsp;on&nbsp;the&nbsp;live&nbsp;migration&nbsp;
&gt;the&nbsp;backup&nbsp;VM&nbsp;state&nbsp;will&nbsp;be&nbsp;inconsistent&nbsp;which&nbsp;means&nbsp;you&nbsp;can't&nbsp;failover&nbsp;to&nbsp;it.
<br><br>Yes, I have concerned about this problem. That is why we need a prefetch buffer.<br><br>&gt;Solving&nbsp;it&nbsp;is&nbsp;not&nbsp;simple&nbsp;as&nbsp;you&nbsp;need&nbsp;some&nbsp;transaction&nbsp;mechanism&nbsp;that&nbsp;will
&gt;change&nbsp;the&nbsp;backup&nbsp;VM&nbsp;state&nbsp;only&nbsp;when&nbsp;the&nbsp;transaction&nbsp;completes&nbsp;(the&nbsp;live&nbsp;migration&nbsp;completes).
&gt;Kemari&nbsp;has&nbsp;something&nbsp;like&nbsp;that.
&gt;
<br><br>The backup VM state will be loaded only when the one whole migration data is prefetched. Otherwise, VM state will not be loaded. So the backup VM is ensured to have a consistent state like a checkpoint.<br>However, how close this checkpoint to the point of the VM failure depends on the workload and bandwidth.<br><br>&gt;The&nbsp;second&nbsp;is&nbsp;that&nbsp;sadly&nbsp;live&nbsp;migration&nbsp;doesn't&nbsp;always&nbsp;converge&nbsp;this&nbsp;means&nbsp;
&gt;that&nbsp;the&nbsp;backup&nbsp;VM&nbsp;won't&nbsp;have&nbsp;a&nbsp;consist&nbsp;state&nbsp;to&nbsp;failover&nbsp;to.
&gt;You&nbsp;need&nbsp;to&nbsp;detect&nbsp;such&nbsp;a&nbsp;case&nbsp;and&nbsp;throttle&nbsp;down&nbsp;the&nbsp;guest&nbsp;to&nbsp;force&nbsp;convergence. <br><br>Yes, that's a problem. AFAK, qemu already have an auto convergence feature.<br>From another perspective,&nbsp; if many migrations could not converge, maybe the workload is high and the bandwidth is low,&nbsp; and it is not recommended to use FT in general.<br><br></div><br><br><span title="neteasefooter"><span id="netease_mail_footer"></span></span>
------=_Part_41423_443349766.1378864487959--