From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38717) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VJZtJ-0003oB-Qh for qemu-devel@nongnu.org; Tue, 10 Sep 2013 22:11:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VJZtB-0005Ij-LD for qemu-devel@nongnu.org; Tue, 10 Sep 2013 22:10:49 -0400 Received: from m12-16.163.com ([220.181.12.16]:60116) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VJZt9-0005Ct-DF for qemu-devel@nongnu.org; Tue, 10 Sep 2013 22:10:41 -0400 Date: Wed, 11 Sep 2013 09:54:47 +0800 (CST) From: junqing.wang@cs2c.com.cn Sender: lancelotds@163.com In-Reply-To: <522F1045.2000705@redhat.com> References: <1378784607-7398-1-git-send-email-junqing.wang@cs2c.com.cn> <522F1045.2000705@redhat.com> Content-Type: multipart/alternative; boundary="----=_Part_41423_443349766.1378864487959" MIME-Version: 1.0 Message-ID: <4bf58665.2cac.1410aba5e17.Coremail.junqing.wang@cs2c.com.cn> Subject: Re: [Qemu-devel] [PATCH RFC 0/4] Curling: KVM Fault Tolerance List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Orit Wasserman Cc: qemu-devel@nongnu.org ------=_Part_41423_443349766.1378864487959 Content-Type: text/plain; charset=GBK Content-Transfer-Encoding: 7bit Hi, >The first is that if the VM failure happen in the middle on the live migration >the backup VM state will be inconsistent which means you can't failover to it. Yes, I have concerned about this problem. That is why we need a prefetch buffer. >Solving it is not simple as you need some transaction mechanism that will >change the backup VM state only when the transaction completes (the live migration completes). >Kemari has something like that. > The backup VM state will be loaded only when the one whole migration data is prefetched. Otherwise, VM state will not be loaded. So the backup VM is ensured to have a consistent state like a checkpoint. However, how close this checkpoint to the point of the VM failure depends on the workload and bandwidth. >The second is that sadly live migration doesn't always converge this means >that the backup VM won't have a consist state to failover to. >You need to detect such a case and throttle down the guest to force convergence. Yes, that's a problem. AFAK, qemu already have an auto convergence feature. >>From another perspective, if many migrations could not converge, maybe the workload is high and the bandwidth is low, and it is not recommended to use FT in general. ------=_Part_41423_443349766.1378864487959 Content-Type: text/html; charset=GBK Content-Transfer-Encoding: 7bit
Hi,

>The first is that if the VM failure happen in the middle on the live migration  >the backup VM state will be inconsistent which means you can't failover to it.

Yes, I have concerned about this problem. That is why we need a prefetch buffer.

>Solving it is not simple as you need some transaction mechanism that will >change the backup VM state only when the transaction completes (the live migration completes). >Kemari has something like that. >

The backup VM state will be loaded only when the one whole migration data is prefetched. Otherwise, VM state will not be loaded. So the backup VM is ensured to have a consistent state like a checkpoint.
However, how close this checkpoint to the point of the VM failure depends on the workload and bandwidth.

>The second is that sadly live migration doesn't always converge this means  >that the backup VM won't have a consist state to failover to. >You need to detect such a case and throttle down the guest to force convergence.

Yes, that's a problem. AFAK, qemu already have an auto convergence feature.
From another perspective,  if many migrations could not converge, maybe the workload is high and the bandwidth is low,  and it is not recommended to use FT in general.



------=_Part_41423_443349766.1378864487959--