From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48265) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VK1T4-000489-Su for qemu-devel@nongnu.org; Thu, 12 Sep 2013 03:37:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VK1Sz-0000cl-7N for qemu-devel@nongnu.org; Thu, 12 Sep 2013 03:37:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:15709) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VK1Sy-0000YE-Vj for qemu-devel@nongnu.org; Thu, 12 Sep 2013 03:37:29 -0400 Message-ID: <52316F55.7090307@redhat.com> Date: Thu, 12 Sep 2013 10:37:57 +0300 From: Orit Wasserman MIME-Version: 1.0 References: <1378784607-7398-1-git-send-email-junqing.wang@cs2c.com.cn> <522F1045.2000705@redhat.com> <4bf58665.2cac.1410aba5e17.Coremail.junqing.wang@cs2c.com.cn> In-Reply-To: <4bf58665.2cac.1410aba5e17.Coremail.junqing.wang@cs2c.com.cn> Content-Type: text/plain; charset=gbk Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH RFC 0/4] Curling: KVM Fault Tolerance List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: junqing.wang@cs2c.com.cn Cc: qemu-devel@nongnu.org On 09/11/2013 04:54 AM, junqing.wang@cs2c.com.cn wrote: > Hi, > >>The first is that if the VM failure happen in the middle on the live migration >the backup VM state will be inconsistent which means you can't failover to it. > > Yes, I have concerned about this problem. That is why we need a prefetch buffer. > You are right I missed that. >>Solving it is not simple as you need some transaction mechanism that will >change the backup VM state only when the transaction completes (the live migration completes). >Kemari has something like that. > > > The backup VM state will be loaded only when the one whole migration data is prefetched. Otherwise, VM state will not be loaded. So the backup VM is ensured to have a consistent state like a checkpoint. > However, how close this checkpoint to the point of the VM failure depends on the workload and bandwidth. > At the moment in your implementation the prefetch buffer can be very large (several copies of guest memory size) are you planning to address this issue? >>The second is that sadly live migration doesn't always converge this means >that the backup VM won't have a consist state to failover to. >You need to detect such a case and throttle down the guest to force convergence. > > Yes, that's a problem. AFAK, qemu already have an auto convergence feature. How about activating it when you do fault tolerance automatically? > From another perspective, if many migrations could not converge, maybe the workload is high and the bandwidth is low, and it is not recommended to use FT in general. > I agree but we need some way to notify the user of such problem. Regards, Orit > >