From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42362) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VYX93-0002E2-OH for qemu-devel@nongnu.org; Tue, 22 Oct 2013 04:16:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VYX91-0002KY-Oh for qemu-devel@nongnu.org; Tue, 22 Oct 2013 04:16:53 -0400 Received: from [2001:250:208:1181:6e92:bfff:fe00:bcdb] (port=37748 helo=mail.cs2c.com.cn) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VYX91-0002JM-63 for qemu-devel@nongnu.org; Tue, 22 Oct 2013 04:16:51 -0400 Message-ID: <1382486935.1780.6.camel@localhost> From: Jules Date: Wed, 23 Oct 2013 08:08:55 +0800 In-Reply-To: <20131017115059.GF10774@stefanha-thinkpad.redhat.com> References: <1381821983-13932-1-git-send-email-junqing.wang@cs2c.com.cn> <20131017115059.GF10774@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v3 0/4] Curling: KVM Fault Tolerance List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: pbonzini@redhat.com, quintela@redhat.com, qemu-devel@nongnu.org, owasserm@redhat.com > On Tue, Oct 15, 2013 at 03:26:19PM +0800, Jules Wang wrote: > > v2 -> v3: > > * add documentation of new option in qapi-schema. > > > > * long option name: ft -> fault-tolerant > > > > v1 -> v2: > > * cmdline: migrate curling:tcp:
: > > -> migrate -f tcp:
: > > > > * sender: use QEMU_VM_FILE_MAGIC_FT as the header of the migration > > to indicate this is a ft migration. > > > > * receiver: look for the signature: > > QEMU_VM_EOF_MAGIC + QEMU_VM_FILE_MAGIC_FT(64bit total) > > which indicates the end of one migration. > > -- > > Jules Wang (4): > > Curling: add doc > > Curling: cmdline interface. > > Curling: the sender > > Curling: the receiver > First of all, thanks for your superb and spot-on comments. > It would be helpful to clarify the status of Curling in the cover letter > email so reviewers know what to expect. OK, but I'm not quite clear about how to clarify the status, would you pls give me an example? > > This series does not address I/O or failover. I guess you are aware of > the missing topics that I mentioned, here are my thoughts on them: > > I/O needs to be held back until the destination host has acknowledged > receiving the last full migration state. The outside world cannot > witness state changes in the guest until the migration state has been > successfully transferred to the destination host. Otherwise the guest > may appear to act incorrectly when resuming execution from the last > snapshot. > > The time period used by the FT sender thread determines how much latency > is added to I/O requests. Yes, there is the latency. That is inevitable. I guess you mean the following situation: If a msg 'hello' is sent to the chat room server just a few seconds before the failover happens, there is a possibility that the msg will be sent to the others twice or be lost. Am I right? > > Failover functionality is missing from these patches. We cannot simply > start executing on the destination host when the migration connection > ends. If the guest disk image is located on shared storage then > split-brain occurs when a network error terminates the migration > connection - > will both hosts begin accessing the shared disk? YES > I have a simple way to handle that. In one word, the third point --gateway. Both the sender and the receiver check the connectivity to the gateway every X seconds. Let's use A and B stand for whether the sender and the receiver are connected to the gateway respectively. When the connection between the sender and the receiver is down. A && B is false. If A is false, the vm instance at the sender will be stopped. If B is false, the vm instance at the receiver will not be started. a.A false B false: 0 vm run b.A false B true: 1 vm run c.A true B false: 1 vm run d.A true B true : 1 vm run (normal case) It becomes complicated when we consider the state transitions in these four states. I suggest adding this feature to libvirt instead of qemu. > What is your plan to address these issues? > > Stefan >