From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NCbTk-0001BO-HV for qemu-devel@nongnu.org; Mon, 23 Nov 2009 11:09:28 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NCbTg-00018x-3i for qemu-devel@nongnu.org; Mon, 23 Nov 2009 11:09:28 -0500 Received: from [199.232.76.173] (port=39632 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NCbTf-00018s-Pt for qemu-devel@nongnu.org; Mon, 23 Nov 2009 11:09:23 -0500 Received: from mail-bw0-f228.google.com ([209.85.218.228]:46347) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NCbTf-0003Dd-Cv for qemu-devel@nongnu.org; Mon, 23 Nov 2009 11:09:23 -0500 Received: by bwz28 with SMTP id 28so5824257bwz.17 for ; Mon, 23 Nov 2009 08:09:22 -0800 (PST) Message-ID: <4B0AB3AB.5080204@codemonkey.ws> Date: Mon, 23 Nov 2009 10:09:15 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <20091123082659.GC2999@redhat.com> <20091123123640.GL2999@redhat.com> <20091123143242.GO2999@redhat.com> <4B0AA165.60900@codemonkey.ws> <20091123145356.GQ2999@redhat.com> <4B0AA4D6.9060607@codemonkey.ws> <20091123152252.GR2999@redhat.com> <4B0AAB20.8060109@codemonkey.ws> <20091123154951.GT2999@redhat.com> In-Reply-To: <20091123154951.GT2999@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gleb Natapov Cc: Paolo Bonzini , Juan Quintela , qemu-devel@nongnu.org Gleb Natapov wrote: > On Mon, Nov 23, 2009 at 09:32:48AM -0600, Anthony Liguori wrote: > >> Gleb Natapov wrote: >> >>> On Mon, Nov 23, 2009 at 09:05:58AM -0600, Anthony Liguori wrote: >>> >>>> Gleb Natapov wrote: >>>> >>>>> Then I don't see why Juan claims what he claims. >>>>> >>>> Live migration is unidirectional. As long as qemu can send out all >>>> of the data without the stream closing, it will "succeed" on the >>>> source. While this may sound like a bug, it's an impossible problem >>>> to solve as it's dealing with reliable communication between two >>>> unreliable nodes (i.e. the two general's problem). This is why the >>>> source qemu does not exit after a successful live migration. It >>>> >>> As far as I remember the two general's problem talks about unreliable >>> channel, not unreliable nodes. >>> >> That's just semantics. The problem is that one general does not >> know if the other general received the message. Even if there was a >> reliable channel between the two generals, if one of the generals >> can die with no indication, then you still have the same problem, >> i.e. the first general doesn't know for sure if the second general >> received the message. >> >> >>> Why not having destination send ACK/NACK >>> to the source when it knows that migration succeeded/failed. >>> >> 1) Source sends migration traffic >> 2) Destination receives it, sends Ack >> 3) Destination needs to wait to receive Ack from Source before >> starting guest to ensure that guest does not start twice >> 4) Source receives Ack from Destination, sends Ack >> 5) Source kills guest >> 6) Destination receives Ack from Source, starts guest >> >> If Destination dies in between 5 and 6, the VM disappears. >> >> > 1) Source sends migration traffic > 2) Destination receives it, sends Ack > 3) Destination start running > 4) Source receives Ack from Destination > 5) Source kills guest > > If Source does not receive Ack it stays paused and wait for management to > sort things out. > Is it really useful to kill the source guest in this case? I'm wary of how useful an unreliable ack is namely because it introduces rather complex semantics from a management tool perspective. If folks think it would be really useful, I'm not fundamentally opposed to it. Regards, Anthony Liguori