From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1MPQgT-0003gc-95
	for qemu-devel@nongnu.org; Fri, 10 Jul 2009 20:43:21 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1MPQgN-0003dv-No
	for qemu-devel@nongnu.org; Fri, 10 Jul 2009 20:43:19 -0400
Received: from [199.232.76.173] (port=44445 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1MPQgN-0003ds-I1
	for qemu-devel@nongnu.org; Fri, 10 Jul 2009 20:43:15 -0400
Received: from mx20.gnu.org ([199.232.41.8]:55872)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <jamie@shareable.org>) id 1MPQgN-0001C2-3A
	for qemu-devel@nongnu.org; Fri, 10 Jul 2009 20:43:15 -0400
Received: from mail2.shareable.org ([80.68.89.115])
	by mx20.gnu.org with esmtp (Exim 4.60)
	(envelope-from <jamie@shareable.org>) id 1MPQgM-0000SS-9k
	for qemu-devel@nongnu.org; Fri, 10 Jul 2009 20:43:14 -0400
Date: Sat, 11 Jul 2009 01:42:58 +0100
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
Message-ID: <20090711004258.GI30322@shareable.org>
References: <1247140059-5034-1-git-send-email-pbonzini@redhat.com>
	<1247140059-5034-3-git-send-email-pbonzini@redhat.com>
	<4A55F46F.6060705@codemonkey.ws> <4A55F510.5090801@redhat.com>
	<4A55F641.6000701@codemonkey.ws>
	<20090710231424.GD30322@shareable.org>
	<Pine.LNX.4.64.0907110404280.5668@linmac.oyster.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.64.0907110404280.5668@linmac.oyster.ru>
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: malc <av1474@comtv.ru>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org

malc wrote:
> > What happens if the destination host sends "migration completed", and
> > then the connection drops before that message is delivered reliably to
> > the sending host?
> > 
> > The destination host will run the VM,
> > and the sending host will restart and run the VM too.
> > 
> > Two copies of the same VM running together doesn't sound healthy.
> > 
> > This is a classic handshaking problem and I'm not aware of any perfect
> > solution, only ways to ensure eventual recovery, and temporary
> > uncertainty errs on the side of caution.  In this case, caution would
> > be neither VM running but a notification to the system manager of this
> > rare condition, and the possibility to recover when the two hosts are
> > able to resume communication.  I don't know how to do better than that.
> 
> Sounds like http://en.wikipedia.org/wiki/Two_Generals%27_Problem

It's not the same.  Unlike the Two Generals, the handshake has
outcomes which allow progress with guaranteed safety.  Two outcomes
result in one or other machine running, and a third outcome is both
machines being stopped, and repeatedly attempting to communicate for
recovery.  Both machines stopped is undesirable (and may be a
catastrophe for some applications), but it is safe in some useful
sense - it's not a disastrous failure compared with both running.

Two Generals, on the other hand, doesn't have any safe solutions,
except for no progress at all.  There is no way for either General to
proceed without some risk of failure, so the only strategy is to
minimise that probability.

-- Jamie


there is uncertainty

  1. A sends "migration complete, you start running" to B, and A stops.
  2. B sends "migration complete accepted" to A, and starts running.

If message 2 is lost, B will be running, A will be stopped, though A
is uncertain.  A defers to the system operator, or keeps trying to
communicate with B.

If message 1 is lost, A 


  - 


> 
> -- 
> mailto:av1474@comtv.ru