From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH 0/6] Save state error handling (kill off no_migrate)
Date: Tue, 9 Nov 2010 18:49:17 +0200
Message-ID: <20101109164917.GA27287@redhat.com>
References: <20101108205901.GB10777@redhat.com>
 <1289251417.28165.37.camel@x201>
 <20101109120020.GC22705@redhat.com>
 <1289314703.28165.53.camel@x201>
 <20101109150708.GA25725@redhat.com>
 <1289316894.14321.15.camel@x201>
 <20101109154217.GA26326@redhat.com>
 <1289317620.14321.19.camel@x201>
 <20101109161525.GA26897@redhat.com>
 <1289320245.14321.28.camel@x201>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, cam@cs.ualberta.ca,
	quintela@redhat.com, anthony@codemonkey.ws
To: Alex Williamson <alex.williamson@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:60249 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752104Ab0KIQtZ (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 9 Nov 2010 11:49:25 -0500
Content-Disposition: inline
In-Reply-To: <1289320245.14321.28.camel@x201>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Tue, Nov 09, 2010 at 09:30:45AM -0700, Alex Williamson wrote:
> On Tue, 2010-11-09 at 18:15 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 09, 2010 at 08:47:00AM -0700, Alex Williamson wrote:
> > > > > But it could.  What if ivshmem is acting in a peer role, but has no
> > > > > clients, could it migrate?  What if ivshmem is migratable when the
> > > > > migration begins, but while the migration continues, a connection is
> > > > > setup and it becomes unmigratable.
> > > > 
> > > > Sounds like something we should work to prevent, not support :)
> > > 
> > > s/:)/:(/  why?
> > 
> > It will just confuse everyone. Also if it happens after sending
> > all of memory, it's pretty painful.
> 
> It happens after sending all of memory with no_migrate, and I think
> pushing that earlier might introduce some races around when
> register_device_unmigratable() can be called.

Good point. I guess we could check it twice just to speed things up.

> > > > >  Using this series, ivshmem would
> > > > > have multiple options how to support this.  It could a) NAK the
> > > > > migration, b) drop connections and prevent new connections until the
> > > > > migration finishes, c) detect that new connections have happened since
> > > > > the migration started and cancel.  And probably more.  no_migrate can
> > > > > only do a).  And in fact, we can only test no_migrate after the VM is
> > > > > stopped (after all memory is migrated) because otherwise it could race
> > > > > with devices setting no_migrate during migration.
> > > > 
> > > > We really want no_migrate to be static. changing it is abusing
> > > > the infrastructure.
> > > 
> > > You call it abusing, I call it making use of the infrastructure.  Why
> > > unnecessarily restrict ourselves?  Is return 0/-1 really that scary,
> > > unmaintainable, undebuggable?  I don't understand the resistance.
> > > 
> > > Alex
> > 
> > management really does not know how to handle unexpected
> > migration failures. They must be avoided.
> > 
> > There are some very special cases that fail migration. They are
> > currently easy to find with grep register_device_unmigratable.
> > I prefer to keep it that way.
> 
> How can management tools be improved to better handle unexpected
> migration failures when the only way for qemu to fail is an abort?
> We need the infrastructure to at least return an error first.  Do we just
> need to add some fprintfs to the save core to print the id string of the
> device that failed to save?  I just can't buy the "code is easier to
> grep" as an argument against adding better error handling to the save
> code path.

I just don't buy the 'we'll return meaningless error codes at random
point in time and management will figure it out' as an argument :)

>  Anyone else want to chime in?
> 
> Alex

Maybe try coding up some user using the new infrastructure to do
something useful, that register_device_unmigratable can't do.

-- 
MST