From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=55922 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PFrNv-00051n-3a for qemu-devel@nongnu.org; Tue, 09 Nov 2010 11:49:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PFrNt-0002Dk-8l for qemu-devel@nongnu.org; Tue, 09 Nov 2010 11:49:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:8406) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PFrNt-0002DJ-1p for qemu-devel@nongnu.org; Tue, 09 Nov 2010 11:49:25 -0500 Date: Tue, 9 Nov 2010 18:49:17 +0200 From: "Michael S. Tsirkin" Message-ID: <20101109164917.GA27287@redhat.com> References: <20101108205901.GB10777@redhat.com> <1289251417.28165.37.camel@x201> <20101109120020.GC22705@redhat.com> <1289314703.28165.53.camel@x201> <20101109150708.GA25725@redhat.com> <1289316894.14321.15.camel@x201> <20101109154217.GA26326@redhat.com> <1289317620.14321.19.camel@x201> <20101109161525.GA26897@redhat.com> <1289320245.14321.28.camel@x201> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1289320245.14321.28.camel@x201> Subject: [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate) List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: cam@cs.ualberta.ca, qemu-devel@nongnu.org, kvm@vger.kernel.org, quintela@redhat.com On Tue, Nov 09, 2010 at 09:30:45AM -0700, Alex Williamson wrote: > On Tue, 2010-11-09 at 18:15 +0200, Michael S. Tsirkin wrote: > > On Tue, Nov 09, 2010 at 08:47:00AM -0700, Alex Williamson wrote: > > > > > But it could. What if ivshmem is acting in a peer role, but has no > > > > > clients, could it migrate? What if ivshmem is migratable when the > > > > > migration begins, but while the migration continues, a connection is > > > > > setup and it becomes unmigratable. > > > > > > > > Sounds like something we should work to prevent, not support :) > > > > > > s/:)/:(/ why? > > > > It will just confuse everyone. Also if it happens after sending > > all of memory, it's pretty painful. > > It happens after sending all of memory with no_migrate, and I think > pushing that earlier might introduce some races around when > register_device_unmigratable() can be called. Good point. I guess we could check it twice just to speed things up. > > > > > Using this series, ivshmem would > > > > > have multiple options how to support this. It could a) NAK the > > > > > migration, b) drop connections and prevent new connections until the > > > > > migration finishes, c) detect that new connections have happened since > > > > > the migration started and cancel. And probably more. no_migrate can > > > > > only do a). And in fact, we can only test no_migrate after the VM is > > > > > stopped (after all memory is migrated) because otherwise it could race > > > > > with devices setting no_migrate during migration. > > > > > > > > We really want no_migrate to be static. changing it is abusing > > > > the infrastructure. > > > > > > You call it abusing, I call it making use of the infrastructure. Why > > > unnecessarily restrict ourselves? Is return 0/-1 really that scary, > > > unmaintainable, undebuggable? I don't understand the resistance. > > > > > > Alex > > > > management really does not know how to handle unexpected > > migration failures. They must be avoided. > > > > There are some very special cases that fail migration. They are > > currently easy to find with grep register_device_unmigratable. > > I prefer to keep it that way. > > How can management tools be improved to better handle unexpected > migration failures when the only way for qemu to fail is an abort? > We need the infrastructure to at least return an error first. Do we just > need to add some fprintfs to the save core to print the id string of the > device that failed to save? I just can't buy the "code is easier to > grep" as an argument against adding better error handling to the save > code path. I just don't buy the 'we'll return meaningless error codes at random point in time and management will figure it out' as an argument :) > Anyone else want to chime in? > > Alex Maybe try coding up some user using the new infrastructure to do something useful, that register_device_unmigratable can't do. -- MST