[Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: cam@cs.ualberta.ca, qemu-devel@nongnu.org, kvm@vger.kernel.org,
	quintela@redhat.com
Subject: [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate)
Date: Tue, 09 Nov 2010 10:44:06 -0700	[thread overview]
Message-ID: <1289324646.14321.39.camel@x201> (raw)
In-Reply-To: <20101109164917.GA27287@redhat.com>

On Tue, 2010-11-09 at 18:49 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 09, 2010 at 09:30:45AM -0700, Alex Williamson wrote:
> > On Tue, 2010-11-09 at 18:15 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Nov 09, 2010 at 08:47:00AM -0700, Alex Williamson wrote:
> > > > > > But it could.  What if ivshmem is acting in a peer role, but has no
> > > > > > clients, could it migrate?  What if ivshmem is migratable when the
> > > > > > migration begins, but while the migration continues, a connection is
> > > > > > setup and it becomes unmigratable.
> > > > > 
> > > > > Sounds like something we should work to prevent, not support :)
> > > > 
> > > > s/:)/:(/  why?
> > > 
> > > It will just confuse everyone. Also if it happens after sending
> > > all of memory, it's pretty painful.
> > 
> > It happens after sending all of memory with no_migrate, and I think
> > pushing that earlier might introduce some races around when
> > register_device_unmigratable() can be called.
> 
> Good point. I guess we could check it twice just to speed things up.
> 
> > > > > >  Using this series, ivshmem would
> > > > > > have multiple options how to support this.  It could a) NAK the
> > > > > > migration, b) drop connections and prevent new connections until the
> > > > > > migration finishes, c) detect that new connections have happened since
> > > > > > the migration started and cancel.  And probably more.  no_migrate can
> > > > > > only do a).  And in fact, we can only test no_migrate after the VM is
> > > > > > stopped (after all memory is migrated) because otherwise it could race
> > > > > > with devices setting no_migrate during migration.
> > > > > 
> > > > > We really want no_migrate to be static. changing it is abusing
> > > > > the infrastructure.
> > > > 
> > > > You call it abusing, I call it making use of the infrastructure.  Why
> > > > unnecessarily restrict ourselves?  Is return 0/-1 really that scary,
> > > > unmaintainable, undebuggable?  I don't understand the resistance.
> > > > 
> > > > Alex
> > > 
> > > management really does not know how to handle unexpected
> > > migration failures. They must be avoided.
> > > 
> > > There are some very special cases that fail migration. They are
> > > currently easy to find with grep register_device_unmigratable.
> > > I prefer to keep it that way.
> > 
> > How can management tools be improved to better handle unexpected
> > migration failures when the only way for qemu to fail is an abort?
> > We need the infrastructure to at least return an error first.  Do we just
> > need to add some fprintfs to the save core to print the id string of the
> > device that failed to save?  I just can't buy the "code is easier to
> > grep" as an argument against adding better error handling to the save
> > code path.
> 
> I just don't buy the 'we'll return meaningless error codes at random
> point in time and management will figure it out' as an argument :)

Why is the error code meaningless?  The error code stops the migration
in qemu and hopefully prints an error message (we could easily add an
fprintf to the core save code to ensure we know the device responsible
for the NAK).  From there we can figure out how to return the error to
monitors and management tools, but we need to have a way to know there's
an error first.

> >  Anyone else want to chime in?
> > 
> > Alex
> 
> Maybe try coding up some user using the new infrastructure to do
> something useful, that register_device_unmigratable can't do.

With the number of people I hear complaining about how qemu has too many
aborts, no error checking, and no way to return errors, I'm a little
dumbfounded that there's such a roadblock to actually add some simple
error handling.  Is it the error handling you're opposed to, or the way
I'm using it to NAK a migration?

Alex

next prev parent reply	other threads:[~2010-11-09 17:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-06 20:58 [Qemu-devel] [PATCH 0/6] Save state error handling (kill off no_migrate) Alex Williamson
2010-10-06 20:59 ` [Qemu-devel] [PATCH 1/6] savevm: Allow SaveStateHandler() to return error Alex Williamson
2010-10-06 20:59 ` [Qemu-devel] [PATCH 2/6] savevm: Allow vmsd->pre_save " Alex Williamson
2010-10-06 20:59 ` [Qemu-devel] [PATCH 3/6] pci: Allow pci_device_save() " Alex Williamson
2010-10-06 20:59 ` [Qemu-devel] [PATCH 4/6] virtio: Allow virtio_save() errors Alex Williamson
2010-10-06 20:59 ` [Qemu-devel] [PATCH 5/6] savevm: Allow set_params and save_live_state to error Alex Williamson
2010-10-06 20:59 ` [Qemu-devel] [PATCH 6/6] savevm: Remove register_device_unmigratable() Alex Williamson
2010-10-07 16:55 ` [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate) Alex Williamson
2010-11-08 11:40 ` Michael S. Tsirkin
2010-11-08 14:59   ` Alex Williamson
2010-11-08 16:54     ` Michael S. Tsirkin
2010-11-08 17:20       ` Alex Williamson
2010-11-08 20:59         ` Michael S. Tsirkin
2010-11-08 21:23           ` Alex Williamson
2010-11-09 12:00             ` Michael S. Tsirkin
2010-11-09 14:58               ` Alex Williamson
2010-11-09 15:07                 ` Michael S. Tsirkin
2010-11-09 15:34                   ` Alex Williamson
2010-11-09 15:42                     ` Michael S. Tsirkin
2010-11-09 15:47                       ` Alex Williamson
2010-11-09 16:15                         ` Michael S. Tsirkin
2010-11-09 16:30                           ` Alex Williamson
2010-11-09 16:49                             ` Michael S. Tsirkin
2010-11-09 17:44                               ` Alex Williamson [this message]
2010-11-09 19:35                                 ` Alex Williamson
2010-11-16 10:23 ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1289324646.14321.39.camel@x201 \
    --to=alex.williamson@redhat.com \
    --cc=cam@cs.ualberta.ca \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).