All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, cam@cs.ualberta.ca,
	quintela@redhat.com, anthony@codemonkey.ws
Subject: Re: [PATCH 0/6] Save state error handling (kill off no_migrate)
Date: Tue, 09 Nov 2010 08:47:00 -0700	[thread overview]
Message-ID: <1289317620.14321.19.camel@x201> (raw)
In-Reply-To: <20101109154217.GA26326@redhat.com>

On Tue, 2010-11-09 at 17:42 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 09, 2010 at 08:34:54AM -0700, Alex Williamson wrote:
> > On Tue, 2010-11-09 at 17:07 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Nov 09, 2010 at 07:58:23AM -0700, Alex Williamson wrote:
> > > > On Tue, 2010-11-09 at 14:00 +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 08, 2010 at 02:23:37PM -0700, Alex Williamson wrote:
> > > > > > On Mon, 2010-11-08 at 22:59 +0200, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Nov 08, 2010 at 10:20:46AM -0700, Alex Williamson wrote:
> > > > > > > > On Mon, 2010-11-08 at 18:54 +0200, Michael S. Tsirkin wrote:
> > > > > > > > > On Mon, Nov 08, 2010 at 07:59:57AM -0700, Alex Williamson wrote:
> > > > > > > > > > On Mon, 2010-11-08 at 13:40 +0200, Michael S. Tsirkin wrote:
> > > > > > > > > > > On Wed, Oct 06, 2010 at 02:58:57PM -0600, Alex Williamson wrote:
> > > > > > > > > > > > Our code paths for saving or migrating a VM are full of functions that
> > > > > > > > > > > > return void, leaving no opportunity for a device to cancel a migration,
> > > > > > > > > > > > either from error or incompatibility.  The ivshmem driver attempted to
> > > > > > > > > > > > solve this with a no_migrate flag on the save state entry.  I think the
> > > > > > > > > > > > more generic and flexible way to solve this is to allow driver save
> > > > > > > > > > > > functions to fail.  This series implements that and converts ivshmem
> > > > > > > > > > > > to uses a set_params function to NAK migration much earlier in the
> > > > > > > > > > > > processes.  This touches a lot of files, but bulk of those changes are
> > > > > > > > > > > > simply s/void/int/ and tacking a "return 0" to the end of functions.
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > 
> > > > > > > > > > > > Alex
> > > > > > > > > > > 
> > > > > > > > > > > Well error handling is always tricky: it seems easier to
> > > > > > > > > > > require save handlers to never fail.
> > > > > > > > > > 
> > > > > > > > > > Sure it's easier, but does that make it robust?
> > > > > > > > > 
> > > > > > > > > More robust in the face of wwhat kind of failure?
> > > > > > > > 
> > > > > > > > I really don't understand why we're having a discussion about whether
> > > > > > > > providing a means to return an error is a good thing or not.  These
> > > > > > > > patches touch a lot of files, but the change is dead simple.
> > > > > > > 
> > > > > > > I just don't see the motivation. Presumably your patches are
> > > > > > > there to achieve some kind of goal, right? I am trying to
> > > > > > > figure out what that goal is.
> > > > > > 
> > > > > > My goal is that I want to be able to NAK a migration when devices are
> > > > > > assigned, and I think we can do it more generically than the no_migrate
> > > > > > flag so that it supports this application and any other reason that
> > > > > > saves might fail in the future.
> > > > > 
> > > > > More generically but harder to understand and debug, IMO.
> > > > 
> > > > How is returning an error condition hard to understand?  Debugging seems
> > > > easier to me, especially if drivers follow the precedent set in the last
> > > > patch and fprintf the reason for the failure.  Ideally this would be
> > > > some kind of push out to qmp, but it still seems easier than figuring
> > > > out which driver called register_device_unmigratable().
> > > > 
> > > > > > > Currently savevm callbacks never fail. So they
> > > > > > > return void. Why is returing 0 and adding a bunch of code to test the
> > > > > > > condition that never happens a good idea?  It just seems to create more
> > > > > > > ways for devices to shoot themselves in the foot.
> > > > > > 
> > > > > > And more ways to indicate something bad happened and keep running.  We
> > > > > > already have far too many abort() calls in the code.
> > > > > 
> > > > > If you can keep running why can't you migrate?
> > > > 
> > > > Well, as you know device assignment is tied to the hardware, so can't
> > > > migrate, but can always keep running.  The ivshmem driver has a peer
> > > > role, where it's tied to the host memory, so can't migrate, but can keep
> > > > running.
> > > 
> > > Right. All these are covered with no_migrate flag well enough.
> > > Their inability to migrate does not change at runtime.
> > 
> > But it could.  What if ivshmem is acting in a peer role, but has no
> > clients, could it migrate?  What if ivshmem is migratable when the
> > migration begins, but while the migration continues, a connection is
> > setup and it becomes unmigratable.
> 
> Sounds like something we should work to prevent, not support :)

s/:)/:(/  why?

> >  Using this series, ivshmem would
> > have multiple options how to support this.  It could a) NAK the
> > migration, b) drop connections and prevent new connections until the
> > migration finishes, c) detect that new connections have happened since
> > the migration started and cancel.  And probably more.  no_migrate can
> > only do a).  And in fact, we can only test no_migrate after the VM is
> > stopped (after all memory is migrated) because otherwise it could race
> > with devices setting no_migrate during migration.
> 
> We really want no_migrate to be static. changing it is abusing
> the infrastructure.

You call it abusing, I call it making use of the infrastructure.  Why
unnecessarily restrict ourselves?  Is return 0/-1 really that scary,
unmaintainable, undebuggable?  I don't understand the resistance.

Alex




WARNING: multiple messages have this Message-ID (diff)
From: Alex Williamson <alex.williamson@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: cam@cs.ualberta.ca, qemu-devel@nongnu.org, kvm@vger.kernel.org,
	quintela@redhat.com
Subject: [Qemu-devel] Re: [PATCH 0/6] Save state error handling (kill off no_migrate)
Date: Tue, 09 Nov 2010 08:47:00 -0700	[thread overview]
Message-ID: <1289317620.14321.19.camel@x201> (raw)
In-Reply-To: <20101109154217.GA26326@redhat.com>

On Tue, 2010-11-09 at 17:42 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 09, 2010 at 08:34:54AM -0700, Alex Williamson wrote:
> > On Tue, 2010-11-09 at 17:07 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Nov 09, 2010 at 07:58:23AM -0700, Alex Williamson wrote:
> > > > On Tue, 2010-11-09 at 14:00 +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 08, 2010 at 02:23:37PM -0700, Alex Williamson wrote:
> > > > > > On Mon, 2010-11-08 at 22:59 +0200, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Nov 08, 2010 at 10:20:46AM -0700, Alex Williamson wrote:
> > > > > > > > On Mon, 2010-11-08 at 18:54 +0200, Michael S. Tsirkin wrote:
> > > > > > > > > On Mon, Nov 08, 2010 at 07:59:57AM -0700, Alex Williamson wrote:
> > > > > > > > > > On Mon, 2010-11-08 at 13:40 +0200, Michael S. Tsirkin wrote:
> > > > > > > > > > > On Wed, Oct 06, 2010 at 02:58:57PM -0600, Alex Williamson wrote:
> > > > > > > > > > > > Our code paths for saving or migrating a VM are full of functions that
> > > > > > > > > > > > return void, leaving no opportunity for a device to cancel a migration,
> > > > > > > > > > > > either from error or incompatibility.  The ivshmem driver attempted to
> > > > > > > > > > > > solve this with a no_migrate flag on the save state entry.  I think the
> > > > > > > > > > > > more generic and flexible way to solve this is to allow driver save
> > > > > > > > > > > > functions to fail.  This series implements that and converts ivshmem
> > > > > > > > > > > > to uses a set_params function to NAK migration much earlier in the
> > > > > > > > > > > > processes.  This touches a lot of files, but bulk of those changes are
> > > > > > > > > > > > simply s/void/int/ and tacking a "return 0" to the end of functions.
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > 
> > > > > > > > > > > > Alex
> > > > > > > > > > > 
> > > > > > > > > > > Well error handling is always tricky: it seems easier to
> > > > > > > > > > > require save handlers to never fail.
> > > > > > > > > > 
> > > > > > > > > > Sure it's easier, but does that make it robust?
> > > > > > > > > 
> > > > > > > > > More robust in the face of wwhat kind of failure?
> > > > > > > > 
> > > > > > > > I really don't understand why we're having a discussion about whether
> > > > > > > > providing a means to return an error is a good thing or not.  These
> > > > > > > > patches touch a lot of files, but the change is dead simple.
> > > > > > > 
> > > > > > > I just don't see the motivation. Presumably your patches are
> > > > > > > there to achieve some kind of goal, right? I am trying to
> > > > > > > figure out what that goal is.
> > > > > > 
> > > > > > My goal is that I want to be able to NAK a migration when devices are
> > > > > > assigned, and I think we can do it more generically than the no_migrate
> > > > > > flag so that it supports this application and any other reason that
> > > > > > saves might fail in the future.
> > > > > 
> > > > > More generically but harder to understand and debug, IMO.
> > > > 
> > > > How is returning an error condition hard to understand?  Debugging seems
> > > > easier to me, especially if drivers follow the precedent set in the last
> > > > patch and fprintf the reason for the failure.  Ideally this would be
> > > > some kind of push out to qmp, but it still seems easier than figuring
> > > > out which driver called register_device_unmigratable().
> > > > 
> > > > > > > Currently savevm callbacks never fail. So they
> > > > > > > return void. Why is returing 0 and adding a bunch of code to test the
> > > > > > > condition that never happens a good idea?  It just seems to create more
> > > > > > > ways for devices to shoot themselves in the foot.
> > > > > > 
> > > > > > And more ways to indicate something bad happened and keep running.  We
> > > > > > already have far too many abort() calls in the code.
> > > > > 
> > > > > If you can keep running why can't you migrate?
> > > > 
> > > > Well, as you know device assignment is tied to the hardware, so can't
> > > > migrate, but can always keep running.  The ivshmem driver has a peer
> > > > role, where it's tied to the host memory, so can't migrate, but can keep
> > > > running.
> > > 
> > > Right. All these are covered with no_migrate flag well enough.
> > > Their inability to migrate does not change at runtime.
> > 
> > But it could.  What if ivshmem is acting in a peer role, but has no
> > clients, could it migrate?  What if ivshmem is migratable when the
> > migration begins, but while the migration continues, a connection is
> > setup and it becomes unmigratable.
> 
> Sounds like something we should work to prevent, not support :)

s/:)/:(/  why?

> >  Using this series, ivshmem would
> > have multiple options how to support this.  It could a) NAK the
> > migration, b) drop connections and prevent new connections until the
> > migration finishes, c) detect that new connections have happened since
> > the migration started and cancel.  And probably more.  no_migrate can
> > only do a).  And in fact, we can only test no_migrate after the VM is
> > stopped (after all memory is migrated) because otherwise it could race
> > with devices setting no_migrate during migration.
> 
> We really want no_migrate to be static. changing it is abusing
> the infrastructure.

You call it abusing, I call it making use of the infrastructure.  Why
unnecessarily restrict ourselves?  Is return 0/-1 really that scary,
unmaintainable, undebuggable?  I don't understand the resistance.

Alex

  reply	other threads:[~2010-11-09 15:47 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-06 20:58 [PATCH 0/6] Save state error handling (kill off no_migrate) Alex Williamson
2010-10-06 20:58 ` [Qemu-devel] " Alex Williamson
2010-10-06 20:59 ` [PATCH 1/6] savevm: Allow SaveStateHandler() to return error Alex Williamson
2010-10-06 20:59   ` [Qemu-devel] " Alex Williamson
2010-10-06 20:59 ` [PATCH 2/6] savevm: Allow vmsd->pre_save " Alex Williamson
2010-10-06 20:59   ` [Qemu-devel] " Alex Williamson
2010-10-06 20:59 ` [PATCH 3/6] pci: Allow pci_device_save() " Alex Williamson
2010-10-06 20:59   ` [Qemu-devel] " Alex Williamson
2010-10-06 20:59 ` [PATCH 4/6] virtio: Allow virtio_save() errors Alex Williamson
2010-10-06 20:59   ` [Qemu-devel] " Alex Williamson
2010-10-06 20:59 ` [PATCH 5/6] savevm: Allow set_params and save_live_state to error Alex Williamson
2010-10-06 20:59   ` [Qemu-devel] " Alex Williamson
2010-10-06 20:59 ` [PATCH 6/6] savevm: Remove register_device_unmigratable() Alex Williamson
2010-10-06 20:59   ` [Qemu-devel] " Alex Williamson
2010-10-07 16:55 ` [PATCH 0/6] Save state error handling (kill off no_migrate) Alex Williamson
2010-10-07 16:55   ` [Qemu-devel] " Alex Williamson
2010-11-08 11:40 ` Michael S. Tsirkin
2010-11-08 11:40   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-08 14:59   ` Alex Williamson
2010-11-08 14:59     ` [Qemu-devel] " Alex Williamson
2010-11-08 16:54     ` Michael S. Tsirkin
2010-11-08 16:54       ` [Qemu-devel] " Michael S. Tsirkin
2010-11-08 17:20       ` Alex Williamson
2010-11-08 17:20         ` [Qemu-devel] " Alex Williamson
2010-11-08 20:59         ` Michael S. Tsirkin
2010-11-08 20:59           ` [Qemu-devel] " Michael S. Tsirkin
2010-11-08 21:23           ` Alex Williamson
2010-11-08 21:23             ` [Qemu-devel] " Alex Williamson
2010-11-09 12:00             ` Michael S. Tsirkin
2010-11-09 12:00               ` [Qemu-devel] " Michael S. Tsirkin
2010-11-09 14:58               ` Alex Williamson
2010-11-09 14:58                 ` [Qemu-devel] " Alex Williamson
2010-11-09 15:07                 ` Michael S. Tsirkin
2010-11-09 15:07                   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-09 15:34                   ` Alex Williamson
2010-11-09 15:34                     ` [Qemu-devel] " Alex Williamson
2010-11-09 15:42                     ` Michael S. Tsirkin
2010-11-09 15:42                       ` [Qemu-devel] " Michael S. Tsirkin
2010-11-09 15:47                       ` Alex Williamson [this message]
2010-11-09 15:47                         ` Alex Williamson
2010-11-09 16:15                         ` Michael S. Tsirkin
2010-11-09 16:15                           ` [Qemu-devel] " Michael S. Tsirkin
2010-11-09 16:30                           ` Alex Williamson
2010-11-09 16:30                             ` [Qemu-devel] " Alex Williamson
2010-11-09 16:49                             ` Michael S. Tsirkin
2010-11-09 16:49                               ` [Qemu-devel] " Michael S. Tsirkin
2010-11-09 17:44                               ` Alex Williamson
2010-11-09 17:44                                 ` [Qemu-devel] " Alex Williamson
2010-11-09 19:35                                 ` Alex Williamson
2010-11-09 19:35                                   ` [Qemu-devel] " Alex Williamson
2010-11-16 10:23 ` Juan Quintela
2010-11-16 10:23   ` [Qemu-devel] " Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1289317620.14321.19.camel@x201 \
    --to=alex.williamson@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=cam@cs.ualberta.ca \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.