All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>,
	quintela@redhat.com, qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] 3.1: second invocation of migrate crashes qemu
Date: Thu, 24 Jan 2019 20:04:28 +0000	[thread overview]
Message-ID: <20190124200428.GG2101@work-vm> (raw)
In-Reply-To: <20190121164550.GE5638@linux.fritz.box>

* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 21.01.2019 um 17:05 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 18.01.2019 um 16:57 hat Dr. David Alan Gilbert geschrieben:
> > > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > > Am 14.01.2019 um 11:51 hat Dr. David Alan Gilbert geschrieben:
> > > > > > * Michael Tokarev (mjt@tls.msk.ru) wrote:
> > > > > > > $ qemu-system-x86_64 -monitor stdio -hda foo.img
> > > > > > > QEMU 3.1.0 monitor - type 'help' for more information
> > > > > > > (qemu) stop
> > > > > > > (qemu) migrate "exec:cat >/dev/null"
> > > > > > > (qemu) migrate "exec:cat >/dev/null"
> > > > > > > qemu-system-x86_64: /build/qemu/qemu-3.1/block.c:4647: bdrv_inactivate_recurse: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
> > > > > > > Aborted
> > > > > > 
> > > > > > And on head as well;  it only happens if the 1st migrate is succesful;
> > > > > > if it got cancelled the 2nd one works, so it's not too bad.
> > > > > > 
> > > > > > I suspect the problem here is all around locking/ownership - the block
> > > > > > devices get shutdown at the end of migration since the assumption is
> > > > > > that the other end has them open now and we had better release them.
> > > > > 
> > > > > Yes, only "cont" gets control back to the source VM.
> > > > > 
> > > > > I think we really should limit the possible monitor commands in the
> > > > > postmigrate status, and possibly provide a way to get back to the
> > > > > regular paused state (which means getting back control of the resources)
> > > > > without resuming the VM first.
> > > > 
> > > > This error is a little interesting if you'd done something like:
> > > > 
> > > > 
> > > >      src:
> > > >          stop
> > > >          migrate
> > > > 
> > > >      dst:
> > > >          <kill qemu for some reason>
> > > >          start a new qemu
> > > > 
> > > >      src:
> > > >          migrate
> > > > 
> > > > Now that used to work (safely) - note we've not started
> > > > a VM succesfully anywhere else.
> > > > 
> > > > Now the source refuses to let that happen - with a rather
> > > > nasty abort.
> > > 
> > > Essentially it's another effect of the problem that migration has always
> > > lacked a proper model of ownership transfer. And it's still treating
> > > this as a block layer problem rather than making it a core concept of
> > > migration as it should.
> > > 
> > > We can stack another one-off fix on top, and get back control of the
> > > block devices automatically on a second 'migrate'. But it feels like a
> > > hack and not like VMs had a properly designed and respected state
> > > machine.
> > 
> > Hmm; I don't like to get back to this argument because I think
> > we've got a perfectly servicable model that's implemented at higher
> > levels outside qemu, and the real problem is the block layer added
> > new assumptions about the semantics without checking they were really
> > true.
> > qemu only has the view from a single host; it takes the higher level
> > view from something like libvirt to have the view across multiple hosts
> > to understand who has the ownership when.
> 
> Obviously the upper layer is not handling this without the help of QEMU
> or we wouldn't have had bugs that images were accessed by two QEMU
> processes at the same time. We didn't change the assumptions, but we
> only started to actually check the preconditions that have always been
> necessary to perform live migration correctly.

In this case there is a behaviour that was perfectly legal before that
fails now; further the case is safe - the source hasn't accessed the
disks after the first migration and isn't trying to access it again
either.

> But if you like to think the upper layer should handle all of this,

I don't really want the upper layer to handle all of this; but I don't
think we can handle it all either - we've not got the higher level
view of screwups that happen outside qemu.

>then
> it's on libvirt to handle the ownership transfer manually. If we really
> want, we can add explicit QMP commands to activate and inactivate block
> nodes. This can be done and requiring that the management layer does
> all of this would be a consistent interface, too.
> 
> I just don't like this design much for two reasons: The first is that
> you can't migrate a VM that has disks with a simple 'migrate' command
> any more. The second is that if you implement it consistently, this has
> an impact on compatibility. I think it's a design that could be
> considered if we were adding live migration as a new feature, but it's
> probably hard to switch to it now.
> 
> In any case, I do think we should finally make a decision how ownership
> of resources should work in the context of migration, and then implement
> that.

I think we're mostly OK, but what I'd like would be:
  a) I'd like things to fail gently rather than abort; so I'd either
     like the current functions to fail cleanly so I can fail the
     migration or add a check at the start of migration to tell the user
    they did something wrong.

  b) I'd like commands that can tell me the current state and a command
     to move it to the other state explicitly; so we've got a way
     to recover in weirder cases.

  c) I'd like to document what the states should be before/after/in
     various  middle states of migration.

I think the normal case is fine, and hence as you say, I wouldn't
want to break a normal 'migrate' - I just want cleaner failures
and ways to do the more unusual things.

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

      reply	other threads:[~2019-01-24 20:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-12 17:11 [Qemu-devel] 3.1: second invocation of migrate crashes qemu Michael Tokarev
2019-01-14 10:51 ` Dr. David Alan Gilbert
2019-01-14 11:52   ` Kevin Wolf
2019-01-18 15:57     ` Dr. David Alan Gilbert
2019-01-21 15:55       ` Kevin Wolf
2019-01-21 16:05         ` Dr. David Alan Gilbert
2019-01-21 16:45           ` Kevin Wolf
2019-01-24 20:04             ` Dr. David Alan Gilbert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190124200428.GG2101@work-vm \
    --to=dgilbert@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mjt@tls.msk.ru \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.