From: Kevin Wolf <kwolf@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com>,
qemu-devel@nongnu.org, jdenemar@redhat.com, wangjie88@huawei.com,
quintela@redhat.com, peterx@redhat.com, mreitz@redhat.com,
eblake@redhat.com, fuweiwei2@huawei.com
Subject: Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
Date: Thu, 12 Oct 2017 11:52:40 +0200 [thread overview]
Message-ID: <20171012095240.GB5624@localhost.localdomain> (raw)
In-Reply-To: <20171012092726.GD16125@redhat.com>
Am 12.10.2017 um 11:27 hat Daniel P. Berrange geschrieben:
> On Thu, Oct 12, 2017 at 11:18:31AM +0200, Kevin Wolf wrote:
> > Am 12.10.2017 um 10:21 hat Daniel P. Berrange geschrieben:
> > > On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > >
> > > > Hi,
> > > > This set attempts to make a race condition between migration and
> > > > drive-mirror (and other block users) soluble by allowing the migration
> > > > to be paused after the source qemu releases the block devices but
> > > > before the serialisation of the device state.
> > > >
> > > > The symptom of this failure, as reported by Wangjie, is a:
> > > > _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> > > >
> > > > and the source qemu dieing; so the problem is pretty nasty.
> > > > This has only been seen on 2.9 onwards, but the theory is that
> > > > prior to 2.9 it might have been happening anyway and we were
> > > > perhaps getting unreported corruptions (lost writes); so this
> > > > really needs fixing.
> > > >
> > > > This flow came from discussions between Kevin and me, and we can't
> > > > see a way of fixing it without exposing a new state to the management
> > > > layer.
> > > >
> > > > The flow is now:
> > > >
> > > > (qemu) migrate_set_capability pause-before-device on
> > > > (qemu) migrate -d ...
> > > > (qemu) info migrate
> > > > ...
> > > > Migration status: pause-before-device
> > > > ...
> > > > << issue commands to clean up any block jobs>>
> > > >
> > > > (qemu) migrate_continue pause-before-device
> > > > (qemu) info migrate
> > > > ...
> > > > Migration status: completed
> > >
> > > I'm curious why QEMU doesn't have enough info to clean up the block
> > > jobs automatically ? What is the key thing that libvirt knows about
> > > the block jobs, that QEMU is lacking ? If QEMU had the right info it
> > > could do it automatically & avoid this extra lock-step synchronization
> > > with libvirt.
> >
> > The key point is that the block job needs to be completed while the
> > source VM is stopped, but the source qemu is still in control of the
> > image files (e.g. still holds the file locks), so that it can do the
> > remaining writes.
> >
> > Without the additional migration phase, the only state where both sides
> > are stopped is when the destination is in control of the image files
> > (migration has completed, but -S prevents it from automatically
> > resuming), so the source can't write to the image any more.
>
> Hmm, I always thought that the target QEMU did not start using the
> image files until you ran 'cont' on the target. eg once source QEMU
> has migrate=completed, both QEMUs are in paused state and source QEMU
> still owns the images, until we run 'cont'.
>
> What you're saying seems to imply this is not the case, but if so what
> is triggering the target QEMU to acquire the locks on images ? Is it
> done implicitly when it finishes reading device state off the wire ?
>
> If so, could we instead add a migrate feature flag to tell the target
> QEMU not to automatically acquire image locks, until it receives an
> explicit 'cont'. That would then not require this extra lock-step
> migration state.
The handover consists of two parts: The destination acquires the locks,
but first the source needs to release them. Without a new command, the
source can't know when it is supposed to do that. The destination
receives the 'cont' command, but source doesn't know about this. So you
have to have something that tells the source "management has made sure
to complete what needed to be completed, you can now give up control of
the images".
I also think that conceptually it is the cleanest to have a source
controlled pre-handover phase with paused VM, which is only symmetrical
to the existing post-handover phase that we have on the destination.
This gives us a clean model for the handover of any resources that
require some tearing down on the source before they can be used on the
destination, so it appears to be the most future-proof option.
Kevin
next prev parent reply other threads:[~2017-10-12 9:52 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 1/7] migration: Add 'pause-before-device' capability Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 2/7] migration: Add 'pause-before-device' and 'device' statuses Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration Dr. David Alan Gilbert (git)
2017-10-18 3:35 ` Peter Xu
2017-10-18 8:59 ` Dr. David Alan Gilbert
2017-10-11 19:13 ` [Qemu-devel] [PATCH 4/7] migration: migrate-continue Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 5/7] migrate: HMP migate_continue Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 6/7] migration: allow cancel to unpause Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 7/7] migration: pause-before-device for postcopy Dr. David Alan Gilbert (git)
2017-10-11 20:03 ` [Qemu-devel] [PATCH 0/7] migration: pause-before-device no-reply
2017-10-12 8:21 ` Daniel P. Berrange
2017-10-12 9:18 ` Kevin Wolf
2017-10-12 9:27 ` Daniel P. Berrange
2017-10-12 9:52 ` Kevin Wolf [this message]
2017-10-12 9:55 ` Daniel P. Berrange
2017-10-12 10:02 ` Daniel P. Berrange
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171012095240.GB5624@localhost.localdomain \
--to=kwolf@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eblake@redhat.com \
--cc=fuweiwei2@huawei.com \
--cc=jdenemar@redhat.com \
--cc=mreitz@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=wangjie88@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).