From: "Daniel P. Berrange" <berrange@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com>,
qemu-devel@nongnu.org, jdenemar@redhat.com, wangjie88@huawei.com,
quintela@redhat.com, peterx@redhat.com, mreitz@redhat.com,
eblake@redhat.com, fuweiwei2@huawei.com
Subject: Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
Date: Thu, 12 Oct 2017 10:55:08 +0100 [thread overview]
Message-ID: <20171012095508.GF16125@redhat.com> (raw)
In-Reply-To: <20171012095240.GB5624@localhost.localdomain>
On Thu, Oct 12, 2017 at 11:52:40AM +0200, Kevin Wolf wrote:
> Am 12.10.2017 um 11:27 hat Daniel P. Berrange geschrieben:
> > On Thu, Oct 12, 2017 at 11:18:31AM +0200, Kevin Wolf wrote:
> > > Am 12.10.2017 um 10:21 hat Daniel P. Berrange geschrieben:
> > > > On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > >
> > > > > Hi,
> > > > > This set attempts to make a race condition between migration and
> > > > > drive-mirror (and other block users) soluble by allowing the migration
> > > > > to be paused after the source qemu releases the block devices but
> > > > > before the serialisation of the device state.
> > > > >
> > > > > The symptom of this failure, as reported by Wangjie, is a:
> > > > > _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> > > > >
> > > > > and the source qemu dieing; so the problem is pretty nasty.
> > > > > This has only been seen on 2.9 onwards, but the theory is that
> > > > > prior to 2.9 it might have been happening anyway and we were
> > > > > perhaps getting unreported corruptions (lost writes); so this
> > > > > really needs fixing.
> > > > >
> > > > > This flow came from discussions between Kevin and me, and we can't
> > > > > see a way of fixing it without exposing a new state to the management
> > > > > layer.
> > > > >
> > > > > The flow is now:
> > > > >
> > > > > (qemu) migrate_set_capability pause-before-device on
> > > > > (qemu) migrate -d ...
> > > > > (qemu) info migrate
> > > > > ...
> > > > > Migration status: pause-before-device
> > > > > ...
> > > > > << issue commands to clean up any block jobs>>
> > > > >
> > > > > (qemu) migrate_continue pause-before-device
> > > > > (qemu) info migrate
> > > > > ...
> > > > > Migration status: completed
> > > >
> > > > I'm curious why QEMU doesn't have enough info to clean up the block
> > > > jobs automatically ? What is the key thing that libvirt knows about
> > > > the block jobs, that QEMU is lacking ? If QEMU had the right info it
> > > > could do it automatically & avoid this extra lock-step synchronization
> > > > with libvirt.
> > >
> > > The key point is that the block job needs to be completed while the
> > > source VM is stopped, but the source qemu is still in control of the
> > > image files (e.g. still holds the file locks), so that it can do the
> > > remaining writes.
> > >
> > > Without the additional migration phase, the only state where both sides
> > > are stopped is when the destination is in control of the image files
> > > (migration has completed, but -S prevents it from automatically
> > > resuming), so the source can't write to the image any more.
> >
> > Hmm, I always thought that the target QEMU did not start using the
> > image files until you ran 'cont' on the target. eg once source QEMU
> > has migrate=completed, both QEMUs are in paused state and source QEMU
> > still owns the images, until we run 'cont'.
> >
> > What you're saying seems to imply this is not the case, but if so what
> > is triggering the target QEMU to acquire the locks on images ? Is it
> > done implicitly when it finishes reading device state off the wire ?
> >
> > If so, could we instead add a migrate feature flag to tell the target
> > QEMU not to automatically acquire image locks, until it receives an
> > explicit 'cont'. That would then not require this extra lock-step
> > migration state.
>
> The handover consists of two parts: The destination acquires the locks,
> but first the source needs to release them. Without a new command, the
> source can't know when it is supposed to do that. The destination
> receives the 'cont' command, but source doesn't know about this. So you
> have to have something that tells the source "management has made sure
> to complete what needed to be completed, you can now give up control of
> the images".
>
> I also think that conceptually it is the cleanest to have a source
> controlled pre-handover phase with paused VM, which is only symmetrical
> to the existing post-handover phase that we have on the destination.
> This gives us a clean model for the handover of any resources that
> require some tearing down on the source before they can be used on the
> destination, so it appears to be the most future-proof option.
Ok, I see what you mean now.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2017-10-12 9:55 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 1/7] migration: Add 'pause-before-device' capability Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 2/7] migration: Add 'pause-before-device' and 'device' statuses Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration Dr. David Alan Gilbert (git)
2017-10-18 3:35 ` Peter Xu
2017-10-18 8:59 ` Dr. David Alan Gilbert
2017-10-11 19:13 ` [Qemu-devel] [PATCH 4/7] migration: migrate-continue Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 5/7] migrate: HMP migate_continue Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 6/7] migration: allow cancel to unpause Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 7/7] migration: pause-before-device for postcopy Dr. David Alan Gilbert (git)
2017-10-11 20:03 ` [Qemu-devel] [PATCH 0/7] migration: pause-before-device no-reply
2017-10-12 8:21 ` Daniel P. Berrange
2017-10-12 9:18 ` Kevin Wolf
2017-10-12 9:27 ` Daniel P. Berrange
2017-10-12 9:52 ` Kevin Wolf
2017-10-12 9:55 ` Daniel P. Berrange [this message]
2017-10-12 10:02 ` Daniel P. Berrange
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171012095508.GF16125@redhat.com \
--to=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eblake@redhat.com \
--cc=fuweiwei2@huawei.com \
--cc=jdenemar@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=wangjie88@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).