qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover
@ 2017-10-18 17:40 Dr. David Alan Gilbert (git)
  2017-10-18 17:40 ` [Qemu-devel] [PATCH v3 1/7] migration: Add 'pause-before-switchover' capability Dr. David Alan Gilbert (git)
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-10-18 17:40 UTC (permalink / raw)
  To: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz
  Cc: berrange, eblake, fuweiwei2

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This set attempts to make a race condition between migration and
drive-mirror (and other block users) soluble by allowing the migration
to be paused after the source qemu releases the block devices but
before the serialisation of the device state.

The symptom of this failure, as reported by Wangjie, is a:
   _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed

and the source qemu dieing; so the problem is pretty nasty.
This has only been seen on 2.9 onwards, but the theory is that
prior to 2.9 it might have been happening anyway and we were
perhaps getting unreported corruptions (lost writes); so this
really needs fixing.

This flow came from discussions between Kevin and me, and we can't
see a way of fixing it without exposing a new state to the management
layer.

The flow is now:

(qemu) migrate_set_capability pause-before-switchover on
(qemu) migrate -d ...
(qemu) info migrate
...
Migration status: pre-switchover
...
<< issue commands to clean up any block jobs>>

(qemu) migrate_continue pre-switchover
(qemu) info migrate
...
Migration status: completed

This set has been _very_ lightly tested just at the normal migration
code, without the addition of the drive mirror; so this is a first
cut.  I'd appreciate some feedback from libvirt whether the inteface
is OK and ideally a hack to test it in a full libvirt setup to see
if we hit any other issues.

The precopy flow is:
active->pre-switchover->device->completed

The postcopy flow is:
active->pre-switchover->postcopy-active->completed

Although the behaviour with postcopy only gets interesting when
we add something like Max's active-sync.

Dave

--
v3
  A couple of FIXUPs that had escaped v2's merge

v2
  Pause *before* block inactivation (thanks Peter)
  Rename state and capability to Dan+KWolf's combined suggestion


Dr. David Alan Gilbert (7):
  migration: Add 'pause-before-switchover' capability
  migration: Add 'pre-switchover' and 'device' statuses
  migration: Wait for semaphore before completing migration
  migration: migrate-continue
  migrate: HMP migate_continue
  migration: allow cancel to unpause
  migration: pause-before-switchover for postcopy

 hmp-commands.hx       | 12 +++++++
 hmp.c                 | 13 ++++++++
 hmp.h                 |  1 +
 migration/migration.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++--
 migration/migration.h |  4 +++
 qapi/migration.json   | 30 ++++++++++++++++--
 6 files changed, 144 insertions(+), 4 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-10-20  8:04 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-18 17:40 [Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover Dr. David Alan Gilbert (git)
2017-10-18 17:40 ` [Qemu-devel] [PATCH v3 1/7] migration: Add 'pause-before-switchover' capability Dr. David Alan Gilbert (git)
2017-10-19  4:17   ` Peter Xu
2017-10-18 17:40 ` [Qemu-devel] [PATCH v3 2/7] migration: Add 'pre-switchover' and 'device' statuses Dr. David Alan Gilbert (git)
2017-10-19  4:34   ` Peter Xu
2017-10-18 17:40 ` [Qemu-devel] [PATCH v3 3/7] migration: Wait for semaphore before completing migration Dr. David Alan Gilbert (git)
2017-10-19  4:39   ` Peter Xu
2017-10-18 17:40 ` [Qemu-devel] [PATCH v3 4/7] migration: migrate-continue Dr. David Alan Gilbert (git)
2017-10-19  4:43   ` Peter Xu
2017-10-19 14:33   ` Jiri Denemark
2017-10-19 14:37     ` Dr. David Alan Gilbert
2017-10-18 17:40 ` [Qemu-devel] [PATCH v3 5/7] migrate: HMP migate_continue Dr. David Alan Gilbert (git)
2017-10-19  4:44   ` Peter Xu
2017-10-18 17:40 ` [Qemu-devel] [PATCH v3 6/7] migration: allow cancel to unpause Dr. David Alan Gilbert (git)
2017-10-19  4:44   ` Peter Xu
2017-10-18 17:40 ` [Qemu-devel] [PATCH v3 7/7] migration: pause-before-switchover for postcopy Dr. David Alan Gilbert (git)
2017-10-19  5:08   ` Peter Xu
2017-10-19  4:31 ` [Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover Peter Xu
2017-10-19 11:21   ` Dr. David Alan Gilbert
2017-10-20  2:42     ` Peter Xu
2017-10-20  8:04       ` Dr. David Alan Gilbert
2017-10-19 15:24 ` Jiri Denemark
2017-10-19 19:10   ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).