From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: Laurent Vivier <lvivier@redhat.com>,
"Daniel P . Berrange" <berrange@redhat.com>,
Alexey Perevalov <a.perevalov@samsung.com>,
Juan Quintela <quintela@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
peterx@redhat.com
Subject: [Qemu-devel] [RFC v2 00/33] Migration: postcopy failure recovery
Date: Wed, 30 Aug 2017 16:31:57 +0800 [thread overview]
Message-ID: <1504081950-2528-1-git-send-email-peterx@redhat.com> (raw)
v2 note (the coarse-grained changelog):
- I appended the migrate-incoming re-use series into this one, since
that one depends on this one, and it's really for the recovery
- I haven't yet added (actually I just added them but removed) the
per-monitor thread related patches into this one, basically to setup
"need-bql"="false" patches - the solution for the monitor hang issue
is still during discussion in the other thread. I'll add them in
when settled.
- Quite a lot of other changes and additions regarding to v1 review
comments. I think I settled all the comments, but the God knows
better.
Feel free to skip this ugly longer changelog (it's too long to be
meaningful I'm afraid).
v2:
- rebased to alexey's received bitmap v9
- add Dave's r-bs for patches: 2/5/6/8/9/13/14/15/16/20/21
- patch 1: use target page size to calc bitmap [Dave]
- patch 3: move trace_*() after EINTR check [Dave]
- patch 4: dropped since I can use bitmap_complement() [Dave]
- patch 7: check file error right after data is read in both
qemu_loadvm_section_start_full() and qemu_loadvm_section_part_end(),
meanwhile also check in check_section_footer() [Dave]
- patch 8/9: fix error_report/commit message in both patches [Dave]
- patch 10: dropped (new parameter "x-postcopy-fast")
- patch 11: split the "postcopy-paused" patch into two, one to
introduce the new state, the other to implement the logic. Also,
print something when paused [Dave]
- patch 17: removed do_resume label, introduced migration_prepare()
[Dave]
- patch 18: removed do_pause label using a new loop [Dave]
- patch 20: removed incorrect comment [Dave]
- patch 21: use 256B buffer in qemu_savevm_send_recv_bitmap(), add
trace in loadvm_handle_recv_bitmap() [Dave]
- patch 22: fix MIG_RP_MSG_RECV_BITMAP for (1) endianess (2) 32/64bit
machines. More info in the commit message update.
- patch 23: add one check on migration state [Dave]
- patch 24: use macro instead of magic 1 [Dave]
- patch 26: use more trace_*() instead of one, and use one sem to
replace mutex+cond. [Dave]
- move sem init/destroy into migration_instance_init() and
migration_instance_finalize (new function after rebase).
- patch 29: squashed this patch most into:
"migration: implement "postcopy-pause" src logic" [Dave]
- split the two fix patches out of the series
- fixed two places where I misused "wake/woke/woken". [Dave]
- add new patch "bitmap: provide to_le/from_le helpers" to solve the
bitmap endianess issue [Dave]
- appended migrate_incoming series to this series, since that one is
depending on the paused state. Using explicit g_source_remove() for
listening ports [Dan]
FUTURE TODO LIST
- support manual switch source into PAUSED state
- support migrate_cancel during PAUSED/RECOVER state
- when anything wrong happens during PAUSED/RECOVER, switching back to
PAUSED state on both sides
As we all know that postcopy migration has a potential risk to lost
the VM if the network is broken during the migration. This series
tries to solve the problem by allowing the migration to pause at the
failure point, and do recovery after the link is reconnected.
There was existing work on this issue from Md Haris Iqbal:
https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html
This series is a totally re-work of the issue, based on Alexey
Perevalov's recved bitmap v8 series:
https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html
Two new status are added to support the migration (used on both
sides):
MIGRATION_STATUS_POSTCOPY_PAUSED
MIGRATION_STATUS_POSTCOPY_RECOVER
The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the
network failure is detected. It is a phase that we'll be in for a long
time as long as the failure is detected, and we'll be there until a
recovery is triggered. In this state, all the threads (on source:
send thread, return-path thread; destination: ram-load thread,
page-fault thread) will be halted.
The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered
a recovery, both source/destination VM will jump into this stage, do
whatever it needs to prepare the recovery (e.g., currently the most
important thing is to synchronize the dirty bitmap, please see commit
messages for more information). After the preparation is ready, the
source will do the final handshake with destination, then both sides
will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.
New commands/messages are defined as well to satisfy the need:
MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for
delivering received bitmaps
MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final
handshake of postcopy recovery.
Here's some more details on how the whole failure/recovery routine is
happened:
- start migration
- ... (switch from precopy to postcopy)
- both sides are in "postcopy-active" state
- ... (failure happened, e.g., network unplugged)
- both sides switch to "postcopy-paused" state
- all the migration threads are stopped on both sides
- ... (both VMs hanged)
- ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on
source side, "-r" means "recover")
- both sides switch to "postcopy-recover" state
- on source: send-thread, return-path-thread will be waked up
- on dest: ram-load-thread waked up, fault-thread still paused
- source calls new savevmhandler hook resume_prepare() (currently,
only ram is providing the hook):
- ram_resume_prepare(): for each ramblock, fetch recved bitmap by:
- src sends MIG_CMD_RECV_BITMAP to dst
- dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data
- src uses the recved bitmap to rebuild dirty bitmap
- source do final handshake with destination
- src sends MIG_CMD_RESUME to dst, telling "src is ready"
- when dst receives the command, fault thread will be waked up,
meanwhile, dst switch back to "postcopy-active"
- dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready"
- when src receives the ack, state switch to "postcopy-active"
- postcopy migration continued
Testing:
As I said, it's still an extremely simple test. I used socat to create
a socket bridge:
socat tcp-listen:6666 tcp-connect:localhost:5555 &
Then do the migration via the bridge. I emulated the network failure
by killing the socat process (bridge down), then tries to recover the
migration using the other channel (default dst channel). It looks
like:
port:6666 +------------------+
+----------> | socat bridge [1] |-------+
| +------------------+ |
| (Original channel) |
| | port: 5555
+---------+ (Recovery channel) +--->+---------+
| src VM |------------------------------------>| dst VM |
+---------+ +---------+
Known issues/notes:
- currently destination listening port still cannot change. E.g., the
recovery should be using the same port on destination for
simplicity. (on source, we can specify new URL)
- the patch: "migration: let dst listen on port always" is still
hacky, it just kept the incoming accept open forever for now...
- some migration numbers might still be inaccurate, like total
migration time, etc. (But I don't really think that matters much
now)
- the patches are very lightly tested.
- Dave reported one problem that may hang destination main loop thread
(one vcpu thread holds the BQL) and the rest. I haven't encountered
it yet, but it does not mean this series can survive with it.
- other potential issues that I may have forgotten or unnoticed...
Anyway, the work is still in preliminary stage. Any suggestions and
comments are greatly welcomed. Thanks.
Peter Xu (33):
bitmap: remove BITOP_WORD()
bitmap: introduce bitmap_count_one()
bitmap: provide to_le/from_le helpers
migration: dump str in migrate_set_state trace
migration: better error handling with QEMUFile
migration: reuse mis->userfault_quit_fd
migration: provide postcopy_fault_thread_notify()
migration: new postcopy-pause state
migration: implement "postcopy-pause" src logic
migration: allow dst vm pause on postcopy
migration: allow src return path to pause
migration: allow send_rq to fail
migration: allow fault thread to pause
qmp: hmp: add migrate "resume" option
migration: pass MigrationState to migrate_init()
migration: rebuild channel on source
migration: new state "postcopy-recover"
migration: wakeup dst ram-load-thread for recover
migration: new cmd MIG_CMD_RECV_BITMAP
migration: new message MIG_RP_MSG_RECV_BITMAP
migration: new cmd MIG_CMD_POSTCOPY_RESUME
migration: new message MIG_RP_MSG_RESUME_ACK
migration: introduce SaveVMHandlers.resume_prepare
migration: synchronize dirty bitmap for resume
migration: setup ramstate for resume
migration: final handshake for the resume
migration: free SocketAddress where allocated
migration: return incoming task tag for sockets
migration: return incoming task tag for exec
migration: return incoming task tag for fd
migration: store listen task tag
migration: allow migrate_incoming for paused VM
migration: init dst in migration_object_init too
hmp-commands.hx | 7 +-
hmp.c | 4 +-
include/migration/register.h | 2 +
include/qemu/bitmap.h | 17 ++
migration/exec.c | 20 +-
migration/exec.h | 2 +-
migration/fd.c | 20 +-
migration/fd.h | 2 +-
migration/migration.c | 578 ++++++++++++++++++++++++++++++++++++++-----
migration/migration.h | 26 +-
migration/postcopy-ram.c | 107 ++++++--
migration/postcopy-ram.h | 2 +
migration/ram.c | 265 +++++++++++++++++++-
migration/ram.h | 3 +
migration/savevm.c | 229 ++++++++++++++++-
migration/savevm.h | 3 +
migration/socket.c | 42 ++--
migration/socket.h | 4 +-
migration/trace-events | 21 +-
qapi-schema.json | 12 +-
util/bitmap.c | 47 ++++
util/bitops.c | 6 +-
22 files changed, 1266 insertions(+), 153 deletions(-)
--
2.7.4
next reply other threads:[~2017-08-30 8:32 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-30 8:31 Peter Xu [this message]
2017-08-30 8:31 ` [Qemu-devel] [RFC v2 01/33] bitmap: remove BITOP_WORD() Peter Xu
2017-09-20 8:41 ` Juan Quintela
2017-08-30 8:31 ` [Qemu-devel] [RFC v2 02/33] bitmap: introduce bitmap_count_one() Peter Xu
2017-09-20 8:25 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 03/33] bitmap: provide to_le/from_le helpers Peter Xu
2017-09-21 17:35 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 04/33] migration: dump str in migrate_set_state trace Peter Xu
2017-09-06 14:36 ` Dr. David Alan Gilbert
2017-09-20 8:44 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 05/33] migration: better error handling with QEMUFile Peter Xu
2017-09-21 17:51 ` Dr. David Alan Gilbert
2017-09-26 8:48 ` Peter Xu
2017-09-26 8:53 ` Dr. David Alan Gilbert
2017-09-26 9:13 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 06/33] migration: reuse mis->userfault_quit_fd Peter Xu
2017-09-20 8:47 ` Juan Quintela
2017-09-20 9:06 ` Juan Quintela
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 07/33] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 08/33] migration: new postcopy-pause state Peter Xu
2017-09-21 17:57 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 09/33] migration: implement "postcopy-pause" src logic Peter Xu
2017-09-21 19:21 ` Dr. David Alan Gilbert
2017-09-26 9:35 ` Peter Xu
2017-10-09 15:32 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy Peter Xu
2017-09-21 19:29 ` Dr. David Alan Gilbert
2017-09-27 7:34 ` Peter Xu
2017-10-09 18:58 ` Dr. David Alan Gilbert
2017-10-10 9:38 ` Peter Xu
2017-10-10 11:31 ` Peter Xu
2017-10-31 18:57 ` Dr. David Alan Gilbert
2017-10-10 12:30 ` Dr. David Alan Gilbert
2017-10-11 3:00 ` Peter Xu
2017-10-12 12:19 ` Dr. David Alan Gilbert
2017-10-13 5:08 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 11/33] migration: allow src return path to pause Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 12/33] migration: allow send_rq to fail Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 13/33] migration: allow fault thread to pause Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 14/33] qmp: hmp: add migrate "resume" option Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 15/33] migration: pass MigrationState to migrate_init() Peter Xu
2017-09-22 9:09 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 16/33] migration: rebuild channel on source Peter Xu
2017-09-22 9:56 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 17/33] migration: new state "postcopy-recover" Peter Xu
2017-09-22 10:08 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 18/33] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 19/33] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 20/33] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-09-22 11:05 ` Dr. David Alan Gilbert
2017-09-27 10:04 ` Peter Xu
2017-10-09 19:12 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 21/33] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-09-22 11:08 ` Dr. David Alan Gilbert
2017-09-27 10:11 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 22/33] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-09-22 11:13 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 23/33] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-09-22 11:17 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 24/33] migration: synchronize dirty bitmap for resume Peter Xu
2017-09-22 11:33 ` Dr. David Alan Gilbert
2017-09-28 2:30 ` Peter Xu
2017-10-02 11:04 ` Dr. David Alan Gilbert
2017-10-09 3:55 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 25/33] migration: setup ramstate " Peter Xu
2017-09-22 11:53 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 26/33] migration: final handshake for the resume Peter Xu
2017-09-22 11:56 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 27/33] migration: free SocketAddress where allocated Peter Xu
2017-09-22 20:08 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 28/33] migration: return incoming task tag for sockets Peter Xu
2017-09-22 20:11 ` Dr. David Alan Gilbert
2017-09-28 3:12 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 29/33] migration: return incoming task tag for exec Peter Xu
2017-09-22 20:15 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 30/33] migration: return incoming task tag for fd Peter Xu
2017-09-22 20:15 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 31/33] migration: store listen task tag Peter Xu
2017-09-22 20:17 ` Dr. David Alan Gilbert
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 32/33] migration: allow migrate_incoming for paused VM Peter Xu
2017-09-22 20:32 ` Dr. David Alan Gilbert
2017-09-28 6:54 ` Peter Xu
2017-10-09 17:28 ` Dr. David Alan Gilbert
2017-10-10 10:08 ` Peter Xu
2017-08-30 8:32 ` [Qemu-devel] [RFC v2 33/33] migration: init dst in migration_object_init too Peter Xu
2017-09-22 20:37 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1504081950-2528-1-git-send-email-peterx@redhat.com \
--to=peterx@redhat.com \
--cc=a.perevalov@samsung.com \
--cc=aarcange@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=lvivier@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).