From: Juan Quintela <quintela@redhat.com>
To: qemu-devel@nongnu.org
Cc: dgilbert@redhat.com, lvivier@redhat.com, peterx@redhat.com
Subject: [Qemu-devel] [PULL 21/40] migration: new state "postcopy-recover"
Date: Wed, 16 May 2018 01:39:58 +0200 [thread overview]
Message-ID: <20180515234017.2277-22-quintela@redhat.com> (raw)
In-Reply-To: <20180515234017.2277-1-quintela@redhat.com>
From: Peter Xu <peterx@redhat.com>
Introducing new migration state "postcopy-recover". If a migration
procedure is paused and the connection is rebuilt afterward
successfully, we'll switch the source VM state from "postcopy-paused" to
the new state "postcopy-recover", then we'll do the resume logic in the
migration thread (along with the return path thread).
This patch only do the state switch on source side. Another following up
patch will handle the state switching on destination side using the same
status bit.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-10-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
s/2.11/2.13/
---
migration/migration.c | 78 ++++++++++++++++++++++++++++++++-----------
qapi/migration.json | 4 ++-
2 files changed, 61 insertions(+), 21 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 038fae93ab..4ab637a1fe 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -577,6 +577,7 @@ static bool migration_is_setup_or_active(int state)
case MIGRATION_STATUS_ACTIVE:
case MIGRATION_STATUS_POSTCOPY_ACTIVE:
case MIGRATION_STATUS_POSTCOPY_PAUSED:
+ case MIGRATION_STATUS_POSTCOPY_RECOVER:
case MIGRATION_STATUS_SETUP:
case MIGRATION_STATUS_PRE_SWITCHOVER:
case MIGRATION_STATUS_DEVICE:
@@ -658,6 +659,7 @@ static void fill_source_migration_info(MigrationInfo *info)
case MIGRATION_STATUS_PRE_SWITCHOVER:
case MIGRATION_STATUS_DEVICE:
case MIGRATION_STATUS_POSTCOPY_PAUSED:
+ case MIGRATION_STATUS_POSTCOPY_RECOVER:
/* TODO add some postcopy stats */
info->has_status = true;
info->has_total_time = true;
@@ -2318,6 +2320,13 @@ typedef enum MigThrError {
MIG_THR_ERR_FATAL = 2,
} MigThrError;
+/* Return zero if success, or <0 for error */
+static int postcopy_do_resume(MigrationState *s)
+{
+ /* TODO: do the resume logic */
+ return 0;
+}
+
/*
* We don't return until we are in a safe state to continue current
* postcopy migration. Returns MIG_THR_ERR_RECOVERED if recovered, or
@@ -2326,29 +2335,55 @@ typedef enum MigThrError {
static MigThrError postcopy_pause(MigrationState *s)
{
assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
- migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
- MIGRATION_STATUS_POSTCOPY_PAUSED);
- /* Current channel is possibly broken. Release it. */
- assert(s->to_dst_file);
- qemu_file_shutdown(s->to_dst_file);
- qemu_fclose(s->to_dst_file);
- s->to_dst_file = NULL;
+ while (true) {
+ migrate_set_state(&s->state, s->state,
+ MIGRATION_STATUS_POSTCOPY_PAUSED);
- error_report("Detected IO failure for postcopy. "
- "Migration paused.");
+ /* Current channel is possibly broken. Release it. */
+ assert(s->to_dst_file);
+ qemu_file_shutdown(s->to_dst_file);
+ qemu_fclose(s->to_dst_file);
+ s->to_dst_file = NULL;
- /*
- * We wait until things fixed up. Then someone will setup the
- * status back for us.
- */
- while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
- qemu_sem_wait(&s->postcopy_pause_sem);
+ error_report("Detected IO failure for postcopy. "
+ "Migration paused.");
+
+ /*
+ * We wait until things fixed up. Then someone will setup the
+ * status back for us.
+ */
+ while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+ qemu_sem_wait(&s->postcopy_pause_sem);
+ }
+
+ if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+ /* Woken up by a recover procedure. Give it a shot */
+
+ /*
+ * Firstly, let's wake up the return path now, with a new
+ * return path channel.
+ */
+ qemu_sem_post(&s->postcopy_pause_rp_sem);
+
+ /* Do the resume logic */
+ if (postcopy_do_resume(s) == 0) {
+ /* Let's continue! */
+ trace_postcopy_pause_continued();
+ return MIG_THR_ERR_RECOVERED;
+ } else {
+ /*
+ * Something wrong happened during the recovery, let's
+ * pause again. Pause is always better than throwing
+ * data away.
+ */
+ continue;
+ }
+ } else {
+ /* This is not right... Time to quit. */
+ return MIG_THR_ERR_FATAL;
+ }
}
-
- trace_postcopy_pause_continued();
-
- return MIG_THR_ERR_RECOVERED;
}
static MigThrError migration_detect_error(MigrationState *s)
@@ -2668,7 +2703,10 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
}
if (resume) {
- /* TODO: do the resume logic */
+ /* Wakeup the main migration thread to do the recovery */
+ migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+ MIGRATION_STATUS_POSTCOPY_RECOVER);
+ qemu_sem_post(&s->postcopy_pause_sem);
return;
}
diff --git a/qapi/migration.json b/qapi/migration.json
index b8ca60ac43..4e8e61ceef 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -91,6 +91,8 @@
#
# @postcopy-paused: during postcopy but paused. (since 2.13)
#
+# @postcopy-recover: trying to recover from a paused postcopy. (since 2.13)
+#
# @completed: migration is finished.
#
# @failed: some error occurred during migration process.
@@ -109,7 +111,7 @@
{ 'enum': 'MigrationStatus',
'data': [ 'none', 'setup', 'cancelling', 'cancelled',
'active', 'postcopy-active', 'postcopy-paused',
- 'completed', 'failed', 'colo',
+ 'postcopy-recover', 'completed', 'failed', 'colo',
'pre-switchover', 'device' ] }
##
--
2.17.0
next prev parent reply other threads:[~2018-05-15 23:40 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-15 23:39 [Qemu-devel] [PULL 00/40] Migration PULL requset (take 2) Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 01/40] migration: fix saving normal page even if it's been compressed Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 02/40] tests: Add migration precopy test Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 03/40] tests: Migration ppc now inlines its program Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 04/40] migration: Set error state in case of error Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 05/40] migration: Introduce multifd_recv_new_channel() Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 06/40] migration: terminate_* can be called for other threads Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 07/40] migration: Be sure all recv channels are created Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 08/40] migration: Export functions to create send channels Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 09/40] migration: Create multifd channels Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 10/40] migration: Delay start of migration main routines Juan Quintela
2018-05-18 8:59 ` Kevin Wolf
2018-05-18 10:34 ` Dr. David Alan Gilbert
2018-05-18 12:14 ` Kevin Wolf
2018-05-22 16:20 ` Kevin Wolf
2018-05-23 6:29 ` Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 11/40] migration: Transmit initial package through the multifd channels Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 12/40] migration: Define MultifdRecvParams sooner Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 13/40] migration: let incoming side use thread context Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 14/40] migration: new postcopy-pause state Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 15/40] migration: implement "postcopy-pause" src logic Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 16/40] migration: allow dst vm pause on postcopy Juan Quintela
2018-06-04 13:49 ` Peter Maydell
2018-06-05 7:48 ` Peter Xu
2018-06-05 10:45 ` Peter Xu
2018-05-15 23:39 ` [Qemu-devel] [PULL 17/40] migration: allow src return path to pause Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 18/40] migration: allow fault thread " Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 19/40] qmp: hmp: add migrate "resume" option Juan Quintela
2018-05-15 23:39 ` [Qemu-devel] [PULL 20/40] migration: rebuild channel on source Juan Quintela
2018-05-15 23:39 ` Juan Quintela [this message]
2018-05-15 23:39 ` [Qemu-devel] [PULL 22/40] migration: wakeup dst ram-load-thread for recover Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 23/40] migration: new cmd MIG_CMD_RECV_BITMAP Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 24/40] migration: new message MIG_RP_MSG_RECV_BITMAP Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 25/40] migration: new cmd MIG_CMD_POSTCOPY_RESUME Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 26/40] migration: new message MIG_RP_MSG_RESUME_ACK Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 27/40] migration: introduce SaveVMHandlers.resume_prepare Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 28/40] migration: synchronize dirty bitmap for resume Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 29/40] migration: setup ramstate " Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 30/40] migration: final handshake for the resume Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 31/40] migration: init dst in migration_object_init too Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 32/40] qmp/migration: new command migrate-recover Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 33/40] hmp/migration: add migrate_recover command Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 34/40] migration: introduce lock for to_dst_file Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 35/40] migration/qmp: add command migrate-pause Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 36/40] migration/hmp: add migrate_pause command Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 37/40] migration: update docs Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 38/40] migration: update index field when delete or qsort RDMALocalBlock Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 39/40] migration: Textual fixups for blocktime Juan Quintela
2018-05-15 23:40 ` [Qemu-devel] [PULL 40/40] Migration+TLS: Fix crash due to double cleanup Juan Quintela
2018-05-17 10:59 ` [Qemu-devel] [PULL 00/40] Migration PULL requset (take 2) Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180515234017.2277-22-quintela@redhat.com \
--to=quintela@redhat.com \
--cc=dgilbert@redhat.com \
--cc=lvivier@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).