qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] migration: switchover-hold flag
@ 2023-06-02  1:16 Peter Xu
  2023-06-02  1:16 ` [PATCH RFC 1/3] migration: switchover-hold parameter Peter Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Peter Xu @ 2023-06-02  1:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Avihai Horon, Paolo Bonzini, Daniel P . Berrangé,
	Leonardo Bras Soares Passos, Juan Quintela, peterx,
	Laurent Vivier, Thomas Huth

This RFC patchset is based on Daniel's recent refactoring on migration-test:

https://lore.kernel.org/r/20230601161347.1803440-1-berrange@redhat.com
Based-on: <20230601161347.1803440-1-berrange@redhat.com>

A new flag "switchover-hold" is added to allow src qemu explicitly hold
switchover for precopy migration.  Note that this flag will not affect
postcopy switchover because src qemu already has migrate-start-postcopy,
which is a finer grained knob just for that.  In general this flag only
affects reaching migration completion phase, when set it'll block it from
happening while keep the migration iteration going.

This can be used in two cases so far in my mind:

  (1) One can use this parameter to start pre-heating migration (but not
      really migrating, so a migrate-cancel will cancel the preheat).  When
      the user wants to really migrate, just clear the flag.  It'll in most
      cases migrate immediately because most pages are already synced.

  (2) Can also be used as a clean way to do qtest, in many of the precopy
      tests we have requirement to run after 1 iteration without completing
      the precopy migration.  Before that we have either set bandwidth to
      ridiculous low value, or tricks on detecting guest memory change over
      some adhoc guest memory position.  Now we can simply set this flag
      then we know precopy won't complete and will just keep going.

The 1st use case may look a bit like COLO where we can actually keep both
QEMU _mostly_ in sync.  I'm not sure whether it can be useful anywhere,
though.

Patch 1 will introduce the new flag.

Patch 2 will temprarily revert the last patch from Daniel's series, so
potentially this will temporarily make migration-test slow again.  If we
want to avoid this we can merge patch 2 & 3 too.

Patch 3 will leverage the new flag to speed up migration-test.  There're
still some difference comparing to Daniel's solution (e.g., we can still
wait for a whole iteration for each test since we run the initial 3MB also
with full bw).  An initial test is this can make migration-test finish
within a little bit more than 1m.

Please have a look, thanks.

Peter Xu (3):
  migration: switchover-hold parameter
  Revert "tests/qtest: massively speed up migration-test"
  qtest/migration: Use switchover-hold to speedup

 qapi/migration.json            |  25 ++++-
 migration/migration.h          |   7 ++
 migration/migration-hmp-cmds.c |   3 +
 migration/migration.c          |  56 ++++++++++-
 migration/options.c            |  17 ++++
 tests/qtest/migration-test.c   | 163 +++++++--------------------------
 6 files changed, 134 insertions(+), 137 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/3] migration: switchover-hold parameter
  2023-06-02  1:16 [PATCH RFC 0/3] migration: switchover-hold flag Peter Xu
@ 2023-06-02  1:16 ` Peter Xu
  2023-06-02 11:14   ` Daniel P. Berrangé
  2023-06-02  1:16 ` [PATCH RFC 2/3] Revert "tests/qtest: massively speed up migration-test" Peter Xu
  2023-06-02  1:16 ` [PATCH RFC 3/3] qtest/migration: Use switchover-hold to speedup Peter Xu
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2023-06-02  1:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Avihai Horon, Paolo Bonzini, Daniel P . Berrangé,
	Leonardo Bras Soares Passos, Juan Quintela, peterx,
	Laurent Vivier, Thomas Huth

Add a new migration parameter switchover-hold which can block src qemu
migration from switching over to dest from running.

One can set this flag to true so src qemu will keep iterating the VM data,
not switching over to dest even if it can.

It means now live migration works somehow like COLO; we keep syncing data
from src to dst without stopping.

When the user is ready for the switchover, one can set the parameter from
true->false.  That'll contain a implicit kick to migration thread to be
alive and re-evaluate the switchover decision.

This can be used in two cases so far in my mind:

  (1) One can use this parameter to start pre-heating migration (but not
      really migrating, so a migrate-cancel will cancel the preheat).  When
      the user wants to really migrate, just clear the flag.  It'll in most
      cases migrate immediately because most pages are already synced.

  (2) Can also be used as a clean way to do qtest, in many of the precopy
      tests we have requirement to run after 1 iteration without completing
      the precopy migration.  Before that we have either set bandwidth to
      ridiculous low value, or tricks on detecting guest memory change over
      some adhoc guest memory position.  Now we can simply set this flag
      then we know precopy won't complete and will just keep going.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json            | 25 +++++++++++++--
 migration/migration.h          |  7 +++++
 migration/migration-hmp-cmds.c |  3 ++
 migration/migration.c          | 56 ++++++++++++++++++++++++++++++++--
 migration/options.c            | 17 +++++++++++
 5 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 179af0c4d8..1d0059d125 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -779,6 +779,15 @@
 #     Nodes are mapped to their block device name if there is one, and
 #     to their node name otherwise.  (Since 5.2)
 #
+# @switchover-hold: Whether we should hold-off precopy switchover from src
+#     to dest QEMU, even if we can finish migration in the downtime
+#     specified.  By default off, so precopy migration will complete as
+#     soon as possible.  One can set it to explicitly keep iterating during
+#     precopy migration until set the flag to false again to kick off the
+#     final switchover.  Note, this does not affect postcopy switchover,
+#     because the user can control that using "migrate-start-postcopy"
+#     command explicitly. (Since 8.1)
+#
 # Features:
 #
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -800,7 +809,7 @@
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
            'multifd-zlib-level' ,'multifd-zstd-level',
-           'block-bitmap-mapping' ] }
+           'block-bitmap-mapping', 'switchover-hold' ] }
 
 ##
 # @MigrateSetParameters:
@@ -935,6 +944,10 @@
 #     Nodes are mapped to their block device name if there is one, and
 #     to their node name otherwise.  (Since 5.2)
 #
+# @switchover-hold: Whether we should hold-off precopy switchover from src
+#     to dest QEMU.  For more details, please refer to MigrationParameter
+#     entry of the same field. (Since 8.1)
+#
 # Features:
 #
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -972,7 +985,8 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
-            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+            '*switchover-hold': 'bool' } }
 
 ##
 # @migrate-set-parameters:
@@ -1127,6 +1141,10 @@
 #     Nodes are mapped to their block device name if there is one, and
 #     to their node name otherwise.  (Since 5.2)
 #
+# @switchover-hold: Whether we should hold-off precopy switchover from src
+#     to dest QEMU.  For more details, please refer to MigrationParameter
+#     entry of the same field. (Since 8.1)
+#
 # Features:
 #
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -1161,7 +1179,8 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
-            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+            '*switchover-hold': 'bool' } }
 
 ##
 # @query-migrate-parameters:
diff --git a/migration/migration.h b/migration/migration.h
index 48a46123a0..086ceec754 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -437,6 +437,13 @@ struct MigrationState {
 
     /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
     JSONWriter *vmdesc;
+    /*
+     * Only migration thread will wait on it when switchover_hold==true.
+     *
+     * Only qmp set param will kick it when switching switchover_hold from
+     * true->false.
+     */
+    QemuSemaphore switchover_hold_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 9885d7c9f7..63a2c8a4a3 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -338,6 +338,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: '%s'\n",
             MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ),
             params->tls_authz);
+        monitor_printf(mon, "%s: %s\n",
+            MigrationParameter_str(MIGRATION_PARAMETER_SWITCHOVER_HOLD),
+            params->switchover_hold ? "on" : "off");
 
         if (params->has_block_bitmap_mapping) {
             const BitmapMigrationNodeAliasList *bmnal;
diff --git a/migration/migration.c b/migration/migration.c
index 5de7f734b9..30f165527b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2700,6 +2700,55 @@ static void migration_update_counters(MigrationState *s,
                               bandwidth, s->threshold_size);
 }
 
+static bool
+migration_should_complete(MigrationState *s, uint64_t pending_size)
+{
+    /* We still have large pending data to send? */
+    if (pending_size && (pending_size >= s->threshold_size)) {
+        return false;
+    }
+
+    /* The user doesn't want us to switchover yet */
+    if (s->parameters.switchover_hold) {
+        /*
+         * Note: when reaching here it probably means we've migrated almost
+         * everything and ready to switchover.  If user asked not to switch
+         * wait for a short period and respond to kicks immediately.
+         *
+         * If we wait too long, there can be a lot of dirty data generated,
+         * while we could have done something to sync data between src/dst.
+         *
+         * If we wait too short, migration thread can eat most/all cpu
+         * resource looping over switchover_hold.
+         *
+         * Make it 10ms which seems to be a good intermediate value.
+         */
+        qemu_sem_timedwait(&s->switchover_hold_sem, 10);
+
+        /*
+         * Return false here always even if user changed it, because we'd
+         * like to re-evaluate everything (e.g. pending_size).
+         */
+        return false;
+    }
+
+    return true;
+}
+
+static bool
+migration_should_start_postcopy(MigrationState *s, uint64_t must_precopy)
+{
+    /* If we're already in postcopy phase, don't bother */
+    if (migration_in_postcopy()) {
+        return false;
+    }
+    /* We still have lots of thing that must be migrated in precopy */
+    if (must_precopy > s->threshold_size) {
+        return false;
+    }
+    return qatomic_read(&s->start_postcopy);
+}
+
 /* Migration thread iteration status */
 typedef enum {
     MIG_ITERATE_RESUME,         /* Resume current iteration */
@@ -2727,15 +2776,14 @@ static MigIterateState migration_iteration_run(MigrationState *s)
         trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
     }
 
-    if (!pending_size || pending_size < s->threshold_size) {
+    if (migration_should_complete(s, pending_size)) {
         trace_migration_thread_low_pending(pending_size);
         migration_completion(s);
         return MIG_ITERATE_BREAK;
     }
 
     /* Still a significant amount to transfer */
-    if (!in_postcopy && must_precopy <= s->threshold_size &&
-        qatomic_read(&s->start_postcopy)) {
+    if (migration_should_start_postcopy(s, must_precopy)) {
         if (postcopy_start(s)) {
             error_report("%s: postcopy failed to start", __func__);
         }
@@ -3287,6 +3335,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->rp_state.rp_sem);
     qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
+    qemu_sem_destroy(&ms->switchover_hold_sem);
     error_free(ms->error);
 }
 
@@ -3309,6 +3358,7 @@ static void migration_instance_init(Object *obj)
     qemu_sem_init(&ms->rate_limit_sem, 0);
     qemu_sem_init(&ms->wait_unplug_sem, 0);
     qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
+    qemu_sem_init(&ms->switchover_hold_sem, 0);
     qemu_mutex_init(&ms->qemu_file_lock);
 }
 
diff --git a/migration/options.c b/migration/options.c
index b62ab30cd5..2d6b138352 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -163,6 +163,8 @@ Property migration_properties[] = {
     DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
     DEFINE_PROP_STRING("tls-hostname", MigrationState, parameters.tls_hostname),
     DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
+    DEFINE_PROP_BOOL("switchover-hold", MigrationState,
+                     parameters.switchover_hold, false),
 
     /* Migration capabilities */
     DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -883,6 +885,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->announce_rounds = s->parameters.announce_rounds;
     params->has_announce_step = true;
     params->announce_step = s->parameters.announce_step;
+    params->has_switchover_hold = true;
+    params->switchover_hold = s->parameters.switchover_hold;
 
     if (s->parameters.has_block_bitmap_mapping) {
         params->has_block_bitmap_mapping = true;
@@ -923,6 +927,7 @@ void migrate_params_init(MigrationParameters *params)
     params->has_announce_max = true;
     params->has_announce_rounds = true;
     params->has_announce_step = true;
+    params->has_switchover_hold = true;
 }
 
 /*
@@ -1177,6 +1182,9 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_announce_step) {
         dest->announce_step = params->announce_step;
     }
+    if (params->has_switchover_hold) {
+        dest->switchover_hold = params->switchover_hold;
+    }
 
     if (params->has_block_bitmap_mapping) {
         dest->has_block_bitmap_mapping = true;
@@ -1290,6 +1298,15 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_announce_step) {
         s->parameters.announce_step = params->announce_step;
     }
+    if (params->has_switchover_hold) {
+        bool old = s->parameters.switchover_hold;
+        bool new = params->switchover_hold;
+
+        s->parameters.switchover_hold = params->switchover_hold;
+        if (old && !new) {
+            qemu_sem_post(&s->switchover_hold_sem);
+        }
+    }
 
     if (params->has_block_bitmap_mapping) {
         qapi_free_BitmapMigrationNodeAliasList(
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 2/3] Revert "tests/qtest: massively speed up migration-test"
  2023-06-02  1:16 [PATCH RFC 0/3] migration: switchover-hold flag Peter Xu
  2023-06-02  1:16 ` [PATCH RFC 1/3] migration: switchover-hold parameter Peter Xu
@ 2023-06-02  1:16 ` Peter Xu
  2023-06-02  1:16 ` [PATCH RFC 3/3] qtest/migration: Use switchover-hold to speedup Peter Xu
  2 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2023-06-02  1:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Avihai Horon, Paolo Bonzini, Daniel P . Berrangé,
	Leonardo Bras Soares Passos, Juan Quintela, peterx,
	Laurent Vivier, Thomas Huth

This reverts commit e76a92b869f17d7a3f554890fb89b7da595dd652.
---
 tests/qtest/migration-test.c | 143 +++++------------------------------
 1 file changed, 18 insertions(+), 125 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d2cd71e6cf..b0c355bbd9 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -46,20 +46,6 @@ static bool uffd_feature_thread_id;
 static bool got_src_stop;
 static bool got_dst_resume;
 
-/*
- * An initial 3 MB offset is used as that corresponds
- * to ~1 sec of data transfer with our bandwidth setting.
- */
-#define MAGIC_OFFSET_BASE (3 * 1024 * 1024)
-/*
- * A further 1k is added to ensure we're not a multiple
- * of TEST_MEM_PAGE_SIZE, thus avoid clash with writes
- * from the migration guest workload.
- */
-#define MAGIC_OFFSET_SHUFFLE 1024
-#define MAGIC_OFFSET (MAGIC_OFFSET_BASE + MAGIC_OFFSET_SHUFFLE)
-#define MAGIC_MARKER 0xFEED12345678CAFEULL
-
 /*
  * Dirtylimit stop working if dirty page rate error
  * value less than DIRTYLIMIT_TOLERANCE_RANGE
@@ -459,91 +445,6 @@ static void migrate_ensure_converge(QTestState *who)
     migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
 }
 
-/*
- * Our goal is to ensure that we run a single full migration
- * iteration, and also dirty memory, ensuring that at least
- * one further iteration is required.
- *
- * We can't directly synchronize with the start of a migration
- * so we have to apply some tricks monitoring memory that is
- * transferred.
- *
- * Initially we set the migration bandwidth to an insanely
- * low value, with tiny max downtime too. This basically
- * guarantees migration will never complete.
- *
- * This will result in a test that is unacceptably slow though,
- * so we can't let the entire migration pass run at this speed.
- * Our intent is to let it run just long enough that we can
- * prove data prior to the marker has been transferred *AND*
- * also prove this transferred data is dirty again.
- *
- * Before migration starts, we write a 64-bit magic marker
- * into a fixed location in the src VM RAM.
- *
- * Then watch dst memory until the marker appears. This is
- * proof that start_address -> MAGIC_OFFSET_BASE has been
- * transferred.
- *
- * Finally we go back to the source and read a byte just
- * before the marker untill we see it flip in value. This
- * is proof that start_address -> MAGIC_OFFSET_BASE
- * is now dirty again.
- *
- * IOW, we're guaranteed at least a 2nd migration pass
- * at this point.
- *
- * We can now let migration run at full speed to finish
- * the test
- */
-static void migrate_prepare_for_dirty_mem(QTestState *from)
-{
-    /*
-     * The guest workflow iterates from start_address to
-     * end_address, writing 1 byte every TEST_MEM_PAGE_SIZE
-     * bytes.
-     *
-     * IOW, if we write to mem at a point which is NOT
-     * a multiple of TEST_MEM_PAGE_SIZE, our write won't
-     * conflict with the migration workflow.
-     *
-     * We put in a marker here, that we'll use to determine
-     * when the data has been transferred to the dst.
-     */
-    qtest_writeq(from, start_address + MAGIC_OFFSET, MAGIC_MARKER);
-}
-
-static void migrate_wait_for_dirty_mem(QTestState *from,
-                                       QTestState *to)
-{
-    uint64_t watch_address = start_address + MAGIC_OFFSET_BASE;
-    uint64_t marker_address = start_address + MAGIC_OFFSET;
-    uint8_t watch_byte;
-
-    /*
-     * Wait for the MAGIC_MARKER to get transferred, as an
-     * indicator that a migration pass has made some known
-     * amount of progress.
-     */
-    do {
-        usleep(1000 * 10);
-    } while (qtest_readq(to, marker_address) != MAGIC_MARKER);
-
-    /*
-     * Now ensure that already transferred bytes are
-     * dirty again from the guest workload. Note the
-     * guest byte value will wrap around and by chance
-     * match the original watch_byte. This is harmless
-     * as we'll eventually see a different value if we
-     * keep watching
-     */
-    watch_byte = qtest_readb(from, watch_address);
-    do {
-        usleep(1000 * 10);
-    } while (qtest_readb(from, watch_address) == watch_byte);
-}
-
-
 static void migrate_pause(QTestState *who)
 {
     qtest_qmp_assert_success(who, "{ 'execute': 'migrate-pause' }");
@@ -676,10 +577,7 @@ typedef struct {
         MIG_TEST_FAIL_DEST_QUIT_ERR,
     } result;
 
-    /*
-     * Optional: set number of migration passes to wait for, if live==true.
-     * If zero, then merely wait for a few MB of dirty data
-     */
+    /* Optional: set number of migration passes to wait for, if live==true */
     unsigned int iterations;
 
     /*
@@ -1267,14 +1165,12 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 
     migrate_ensure_non_converge(from);
 
-    migrate_prepare_for_dirty_mem(from);
-
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
 
     migrate_qmp(from, uri, "{}");
 
-    migrate_wait_for_dirty_mem(from, to);
+    wait_for_migration_pass(from);
 
     *from_ptr = from;
     *to_ptr = to;
@@ -1509,8 +1405,14 @@ static void test_precopy_common(MigrateCommon *args)
     }
 
     if (args->live) {
+        /*
+         * Testing live migration, we want to ensure that some
+         * memory is re-dirtied after being transferred, so that
+         * we exercise logic for dirty page handling. We achieve
+         * this with a ridiculosly low bandwidth that guarantees
+         * non-convergance.
+         */
         migrate_ensure_non_converge(from);
-        migrate_prepare_for_dirty_mem(from);
     } else {
         /*
          * Testing non-live migration, we allow it to run at
@@ -1545,16 +1447,13 @@ static void test_precopy_common(MigrateCommon *args)
         }
     } else {
         if (args->live) {
-            /*
-             * For initial iteration(s) we must do a full pass,
-             * but for the final iteration, we need only wait
-             * for some dirty mem before switching to converge
-             */
-            while (args->iterations > 1) {
+            if (args->iterations) {
+                while (args->iterations--) {
+                    wait_for_migration_pass(from);
+                }
+            } else {
                 wait_for_migration_pass(from);
-                args->iterations--;
             }
-            migrate_wait_for_dirty_mem(from, to);
 
             migrate_ensure_converge(from);
 
@@ -1687,9 +1586,6 @@ static void test_ignore_shared(void)
         return;
     }
 
-    migrate_ensure_non_converge(from);
-    migrate_prepare_for_dirty_mem(from);
-
     migrate_set_capability(from, "x-ignore-shared", true);
     migrate_set_capability(to, "x-ignore-shared", true);
 
@@ -1698,7 +1594,7 @@ static void test_ignore_shared(void)
 
     migrate_qmp(from, uri, "{}");
 
-    migrate_wait_for_dirty_mem(from, to);
+    wait_for_migration_pass(from);
 
     if (!got_src_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -2402,7 +2298,6 @@ static void test_multifd_tcp_cancel(void)
     }
 
     migrate_ensure_non_converge(from);
-    migrate_prepare_for_dirty_mem(from);
 
     migrate_set_parameter_int(from, "multifd-channels", 16);
     migrate_set_parameter_int(to, "multifd-channels", 16);
@@ -2421,7 +2316,7 @@ static void test_multifd_tcp_cancel(void)
 
     migrate_qmp(from, uri, "{}");
 
-    migrate_wait_for_dirty_mem(from, to);
+    wait_for_migration_pass(from);
 
     migrate_cancel(from);
 
@@ -2450,13 +2345,11 @@ static void test_multifd_tcp_cancel(void)
 
     wait_for_migration_status(from, "cancelled", NULL);
 
-    migrate_ensure_non_converge(from);
+    migrate_ensure_converge(from);
 
     migrate_qmp(from, uri, "{}");
 
-    migrate_wait_for_dirty_mem(from, to);
-
-    migrate_ensure_converge(from);
+    wait_for_migration_pass(from);
 
     if (!got_src_stop) {
         qtest_qmp_eventwait(from, "STOP");
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 3/3] qtest/migration: Use switchover-hold to speedup
  2023-06-02  1:16 [PATCH RFC 0/3] migration: switchover-hold flag Peter Xu
  2023-06-02  1:16 ` [PATCH RFC 1/3] migration: switchover-hold parameter Peter Xu
  2023-06-02  1:16 ` [PATCH RFC 2/3] Revert "tests/qtest: massively speed up migration-test" Peter Xu
@ 2023-06-02  1:16 ` Peter Xu
  2023-06-02 11:03   ` Daniel P. Berrangé
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2023-06-02  1:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Avihai Horon, Paolo Bonzini, Daniel P . Berrangé,
	Leonardo Bras Soares Passos, Juan Quintela, peterx,
	Laurent Vivier, Thomas Huth

Use the switchover-hold flag rather than tuning bw+downtime to guide test
convergence.

This can achieve similar goal of previous patch "tests/qtest: massively
speed up migration-test" but without magic offset to write or monitoring.

The initial solution can reduce migration-test time from 8min to 1min40s,
this patch can further reduce it from 1m40s to 1m1s per my local test.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-test.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355bbd9..62bdd67fd9 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -433,16 +433,15 @@ static void migrate_set_parameter_bool(QTestState *who, const char *parameter,
 
 static void migrate_ensure_non_converge(QTestState *who)
 {
-    /* Can't converge with 1ms downtime + 3 mbs bandwidth limit */
-    migrate_set_parameter_int(who, "max-bandwidth", 3 * 1000 * 1000);
-    migrate_set_parameter_int(who, "downtime-limit", 1);
+    /* Hold off switchover for precopy only */
+    migrate_set_parameter_bool(who, "switchover-hold", true);
 }
 
 static void migrate_ensure_converge(QTestState *who)
 {
-    /* Should converge with 30s downtime + 1 gbs bandwidth limit */
-    migrate_set_parameter_int(who, "max-bandwidth", 1 * 1000 * 1000 * 1000);
-    migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
+    /* No limitation on bandwidth so converge faster */
+    migrate_set_parameter_int(who, "max-bandwidth", 0);
+    migrate_set_parameter_bool(who, "switchover-hold", false);
 }
 
 static void migrate_pause(QTestState *who)
@@ -492,6 +491,13 @@ static void migrate_postcopy_start(QTestState *from, QTestState *to)
     }
 
     qtest_qmp_eventwait(to, "RESUME");
+
+    /*
+     * Now allow precopy switchover (which will allow completion).  This
+     * needs to be done after migrate-start-postcopy to make sure we switch
+     * to postcopy first.
+     */
+    migrate_ensure_converge(from);
 }
 
 typedef struct {
@@ -1164,6 +1170,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
     }
 
     migrate_ensure_non_converge(from);
+    /* Still use unlimited precopy speed to finish 1st iteration fast */
+    migrate_set_parameter_int(from, "max-bandwidth", 0);
 
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 3/3] qtest/migration: Use switchover-hold to speedup
  2023-06-02  1:16 ` [PATCH RFC 3/3] qtest/migration: Use switchover-hold to speedup Peter Xu
@ 2023-06-02 11:03   ` Daniel P. Berrangé
  2023-06-02 13:23     ` Peter Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel P. Berrangé @ 2023-06-02 11:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Avihai Horon, Paolo Bonzini,
	Leonardo Bras Soares Passos, Juan Quintela, Laurent Vivier,
	Thomas Huth

On Thu, Jun 01, 2023 at 09:16:26PM -0400, Peter Xu wrote:
> Use the switchover-hold flag rather than tuning bw+downtime to guide test
> convergence.
> 
> This can achieve similar goal of previous patch "tests/qtest: massively
> speed up migration-test" but without magic offset to write or monitoring.
> 
> The initial solution can reduce migration-test time from 8min to 1min40s,
> this patch can further reduce it from 1m40s to 1m1s per my local test.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/qtest/migration-test.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index b0c355bbd9..62bdd67fd9 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -433,16 +433,15 @@ static void migrate_set_parameter_bool(QTestState *who, const char *parameter,
>  
>  static void migrate_ensure_non_converge(QTestState *who)
>  {
> -    /* Can't converge with 1ms downtime + 3 mbs bandwidth limit */
> -    migrate_set_parameter_int(who, "max-bandwidth", 3 * 1000 * 1000);
> -    migrate_set_parameter_int(who, "downtime-limit", 1);
> +    /* Hold off switchover for precopy only */
> +    migrate_set_parameter_bool(who, "switchover-hold", true);
>  }
>  
>  static void migrate_ensure_converge(QTestState *who)
>  {
> -    /* Should converge with 30s downtime + 1 gbs bandwidth limit */
> -    migrate_set_parameter_int(who, "max-bandwidth", 1 * 1000 * 1000 * 1000);
> -    migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
> +    /* No limitation on bandwidth so converge faster */
> +    migrate_set_parameter_int(who, "max-bandwidth", 0);

You're already setting max-bandwith==0 in migrate_postcopy_prepare.

If you also set it in test_precopy_common, so we don't need to
set it here, and we'll avoid the initial phase running with
bandwidth=32mbs.

> +    migrate_set_parameter_bool(who, "switchover-hold", false);
>  }
>  
>  static void migrate_pause(QTestState *who)
> @@ -492,6 +491,13 @@ static void migrate_postcopy_start(QTestState *from, QTestState *to)
>      }
>  
>      qtest_qmp_eventwait(to, "RESUME");
> +
> +    /*
> +     * Now allow precopy switchover (which will allow completion).  This
> +     * needs to be done after migrate-start-postcopy to make sure we switch
> +     * to postcopy first.
> +     */
> +    migrate_ensure_converge(from);
>  }
>  
>  typedef struct {
> @@ -1164,6 +1170,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
>      }
>  
>      migrate_ensure_non_converge(from);
> +    /* Still use unlimited precopy speed to finish 1st iteration fast */
> +    migrate_set_parameter_int(from, "max-bandwidth", 0);
>  
>      /* Wait for the first serial output from the source */
>      wait_for_serial("src_serial");
> -- 
> 2.40.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 1/3] migration: switchover-hold parameter
  2023-06-02  1:16 ` [PATCH RFC 1/3] migration: switchover-hold parameter Peter Xu
@ 2023-06-02 11:14   ` Daniel P. Berrangé
  2023-06-02 13:20     ` Peter Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel P. Berrangé @ 2023-06-02 11:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Avihai Horon, Paolo Bonzini,
	Leonardo Bras Soares Passos, Juan Quintela, Laurent Vivier,
	Thomas Huth

On Thu, Jun 01, 2023 at 09:16:24PM -0400, Peter Xu wrote:
> Add a new migration parameter switchover-hold which can block src qemu
> migration from switching over to dest from running.
> 
> One can set this flag to true so src qemu will keep iterating the VM data,
> not switching over to dest even if it can.
> 
> It means now live migration works somehow like COLO; we keep syncing data
> from src to dst without stopping.
> 
> When the user is ready for the switchover, one can set the parameter from
> true->false.  That'll contain a implicit kick to migration thread to be
> alive and re-evaluate the switchover decision.
> 
> This can be used in two cases so far in my mind:
> 
>   (1) One can use this parameter to start pre-heating migration (but not
>       really migrating, so a migrate-cancel will cancel the preheat).  When
>       the user wants to really migrate, just clear the flag.  It'll in most
>       cases migrate immediately because most pages are already synced.
> 
>   (2) Can also be used as a clean way to do qtest, in many of the precopy
>       tests we have requirement to run after 1 iteration without completing
>       the precopy migration.  Before that we have either set bandwidth to
>       ridiculous low value, or tricks on detecting guest memory change over
>       some adhoc guest memory position.  Now we can simply set this flag
>       then we know precopy won't complete and will just keep going.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  qapi/migration.json            | 25 +++++++++++++--
>  migration/migration.h          |  7 +++++
>  migration/migration-hmp-cmds.c |  3 ++
>  migration/migration.c          | 56 ++++++++++++++++++++++++++++++++--
>  migration/options.c            | 17 +++++++++++
>  5 files changed, 102 insertions(+), 6 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 179af0c4d8..1d0059d125 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -779,6 +779,15 @@
>  #     Nodes are mapped to their block device name if there is one, and
>  #     to their node name otherwise.  (Since 5.2)
>  #
> +# @switchover-hold: Whether we should hold-off precopy switchover from src
> +#     to dest QEMU, even if we can finish migration in the downtime
> +#     specified.  By default off, so precopy migration will complete as
> +#     soon as possible.  One can set it to explicitly keep iterating during
> +#     precopy migration until set the flag to false again to kick off the
> +#     final switchover.  Note, this does not affect postcopy switchover,
> +#     because the user can control that using "migrate-start-postcopy"
> +#     command explicitly. (Since 8.1)
> +#
>  # Features:
>  #
>  # @unstable: Member @x-checkpoint-delay is experimental.
> @@ -800,7 +809,7 @@
>             'xbzrle-cache-size', 'max-postcopy-bandwidth',
>             'max-cpu-throttle', 'multifd-compression',
>             'multifd-zlib-level' ,'multifd-zstd-level',
> -           'block-bitmap-mapping' ] }
> +           'block-bitmap-mapping', 'switchover-hold' ] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -935,6 +944,10 @@
>  #     Nodes are mapped to their block device name if there is one, and
>  #     to their node name otherwise.  (Since 5.2)
>  #
> +# @switchover-hold: Whether we should hold-off precopy switchover from src
> +#     to dest QEMU.  For more details, please refer to MigrationParameter
> +#     entry of the same field. (Since 8.1)
> +#
>  # Features:
>  #
>  # @unstable: Member @x-checkpoint-delay is experimental.
> @@ -972,7 +985,8 @@
>              '*multifd-compression': 'MultiFDCompression',
>              '*multifd-zlib-level': 'uint8',
>              '*multifd-zstd-level': 'uint8',
> -            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> +            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> +            '*switchover-hold': 'bool' } }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -1127,6 +1141,10 @@
>  #     Nodes are mapped to their block device name if there is one, and
>  #     to their node name otherwise.  (Since 5.2)
>  #
> +# @switchover-hold: Whether we should hold-off precopy switchover from src
> +#     to dest QEMU.  For more details, please refer to MigrationParameter
> +#     entry of the same field. (Since 8.1)
> +#
>  # Features:
>  #
>  # @unstable: Member @x-checkpoint-delay is experimental.
> @@ -1161,7 +1179,8 @@
>              '*multifd-compression': 'MultiFDCompression',
>              '*multifd-zlib-level': 'uint8',
>              '*multifd-zstd-level': 'uint8',
> -            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> +            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> +            '*switchover-hold': 'bool' } }
>  
>  ##
>  # @query-migrate-parameters:
> diff --git a/migration/migration.h b/migration/migration.h
> index 48a46123a0..086ceec754 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -437,6 +437,13 @@ struct MigrationState {
>  
>      /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
>      JSONWriter *vmdesc;
> +    /*
> +     * Only migration thread will wait on it when switchover_hold==true.
> +     *
> +     * Only qmp set param will kick it when switching switchover_hold from
> +     * true->false.
> +     */
> +    QemuSemaphore switchover_hold_sem;
>  };
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 9885d7c9f7..63a2c8a4a3 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -338,6 +338,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>          monitor_printf(mon, "%s: '%s'\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ),
>              params->tls_authz);
> +        monitor_printf(mon, "%s: %s\n",
> +            MigrationParameter_str(MIGRATION_PARAMETER_SWITCHOVER_HOLD),
> +            params->switchover_hold ? "on" : "off");
>  
>          if (params->has_block_bitmap_mapping) {
>              const BitmapMigrationNodeAliasList *bmnal;
> diff --git a/migration/migration.c b/migration/migration.c
> index 5de7f734b9..30f165527b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2700,6 +2700,55 @@ static void migration_update_counters(MigrationState *s,
>                                bandwidth, s->threshold_size);
>  }
>  
> +static bool
> +migration_should_complete(MigrationState *s, uint64_t pending_size)
> +{
> +    /* We still have large pending data to send? */
> +    if (pending_size && (pending_size >= s->threshold_size)) {
> +        return false;
> +    }
> +
> +    /* The user doesn't want us to switchover yet */
> +    if (s->parameters.switchover_hold) {
> +        /*
> +         * Note: when reaching here it probably means we've migrated almost
> +         * everything and ready to switchover.  If user asked not to switch
> +         * wait for a short period and respond to kicks immediately.
> +         *
> +         * If we wait too long, there can be a lot of dirty data generated,
> +         * while we could have done something to sync data between src/dst.
> +         *
> +         * If we wait too short, migration thread can eat most/all cpu
> +         * resource looping over switchover_hold.
> +         *
> +         * Make it 10ms which seems to be a good intermediate value.
> +         */
> +        qemu_sem_timedwait(&s->switchover_hold_sem, 10);
> +
> +        /*
> +         * Return false here always even if user changed it, because we'd
> +         * like to re-evaluate everything (e.g. pending_size).
> +         */
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static bool
> +migration_should_start_postcopy(MigrationState *s, uint64_t must_precopy)
> +{
> +    /* If we're already in postcopy phase, don't bother */
> +    if (migration_in_postcopy()) {
> +        return false;
> +    }
> +    /* We still have lots of thing that must be migrated in precopy */
> +    if (must_precopy > s->threshold_size) {
> +        return false;
> +    }
> +    return qatomic_read(&s->start_postcopy);
> +}
> +
>  /* Migration thread iteration status */
>  typedef enum {
>      MIG_ITERATE_RESUME,         /* Resume current iteration */
> @@ -2727,15 +2776,14 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>          trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
>      }
>  
> -    if (!pending_size || pending_size < s->threshold_size) {
> +    if (migration_should_complete(s, pending_size)) {
>          trace_migration_thread_low_pending(pending_size);
>          migration_completion(s);
>          return MIG_ITERATE_BREAK;
>      }
>  
>      /* Still a significant amount to transfer */
> -    if (!in_postcopy && must_precopy <= s->threshold_size &&
> -        qatomic_read(&s->start_postcopy)) {
> +    if (migration_should_start_postcopy(s, must_precopy)) {
>          if (postcopy_start(s)) {
>              error_report("%s: postcopy failed to start", __func__);
>          }
> @@ -3287,6 +3335,7 @@ static void migration_instance_finalize(Object *obj)
>      qemu_sem_destroy(&ms->rp_state.rp_sem);
>      qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
>      qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
> +    qemu_sem_destroy(&ms->switchover_hold_sem);
>      error_free(ms->error);
>  }
>  
> @@ -3309,6 +3358,7 @@ static void migration_instance_init(Object *obj)
>      qemu_sem_init(&ms->rate_limit_sem, 0);
>      qemu_sem_init(&ms->wait_unplug_sem, 0);
>      qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
> +    qemu_sem_init(&ms->switchover_hold_sem, 0);
>      qemu_mutex_init(&ms->qemu_file_lock);
>  }
>  
> diff --git a/migration/options.c b/migration/options.c
> index b62ab30cd5..2d6b138352 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -163,6 +163,8 @@ Property migration_properties[] = {
>      DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
>      DEFINE_PROP_STRING("tls-hostname", MigrationState, parameters.tls_hostname),
>      DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
> +    DEFINE_PROP_BOOL("switchover-hold", MigrationState,
> +                     parameters.switchover_hold, false),
>  
>      /* Migration capabilities */
>      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> @@ -883,6 +885,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>      params->announce_rounds = s->parameters.announce_rounds;
>      params->has_announce_step = true;
>      params->announce_step = s->parameters.announce_step;
> +    params->has_switchover_hold = true;
> +    params->switchover_hold = s->parameters.switchover_hold;
>  
>      if (s->parameters.has_block_bitmap_mapping) {
>          params->has_block_bitmap_mapping = true;
> @@ -923,6 +927,7 @@ void migrate_params_init(MigrationParameters *params)
>      params->has_announce_max = true;
>      params->has_announce_rounds = true;
>      params->has_announce_step = true;
> +    params->has_switchover_hold = true;
>  }
>  
>  /*
> @@ -1177,6 +1182,9 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>      if (params->has_announce_step) {
>          dest->announce_step = params->announce_step;
>      }
> +    if (params->has_switchover_hold) {
> +        dest->switchover_hold = params->switchover_hold;
> +    }
>  
>      if (params->has_block_bitmap_mapping) {
>          dest->has_block_bitmap_mapping = true;
> @@ -1290,6 +1298,15 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>      if (params->has_announce_step) {
>          s->parameters.announce_step = params->announce_step;
>      }
> +    if (params->has_switchover_hold) {
> +        bool old = s->parameters.switchover_hold;
> +        bool new = params->switchover_hold;
> +
> +        s->parameters.switchover_hold = params->switchover_hold;
> +        if (old && !new) {
> +            qemu_sem_post(&s->switchover_hold_sem);
> +        }
> +    }

IIUC, MigrationState is global to QEMU and gets reused on every
migration.

If we qemu_sem_post() here, and the migration fails, and the
user starts a new migration on the same QEMU, the switchover_hold_sem
needs resetting somewhere, as there's a (tiny) risk that the
qemu_sem_timedwait() won't have run before the migration fail.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 1/3] migration: switchover-hold parameter
  2023-06-02 11:14   ` Daniel P. Berrangé
@ 2023-06-02 13:20     ` Peter Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2023-06-02 13:20 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Avihai Horon, Paolo Bonzini,
	Leonardo Bras Soares Passos, Juan Quintela, Laurent Vivier,
	Thomas Huth

On Fri, Jun 02, 2023 at 12:14:20PM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 01, 2023 at 09:16:24PM -0400, Peter Xu wrote:
> > Add a new migration parameter switchover-hold which can block src qemu
> > migration from switching over to dest from running.
> > 
> > One can set this flag to true so src qemu will keep iterating the VM data,
> > not switching over to dest even if it can.
> > 
> > It means now live migration works somehow like COLO; we keep syncing data
> > from src to dst without stopping.
> > 
> > When the user is ready for the switchover, one can set the parameter from
> > true->false.  That'll contain a implicit kick to migration thread to be
> > alive and re-evaluate the switchover decision.
> > 
> > This can be used in two cases so far in my mind:
> > 
> >   (1) One can use this parameter to start pre-heating migration (but not
> >       really migrating, so a migrate-cancel will cancel the preheat).  When
> >       the user wants to really migrate, just clear the flag.  It'll in most
> >       cases migrate immediately because most pages are already synced.
> > 
> >   (2) Can also be used as a clean way to do qtest, in many of the precopy
> >       tests we have requirement to run after 1 iteration without completing
> >       the precopy migration.  Before that we have either set bandwidth to
> >       ridiculous low value, or tricks on detecting guest memory change over
> >       some adhoc guest memory position.  Now we can simply set this flag
> >       then we know precopy won't complete and will just keep going.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  qapi/migration.json            | 25 +++++++++++++--
> >  migration/migration.h          |  7 +++++
> >  migration/migration-hmp-cmds.c |  3 ++
> >  migration/migration.c          | 56 ++++++++++++++++++++++++++++++++--
> >  migration/options.c            | 17 +++++++++++
> >  5 files changed, 102 insertions(+), 6 deletions(-)
> > 
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 179af0c4d8..1d0059d125 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -779,6 +779,15 @@
> >  #     Nodes are mapped to their block device name if there is one, and
> >  #     to their node name otherwise.  (Since 5.2)
> >  #
> > +# @switchover-hold: Whether we should hold-off precopy switchover from src
> > +#     to dest QEMU, even if we can finish migration in the downtime
> > +#     specified.  By default off, so precopy migration will complete as
> > +#     soon as possible.  One can set it to explicitly keep iterating during
> > +#     precopy migration until set the flag to false again to kick off the
> > +#     final switchover.  Note, this does not affect postcopy switchover,
> > +#     because the user can control that using "migrate-start-postcopy"
> > +#     command explicitly. (Since 8.1)
> > +#
> >  # Features:
> >  #
> >  # @unstable: Member @x-checkpoint-delay is experimental.
> > @@ -800,7 +809,7 @@
> >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> >             'max-cpu-throttle', 'multifd-compression',
> >             'multifd-zlib-level' ,'multifd-zstd-level',
> > -           'block-bitmap-mapping' ] }
> > +           'block-bitmap-mapping', 'switchover-hold' ] }
> >  
> >  ##
> >  # @MigrateSetParameters:
> > @@ -935,6 +944,10 @@
> >  #     Nodes are mapped to their block device name if there is one, and
> >  #     to their node name otherwise.  (Since 5.2)
> >  #
> > +# @switchover-hold: Whether we should hold-off precopy switchover from src
> > +#     to dest QEMU.  For more details, please refer to MigrationParameter
> > +#     entry of the same field. (Since 8.1)
> > +#
> >  # Features:
> >  #
> >  # @unstable: Member @x-checkpoint-delay is experimental.
> > @@ -972,7 +985,8 @@
> >              '*multifd-compression': 'MultiFDCompression',
> >              '*multifd-zlib-level': 'uint8',
> >              '*multifd-zstd-level': 'uint8',
> > -            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> > +            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> > +            '*switchover-hold': 'bool' } }
> >  
> >  ##
> >  # @migrate-set-parameters:
> > @@ -1127,6 +1141,10 @@
> >  #     Nodes are mapped to their block device name if there is one, and
> >  #     to their node name otherwise.  (Since 5.2)
> >  #
> > +# @switchover-hold: Whether we should hold-off precopy switchover from src
> > +#     to dest QEMU.  For more details, please refer to MigrationParameter
> > +#     entry of the same field. (Since 8.1)
> > +#
> >  # Features:
> >  #
> >  # @unstable: Member @x-checkpoint-delay is experimental.
> > @@ -1161,7 +1179,8 @@
> >              '*multifd-compression': 'MultiFDCompression',
> >              '*multifd-zlib-level': 'uint8',
> >              '*multifd-zstd-level': 'uint8',
> > -            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> > +            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> > +            '*switchover-hold': 'bool' } }
> >  
> >  ##
> >  # @query-migrate-parameters:
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 48a46123a0..086ceec754 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -437,6 +437,13 @@ struct MigrationState {
> >  
> >      /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
> >      JSONWriter *vmdesc;
> > +    /*
> > +     * Only migration thread will wait on it when switchover_hold==true.
> > +     *
> > +     * Only qmp set param will kick it when switching switchover_hold from
> > +     * true->false.
> > +     */
> > +    QemuSemaphore switchover_hold_sem;
> >  };
> >  
> >  void migrate_set_state(int *state, int old_state, int new_state);
> > diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> > index 9885d7c9f7..63a2c8a4a3 100644
> > --- a/migration/migration-hmp-cmds.c
> > +++ b/migration/migration-hmp-cmds.c
> > @@ -338,6 +338,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
> >          monitor_printf(mon, "%s: '%s'\n",
> >              MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ),
> >              params->tls_authz);
> > +        monitor_printf(mon, "%s: %s\n",
> > +            MigrationParameter_str(MIGRATION_PARAMETER_SWITCHOVER_HOLD),
> > +            params->switchover_hold ? "on" : "off");
> >  
> >          if (params->has_block_bitmap_mapping) {
> >              const BitmapMigrationNodeAliasList *bmnal;
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 5de7f734b9..30f165527b 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2700,6 +2700,55 @@ static void migration_update_counters(MigrationState *s,
> >                                bandwidth, s->threshold_size);
> >  }
> >  
> > +static bool
> > +migration_should_complete(MigrationState *s, uint64_t pending_size)
> > +{
> > +    /* We still have large pending data to send? */
> > +    if (pending_size && (pending_size >= s->threshold_size)) {
> > +        return false;
> > +    }
> > +
> > +    /* The user doesn't want us to switchover yet */
> > +    if (s->parameters.switchover_hold) {
> > +        /*
> > +         * Note: when reaching here it probably means we've migrated almost
> > +         * everything and ready to switchover.  If user asked not to switch
> > +         * wait for a short period and respond to kicks immediately.
> > +         *
> > +         * If we wait too long, there can be a lot of dirty data generated,
> > +         * while we could have done something to sync data between src/dst.
> > +         *
> > +         * If we wait too short, migration thread can eat most/all cpu
> > +         * resource looping over switchover_hold.
> > +         *
> > +         * Make it 10ms which seems to be a good intermediate value.
> > +         */
> > +        qemu_sem_timedwait(&s->switchover_hold_sem, 10);
> > +
> > +        /*
> > +         * Return false here always even if user changed it, because we'd
> > +         * like to re-evaluate everything (e.g. pending_size).
> > +         */
> > +        return false;
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +static bool
> > +migration_should_start_postcopy(MigrationState *s, uint64_t must_precopy)
> > +{
> > +    /* If we're already in postcopy phase, don't bother */
> > +    if (migration_in_postcopy()) {
> > +        return false;
> > +    }
> > +    /* We still have lots of thing that must be migrated in precopy */
> > +    if (must_precopy > s->threshold_size) {
> > +        return false;
> > +    }
> > +    return qatomic_read(&s->start_postcopy);
> > +}
> > +
> >  /* Migration thread iteration status */
> >  typedef enum {
> >      MIG_ITERATE_RESUME,         /* Resume current iteration */
> > @@ -2727,15 +2776,14 @@ static MigIterateState migration_iteration_run(MigrationState *s)
> >          trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
> >      }
> >  
> > -    if (!pending_size || pending_size < s->threshold_size) {
> > +    if (migration_should_complete(s, pending_size)) {
> >          trace_migration_thread_low_pending(pending_size);
> >          migration_completion(s);
> >          return MIG_ITERATE_BREAK;
> >      }
> >  
> >      /* Still a significant amount to transfer */
> > -    if (!in_postcopy && must_precopy <= s->threshold_size &&
> > -        qatomic_read(&s->start_postcopy)) {
> > +    if (migration_should_start_postcopy(s, must_precopy)) {
> >          if (postcopy_start(s)) {
> >              error_report("%s: postcopy failed to start", __func__);
> >          }
> > @@ -3287,6 +3335,7 @@ static void migration_instance_finalize(Object *obj)
> >      qemu_sem_destroy(&ms->rp_state.rp_sem);
> >      qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
> >      qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
> > +    qemu_sem_destroy(&ms->switchover_hold_sem);
> >      error_free(ms->error);
> >  }
> >  
> > @@ -3309,6 +3358,7 @@ static void migration_instance_init(Object *obj)
> >      qemu_sem_init(&ms->rate_limit_sem, 0);
> >      qemu_sem_init(&ms->wait_unplug_sem, 0);
> >      qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
> > +    qemu_sem_init(&ms->switchover_hold_sem, 0);
> >      qemu_mutex_init(&ms->qemu_file_lock);
> >  }
> >  
> > diff --git a/migration/options.c b/migration/options.c
> > index b62ab30cd5..2d6b138352 100644
> > --- a/migration/options.c
> > +++ b/migration/options.c
> > @@ -163,6 +163,8 @@ Property migration_properties[] = {
> >      DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
> >      DEFINE_PROP_STRING("tls-hostname", MigrationState, parameters.tls_hostname),
> >      DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
> > +    DEFINE_PROP_BOOL("switchover-hold", MigrationState,
> > +                     parameters.switchover_hold, false),
> >  
> >      /* Migration capabilities */
> >      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> > @@ -883,6 +885,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
> >      params->announce_rounds = s->parameters.announce_rounds;
> >      params->has_announce_step = true;
> >      params->announce_step = s->parameters.announce_step;
> > +    params->has_switchover_hold = true;
> > +    params->switchover_hold = s->parameters.switchover_hold;
> >  
> >      if (s->parameters.has_block_bitmap_mapping) {
> >          params->has_block_bitmap_mapping = true;
> > @@ -923,6 +927,7 @@ void migrate_params_init(MigrationParameters *params)
> >      params->has_announce_max = true;
> >      params->has_announce_rounds = true;
> >      params->has_announce_step = true;
> > +    params->has_switchover_hold = true;
> >  }
> >  
> >  /*
> > @@ -1177,6 +1182,9 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
> >      if (params->has_announce_step) {
> >          dest->announce_step = params->announce_step;
> >      }
> > +    if (params->has_switchover_hold) {
> > +        dest->switchover_hold = params->switchover_hold;
> > +    }
> >  
> >      if (params->has_block_bitmap_mapping) {
> >          dest->has_block_bitmap_mapping = true;
> > @@ -1290,6 +1298,15 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> >      if (params->has_announce_step) {
> >          s->parameters.announce_step = params->announce_step;
> >      }
> > +    if (params->has_switchover_hold) {
> > +        bool old = s->parameters.switchover_hold;
> > +        bool new = params->switchover_hold;
> > +
> > +        s->parameters.switchover_hold = params->switchover_hold;
> > +        if (old && !new) {
> > +            qemu_sem_post(&s->switchover_hold_sem);
> > +        }
> > +    }
> 
> IIUC, MigrationState is global to QEMU and gets reused on every
> migration.
> 
> If we qemu_sem_post() here, and the migration fails, and the
> user starts a new migration on the same QEMU, the switchover_hold_sem
> needs resetting somewhere, as there's a (tiny) risk that the
> qemu_sem_timedwait() won't have run before the migration fail.

Yes I can do that, but may not be necessary - I think it's fine to have
this sem accumulate outdated events.

The sem may go not "accurate" easily, e.g. one can set this flag
true->false->true->false... so it keeps accumulate.  I considered checking
that too, but decided to didn't bother because each outdated post will only
cause migration_thread (only when reaching end of migration and hitting the
sem_timedwait()) to spin another round - the migration thread will "eat"
that outdated sem count and spin one more time, reaching timedwait() again,
then do the real sleep.

How about I add some comment on explaining outdated counts are safe,
instead?  Then we don't need to worry on anything happen to this sem, since
the only goal it services is to kick the migration thread faster, meanwhile
don't let migration thread eat 100% cpu looping there.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 3/3] qtest/migration: Use switchover-hold to speedup
  2023-06-02 11:03   ` Daniel P. Berrangé
@ 2023-06-02 13:23     ` Peter Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2023-06-02 13:23 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Avihai Horon, Paolo Bonzini,
	Leonardo Bras Soares Passos, Juan Quintela, Laurent Vivier,
	Thomas Huth

On Fri, Jun 02, 2023 at 12:03:05PM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 01, 2023 at 09:16:26PM -0400, Peter Xu wrote:
> > Use the switchover-hold flag rather than tuning bw+downtime to guide test
> > convergence.
> > 
> > This can achieve similar goal of previous patch "tests/qtest: massively
> > speed up migration-test" but without magic offset to write or monitoring.
> > 
> > The initial solution can reduce migration-test time from 8min to 1min40s,
> > this patch can further reduce it from 1m40s to 1m1s per my local test.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  tests/qtest/migration-test.c | 20 ++++++++++++++------
> >  1 file changed, 14 insertions(+), 6 deletions(-)
> > 
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index b0c355bbd9..62bdd67fd9 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -433,16 +433,15 @@ static void migrate_set_parameter_bool(QTestState *who, const char *parameter,
> >  
> >  static void migrate_ensure_non_converge(QTestState *who)
> >  {
> > -    /* Can't converge with 1ms downtime + 3 mbs bandwidth limit */
> > -    migrate_set_parameter_int(who, "max-bandwidth", 3 * 1000 * 1000);
> > -    migrate_set_parameter_int(who, "downtime-limit", 1);
> > +    /* Hold off switchover for precopy only */
> > +    migrate_set_parameter_bool(who, "switchover-hold", true);
> >  }
> >  
> >  static void migrate_ensure_converge(QTestState *who)
> >  {
> > -    /* Should converge with 30s downtime + 1 gbs bandwidth limit */
> > -    migrate_set_parameter_int(who, "max-bandwidth", 1 * 1000 * 1000 * 1000);
> > -    migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
> > +    /* No limitation on bandwidth so converge faster */
> > +    migrate_set_parameter_int(who, "max-bandwidth", 0);
> 
> You're already setting max-bandwith==0 in migrate_postcopy_prepare.
> 
> If you also set it in test_precopy_common, so we don't need to
> set it here, and we'll avoid the initial phase running with
> bandwidth=32mbs.

There are more tests than test_precopy_common() that leverages this, so I
used to kept it there to still share some code:

test_migrate_auto_converge[2021] migrate_ensure_converge(from);
test_multifd_tcp_cancel[2349]  migrate_ensure_converge(from);

But I think you're right - moving the bw=0 setup out should be cleaner.
I'll see whether I should just move it into all the specific tests.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-02 13:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-02  1:16 [PATCH RFC 0/3] migration: switchover-hold flag Peter Xu
2023-06-02  1:16 ` [PATCH RFC 1/3] migration: switchover-hold parameter Peter Xu
2023-06-02 11:14   ` Daniel P. Berrangé
2023-06-02 13:20     ` Peter Xu
2023-06-02  1:16 ` [PATCH RFC 2/3] Revert "tests/qtest: massively speed up migration-test" Peter Xu
2023-06-02  1:16 ` [PATCH RFC 3/3] qtest/migration: Use switchover-hold to speedup Peter Xu
2023-06-02 11:03   ` Daniel P. Berrangé
2023-06-02 13:23     ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).