[PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1
@ 2025-06-09 16:18 Peter Xu
  2025-06-09 16:18 ` [PATCH v2 01/11] migration/hmp: Reorg "info migrate" once more Peter Xu
                   ` (12 more replies)
  0 siblings, 13 replies; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas

v2:
- Collected R-bs
- Avoid using "\b" in HMP dumps [Markus, Dave]

The series is based on a small patch from Yanfei Xu here:

Based-on: <20250514115827.3216082-1-yanfei.xu@bytedance.com>
https://lore.kernel.org/r/20250514115827.3216082-1-yanfei.xu@bytedance.com

This is a series that collected many of either enhancements or cleanups I
got for QEMU 10.1, which almost came from when working on the last patch.

The last patch, which is a oneliner, can further reduce 10% postcopy page
fault latency with preempt mode enabled.

      Before: 268.00us (+-1.87%)
      After:  232.67us (+-2.01%)

The patch layout is as following:

Patch 1:         A follow up of HMP change for "info migrate", per
                 suggestion from Dave
Patch 2:         Yet another HMP fix for blocktime displays
Patch 3-10:      Cleanups everywhere, especially please take a look at
                 patch 10 which changes the core switchover decision logic
Patch 11:        The one-liner optimization

Comments welcomed, thanks.

Peter Xu (11):
  migration/hmp: Reorg "info migrate" once more
  migration/hmp: Fix postcopy-blocktime per-vCPU results
  migration/docs: Move docs for postcopy blocktime feature
  migration/bg-snapshot: Do not check for SKIP in iterator
  migration: Drop save_live_complete_postcopy hook
  migration: Rename save_live_complete_precopy to save_complete
  migration: qemu_savevm_complete*() helpers
  migration/ram: One less indent for ram_find_and_save_block()
  migration/ram: Add tracepoints for ram_save_complete()
  migration: Rewrite the migration complete detect logic
  migration/postcopy: Avoid clearing dirty bitmap for postcopy too

 docs/devel/migration/postcopy.rst | 36 +++++++-------
 include/migration/register.h      | 26 ++++------
 hw/ppc/spapr.c                    |  2 +-
 hw/s390x/s390-stattrib.c          |  2 +-
 hw/vfio/migration.c               |  2 +-
 migration/block-dirty-bitmap.c    |  3 +-
 migration/migration-hmp-cmds.c    | 81 ++++++++++++++++--------------
 migration/migration.c             | 61 ++++++++++++++++-------
 migration/ram.c                   | 32 +++++++-----
 migration/savevm.c                | 83 +++++++++++++++++--------------
 migration/trace-events            |  1 +
 11 files changed, 184 insertions(+), 145 deletions(-)

-- 
2.49.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 01/11] migration/hmp: Reorg "info migrate" once more
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-06-09 16:18 ` [PATCH v2 02/11] migration/hmp: Fix postcopy-blocktime per-vCPU results Peter Xu
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas,
	Li Zhijian

Dave suggested the HMP output for "info migrate" can not only leverage the
lines but also better grouping:

  https://lore.kernel.org/r/aC4_-nMc7FwsMf9p@gallifrey

I followed Dave's suggestion, and some more modifications on top:

  - Added all elements into the picture

  - Use size_to_str() and drop most of the units: benefit is more friendly
  to most human eyes, bad side effect is lose of details, but that should
  be corner case per my uses, and one can still leverage the QMP interface
  when necessary.

  - Sub-grouping for "Transfers" ("Channels" and "Page Types").

  - Better indentations

Sample output:

(qemu) info migrate
Status:                 postcopy-active
Time (ms):              total=47317, setup=5, down=8
RAM info:
  Throughput (Mbps):    1342.83
  Sizes:                pagesize=4 KiB, total=4.02 GiB
  Transfers:            transferred=1.41 GiB, remain=2.46 GiB
    Channels:           precopy=15.2 MiB, multifd=0 B, postcopy=1.39 GiB
    Page Types:         normal=367713, zero=41195
  Page Rates (pps):     transfer=40900, dirty=4
  Others:               dirty_syncs=2, postcopy_req=57503

Suggested-by: Dr. David Alan Gilbert <dave@treblig.org>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration-hmp-cmds.c | 59 ++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 28 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index e8a563c7d8..367ff6037f 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -69,7 +69,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
     }
 
     if (info->has_status) {
-        monitor_printf(mon, "Status: %s",
+        monitor_printf(mon, "Status: \t\t%s",
                        MigrationStatus_str(info->status));
         if (info->status == MIGRATION_STATUS_FAILED && info->error_desc) {
             monitor_printf(mon, " (%s)\n", info->error_desc);
@@ -78,7 +78,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
         }
 
         if (info->total_time) {
-            monitor_printf(mon, "Time (ms): total=%" PRIu64,
+            monitor_printf(mon, "Time (ms): \t\ttotal=%" PRIu64,
                            info->total_time);
             if (info->has_setup_time) {
                 monitor_printf(mon, ", setup=%" PRIu64,
@@ -110,48 +110,51 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
     }
 
     if (info->ram) {
+        g_autofree char *str_psize = size_to_str(info->ram->page_size);
+        g_autofree char *str_total = size_to_str(info->ram->total);
+        g_autofree char *str_transferred = size_to_str(info->ram->transferred);
+        g_autofree char *str_remaining = size_to_str(info->ram->remaining);
+        g_autofree char *str_precopy = size_to_str(info->ram->precopy_bytes);
+        g_autofree char *str_multifd = size_to_str(info->ram->multifd_bytes);
+        g_autofree char *str_postcopy = size_to_str(info->ram->postcopy_bytes);
+
         monitor_printf(mon, "RAM info:\n");
-        monitor_printf(mon, "  Throughput (Mbps): %0.2f\n",
+        monitor_printf(mon, "  Throughput (Mbps): \t%0.2f\n",
                        info->ram->mbps);
-        monitor_printf(mon, "  Sizes (KiB): pagesize=%" PRIu64
-                       ", total=%" PRIu64 ",\n",
-                       info->ram->page_size >> 10,
-                       info->ram->total >> 10);
-        monitor_printf(mon, "    transferred=%" PRIu64
-                       ", remain=%" PRIu64 ",\n",
-                       info->ram->transferred >> 10,
-                       info->ram->remaining >> 10);
-        monitor_printf(mon, "    precopy=%" PRIu64
-                       ", multifd=%" PRIu64
-                       ", postcopy=%" PRIu64,
-                       info->ram->precopy_bytes >> 10,
-                       info->ram->multifd_bytes >> 10,
-                       info->ram->postcopy_bytes >> 10);
+        monitor_printf(mon, "  Sizes: \t\tpagesize=%s, total=%s\n",
+                       str_psize, str_total);
+        monitor_printf(mon, "  Transfers: \t\ttransferred=%s, remain=%s\n",
+                       str_transferred, str_remaining);
+        monitor_printf(mon, "    Channels: \t\tprecopy=%s, "
+                       "multifd=%s, postcopy=%s",
+                       str_precopy, str_multifd, str_postcopy);
 
         if (info->vfio) {
-            monitor_printf(mon, ", vfio=%" PRIu64,
-                           info->vfio->transferred >> 10);
+            g_autofree char *str_vfio = size_to_str(info->vfio->transferred);
+
+            monitor_printf(mon, ", vfio=%s", str_vfio);
         }
         monitor_printf(mon, "\n");
 
-        monitor_printf(mon, "  Pages: normal=%" PRIu64 ", zero=%" PRIu64
-                       ", rate_per_sec=%" PRIu64 "\n",
-                       info->ram->normal,
-                       info->ram->duplicate,
+        monitor_printf(mon, "    Page Types: \tnormal=%" PRIu64
+                       ", zero=%" PRIu64 "\n",
+                       info->ram->normal, info->ram->duplicate);
+        monitor_printf(mon, "  Page Rates (pps): \ttransfer=%" PRIu64,
                        info->ram->pages_per_second);
-        monitor_printf(mon, "  Others: dirty_syncs=%" PRIu64,
-                       info->ram->dirty_sync_count);
-
         if (info->ram->dirty_pages_rate) {
-            monitor_printf(mon, ", dirty_pages_rate=%" PRIu64,
+            monitor_printf(mon, ", dirty=%" PRIu64,
                            info->ram->dirty_pages_rate);
         }
+        monitor_printf(mon, "\n");
+
+        monitor_printf(mon, "  Others: \t\tdirty_syncs=%" PRIu64,
+                       info->ram->dirty_sync_count);
         if (info->ram->postcopy_requests) {
             monitor_printf(mon, ", postcopy_req=%" PRIu64,
                            info->ram->postcopy_requests);
         }
         if (info->ram->downtime_bytes) {
-            monitor_printf(mon, ", downtime_ram=%" PRIu64,
+            monitor_printf(mon, ", downtime_bytes=%" PRIu64,
                            info->ram->downtime_bytes);
         }
         if (info->ram->dirty_sync_missed_zero_copy) {
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 02/11] migration/hmp: Fix postcopy-blocktime per-vCPU results
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
  2025-06-09 16:18 ` [PATCH v2 01/11] migration/hmp: Reorg "info migrate" once more Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-07-01 14:38   ` Fabiano Rosas
  2025-06-09 16:18 ` [PATCH v2 03/11] migration/docs: Move docs for postcopy blocktime feature Peter Xu
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas,
	Alexey Perevalov, Markus Armbruster

Unfortunately, it was never correctly shown..

This is only found when I started to look into making the blocktime feature
more useful (so as to avoid using bpftrace, even though I'm not sure which
one will be harder to use..).

So the old dump would look like this:

  Postcopy vCPU Blocktime: 0-1,4,10,21,33,46,48,59

Even though there're actually 40 vcpus, and the string will merge same
elements and also sort them.

To fix it, simply loop over the uint32List manually.  Now it looks like:

  Postcopy vCPU Blocktime (ms):
   [15, 0, 0, 43, 29, 34, 36, 29, 37, 41,
    33, 37, 45, 52, 50, 38, 40, 37, 40, 49,
    40, 35, 35, 35, 81, 19, 18, 19, 18, 30,
    22, 3, 0, 0, 0, 0, 0, 0, 0, 0]

Cc: Dr. David Alan Gilbert <dave@treblig.org>
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration-hmp-cmds.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 367ff6037f..6c36e202a0 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -208,15 +208,19 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
     }
 
     if (info->has_postcopy_vcpu_blocktime) {
-        Visitor *v;
-        char *str;
-        v = string_output_visitor_new(false, &str);
-        visit_type_uint32List(v, NULL, &info->postcopy_vcpu_blocktime,
-                              &error_abort);
-        visit_complete(v, &str);
-        monitor_printf(mon, "Postcopy vCPU Blocktime: %s\n", str);
-        g_free(str);
-        visit_free(v);
+        uint32List *item = info->postcopy_vcpu_blocktime;
+        const char *sep = "";
+        int count = 0;
+
+        monitor_printf(mon, "Postcopy vCPU Blocktime (ms): \n [");
+
+        while (item) {
+            monitor_printf(mon, "%s%"PRIu32, sep, item->value);
+            item = item->next;
+            /* Each line 10 vcpu results, newline if there's more */
+            sep = ((++count % 10 == 0) && item) ? ",\n  " : ", ";
+        }
+        monitor_printf(mon, "]\n");
     }
 
 out:
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 03/11] migration/docs: Move docs for postcopy blocktime feature
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
  2025-06-09 16:18 ` [PATCH v2 01/11] migration/hmp: Reorg "info migrate" once more Peter Xu
  2025-06-09 16:18 ` [PATCH v2 02/11] migration/hmp: Fix postcopy-blocktime per-vCPU results Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-06-09 16:18 ` [PATCH v2 04/11] migration/bg-snapshot: Do not check for SKIP in iterator Peter Xu
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas

Move it out of vanilla postcopy session, but instead a standalone feature.
When at it, removing the NOTE because it's incorrect now after introduction
of max-postcopy-bandwidth, which can control the throughput even for
postcopy phase.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 docs/devel/migration/postcopy.rst | 36 +++++++++++++++----------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/docs/devel/migration/postcopy.rst b/docs/devel/migration/postcopy.rst
index 82e7a848c6..e319388d8f 100644
--- a/docs/devel/migration/postcopy.rst
+++ b/docs/devel/migration/postcopy.rst
@@ -33,25 +33,6 @@ will now cause the transition from precopy to postcopy.
 It can be issued immediately after migration is started or any
 time later on.  Issuing it after the end of a migration is harmless.
 
-Blocktime is a postcopy live migration metric, intended to show how
-long the vCPU was in state of interruptible sleep due to pagefault.
-That metric is calculated both for all vCPUs as overlapped value, and
-separately for each vCPU. These values are calculated on destination
-side.  To enable postcopy blocktime calculation, enter following
-command on destination monitor:
-
-``migrate_set_capability postcopy-blocktime on``
-
-Postcopy blocktime can be retrieved by query-migrate qmp command.
-postcopy-blocktime value of qmp command will show overlapped blocking
-time for all vCPU, postcopy-vcpu-blocktime will show list of blocking
-time per vCPU.
-
-.. note::
-  During the postcopy phase, the bandwidth limits set using
-  ``migrate_set_parameter`` is ignored (to avoid delaying requested pages that
-  the destination is waiting for).
-
 Postcopy internals
 ==================
 
@@ -312,3 +293,20 @@ explicitly) to be sent in a separate preempt channel, rather than queued in
 the background migration channel.  Anyone who cares about latencies of page
 faults during a postcopy migration should enable this feature.  By default,
 it's not enabled.
+
+Postcopy blocktime statistics
+-----------------------------
+
+Blocktime is a postcopy live migration metric, intended to show how
+long the vCPU was in state of interruptible sleep due to pagefault.
+That metric is calculated both for all vCPUs as overlapped value, and
+separately for each vCPU. These values are calculated on destination
+side.  To enable postcopy blocktime calculation, enter following
+command on destination monitor:
+
+``migrate_set_capability postcopy-blocktime on``
+
+Postcopy blocktime can be retrieved by query-migrate qmp command.
+postcopy-blocktime value of qmp command will show overlapped blocking
+time for all vCPU, postcopy-vcpu-blocktime will show list of blocking
+time per vCPU.
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 04/11] migration/bg-snapshot: Do not check for SKIP in iterator
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (2 preceding siblings ...)
  2025-06-09 16:18 ` [PATCH v2 03/11] migration/docs: Move docs for postcopy blocktime feature Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-06-09 16:18 ` [PATCH v2 05/11] migration: Drop save_live_complete_postcopy hook Peter Xu
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas

It's not possible to happen in bg-snapshot case.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4098870bce..e33e39ac74 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3887,9 +3887,8 @@ static void *bg_migration_thread(void *opaque)
 
     while (migration_is_active()) {
         MigIterateState iter_state = bg_migration_iteration_run(s);
-        if (iter_state == MIG_ITERATE_SKIP) {
-            continue;
-        } else if (iter_state == MIG_ITERATE_BREAK) {
+
+        if (iter_state == MIG_ITERATE_BREAK) {
             break;
         }
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 05/11] migration: Drop save_live_complete_postcopy hook
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (3 preceding siblings ...)
  2025-06-09 16:18 ` [PATCH v2 04/11] migration/bg-snapshot: Do not check for SKIP in iterator Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-07-01 14:40   ` Fabiano Rosas
  2025-06-09 16:18 ` [PATCH v2 06/11] migration: Rename save_live_complete_precopy to save_complete Peter Xu
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas

The hook is only defined in two vmstate users ("ram" and "block dirty
bitmap"), meanwhile both of them define the hook exactly the same as the
precopy version.  Hence, this postcopy version isn't needed.

No functional change intended.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/migration/register.h   | 24 ++++++++----------------
 migration/block-dirty-bitmap.c |  1 -
 migration/ram.c                |  1 -
 migration/savevm.c             |  9 ++++-----
 4 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index b79dc81b8d..e022195785 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -77,26 +77,18 @@ typedef struct SaveVMHandlers {
      */
     void (*save_cleanup)(void *opaque);
 
-    /**
-     * @save_live_complete_postcopy
-     *
-     * Called at the end of postcopy for all postcopyable devices.
-     *
-     * @f: QEMUFile where to send the data
-     * @opaque: data pointer passed to register_savevm_live()
-     *
-     * Returns zero to indicate success and negative for error
-     */
-    int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
-
     /**
      * @save_live_complete_precopy
      *
      * Transmits the last section for the device containing any
-     * remaining data at the end of a precopy phase. When postcopy is
-     * enabled, devices that support postcopy will skip this step,
-     * where the final data will be flushed at the end of postcopy via
-     * @save_live_complete_postcopy instead.
+     * remaining data at the end phase of migration.
+     *
+     * For precopy, this will be invoked _during_ the switchover phase
+     * after source VM is stopped.
+     *
+     * For postcopy, this will be invoked _after_ the switchover phase
+     * (except some very unusual cases, like PMEM ramblocks), while
+     * destination VM can be running.
      *
      * @f: QEMUFile where to send the data
      * @opaque: data pointer passed to register_savevm_live()
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index f2c352d4a7..6ee3c32a76 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1248,7 +1248,6 @@ static bool dirty_bitmap_has_postcopy(void *opaque)
 
 static SaveVMHandlers savevm_dirty_bitmap_handlers = {
     .save_setup = dirty_bitmap_save_setup,
-    .save_live_complete_postcopy = dirty_bitmap_save_complete,
     .save_live_complete_precopy = dirty_bitmap_save_complete,
     .has_postcopy = dirty_bitmap_has_postcopy,
     .state_pending_exact = dirty_bitmap_state_pending,
diff --git a/migration/ram.c b/migration/ram.c
index fd8d83b63c..8b43b9e1e8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4545,7 +4545,6 @@ void postcopy_preempt_shutdown_file(MigrationState *s)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-    .save_live_complete_postcopy = ram_save_complete,
     .save_live_complete_precopy = ram_save_complete,
     .has_postcopy = ram_has_postcopy,
     .state_pending_exact = ram_state_pending_exact,
diff --git a/migration/savevm.c b/migration/savevm.c
index 52105dd2f1..dfb67bf679 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1485,9 +1485,8 @@ bool should_send_vmdesc(void)
 }
 
 /*
- * Calls the save_live_complete_postcopy methods
- * causing the last few pages to be sent immediately and doing any associated
- * cleanup.
+ * Complete saving any postcopy-able devices.
+ *
  * Note postcopy also calls qemu_savevm_state_complete_precopy to complete
  * all the other devices, but that happens at the point we switch to postcopy.
  */
@@ -1497,7 +1496,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     int ret;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete_postcopy) {
+        if (!se->ops || !se->ops->save_live_complete_precopy) {
             continue;
         }
         if (se->ops->is_active) {
@@ -1510,7 +1509,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
         qemu_put_byte(f, QEMU_VM_SECTION_END);
         qemu_put_be32(f, se->section_id);
 
-        ret = se->ops->save_live_complete_postcopy(f, se->opaque);
+        ret = se->ops->save_live_complete_precopy(f, se->opaque);
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         save_section_footer(f, se);
         if (ret < 0) {
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 06/11] migration: Rename save_live_complete_precopy to save_complete
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (4 preceding siblings ...)
  2025-06-09 16:18 ` [PATCH v2 05/11] migration: Drop save_live_complete_postcopy hook Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-07-01 14:41   ` Fabiano Rosas
  2025-06-09 16:18 ` [PATCH v2 07/11] migration: qemu_savevm_complete*() helpers Peter Xu
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas

Now after merging the precopy and postcopy version of complete() hook,
rename the precopy version from save_live_complete_precopy() to
save_complete().

Dropping the "live" when at it, because it's in most cases not live when
happening (in precopy).

No functional change intended.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/migration/register.h   | 4 ++--
 hw/ppc/spapr.c                 | 2 +-
 hw/s390x/s390-stattrib.c       | 2 +-
 hw/vfio/migration.c            | 2 +-
 migration/block-dirty-bitmap.c | 2 +-
 migration/ram.c                | 2 +-
 migration/savevm.c             | 8 ++++----
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index e022195785..0510534515 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -78,7 +78,7 @@ typedef struct SaveVMHandlers {
     void (*save_cleanup)(void *opaque);
 
     /**
-     * @save_live_complete_precopy
+     * @save_complete
      *
      * Transmits the last section for the device containing any
      * remaining data at the end phase of migration.
@@ -95,7 +95,7 @@ typedef struct SaveVMHandlers {
      *
      * Returns zero to indicate success and negative for error
      */
-    int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
+    int (*save_complete)(QEMUFile *f, void *opaque);
 
     /**
      * @save_live_complete_precopy_thread (invoked in a separate thread)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 702f774cda..c5d30f2ebd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2518,7 +2518,7 @@ static void htab_save_cleanup(void *opaque)
 static SaveVMHandlers savevm_htab_handlers = {
     .save_setup = htab_save_setup,
     .save_live_iterate = htab_save_iterate,
-    .save_live_complete_precopy = htab_save_complete,
+    .save_complete = htab_save_complete,
     .save_cleanup = htab_save_cleanup,
     .load_state = htab_load,
 };
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index f74cf32636..13a678a803 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -338,7 +338,7 @@ static const TypeInfo qemu_s390_stattrib_info = {
 static SaveVMHandlers savevm_s390_stattrib_handlers = {
     .save_setup = cmma_save_setup,
     .save_live_iterate = cmma_save_iterate,
-    .save_live_complete_precopy = cmma_save_complete,
+    .save_complete = cmma_save_complete,
     .state_pending_exact = cmma_state_pending,
     .state_pending_estimate = cmma_state_pending,
     .save_cleanup = cmma_save_cleanup,
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index b76697bd1a..33a71f8999 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -824,7 +824,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
     .state_pending_exact = vfio_state_pending_exact,
     .is_active_iterate = vfio_is_active_iterate,
     .save_live_iterate = vfio_save_iterate,
-    .save_live_complete_precopy = vfio_save_complete_precopy,
+    .save_complete = vfio_save_complete_precopy,
     .save_state = vfio_save_state,
     .load_setup = vfio_load_setup,
     .load_cleanup = vfio_load_cleanup,
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 6ee3c32a76..a061aad817 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1248,7 +1248,7 @@ static bool dirty_bitmap_has_postcopy(void *opaque)
 
 static SaveVMHandlers savevm_dirty_bitmap_handlers = {
     .save_setup = dirty_bitmap_save_setup,
-    .save_live_complete_precopy = dirty_bitmap_save_complete,
+    .save_complete = dirty_bitmap_save_complete,
     .has_postcopy = dirty_bitmap_has_postcopy,
     .state_pending_exact = dirty_bitmap_state_pending,
     .state_pending_estimate = dirty_bitmap_state_pending,
diff --git a/migration/ram.c b/migration/ram.c
index 8b43b9e1e8..ed380ac86f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4545,7 +4545,7 @@ void postcopy_preempt_shutdown_file(MigrationState *s)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-    .save_live_complete_precopy = ram_save_complete,
+    .save_complete = ram_save_complete,
     .has_postcopy = ram_has_postcopy,
     .state_pending_exact = ram_state_pending_exact,
     .state_pending_estimate = ram_state_pending_estimate,
diff --git a/migration/savevm.c b/migration/savevm.c
index dfb67bf679..7dba367d33 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1496,7 +1496,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     int ret;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete_precopy) {
+        if (!se->ops || !se->ops->save_complete) {
             continue;
         }
         if (se->ops->is_active) {
@@ -1509,7 +1509,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
         qemu_put_byte(f, QEMU_VM_SECTION_END);
         qemu_put_be32(f, se->section_id);
 
-        ret = se->ops->save_live_complete_precopy(f, se->opaque);
+        ret = se->ops->save_complete(f, se->opaque);
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         save_section_footer(f, se);
         if (ret < 0) {
@@ -1583,7 +1583,7 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
         if (!se->ops ||
             (in_postcopy && se->ops->has_postcopy &&
              se->ops->has_postcopy(se->opaque)) ||
-            !se->ops->save_live_complete_precopy) {
+            !se->ops->save_complete) {
             continue;
         }
 
@@ -1598,7 +1598,7 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 
         save_section_header(f, se, QEMU_VM_SECTION_END);
 
-        ret = se->ops->save_live_complete_precopy(f, se->opaque);
+        ret = se->ops->save_complete(f, se->opaque);
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         save_section_footer(f, se);
         if (ret < 0) {
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 07/11] migration: qemu_savevm_complete*() helpers
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (5 preceding siblings ...)
  2025-06-09 16:18 ` [PATCH v2 06/11] migration: Rename save_live_complete_precopy to save_complete Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-07-01 14:53   ` Fabiano Rosas
  2025-06-09 16:18 ` [PATCH v2 08/11] migration/ram: One less indent for ram_find_and_save_block() Peter Xu
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas

Since we use the same save_complete() hook for both precopy and postcopy,
add a set of helpers to invoke the hook() to dedup the code.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c | 78 ++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 7dba367d33..1efb7199c0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1484,6 +1484,38 @@ bool should_send_vmdesc(void)
     return !machine->suppress_vmdesc;
 }
 
+static bool qemu_savevm_complete_exists(SaveStateEntry *se)
+{
+    return se->ops && se->ops->save_complete;
+}
+
+/*
+ * Invoke the ->save_complete() if necessary.
+ * Returns: 0 if skip the current SE or succeeded, <0 if error happened.
+ */
+static int qemu_savevm_complete(SaveStateEntry *se, QEMUFile *f)
+{
+    int ret;
+
+    if (se->ops->is_active) {
+        if (!se->ops->is_active(se->opaque)) {
+            return 0;
+        }
+    }
+
+    trace_savevm_section_start(se->idstr, se->section_id);
+    save_section_header(f, se, QEMU_VM_SECTION_END);
+    ret = se->ops->save_complete(f, se->opaque);
+    trace_savevm_section_end(se->idstr, se->section_id, ret);
+    save_section_footer(f, se);
+
+    if (ret < 0) {
+        qemu_file_set_error(f, ret);
+    }
+
+    return ret;
+}
+
 /*
  * Complete saving any postcopy-able devices.
  *
@@ -1493,27 +1525,13 @@ bool should_send_vmdesc(void)
 void qemu_savevm_state_complete_postcopy(QEMUFile *f)
 {
     SaveStateEntry *se;
-    int ret;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_complete) {
+        if (!qemu_savevm_complete_exists(se)) {
             continue;
         }
-        if (se->ops->is_active) {
-            if (!se->ops->is_active(se->opaque)) {
-                continue;
-            }
-        }
-        trace_savevm_section_start(se->idstr, se->section_id);
-        /* Section type */
-        qemu_put_byte(f, QEMU_VM_SECTION_END);
-        qemu_put_be32(f, se->section_id);
 
-        ret = se->ops->save_complete(f, se->opaque);
-        trace_savevm_section_end(se->idstr, se->section_id, ret);
-        save_section_footer(f, se);
-        if (ret < 0) {
-            qemu_file_set_error(f, ret);
+        if (qemu_savevm_complete(se, f) < 0) {
             return;
         }
     }
@@ -1559,7 +1577,6 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 {
     int64_t start_ts_each, end_ts_each;
     SaveStateEntry *se;
-    int ret;
     bool multifd_device_state = multifd_device_state_supported();
 
     if (multifd_device_state) {
@@ -1580,32 +1597,25 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
     }
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops ||
-            (in_postcopy && se->ops->has_postcopy &&
-             se->ops->has_postcopy(se->opaque)) ||
-            !se->ops->save_complete) {
+        if (!qemu_savevm_complete_exists(se)) {
             continue;
         }
 
-        if (se->ops->is_active) {
-            if (!se->ops->is_active(se->opaque)) {
-                continue;
-            }
+        if (in_postcopy && se->ops->has_postcopy &&
+            se->ops->has_postcopy(se->opaque)) {
+            /*
+             * If postcopy will start soon, and if the SE supports
+             * postcopy, then we can skip the SE for the postcopy phase.
+             */
+            continue;
         }
 
         start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
-        trace_savevm_section_start(se->idstr, se->section_id);
-
-        save_section_header(f, se, QEMU_VM_SECTION_END);
-
-        ret = se->ops->save_complete(f, se->opaque);
-        trace_savevm_section_end(se->idstr, se->section_id, ret);
-        save_section_footer(f, se);
-        if (ret < 0) {
-            qemu_file_set_error(f, ret);
+        if (qemu_savevm_complete(se, f) < 0) {
             goto ret_fail_abort_threads;
         }
         end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+
         trace_vmstate_downtime_save("iterable", se->idstr, se->instance_id,
                                     end_ts_each - start_ts_each);
     }
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 08/11] migration/ram: One less indent for ram_find_and_save_block()
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (6 preceding siblings ...)
  2025-06-09 16:18 ` [PATCH v2 07/11] migration: qemu_savevm_complete*() helpers Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-06-09 16:18 ` [PATCH v2 09/11] migration/ram: Add tracepoints for ram_save_complete() Peter Xu
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas

The check over PAGE_DIRTY_FOUND isn't necessary. We could indent one less
and assert that instead.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ed380ac86f..c66ad3cf8b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2285,16 +2285,18 @@ static int ram_find_and_save_block(RAMState *rs)
         if (!get_queued_page(rs, pss)) {
             /* priority queue empty, so just search for something dirty */
             int res = find_dirty_block(rs, pss);
-            if (res != PAGE_DIRTY_FOUND) {
-                if (res == PAGE_ALL_CLEAN) {
-                    break;
-                } else if (res == PAGE_TRY_AGAIN) {
-                    continue;
-                } else if (res < 0) {
-                    pages = res;
-                    break;
-                }
+
+            if (res == PAGE_ALL_CLEAN) {
+                break;
+            } else if (res == PAGE_TRY_AGAIN) {
+                continue;
+            } else if (res < 0) {
+                pages = res;
+                break;
             }
+
+            /* Otherwise we must have a dirty page to move */
+            assert(res == PAGE_DIRTY_FOUND);
         }
         pages = ram_save_host_page(rs, pss);
         if (pages) {
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 09/11] migration/ram: Add tracepoints for ram_save_complete()
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (7 preceding siblings ...)
  2025-06-09 16:18 ` [PATCH v2 08/11] migration/ram: One less indent for ram_find_and_save_block() Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-06-09 16:18 ` [PATCH v2 10/11] migration: Rewrite the migration complete detect logic Peter Xu
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas

Take notes on start/end state of dirty pages for the whole system.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c        | 5 +++++
 migration/trace-events | 1 +
 2 files changed, 6 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index c66ad3cf8b..a1d0e8ada2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3289,6 +3289,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     RAMState *rs = *temp;
     int ret = 0;
 
+    trace_ram_save_complete(rs->migration_dirty_pages, 0);
+
     rs->last_stage = !migration_in_colo_state();
 
     WITH_RCU_READ_LOCK_GUARD() {
@@ -3352,6 +3354,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     }
 
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+
+    trace_ram_save_complete(rs->migration_dirty_pages, 1);
+
     return qemu_fflush(f);
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index c506e11a2e..dcd8fe9a0c 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -105,6 +105,7 @@ ram_load_postcopy_loop(int channel, uint64_t addr, int flags) "chan=%d addr=0x%"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
+ram_save_complete(uint64_t dirty_pages, int done) "dirty=%" PRIu64 ", done=%d"
 ram_dirty_bitmap_request(char *str) "%s"
 ram_dirty_bitmap_reload_begin(char *str) "%s"
 ram_dirty_bitmap_reload_complete(char *str) "%s"
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 10/11] migration: Rewrite the migration complete detect logic
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (8 preceding siblings ...)
  2025-06-09 16:18 ` [PATCH v2 09/11] migration/ram: Add tracepoints for ram_save_complete() Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-07-01 17:35   ` Fabiano Rosas
  2025-06-09 16:18 ` [PATCH v2 11/11] migration/postcopy: Avoid clearing dirty bitmap for postcopy too Peter Xu
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas

There're a few things off here in that logic, rewrite it.  When at it, add
rich comment to explain each of the decisions.

Since this is very sensitive path for migration, below are the list of
things changed with their reasonings.

  (1) Exact pending size is only needed for precopy not postcopy

      Fundamentally it's because "exact" version only does one more deep
      sync to fetch the pending results, while in postcopy's case it's
      never going to sync anything more than estimate as the VM on source
      is stopped.

  (2) Do _not_ rely on threshold_size anymore to decide whether postcopy
      should complete

      threshold_size was calculated from the expected downtime and
      bandwidth only during precopy as an efficient way to decide when to
      switchover.  It's not sensible to rely on threshold_size in postcopy.

      For precopy, if switchover is decided, the migration will complete
      soon.  It's not true for postcopy.  Logically speaking, postcopy
      should only complete the migration if all pending data is flushed.

      Here it used to work because save_complete() used to implicitly
      contain save_live_iterate() when there's pending size.

      Even if that looks benign, having RAMs to be migrated in postcopy's
      save_complete() has other bad side effects:

      (a) Since save_complete() needs to be run once at a time, it means
      when moving RAM there's no way moving other things (rather than
      round-robin iterating the vmstate handlers like what we do with
      ITERABLE phase).  Not an immediate concern, but it may stop working
      in the future when there're more than one iterables (e.g. vfio
      postcopy).

      (b) postcopy recovery, unfortunately, only works during ITERABLE
      phase. IOW, if src QEMU moves RAM during postcopy's save_complete()
      and network failed, then it'll crash both QEMUs... OTOH if it failed
      during iteration it'll still be recoverable.  IOW, this change should
      further reduce the window QEMU split brain and crash in extreme cases.

      If we enable the ram_save_complete() tracepoints, we'll see this
      before this patch:

      1267959@1748381938.294066:ram_save_complete dirty=9627, done=0
      1267959@1748381938.308884:ram_save_complete dirty=0, done=1

      It means in this migration there're 9627 pages migrated at complete()
      of postcopy phase.

      After this change, all the postcopy RAM should be migrated in iterable
      phase, rather than save_complete():

      1267959@1748381938.294066:ram_save_complete dirty=0, done=0
      1267959@1748381938.308884:ram_save_complete dirty=0, done=1

  (3) Adjust when to decide to switch to postcopy

      This shouldn't be super important, the movement makes sure there's
      only one in_postcopy check, then we are clear on what we do with the
      two completely differnt use cases (precopy v.s. postcopy).

  (4) Trivial touch up on threshold_size comparision

      Which changes:

      "(!pending_size || pending_size < s->threshold_size)"

      into:

      "(pending_size <= s->threshold_size)"

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 56 +++++++++++++++++++++++++++++++------------
 1 file changed, 41 insertions(+), 15 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e33e39ac74..1a26a4bfef 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3436,33 +3436,59 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     Error *local_err = NULL;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
     bool can_switchover = migration_can_switchover(s);
+    bool complete_ready;
 
+    /* Fast path - get the estimated amount of pending data */
     qemu_savevm_state_pending_estimate(&must_precopy, &can_postcopy);
     pending_size = must_precopy + can_postcopy;
     trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy);
 
-    if (pending_size < s->threshold_size) {
-        qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
-        pending_size = must_precopy + can_postcopy;
-        trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
+    if (in_postcopy) {
+        /*
+         * Iterate in postcopy until all pending data flushed.  Note that
+         * postcopy completion doesn't rely on can_switchover, because when
+         * POSTCOPY_ACTIVE it means switchover already happened.
+         */
+        complete_ready = !pending_size;
+    } else {
+        /*
+         * Exact pending reporting is only needed for precopy.  Taking RAM
+         * as example, there'll be no extra dirty information after
+         * postcopy started, so ESTIMATE should always match with EXACT
+         * during postcopy phase.
+         */
+        if (pending_size < s->threshold_size) {
+            qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
+            pending_size = must_precopy + can_postcopy;
+            trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
+        }
+
+        /* Should we switch to postcopy now? */
+        if (must_precopy <= s->threshold_size &&
+            can_switchover && qatomic_read(&s->start_postcopy)) {
+            if (postcopy_start(s, &local_err)) {
+                migrate_set_error(s, local_err);
+                error_report_err(local_err);
+            }
+            return MIG_ITERATE_SKIP;
+        }
+
+        /*
+         * For precopy, migration can complete only if:
+         *
+         * (1) Switchover is acknowledged by destination
+         * (2) Pending size is no more than the threshold specified
+         *     (which was calculated from expected downtime)
+         */
+        complete_ready = can_switchover && (pending_size <= s->threshold_size);
     }
 
-    if ((!pending_size || pending_size < s->threshold_size) && can_switchover) {
+    if (complete_ready) {
         trace_migration_thread_low_pending(pending_size);
         migration_completion(s);
         return MIG_ITERATE_BREAK;
     }
 
-    /* Still a significant amount to transfer */
-    if (!in_postcopy && must_precopy <= s->threshold_size && can_switchover &&
-        qatomic_read(&s->start_postcopy)) {
-        if (postcopy_start(s, &local_err)) {
-            migrate_set_error(s, local_err);
-            error_report_err(local_err);
-        }
-        return MIG_ITERATE_SKIP;
-    }
-
     /* Just another iteration step */
     qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
     return MIG_ITERATE_RESUME;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 11/11] migration/postcopy: Avoid clearing dirty bitmap for postcopy too
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (9 preceding siblings ...)
  2025-06-09 16:18 ` [PATCH v2 10/11] migration: Rewrite the migration complete detect logic Peter Xu
@ 2025-06-09 16:18 ` Peter Xu
  2025-06-11  6:15 ` [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Mario Casquero
  2025-06-11 21:35 ` Peter Xu
  12 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2025-06-09 16:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Fabiano Rosas,
	Yanfei Xu

This is a follow up on the other commit "migration/ram: avoid to do log
clear in the last round" but for postcopy.

https://lore.kernel.org/r/20250514115827.3216082-1-yanfei.xu@bytedance.com

I can observe more than 10% reduction of average page fault latency during
postcopy phase with this optimization:

  Before: 268.00us (+-1.87%)
  After:  232.67us (+-2.01%)

The test was done with a 16GB VM with 80 vCPUs, running a workload that
busy random writes to 13GB memory.

Cc: Yanfei Xu <yanfei.xu@bytedance.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index a1d0e8ada2..cd4aafd15c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -835,8 +835,10 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
      * protections isn't needed as we know there will be either (1) no
      * further writes if migration will complete, or (2) migration fails
      * at last then tracking isn't needed either.
+     *
+     * Do the same for postcopy due to the same reason.
      */
-    if (!rs->last_stage) {
+    if (!rs->last_stage && !migration_in_postcopy()) {
         /*
          * Clear dirty bitmap if needed.  This _must_ be called before we
          * send any of the page in the chunk because we need to make sure
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (10 preceding siblings ...)
  2025-06-09 16:18 ` [PATCH v2 11/11] migration/postcopy: Avoid clearing dirty bitmap for postcopy too Peter Xu
@ 2025-06-11  6:15 ` Mario Casquero
  2025-06-11 13:06   ` Peter Xu
  2025-06-11 21:35 ` Peter Xu
  12 siblings, 1 reply; 22+ messages in thread
From: Mario Casquero @ 2025-06-11  6:15 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juraj Marcin, Dr . David Alan Gilbert, Fabiano Rosas

This series has been successfully tested. The information displayed
from the HMP info migrate command is more user-friendly, with the
possibility of displaying the globals with info migrate -a.
(qemu) info migrate -a
Status: active
Sockets: [
tcp::::8888
]
Globals:
  store-global-state: on
  only-migratable: off
  send-configuration: on
  send-section-footer: on
  send-switchover-start: on
  clear-bitmap-shift: 18

Tested-by: Mario Casquero <mcasquer@redhat.com>


On Mon, Jun 9, 2025 at 6:20 PM Peter Xu <peterx@redhat.com> wrote:
>
> v2:
> - Collected R-bs
> - Avoid using "\b" in HMP dumps [Markus, Dave]
>
> The series is based on a small patch from Yanfei Xu here:
>
> Based-on: <20250514115827.3216082-1-yanfei.xu@bytedance.com>
> https://lore.kernel.org/r/20250514115827.3216082-1-yanfei.xu@bytedance.com
>
> This is a series that collected many of either enhancements or cleanups I
> got for QEMU 10.1, which almost came from when working on the last patch.
>
> The last patch, which is a oneliner, can further reduce 10% postcopy page
> fault latency with preempt mode enabled.
>
>       Before: 268.00us (+-1.87%)
>       After:  232.67us (+-2.01%)
>
> The patch layout is as following:
>
> Patch 1:         A follow up of HMP change for "info migrate", per
>                  suggestion from Dave
> Patch 2:         Yet another HMP fix for blocktime displays
> Patch 3-10:      Cleanups everywhere, especially please take a look at
>                  patch 10 which changes the core switchover decision logic
> Patch 11:        The one-liner optimization
>
> Comments welcomed, thanks.
>
> Peter Xu (11):
>   migration/hmp: Reorg "info migrate" once more
>   migration/hmp: Fix postcopy-blocktime per-vCPU results
>   migration/docs: Move docs for postcopy blocktime feature
>   migration/bg-snapshot: Do not check for SKIP in iterator
>   migration: Drop save_live_complete_postcopy hook
>   migration: Rename save_live_complete_precopy to save_complete
>   migration: qemu_savevm_complete*() helpers
>   migration/ram: One less indent for ram_find_and_save_block()
>   migration/ram: Add tracepoints for ram_save_complete()
>   migration: Rewrite the migration complete detect logic
>   migration/postcopy: Avoid clearing dirty bitmap for postcopy too
>
>  docs/devel/migration/postcopy.rst | 36 +++++++-------
>  include/migration/register.h      | 26 ++++------
>  hw/ppc/spapr.c                    |  2 +-
>  hw/s390x/s390-stattrib.c          |  2 +-
>  hw/vfio/migration.c               |  2 +-
>  migration/block-dirty-bitmap.c    |  3 +-
>  migration/migration-hmp-cmds.c    | 81 ++++++++++++++++--------------
>  migration/migration.c             | 61 ++++++++++++++++-------
>  migration/ram.c                   | 32 +++++++-----
>  migration/savevm.c                | 83 +++++++++++++++++--------------
>  migration/trace-events            |  1 +
>  11 files changed, 184 insertions(+), 145 deletions(-)
>
> --
> 2.49.0
>
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1
  2025-06-11  6:15 ` [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Mario Casquero
@ 2025-06-11 13:06   ` Peter Xu
  2025-06-12 10:35     ` Mario Casquero
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2025-06-11 13:06 UTC (permalink / raw)
  To: Mario Casquero
  Cc: qemu-devel, Juraj Marcin, Dr . David Alan Gilbert, Fabiano Rosas

On Wed, Jun 11, 2025 at 08:15:55AM +0200, Mario Casquero wrote:
> This series has been successfully tested. The information displayed
> from the HMP info migrate command is more user-friendly, with the
> possibility of displaying the globals with info migrate -a.
> (qemu) info migrate -a
> Status: active
> Sockets: [
> tcp::::8888
> ]
> Globals:
>   store-global-state: on
>   only-migratable: off
>   send-configuration: on
>   send-section-footer: on
>   send-switchover-start: on
>   clear-bitmap-shift: 18
> 
> Tested-by: Mario Casquero <mcasquer@redhat.com>

Hey, Mario,

Thanks for doing this!

This is a specific HMP dump test on recv side, just to mention the major
change will be on the src side, so feel free to try that too.  That's what
patch 1 does.

Patch 2 changed recv side report for blocktime, but in your case you didn't
enable it, to cover tests on patch 2, you can enable postcopy-blocktime
feature and kickoff a postcopy migration.

And just to mention, the real meat in this series is actually the last
patch. :) If you want to test that, you'd likely want to apply another of
my series:

https://lore.kernel.org/r/20250609191259.9053-1-peterx@redhat.com

Then invoke postcopy test with some loads, then check the blocktime reports
again.  The other series added latency tracking to blocktime.  With that
extra series applied, you should be able to observe average page fault
latency reduction after the last patch, aka, the meat.

Note that this is not a request to have you test everything!  Just to
mention the bits from test perspective, so just take it as FYI.  I
appreciate your help already to test on the recv side!

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1
  2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
                   ` (11 preceding siblings ...)
  2025-06-11  6:15 ` [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Mario Casquero
@ 2025-06-11 21:35 ` Peter Xu
  12 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2025-06-11 21:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juraj Marcin, Dr . David Alan Gilbert, Fabiano Rosas

On Mon, Jun 09, 2025 at 12:18:44PM -0400, Peter Xu wrote:
> v2:
> - Collected R-bs
> - Avoid using "\b" in HMP dumps [Markus, Dave]
> 
> The series is based on a small patch from Yanfei Xu here:
> 
> Based-on: <20250514115827.3216082-1-yanfei.xu@bytedance.com>
> https://lore.kernel.org/r/20250514115827.3216082-1-yanfei.xu@bytedance.com
> 
> This is a series that collected many of either enhancements or cleanups I
> got for QEMU 10.1, which almost came from when working on the last patch.
> 
> The last patch, which is a oneliner, can further reduce 10% postcopy page
> fault latency with preempt mode enabled.
> 
>       Before: 268.00us (+-1.87%)
>       After:  232.67us (+-2.01%)
> 
> The patch layout is as following:
> 
> Patch 1:         A follow up of HMP change for "info migrate", per
>                  suggestion from Dave
> Patch 2:         Yet another HMP fix for blocktime displays
> Patch 3-10:      Cleanups everywhere, especially please take a look at
>                  patch 10 which changes the core switchover decision logic
> Patch 11:        The one-liner optimization
> 
> Comments welcomed, thanks.
> 
> Peter Xu (11):
>   migration/hmp: Reorg "info migrate" once more
>   migration/hmp: Fix postcopy-blocktime per-vCPU results
>   migration/docs: Move docs for postcopy blocktime feature
>   migration/bg-snapshot: Do not check for SKIP in iterator
>   migration: Drop save_live_complete_postcopy hook
>   migration: Rename save_live_complete_precopy to save_complete
>   migration: qemu_savevm_complete*() helpers
>   migration/ram: One less indent for ram_find_and_save_block()
>   migration/ram: Add tracepoints for ram_save_complete()
>   migration: Rewrite the migration complete detect logic
>   migration/postcopy: Avoid clearing dirty bitmap for postcopy too

There're two checkpatch issues need fixing.  Two fixups will be needed as
below, one remove a space, one fix 80 chars.  I'll squash if I'm sending
new versions.

Sorry for the noise.

===8<===

From 25356e1262006fd668ba4e29b01325b5e784e19a Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 11 Jun 2025 17:23:00 -0400
Subject: [PATCH] fixup! migration/hmp: Fix postcopy-blocktime per-vCPU results

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration-hmp-cmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 6c36e202a0..867e017b32 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -212,7 +212,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
         const char *sep = "";
         int count = 0;
 
-        monitor_printf(mon, "Postcopy vCPU Blocktime (ms): \n [");
+        monitor_printf(mon, "Postcopy vCPU Blocktime (ms):\n [");
 
         while (item) {
             monitor_printf(mon, "%s%"PRIu32, sep, item->value);
-- 
2.49.0


From 58dfb3e311fb477732d0f109886d02adcb439e14 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 11 Jun 2025 17:23:38 -0400
Subject: [PATCH] fixup! migration: Rewrite the migration complete detect logic

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 1a26a4bfef..923400f801 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3460,7 +3460,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
         if (pending_size < s->threshold_size) {
             qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
             pending_size = must_precopy + can_postcopy;
-            trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
+            trace_migrate_pending_exact(pending_size, must_precopy,
+                                        can_postcopy);
         }
 
         /* Should we switch to postcopy now? */
-- 
2.49.0


-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1
  2025-06-11 13:06   ` Peter Xu
@ 2025-06-12 10:35     ` Mario Casquero
  2025-06-12 11:43       ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Mario Casquero @ 2025-06-12 10:35 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juraj Marcin, Dr . David Alan Gilbert, Fabiano Rosas

Hello Peter,

Thanks for pointing this out! I retested it with the series you
mentioned and everything works fine.

Booted up 2 VMs as usual, one in source and one in destination with
-incoming defer. Set the postcopy-blocktime and postcopy-ram
capabilities and query them to verify that they are enabled.

(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate_set_capability postcopy-blocktime on
(qemu) info migrate_capabilities

...
postcopy-ram: on
...
postcopy-blocktime: on
...

Do migration with postcopy, this time check the full info migrate in source.
(qemu) info migrate  -a
Status: postcopy-active
Time (ms): total=6522, setup=33, down=16
RAM info:
  Throughput (Mbps): 949.60
  Sizes: pagesize=4 KiB, total=16 GiB
  Transfers: transferred=703 MiB, remain=5.4 GiB
    Channels: precopy=111 MiB, multifd=0 B, postcopy=592 MiB
    Page Types: normal=178447, zero=508031
  Page Rates (pps): transfer=167581
  Others: dirty_syncs=2, postcopy_req=1652
Globals:
  store-global-state: on
  only-migratable: off
  send-configuration: on
  send-section-footer: on
  send-switchover-start: on
  clear-bitmap-shift: 18

Once migration is completed compare the differences in destination
about the postcopy blocktime.

(qemu) info migrate -a
Status: completed
Globals:
...
Postcopy Blocktime (ms): 712
Postcopy vCPU Blocktime (ms):
 [1633, 1635, 1710, 2097, 2595, 1993, 1958, 1214]

With all the series applied and same VM:

(qemu) info migrate -a
Status: completed
Globals:
...
Postcopy Blocktime (ms): 134
Postcopy vCPU Blocktime (ms):
 [1310, 1064, 1112, 1400, 1334, 756, 1216, 1420]
Postcopy Latency (us): 16075
Postcopy non-vCPU Latencies (us): 14743
Postcopy vCPU Latencies (us):
 [24730, 25350, 27125, 25930, 23825, 29110, 22960, 26304]

Indeed the Postcopy Blocktime has been reduced a lot :)

Thanks,
Mario

On Wed, Jun 11, 2025 at 3:06 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jun 11, 2025 at 08:15:55AM +0200, Mario Casquero wrote:
> > This series has been successfully tested. The information displayed
> > from the HMP info migrate command is more user-friendly, with the
> > possibility of displaying the globals with info migrate -a.
> > (qemu) info migrate -a
> > Status: active
> > Sockets: [
> > tcp::::8888
> > ]
> > Globals:
> >   store-global-state: on
> >   only-migratable: off
> >   send-configuration: on
> >   send-section-footer: on
> >   send-switchover-start: on
> >   clear-bitmap-shift: 18
> >
> > Tested-by: Mario Casquero <mcasquer@redhat.com>
>
> Hey, Mario,
>
> Thanks for doing this!
>
> This is a specific HMP dump test on recv side, just to mention the major
> change will be on the src side, so feel free to try that too.  That's what
> patch 1 does.
>
> Patch 2 changed recv side report for blocktime, but in your case you didn't
> enable it, to cover tests on patch 2, you can enable postcopy-blocktime
> feature and kickoff a postcopy migration.
>
> And just to mention, the real meat in this series is actually the last
> patch. :) If you want to test that, you'd likely want to apply another of
> my series:
>
> https://lore.kernel.org/r/20250609191259.9053-1-peterx@redhat.com
>
> Then invoke postcopy test with some loads, then check the blocktime reports
> again.  The other series added latency tracking to blocktime.  With that
> extra series applied, you should be able to observe average page fault
> latency reduction after the last patch, aka, the meat.
>
> Note that this is not a request to have you test everything!  Just to
> mention the bits from test perspective, so just take it as FYI.  I
> appreciate your help already to test on the recv side!
>
> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1
  2025-06-12 10:35     ` Mario Casquero
@ 2025-06-12 11:43       ` Peter Xu
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2025-06-12 11:43 UTC (permalink / raw)
  To: Mario Casquero
  Cc: qemu-devel, Juraj Marcin, Dr . David Alan Gilbert, Fabiano Rosas

On Thu, Jun 12, 2025 at 12:35:46PM +0200, Mario Casquero wrote:
> Hello Peter,

Hi, Mario,

> 
> Thanks for pointing this out! I retested it with the series you
> mentioned and everything works fine.
> 
> Booted up 2 VMs as usual, one in source and one in destination with
> -incoming defer. Set the postcopy-blocktime and postcopy-ram
> capabilities and query them to verify that they are enabled.
> 
> (qemu) migrate_set_capability postcopy-ram on
> (qemu) migrate_set_capability postcopy-blocktime on
> (qemu) info migrate_capabilities
> 
> ...
> postcopy-ram: on
> ...
> postcopy-blocktime: on
> ...
> 
> Do migration with postcopy, this time check the full info migrate in source.
> (qemu) info migrate  -a
> Status: postcopy-active
> Time (ms): total=6522, setup=33, down=16
> RAM info:
>   Throughput (Mbps): 949.60
>   Sizes: pagesize=4 KiB, total=16 GiB
>   Transfers: transferred=703 MiB, remain=5.4 GiB
>     Channels: precopy=111 MiB, multifd=0 B, postcopy=592 MiB
>     Page Types: normal=178447, zero=508031
>   Page Rates (pps): transfer=167581
>   Others: dirty_syncs=2, postcopy_req=1652
> Globals:
>   store-global-state: on
>   only-migratable: off
>   send-configuration: on
>   send-section-footer: on
>   send-switchover-start: on
>   clear-bitmap-shift: 18
> 
> Once migration is completed compare the differences in destination
> about the postcopy blocktime.
> 
> (qemu) info migrate -a
> Status: completed
> Globals:
> ...
> Postcopy Blocktime (ms): 712
> Postcopy vCPU Blocktime (ms):
>  [1633, 1635, 1710, 2097, 2595, 1993, 1958, 1214]
> 
> With all the series applied and same VM:
> 
> (qemu) info migrate -a
> Status: completed
> Globals:
> ...
> Postcopy Blocktime (ms): 134
> Postcopy vCPU Blocktime (ms):
>  [1310, 1064, 1112, 1400, 1334, 756, 1216, 1420]
> Postcopy Latency (us): 16075

Here the latency is 16ms, my fault here - I forgot to let you enable
postcopy-preempt as well, sorry.

The optimization won't help much without preempt, because the optimization
is in tens of microseconds level.  So logically the optimization might be
buried in the noise if without preempt mode.  It's suggested to enable
preempt mode always for a postcopy migration whenever available.

> Postcopy non-vCPU Latencies (us): 14743
> Postcopy vCPU Latencies (us):
>  [24730, 25350, 27125, 25930, 23825, 29110, 22960, 26304]
> 
> Indeed the Postcopy Blocktime has been reduced a lot :)

I haven't compared with blocktime before, I'm surprised it changed that
much.  Though maybe you didn't really run any workload inside?  In that
case the results can be unpredictable.  The perf test would make more sense
if you run some loads so the majority of the faults triggered will not be
adhoc system probes but more predictable.  I normally use mig_mon [1] with
something like this:

[1] https://github.com/xzpeter/mig_mon

$ ./mig_mon mm_dirty -m 13G -p random

This first write pre-fault the whole memory using all the CPUs, then
dirties the 13G memory single threaded as fast as possible in random
fashion.

What I did with the test was applying both series then revert the last
patch of 1st series, as "postcopy-latency' metrics wasn't around before
applying the 2nd series, or you'll need to use some kernel tracepoints.

This is definitely an awkward series to test when having the two mangled.
Again, feel free to skip that, just FYI!

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 02/11] migration/hmp: Fix postcopy-blocktime per-vCPU results
  2025-06-09 16:18 ` [PATCH v2 02/11] migration/hmp: Fix postcopy-blocktime per-vCPU results Peter Xu
@ 2025-07-01 14:38   ` Fabiano Rosas
  0 siblings, 0 replies; 22+ messages in thread
From: Fabiano Rosas @ 2025-07-01 14:38 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert, Alexey Perevalov,
	Markus Armbruster

Peter Xu <peterx@redhat.com> writes:

> Unfortunately, it was never correctly shown..
>
> This is only found when I started to look into making the blocktime feature
> more useful (so as to avoid using bpftrace, even though I'm not sure which
> one will be harder to use..).
>
> So the old dump would look like this:
>
>   Postcopy vCPU Blocktime: 0-1,4,10,21,33,46,48,59
>
> Even though there're actually 40 vcpus, and the string will merge same
> elements and also sort them.
>
> To fix it, simply loop over the uint32List manually.  Now it looks like:
>
>   Postcopy vCPU Blocktime (ms):
>    [15, 0, 0, 43, 29, 34, 36, 29, 37, 41,
>     33, 37, 45, 52, 50, 38, 40, 37, 40, 49,
>     40, 35, 35, 35, 81, 19, 18, 19, 18, 30,
>     22, 3, 0, 0, 0, 0, 0, 0, 0, 0]
>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Cc: Alexey Perevalov <a.perevalov@samsung.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 05/11] migration: Drop save_live_complete_postcopy hook
  2025-06-09 16:18 ` [PATCH v2 05/11] migration: Drop save_live_complete_postcopy hook Peter Xu
@ 2025-07-01 14:40   ` Fabiano Rosas
  0 siblings, 0 replies; 22+ messages in thread
From: Fabiano Rosas @ 2025-07-01 14:40 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> writes:

> The hook is only defined in two vmstate users ("ram" and "block dirty
> bitmap"), meanwhile both of them define the hook exactly the same as the
> precopy version.  Hence, this postcopy version isn't needed.
>
> No functional change intended.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 06/11] migration: Rename save_live_complete_precopy to save_complete
  2025-06-09 16:18 ` [PATCH v2 06/11] migration: Rename save_live_complete_precopy to save_complete Peter Xu
@ 2025-07-01 14:41   ` Fabiano Rosas
  0 siblings, 0 replies; 22+ messages in thread
From: Fabiano Rosas @ 2025-07-01 14:41 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> writes:

> Now after merging the precopy and postcopy version of complete() hook,
> rename the precopy version from save_live_complete_precopy() to
> save_complete().
>
> Dropping the "live" when at it, because it's in most cases not live when
> happening (in precopy).
>
> No functional change intended.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 07/11] migration: qemu_savevm_complete*() helpers
  2025-06-09 16:18 ` [PATCH v2 07/11] migration: qemu_savevm_complete*() helpers Peter Xu
@ 2025-07-01 14:53   ` Fabiano Rosas
  0 siblings, 0 replies; 22+ messages in thread
From: Fabiano Rosas @ 2025-07-01 14:53 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> writes:

> Since we use the same save_complete() hook for both precopy and postcopy,
> add a set of helpers to invoke the hook() to dedup the code.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/11] migration: Rewrite the migration complete detect logic
  2025-06-09 16:18 ` [PATCH v2 10/11] migration: Rewrite the migration complete detect logic Peter Xu
@ 2025-07-01 17:35   ` Fabiano Rosas
  0 siblings, 0 replies; 22+ messages in thread
From: Fabiano Rosas @ 2025-07-01 17:35 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: Juraj Marcin, peterx, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> writes:

> There're a few things off here in that logic, rewrite it.  When at it, add
> rich comment to explain each of the decisions.
>
> Since this is very sensitive path for migration, below are the list of
> things changed with their reasonings.
>
>   (1) Exact pending size is only needed for precopy not postcopy
>
>       Fundamentally it's because "exact" version only does one more deep
>       sync to fetch the pending results, while in postcopy's case it's
>       never going to sync anything more than estimate as the VM on source
>       is stopped.
>
>   (2) Do _not_ rely on threshold_size anymore to decide whether postcopy
>       should complete
>
>       threshold_size was calculated from the expected downtime and
>       bandwidth only during precopy as an efficient way to decide when to
>       switchover.  It's not sensible to rely on threshold_size in postcopy.
>
>       For precopy, if switchover is decided, the migration will complete
>       soon.  It's not true for postcopy.  Logically speaking, postcopy
>       should only complete the migration if all pending data is flushed.
>
>       Here it used to work because save_complete() used to implicitly
>       contain save_live_iterate() when there's pending size.
>
>       Even if that looks benign, having RAMs to be migrated in postcopy's
>       save_complete() has other bad side effects:
>
>       (a) Since save_complete() needs to be run once at a time, it means
>       when moving RAM there's no way moving other things (rather than
>       round-robin iterating the vmstate handlers like what we do with
>       ITERABLE phase).  Not an immediate concern, but it may stop working
>       in the future when there're more than one iterables (e.g. vfio
>       postcopy).
>
>       (b) postcopy recovery, unfortunately, only works during ITERABLE
>       phase. IOW, if src QEMU moves RAM during postcopy's save_complete()
>       and network failed, then it'll crash both QEMUs... OTOH if it failed
>       during iteration it'll still be recoverable.  IOW, this change should
>       further reduce the window QEMU split brain and crash in extreme cases.
>
>       If we enable the ram_save_complete() tracepoints, we'll see this
>       before this patch:
>
>       1267959@1748381938.294066:ram_save_complete dirty=9627, done=0
>       1267959@1748381938.308884:ram_save_complete dirty=0, done=1
>
>       It means in this migration there're 9627 pages migrated at complete()
>       of postcopy phase.
>
>       After this change, all the postcopy RAM should be migrated in iterable
>       phase, rather than save_complete():
>
>       1267959@1748381938.294066:ram_save_complete dirty=0, done=0
>       1267959@1748381938.308884:ram_save_complete dirty=0, done=1
>
>   (3) Adjust when to decide to switch to postcopy
>
>       This shouldn't be super important, the movement makes sure there's
>       only one in_postcopy check, then we are clear on what we do with the
>       two completely differnt use cases (precopy v.s. postcopy).
>
>   (4) Trivial touch up on threshold_size comparision
>
>       Which changes:
>
>       "(!pending_size || pending_size < s->threshold_size)"
>
>       into:
>
>       "(pending_size <= s->threshold_size)"
>
> Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-07-01 17:36 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 16:18 [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Peter Xu
2025-06-09 16:18 ` [PATCH v2 01/11] migration/hmp: Reorg "info migrate" once more Peter Xu
2025-06-09 16:18 ` [PATCH v2 02/11] migration/hmp: Fix postcopy-blocktime per-vCPU results Peter Xu
2025-07-01 14:38   ` Fabiano Rosas
2025-06-09 16:18 ` [PATCH v2 03/11] migration/docs: Move docs for postcopy blocktime feature Peter Xu
2025-06-09 16:18 ` [PATCH v2 04/11] migration/bg-snapshot: Do not check for SKIP in iterator Peter Xu
2025-06-09 16:18 ` [PATCH v2 05/11] migration: Drop save_live_complete_postcopy hook Peter Xu
2025-07-01 14:40   ` Fabiano Rosas
2025-06-09 16:18 ` [PATCH v2 06/11] migration: Rename save_live_complete_precopy to save_complete Peter Xu
2025-07-01 14:41   ` Fabiano Rosas
2025-06-09 16:18 ` [PATCH v2 07/11] migration: qemu_savevm_complete*() helpers Peter Xu
2025-07-01 14:53   ` Fabiano Rosas
2025-06-09 16:18 ` [PATCH v2 08/11] migration/ram: One less indent for ram_find_and_save_block() Peter Xu
2025-06-09 16:18 ` [PATCH v2 09/11] migration/ram: Add tracepoints for ram_save_complete() Peter Xu
2025-06-09 16:18 ` [PATCH v2 10/11] migration: Rewrite the migration complete detect logic Peter Xu
2025-07-01 17:35   ` Fabiano Rosas
2025-06-09 16:18 ` [PATCH v2 11/11] migration/postcopy: Avoid clearing dirty bitmap for postcopy too Peter Xu
2025-06-11  6:15 ` [PATCH v2 00/11] migration: Some enhancements and cleanups for 10.1 Mario Casquero
2025-06-11 13:06   ` Peter Xu
2025-06-12 10:35     ` Mario Casquero
2025-06-12 11:43       ` Peter Xu
2025-06-11 21:35 ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).