[PULL 00/26] Migration patches for 2025-07-11

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PULL 00/26] Migration patches for 2025-07-11
@ 2025-07-11 14:10 Fabiano Rosas
  2025-07-11 14:10 ` [PULL 01/26] migration/hmp: Reorg "info migrate" once more Fabiano Rosas
                   ` (26 more replies)
  0 siblings, 27 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

The following changes since commit df6fe2abf2e990f767ce755d426bc439c7bba336:

  Merge tag 'pull-target-arm-20250704' of https://gitlab.com/pm215/qemu into staging (2025-07-07 09:22:41 -0400)

are available in the Git repository at:

  https://gitlab.com/farosas/qemu.git tags/migration-20250711-pull-request

for you to fetch changes up to beeac2df5ff0850299e58f4ad27f83dae64c54df:

  migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread (2025-07-11 10:37:39 -0300)

----------------------------------------------------------------
Migration pull request

- General cleanups around: postcopy, bg-snapshot, migration hooks,
  migration completion and formatting of 'info migrate'.

- Overhaul of postcopy blocktime tracking.

----------------------------------------------------------------

Juraj Marcin (1):
  migration: Rename save_live_complete_precopy_thread to
    save_complete_precopy_thread

Peter Xu (25):
  migration/hmp: Reorg "info migrate" once more
  migration/hmp: Fix postcopy-blocktime per-vCPU results
  migration/docs: Move docs for postcopy blocktime feature
  migration/bg-snapshot: Do not check for SKIP in iterator
  migration: Drop save_live_complete_postcopy hook
  migration: Rename save_live_complete_precopy to save_complete
  migration: qemu_savevm_complete*() helpers
  migration/ram: One less indent for ram_find_and_save_block()
  migration/ram: Add tracepoints for ram_save_complete()
  migration: Rewrite the migration complete detect logic
  migration/postcopy: Avoid clearing dirty bitmap for postcopy too
  migration: Add option to set postcopy-blocktime
  migration/postcopy: Push blocktime start/end into page req mutex
  migration/postcopy: Drop all atomic ops in blocktime feature
  migration/postcopy: Make all blocktime vars 64bits
  migration/postcopy: Drop PostcopyBlocktimeContext.start_time
  migration/postcopy: Bring blocktime layer to ns level
  migration/postcopy: Add blocktime fault counts per-vcpu
  migration/postcopy: Report fault latencies in blocktime
  migration/postcopy: Initialize blocktime context only until listen
  migration/postcopy: Cache the tid->vcpu mapping for blocktime
  migration/postcopy: Cleanup the total blocktime accounting
  migration/postcopy: Optimize blocktime fault tracking with hashtable
  migration/postcopy: blocktime allows track / report non-vCPU faults
  migration/postcopy: Add latency distribution report for blocktime

 docs/devel/migration/main.rst         |   4 +-
 docs/devel/migration/postcopy.rst     |  36 +-
 docs/devel/migration/vfio.rst         |  16 +-
 hw/ppc/spapr.c                        |   2 +-
 hw/s390x/s390-stattrib.c              |   2 +-
 hw/vfio/migration-multifd.c           |   4 +-
 hw/vfio/migration-multifd.h           |   2 +-
 hw/vfio/migration.c                   |   4 +-
 include/migration/misc.h              |   8 +-
 include/migration/register.h          |  36 +-
 include/qemu/typedefs.h               |   6 +-
 migration/block-dirty-bitmap.c        |   3 +-
 migration/migration-hmp-cmds.c        | 159 ++++++--
 migration/migration.c                 |  87 ++--
 migration/migration.h                 |   2 +-
 migration/multifd-device-state.c      |  10 +-
 migration/options.c                   |   2 +
 migration/postcopy-ram.c              | 563 ++++++++++++++++++++------
 migration/postcopy-ram.h              |   2 +
 migration/ram.c                       |  32 +-
 migration/savevm.c                    |  89 ++--
 migration/trace-events                |   9 +-
 qapi/migration.json                   |  38 ++
 tests/qtest/migration/migration-qmp.c |   5 +
 24 files changed, 799 insertions(+), 322 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PULL 01/26] migration/hmp: Reorg "info migrate" once more
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 02/26] migration/hmp: Fix postcopy-blocktime per-vCPU results Fabiano Rosas
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Dr . David Alan Gilbert, Li Zhijian, Juraj Marcin,
	Mario Casquero

From: Peter Xu <peterx@redhat.com>

Dave suggested the HMP output for "info migrate" can not only leverage the
lines but also better grouping:

  https://lore.kernel.org/r/aC4_-nMc7FwsMf9p@gallifrey

I followed Dave's suggestion, and some more modifications on top:

  - Added all elements into the picture

  - Use size_to_str() and drop most of the units: benefit is more friendly
  to most human eyes, bad side effect is lose of details, but that should
  be corner case per my uses, and one can still leverage the QMP interface
  when necessary.

  - Sub-grouping for "Transfers" ("Channels" and "Page Types").

  - Better indentations

Sample output:

(qemu) info migrate
Status:                 postcopy-active
Time (ms):              total=47317, setup=5, down=8
RAM info:
  Throughput (Mbps):    1342.83
  Sizes:                pagesize=4 KiB, total=4.02 GiB
  Transfers:            transferred=1.41 GiB, remain=2.46 GiB
    Channels:           precopy=15.2 MiB, multifd=0 B, postcopy=1.39 GiB
    Page Types:         normal=367713, zero=41195
  Page Rates (pps):     transfer=40900, dirty=4
  Others:               dirty_syncs=2, postcopy_req=57503

Suggested-by: Dr. David Alan Gilbert <dave@treblig.org>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Link: https://lore.kernel.org/r/20250613140801.474264-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration-hmp-cmds.c | 59 ++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 28 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index e8a563c7d8..367ff6037f 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -69,7 +69,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
     }
 
     if (info->has_status) {
-        monitor_printf(mon, "Status: %s",
+        monitor_printf(mon, "Status: \t\t%s",
                        MigrationStatus_str(info->status));
         if (info->status == MIGRATION_STATUS_FAILED && info->error_desc) {
             monitor_printf(mon, " (%s)\n", info->error_desc);
@@ -78,7 +78,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
         }
 
         if (info->total_time) {
-            monitor_printf(mon, "Time (ms): total=%" PRIu64,
+            monitor_printf(mon, "Time (ms): \t\ttotal=%" PRIu64,
                            info->total_time);
             if (info->has_setup_time) {
                 monitor_printf(mon, ", setup=%" PRIu64,
@@ -110,48 +110,51 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
     }
 
     if (info->ram) {
+        g_autofree char *str_psize = size_to_str(info->ram->page_size);
+        g_autofree char *str_total = size_to_str(info->ram->total);
+        g_autofree char *str_transferred = size_to_str(info->ram->transferred);
+        g_autofree char *str_remaining = size_to_str(info->ram->remaining);
+        g_autofree char *str_precopy = size_to_str(info->ram->precopy_bytes);
+        g_autofree char *str_multifd = size_to_str(info->ram->multifd_bytes);
+        g_autofree char *str_postcopy = size_to_str(info->ram->postcopy_bytes);
+
         monitor_printf(mon, "RAM info:\n");
-        monitor_printf(mon, "  Throughput (Mbps): %0.2f\n",
+        monitor_printf(mon, "  Throughput (Mbps): \t%0.2f\n",
                        info->ram->mbps);
-        monitor_printf(mon, "  Sizes (KiB): pagesize=%" PRIu64
-                       ", total=%" PRIu64 ",\n",
-                       info->ram->page_size >> 10,
-                       info->ram->total >> 10);
-        monitor_printf(mon, "    transferred=%" PRIu64
-                       ", remain=%" PRIu64 ",\n",
-                       info->ram->transferred >> 10,
-                       info->ram->remaining >> 10);
-        monitor_printf(mon, "    precopy=%" PRIu64
-                       ", multifd=%" PRIu64
-                       ", postcopy=%" PRIu64,
-                       info->ram->precopy_bytes >> 10,
-                       info->ram->multifd_bytes >> 10,
-                       info->ram->postcopy_bytes >> 10);
+        monitor_printf(mon, "  Sizes: \t\tpagesize=%s, total=%s\n",
+                       str_psize, str_total);
+        monitor_printf(mon, "  Transfers: \t\ttransferred=%s, remain=%s\n",
+                       str_transferred, str_remaining);
+        monitor_printf(mon, "    Channels: \t\tprecopy=%s, "
+                       "multifd=%s, postcopy=%s",
+                       str_precopy, str_multifd, str_postcopy);
 
         if (info->vfio) {
-            monitor_printf(mon, ", vfio=%" PRIu64,
-                           info->vfio->transferred >> 10);
+            g_autofree char *str_vfio = size_to_str(info->vfio->transferred);
+
+            monitor_printf(mon, ", vfio=%s", str_vfio);
         }
         monitor_printf(mon, "\n");
 
-        monitor_printf(mon, "  Pages: normal=%" PRIu64 ", zero=%" PRIu64
-                       ", rate_per_sec=%" PRIu64 "\n",
-                       info->ram->normal,
-                       info->ram->duplicate,
+        monitor_printf(mon, "    Page Types: \tnormal=%" PRIu64
+                       ", zero=%" PRIu64 "\n",
+                       info->ram->normal, info->ram->duplicate);
+        monitor_printf(mon, "  Page Rates (pps): \ttransfer=%" PRIu64,
                        info->ram->pages_per_second);
-        monitor_printf(mon, "  Others: dirty_syncs=%" PRIu64,
-                       info->ram->dirty_sync_count);
-
         if (info->ram->dirty_pages_rate) {
-            monitor_printf(mon, ", dirty_pages_rate=%" PRIu64,
+            monitor_printf(mon, ", dirty=%" PRIu64,
                            info->ram->dirty_pages_rate);
         }
+        monitor_printf(mon, "\n");
+
+        monitor_printf(mon, "  Others: \t\tdirty_syncs=%" PRIu64,
+                       info->ram->dirty_sync_count);
         if (info->ram->postcopy_requests) {
             monitor_printf(mon, ", postcopy_req=%" PRIu64,
                            info->ram->postcopy_requests);
         }
         if (info->ram->downtime_bytes) {
-            monitor_printf(mon, ", downtime_ram=%" PRIu64,
+            monitor_printf(mon, ", downtime_bytes=%" PRIu64,
                            info->ram->downtime_bytes);
         }
         if (info->ram->dirty_sync_missed_zero_copy) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 02/26] migration/hmp: Fix postcopy-blocktime per-vCPU results
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
  2025-07-11 14:10 ` [PULL 01/26] migration/hmp: Reorg "info migrate" once more Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 03/26] migration/docs: Move docs for postcopy blocktime feature Fabiano Rosas
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Dr . David Alan Gilbert, Alexey Perevalov,
	Markus Armbruster, Mario Casquero, Juraj Marcin

From: Peter Xu <peterx@redhat.com>

Unfortunately, it was never correctly shown..

This is only found when I started to look into making the blocktime feature
more useful (so as to avoid using bpftrace, even though I'm not sure which
one will be harder to use..).

So the old dump would look like this:

  Postcopy vCPU Blocktime: 0-1,4,10,21,33,46,48,59

Even though there're actually 40 vcpus, and the string will merge same
elements and also sort them.

To fix it, simply loop over the uint32List manually.  Now it looks like:

  Postcopy vCPU Blocktime (ms):
   [15, 0, 0, 43, 29, 34, 36, 29, 37, 41,
    33, 37, 45, 52, 50, 38, 40, 37, 40, 49,
    40, 35, 35, 35, 81, 19, 18, 19, 18, 30,
    22, 3, 0, 0, 0, 0, 0, 0, 0, 0]

Cc: Dr. David Alan Gilbert <dave@treblig.org>
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Cc: Markus Armbruster <armbru@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613140801.474264-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration-hmp-cmds.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 367ff6037f..867e017b32 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -208,15 +208,19 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
     }
 
     if (info->has_postcopy_vcpu_blocktime) {
-        Visitor *v;
-        char *str;
-        v = string_output_visitor_new(false, &str);
-        visit_type_uint32List(v, NULL, &info->postcopy_vcpu_blocktime,
-                              &error_abort);
-        visit_complete(v, &str);
-        monitor_printf(mon, "Postcopy vCPU Blocktime: %s\n", str);
-        g_free(str);
-        visit_free(v);
+        uint32List *item = info->postcopy_vcpu_blocktime;
+        const char *sep = "";
+        int count = 0;
+
+        monitor_printf(mon, "Postcopy vCPU Blocktime (ms):\n [");
+
+        while (item) {
+            monitor_printf(mon, "%s%"PRIu32, sep, item->value);
+            item = item->next;
+            /* Each line 10 vcpu results, newline if there's more */
+            sep = ((++count % 10 == 0) && item) ? ",\n  " : ", ";
+        }
+        monitor_printf(mon, "]\n");
     }
 
 out:
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 03/26] migration/docs: Move docs for postcopy blocktime feature
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
  2025-07-11 14:10 ` [PULL 01/26] migration/hmp: Reorg "info migrate" once more Fabiano Rosas
  2025-07-11 14:10 ` [PULL 02/26] migration/hmp: Fix postcopy-blocktime per-vCPU results Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 04/26] migration/bg-snapshot: Do not check for SKIP in iterator Fabiano Rosas
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Juraj Marcin

From: Peter Xu <peterx@redhat.com>

Move it out of vanilla postcopy session, but instead a standalone feature.
When at it, removing the NOTE because it's incorrect now after introduction
of max-postcopy-bandwidth, which can control the throughput even for
postcopy phase.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250613140801.474264-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 docs/devel/migration/postcopy.rst | 36 +++++++++++++++----------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/docs/devel/migration/postcopy.rst b/docs/devel/migration/postcopy.rst
index 82e7a848c6..e319388d8f 100644
--- a/docs/devel/migration/postcopy.rst
+++ b/docs/devel/migration/postcopy.rst
@@ -33,25 +33,6 @@ will now cause the transition from precopy to postcopy.
 It can be issued immediately after migration is started or any
 time later on.  Issuing it after the end of a migration is harmless.
 
-Blocktime is a postcopy live migration metric, intended to show how
-long the vCPU was in state of interruptible sleep due to pagefault.
-That metric is calculated both for all vCPUs as overlapped value, and
-separately for each vCPU. These values are calculated on destination
-side.  To enable postcopy blocktime calculation, enter following
-command on destination monitor:
-
-``migrate_set_capability postcopy-blocktime on``
-
-Postcopy blocktime can be retrieved by query-migrate qmp command.
-postcopy-blocktime value of qmp command will show overlapped blocking
-time for all vCPU, postcopy-vcpu-blocktime will show list of blocking
-time per vCPU.
-
-.. note::
-  During the postcopy phase, the bandwidth limits set using
-  ``migrate_set_parameter`` is ignored (to avoid delaying requested pages that
-  the destination is waiting for).
-
 Postcopy internals
 ==================
 
@@ -312,3 +293,20 @@ explicitly) to be sent in a separate preempt channel, rather than queued in
 the background migration channel.  Anyone who cares about latencies of page
 faults during a postcopy migration should enable this feature.  By default,
 it's not enabled.
+
+Postcopy blocktime statistics
+-----------------------------
+
+Blocktime is a postcopy live migration metric, intended to show how
+long the vCPU was in state of interruptible sleep due to pagefault.
+That metric is calculated both for all vCPUs as overlapped value, and
+separately for each vCPU. These values are calculated on destination
+side.  To enable postcopy blocktime calculation, enter following
+command on destination monitor:
+
+``migrate_set_capability postcopy-blocktime on``
+
+Postcopy blocktime can be retrieved by query-migrate qmp command.
+postcopy-blocktime value of qmp command will show overlapped blocking
+time for all vCPU, postcopy-vcpu-blocktime will show list of blocking
+time per vCPU.
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 04/26] migration/bg-snapshot: Do not check for SKIP in iterator
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (2 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 03/26] migration/docs: Move docs for postcopy blocktime feature Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 05/26] migration: Drop save_live_complete_postcopy hook Fabiano Rosas
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Juraj Marcin

From: Peter Xu <peterx@redhat.com>

It's not possible to happen in bg-snapshot case.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250613140801.474264-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4098870bce..e33e39ac74 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3887,9 +3887,8 @@ static void *bg_migration_thread(void *opaque)
 
     while (migration_is_active()) {
         MigIterateState iter_state = bg_migration_iteration_run(s);
-        if (iter_state == MIG_ITERATE_SKIP) {
-            continue;
-        } else if (iter_state == MIG_ITERATE_BREAK) {
+
+        if (iter_state == MIG_ITERATE_BREAK) {
             break;
         }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 05/26] migration: Drop save_live_complete_postcopy hook
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (3 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 04/26] migration/bg-snapshot: Do not check for SKIP in iterator Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 06/26] migration: Rename save_live_complete_precopy to save_complete Fabiano Rosas
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Juraj Marcin

From: Peter Xu <peterx@redhat.com>

The hook is only defined in two vmstate users ("ram" and "block dirty
bitmap"), meanwhile both of them define the hook exactly the same as the
precopy version.  Hence, this postcopy version isn't needed.

No functional change intended.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613140801.474264-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 include/migration/register.h   | 24 ++++++++----------------
 migration/block-dirty-bitmap.c |  1 -
 migration/ram.c                |  1 -
 migration/savevm.c             |  9 ++++-----
 4 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index b79dc81b8d..e022195785 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -77,26 +77,18 @@ typedef struct SaveVMHandlers {
      */
     void (*save_cleanup)(void *opaque);
 
-    /**
-     * @save_live_complete_postcopy
-     *
-     * Called at the end of postcopy for all postcopyable devices.
-     *
-     * @f: QEMUFile where to send the data
-     * @opaque: data pointer passed to register_savevm_live()
-     *
-     * Returns zero to indicate success and negative for error
-     */
-    int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
-
     /**
      * @save_live_complete_precopy
      *
      * Transmits the last section for the device containing any
-     * remaining data at the end of a precopy phase. When postcopy is
-     * enabled, devices that support postcopy will skip this step,
-     * where the final data will be flushed at the end of postcopy via
-     * @save_live_complete_postcopy instead.
+     * remaining data at the end phase of migration.
+     *
+     * For precopy, this will be invoked _during_ the switchover phase
+     * after source VM is stopped.
+     *
+     * For postcopy, this will be invoked _after_ the switchover phase
+     * (except some very unusual cases, like PMEM ramblocks), while
+     * destination VM can be running.
      *
      * @f: QEMUFile where to send the data
      * @opaque: data pointer passed to register_savevm_live()
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index f2c352d4a7..6ee3c32a76 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1248,7 +1248,6 @@ static bool dirty_bitmap_has_postcopy(void *opaque)
 
 static SaveVMHandlers savevm_dirty_bitmap_handlers = {
     .save_setup = dirty_bitmap_save_setup,
-    .save_live_complete_postcopy = dirty_bitmap_save_complete,
     .save_live_complete_precopy = dirty_bitmap_save_complete,
     .has_postcopy = dirty_bitmap_has_postcopy,
     .state_pending_exact = dirty_bitmap_state_pending,
diff --git a/migration/ram.c b/migration/ram.c
index 2140785a05..6bb4f13031 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4548,7 +4548,6 @@ void postcopy_preempt_shutdown_file(MigrationState *s)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-    .save_live_complete_postcopy = ram_save_complete,
     .save_live_complete_precopy = ram_save_complete,
     .has_postcopy = ram_has_postcopy,
     .state_pending_exact = ram_state_pending_exact,
diff --git a/migration/savevm.c b/migration/savevm.c
index bb04a4520d..3e20f8608a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1485,9 +1485,8 @@ bool should_send_vmdesc(void)
 }
 
 /*
- * Calls the save_live_complete_postcopy methods
- * causing the last few pages to be sent immediately and doing any associated
- * cleanup.
+ * Complete saving any postcopy-able devices.
+ *
  * Note postcopy also calls qemu_savevm_state_complete_precopy to complete
  * all the other devices, but that happens at the point we switch to postcopy.
  */
@@ -1497,7 +1496,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     int ret;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete_postcopy) {
+        if (!se->ops || !se->ops->save_live_complete_precopy) {
             continue;
         }
         if (se->ops->is_active) {
@@ -1510,7 +1509,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
         qemu_put_byte(f, QEMU_VM_SECTION_END);
         qemu_put_be32(f, se->section_id);
 
-        ret = se->ops->save_live_complete_postcopy(f, se->opaque);
+        ret = se->ops->save_live_complete_precopy(f, se->opaque);
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         save_section_footer(f, se);
         if (ret < 0) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 06/26] migration: Rename save_live_complete_precopy to save_complete
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (4 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 05/26] migration: Drop save_live_complete_postcopy hook Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 07/26] migration: qemu_savevm_complete*() helpers Fabiano Rosas
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Juraj Marcin

From: Peter Xu <peterx@redhat.com>

Now after merging the precopy and postcopy version of complete() hook,
rename the precopy version from save_live_complete_precopy() to
save_complete().

Dropping the "live" when at it, because it's in most cases not live when
happening (in precopy).

No functional change intended.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613140801.474264-7-peterx@redhat.com
[peterx: squash the fixup that covers a few more doc spots, per Juraj]
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 docs/devel/migration/main.rst  |  4 ++--
 docs/devel/migration/vfio.rst  | 12 ++++++------
 hw/ppc/spapr.c                 |  2 +-
 hw/s390x/s390-stattrib.c       |  2 +-
 hw/vfio/migration.c            |  2 +-
 include/migration/register.h   |  6 +++---
 migration/block-dirty-bitmap.c |  2 +-
 migration/ram.c                |  2 +-
 migration/savevm.c             |  8 ++++----
 9 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index cdd4f4a6d7..6493c1d2bc 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -508,8 +508,8 @@ An iterative device must provide:
     the point that stream bandwidth limits tell it to stop.  Each call
     generates one section.
 
-  - A ``save_live_complete_precopy`` function that must transmit the
-    last section for the device containing any remaining data.
+  - A ``save_complete`` function that must transmit the last section for
+    the device containing any remaining data.
 
   - A ``load_state`` function used to load sections generated by
     any of the save functions that generate sections.
diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst
index 673e354754..8ff5ab0c74 100644
--- a/docs/devel/migration/vfio.rst
+++ b/docs/devel/migration/vfio.rst
@@ -75,10 +75,10 @@ VFIO implements the device hooks for the iterative approach as follows:
   in the non-multifd mode.
   In the multifd mode it just emits either a dummy EOS marker.
 
-* A ``save_live_complete_precopy`` function that sets the VFIO device in
-  _STOP_COPY state and iteratively copies the data for the VFIO device until
-  the vendor driver indicates that no data remains.
-  In the multifd mode it just emits a dummy EOS marker.
+* A ``save_complete`` function that sets the VFIO device in _STOP_COPY
+  state and iteratively copies the data for the VFIO device until the
+  vendor driver indicates that no data remains.  In the multifd mode it
+  just emits a dummy EOS marker.
 
 * A ``save_live_complete_precopy_thread`` function that in the multifd mode
   provides thread handler performing multifd device state transfer.
@@ -195,9 +195,9 @@ Live migration save path
                                       |
                 Then the VFIO device is put in _STOP_COPY state
                      (FINISH_MIGRATE, _ACTIVE, _STOP_COPY)
-         .save_live_complete_precopy() is called for each active device
+               .save_complete() is called for each active device
               For the VFIO device: in the non-multifd mode iterate in
-                        .save_live_complete_precopy() until
+                             .save_complete() until
                                pending data is 0
 	          In the multifd mode this iteration is done in
 	          .save_live_complete_precopy_thread() instead.
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 08615f6c90..40f53ad7b3 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2518,7 +2518,7 @@ static void htab_save_cleanup(void *opaque)
 static SaveVMHandlers savevm_htab_handlers = {
     .save_setup = htab_save_setup,
     .save_live_iterate = htab_save_iterate,
-    .save_live_complete_precopy = htab_save_complete,
+    .save_complete = htab_save_complete,
     .save_cleanup = htab_save_cleanup,
     .load_state = htab_load,
 };
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index f74cf32636..13a678a803 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -338,7 +338,7 @@ static const TypeInfo qemu_s390_stattrib_info = {
 static SaveVMHandlers savevm_s390_stattrib_handlers = {
     .save_setup = cmma_save_setup,
     .save_live_iterate = cmma_save_iterate,
-    .save_live_complete_precopy = cmma_save_complete,
+    .save_complete = cmma_save_complete,
     .state_pending_exact = cmma_state_pending,
     .state_pending_estimate = cmma_state_pending,
     .save_cleanup = cmma_save_cleanup,
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index b76697bd1a..33a71f8999 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -824,7 +824,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
     .state_pending_exact = vfio_state_pending_exact,
     .is_active_iterate = vfio_is_active_iterate,
     .save_live_iterate = vfio_save_iterate,
-    .save_live_complete_precopy = vfio_save_complete_precopy,
+    .save_complete = vfio_save_complete_precopy,
     .save_state = vfio_save_state,
     .load_setup = vfio_load_setup,
     .load_cleanup = vfio_load_cleanup,
diff --git a/include/migration/register.h b/include/migration/register.h
index e022195785..2a26e76a68 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -78,7 +78,7 @@ typedef struct SaveVMHandlers {
     void (*save_cleanup)(void *opaque);
 
     /**
-     * @save_live_complete_precopy
+     * @save_complete
      *
      * Transmits the last section for the device containing any
      * remaining data at the end phase of migration.
@@ -95,7 +95,7 @@ typedef struct SaveVMHandlers {
      *
      * Returns zero to indicate success and negative for error
      */
-    int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
+    int (*save_complete)(QEMUFile *f, void *opaque);
 
     /**
      * @save_live_complete_precopy_thread (invoked in a separate thread)
@@ -103,7 +103,7 @@ typedef struct SaveVMHandlers {
      * Called at the end of a precopy phase from a separate worker thread
      * in configurations where multifd device state transfer is supported
      * in order to perform asynchronous transmission of the remaining data in
-     * parallel with @save_live_complete_precopy handlers.
+     * parallel with @save_complete handlers.
      * When postcopy is enabled, devices that support postcopy will skip this
      * step.
      *
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 6ee3c32a76..a061aad817 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1248,7 +1248,7 @@ static bool dirty_bitmap_has_postcopy(void *opaque)
 
 static SaveVMHandlers savevm_dirty_bitmap_handlers = {
     .save_setup = dirty_bitmap_save_setup,
-    .save_live_complete_precopy = dirty_bitmap_save_complete,
+    .save_complete = dirty_bitmap_save_complete,
     .has_postcopy = dirty_bitmap_has_postcopy,
     .state_pending_exact = dirty_bitmap_state_pending,
     .state_pending_estimate = dirty_bitmap_state_pending,
diff --git a/migration/ram.c b/migration/ram.c
index 6bb4f13031..ffd839f57c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4548,7 +4548,7 @@ void postcopy_preempt_shutdown_file(MigrationState *s)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-    .save_live_complete_precopy = ram_save_complete,
+    .save_complete = ram_save_complete,
     .has_postcopy = ram_has_postcopy,
     .state_pending_exact = ram_state_pending_exact,
     .state_pending_estimate = ram_state_pending_estimate,
diff --git a/migration/savevm.c b/migration/savevm.c
index 3e20f8608a..454e914b56 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1496,7 +1496,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     int ret;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete_precopy) {
+        if (!se->ops || !se->ops->save_complete) {
             continue;
         }
         if (se->ops->is_active) {
@@ -1509,7 +1509,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
         qemu_put_byte(f, QEMU_VM_SECTION_END);
         qemu_put_be32(f, se->section_id);
 
-        ret = se->ops->save_live_complete_precopy(f, se->opaque);
+        ret = se->ops->save_complete(f, se->opaque);
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         save_section_footer(f, se);
         if (ret < 0) {
@@ -1583,7 +1583,7 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
         if (!se->ops ||
             (in_postcopy && se->ops->has_postcopy &&
              se->ops->has_postcopy(se->opaque)) ||
-            !se->ops->save_live_complete_precopy) {
+            !se->ops->save_complete) {
             continue;
         }
 
@@ -1598,7 +1598,7 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 
         save_section_header(f, se, QEMU_VM_SECTION_END);
 
-        ret = se->ops->save_live_complete_precopy(f, se->opaque);
+        ret = se->ops->save_complete(f, se->opaque);
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         save_section_footer(f, se);
         if (ret < 0) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 07/26] migration: qemu_savevm_complete*() helpers
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (5 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 06/26] migration: Rename save_live_complete_precopy to save_complete Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 08/26] migration/ram: One less indent for ram_find_and_save_block() Fabiano Rosas
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Juraj Marcin

From: Peter Xu <peterx@redhat.com>

Since we use the same save_complete() hook for both precopy and postcopy,
add a set of helpers to invoke the hook() to dedup the code.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613140801.474264-8-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/savevm.c | 78 ++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 454e914b56..c4fd5f5a5b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1484,6 +1484,38 @@ bool should_send_vmdesc(void)
     return !machine->suppress_vmdesc;
 }
 
+static bool qemu_savevm_complete_exists(SaveStateEntry *se)
+{
+    return se->ops && se->ops->save_complete;
+}
+
+/*
+ * Invoke the ->save_complete() if necessary.
+ * Returns: 0 if skip the current SE or succeeded, <0 if error happened.
+ */
+static int qemu_savevm_complete(SaveStateEntry *se, QEMUFile *f)
+{
+    int ret;
+
+    if (se->ops->is_active) {
+        if (!se->ops->is_active(se->opaque)) {
+            return 0;
+        }
+    }
+
+    trace_savevm_section_start(se->idstr, se->section_id);
+    save_section_header(f, se, QEMU_VM_SECTION_END);
+    ret = se->ops->save_complete(f, se->opaque);
+    trace_savevm_section_end(se->idstr, se->section_id, ret);
+    save_section_footer(f, se);
+
+    if (ret < 0) {
+        qemu_file_set_error(f, ret);
+    }
+
+    return ret;
+}
+
 /*
  * Complete saving any postcopy-able devices.
  *
@@ -1493,27 +1525,13 @@ bool should_send_vmdesc(void)
 void qemu_savevm_state_complete_postcopy(QEMUFile *f)
 {
     SaveStateEntry *se;
-    int ret;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_complete) {
+        if (!qemu_savevm_complete_exists(se)) {
             continue;
         }
-        if (se->ops->is_active) {
-            if (!se->ops->is_active(se->opaque)) {
-                continue;
-            }
-        }
-        trace_savevm_section_start(se->idstr, se->section_id);
-        /* Section type */
-        qemu_put_byte(f, QEMU_VM_SECTION_END);
-        qemu_put_be32(f, se->section_id);
 
-        ret = se->ops->save_complete(f, se->opaque);
-        trace_savevm_section_end(se->idstr, se->section_id, ret);
-        save_section_footer(f, se);
-        if (ret < 0) {
-            qemu_file_set_error(f, ret);
+        if (qemu_savevm_complete(se, f) < 0) {
             return;
         }
     }
@@ -1559,7 +1577,6 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 {
     int64_t start_ts_each, end_ts_each;
     SaveStateEntry *se;
-    int ret;
     bool multifd_device_state = multifd_device_state_supported();
 
     if (multifd_device_state) {
@@ -1580,32 +1597,25 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
     }
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops ||
-            (in_postcopy && se->ops->has_postcopy &&
-             se->ops->has_postcopy(se->opaque)) ||
-            !se->ops->save_complete) {
+        if (!qemu_savevm_complete_exists(se)) {
             continue;
         }
 
-        if (se->ops->is_active) {
-            if (!se->ops->is_active(se->opaque)) {
-                continue;
-            }
+        if (in_postcopy && se->ops->has_postcopy &&
+            se->ops->has_postcopy(se->opaque)) {
+            /*
+             * If postcopy will start soon, and if the SE supports
+             * postcopy, then we can skip the SE for the postcopy phase.
+             */
+            continue;
         }
 
         start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
-        trace_savevm_section_start(se->idstr, se->section_id);
-
-        save_section_header(f, se, QEMU_VM_SECTION_END);
-
-        ret = se->ops->save_complete(f, se->opaque);
-        trace_savevm_section_end(se->idstr, se->section_id, ret);
-        save_section_footer(f, se);
-        if (ret < 0) {
-            qemu_file_set_error(f, ret);
+        if (qemu_savevm_complete(se, f) < 0) {
             goto ret_fail_abort_threads;
         }
         end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+
         trace_vmstate_downtime_save("iterable", se->idstr, se->instance_id,
                                     end_ts_each - start_ts_each);
     }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 08/26] migration/ram: One less indent for ram_find_and_save_block()
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (6 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 07/26] migration: qemu_savevm_complete*() helpers Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 09/26] migration/ram: Add tracepoints for ram_save_complete() Fabiano Rosas
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Juraj Marcin

From: Peter Xu <peterx@redhat.com>

The check over PAGE_DIRTY_FOUND isn't necessary. We could indent one less
and assert that instead.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250613140801.474264-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/ram.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ffd839f57c..aa76f0a53f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2286,16 +2286,18 @@ static int ram_find_and_save_block(RAMState *rs)
         if (!get_queued_page(rs, pss)) {
             /* priority queue empty, so just search for something dirty */
             int res = find_dirty_block(rs, pss);
-            if (res != PAGE_DIRTY_FOUND) {
-                if (res == PAGE_ALL_CLEAN) {
-                    break;
-                } else if (res == PAGE_TRY_AGAIN) {
-                    continue;
-                } else if (res < 0) {
-                    pages = res;
-                    break;
-                }
+
+            if (res == PAGE_ALL_CLEAN) {
+                break;
+            } else if (res == PAGE_TRY_AGAIN) {
+                continue;
+            } else if (res < 0) {
+                pages = res;
+                break;
             }
+
+            /* Otherwise we must have a dirty page to move */
+            assert(res == PAGE_DIRTY_FOUND);
         }
         pages = ram_save_host_page(rs, pss);
         if (pages) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 09/26] migration/ram: Add tracepoints for ram_save_complete()
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (7 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 08/26] migration/ram: One less indent for ram_find_and_save_block() Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 10/26] migration: Rewrite the migration complete detect logic Fabiano Rosas
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Juraj Marcin

From: Peter Xu <peterx@redhat.com>

Take notes on start/end state of dirty pages for the whole system.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250613140801.474264-10-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/ram.c        | 5 +++++
 migration/trace-events | 1 +
 2 files changed, 6 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index aa76f0a53f..e4871b4c17 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3290,6 +3290,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     RAMState *rs = *temp;
     int ret = 0;
 
+    trace_ram_save_complete(rs->migration_dirty_pages, 0);
+
     rs->last_stage = !migration_in_colo_state();
 
     WITH_RCU_READ_LOCK_GUARD() {
@@ -3353,6 +3355,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     }
 
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+
+    trace_ram_save_complete(rs->migration_dirty_pages, 1);
+
     return qemu_fflush(f);
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index c506e11a2e..dcd8fe9a0c 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -105,6 +105,7 @@ ram_load_postcopy_loop(int channel, uint64_t addr, int flags) "chan=%d addr=0x%"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
+ram_save_complete(uint64_t dirty_pages, int done) "dirty=%" PRIu64 ", done=%d"
 ram_dirty_bitmap_request(char *str) "%s"
 ram_dirty_bitmap_reload_begin(char *str) "%s"
 ram_dirty_bitmap_reload_complete(char *str) "%s"
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 10/26] migration: Rewrite the migration complete detect logic
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (8 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 09/26] migration/ram: Add tracepoints for ram_save_complete() Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 11/26] migration/postcopy: Avoid clearing dirty bitmap for postcopy too Fabiano Rosas
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Juraj Marcin

From: Peter Xu <peterx@redhat.com>

There're a few things off here in that logic, rewrite it.  When at it, add
rich comment to explain each of the decisions.

Since this is very sensitive path for migration, below are the list of
things changed with their reasonings.

  (1) Exact pending size is only needed for precopy not postcopy

      Fundamentally it's because "exact" version only does one more deep
      sync to fetch the pending results, while in postcopy's case it's
      never going to sync anything more than estimate as the VM on source
      is stopped.

  (2) Do _not_ rely on threshold_size anymore to decide whether postcopy
      should complete

      threshold_size was calculated from the expected downtime and
      bandwidth only during precopy as an efficient way to decide when to
      switchover.  It's not sensible to rely on threshold_size in postcopy.

      For precopy, if switchover is decided, the migration will complete
      soon.  It's not true for postcopy.  Logically speaking, postcopy
      should only complete the migration if all pending data is flushed.

      Here it used to work because save_complete() used to implicitly
      contain save_live_iterate() when there's pending size.

      Even if that looks benign, having RAMs to be migrated in postcopy's
      save_complete() has other bad side effects:

      (a) Since save_complete() needs to be run once at a time, it means
      when moving RAM there's no way moving other things (rather than
      round-robin iterating the vmstate handlers like what we do with
      ITERABLE phase).  Not an immediate concern, but it may stop working
      in the future when there're more than one iterables (e.g. vfio
      postcopy).

      (b) postcopy recovery, unfortunately, only works during ITERABLE
      phase. IOW, if src QEMU moves RAM during postcopy's save_complete()
      and network failed, then it'll crash both QEMUs... OTOH if it failed
      during iteration it'll still be recoverable.  IOW, this change should
      further reduce the window QEMU split brain and crash in extreme cases.

      If we enable the ram_save_complete() tracepoints, we'll see this
      before this patch:

      1267959@1748381938.294066:ram_save_complete dirty=9627, done=0
      1267959@1748381938.308884:ram_save_complete dirty=0, done=1

      It means in this migration there're 9627 pages migrated at complete()
      of postcopy phase.

      After this change, all the postcopy RAM should be migrated in iterable
      phase, rather than save_complete():

      1267959@1748381938.294066:ram_save_complete dirty=0, done=0
      1267959@1748381938.308884:ram_save_complete dirty=0, done=1

  (3) Adjust when to decide to switch to postcopy

      This shouldn't be super important, the movement makes sure there's
      only one in_postcopy check, then we are clear on what we do with the
      two completely differnt use cases (precopy v.s. postcopy).

  (4) Trivial touch up on threshold_size comparision

      Which changes:

      "(!pending_size || pending_size < s->threshold_size)"

      into:

      "(pending_size <= s->threshold_size)"

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613140801.474264-11-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 57 +++++++++++++++++++++++++++++++------------
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e33e39ac74..923400f801 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3436,33 +3436,60 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     Error *local_err = NULL;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
     bool can_switchover = migration_can_switchover(s);
+    bool complete_ready;
 
+    /* Fast path - get the estimated amount of pending data */
     qemu_savevm_state_pending_estimate(&must_precopy, &can_postcopy);
     pending_size = must_precopy + can_postcopy;
     trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy);
 
-    if (pending_size < s->threshold_size) {
-        qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
-        pending_size = must_precopy + can_postcopy;
-        trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy);
+    if (in_postcopy) {
+        /*
+         * Iterate in postcopy until all pending data flushed.  Note that
+         * postcopy completion doesn't rely on can_switchover, because when
+         * POSTCOPY_ACTIVE it means switchover already happened.
+         */
+        complete_ready = !pending_size;
+    } else {
+        /*
+         * Exact pending reporting is only needed for precopy.  Taking RAM
+         * as example, there'll be no extra dirty information after
+         * postcopy started, so ESTIMATE should always match with EXACT
+         * during postcopy phase.
+         */
+        if (pending_size < s->threshold_size) {
+            qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
+            pending_size = must_precopy + can_postcopy;
+            trace_migrate_pending_exact(pending_size, must_precopy,
+                                        can_postcopy);
+        }
+
+        /* Should we switch to postcopy now? */
+        if (must_precopy <= s->threshold_size &&
+            can_switchover && qatomic_read(&s->start_postcopy)) {
+            if (postcopy_start(s, &local_err)) {
+                migrate_set_error(s, local_err);
+                error_report_err(local_err);
+            }
+            return MIG_ITERATE_SKIP;
+        }
+
+        /*
+         * For precopy, migration can complete only if:
+         *
+         * (1) Switchover is acknowledged by destination
+         * (2) Pending size is no more than the threshold specified
+         *     (which was calculated from expected downtime)
+         */
+        complete_ready = can_switchover && (pending_size <= s->threshold_size);
     }
 
-    if ((!pending_size || pending_size < s->threshold_size) && can_switchover) {
+    if (complete_ready) {
         trace_migration_thread_low_pending(pending_size);
         migration_completion(s);
         return MIG_ITERATE_BREAK;
     }
 
-    /* Still a significant amount to transfer */
-    if (!in_postcopy && must_precopy <= s->threshold_size && can_switchover &&
-        qatomic_read(&s->start_postcopy)) {
-        if (postcopy_start(s, &local_err)) {
-            migrate_set_error(s, local_err);
-            error_report_err(local_err);
-        }
-        return MIG_ITERATE_SKIP;
-    }
-
     /* Just another iteration step */
     qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
     return MIG_ITERATE_RESUME;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 11/26] migration/postcopy: Avoid clearing dirty bitmap for postcopy too
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (9 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 10/26] migration: Rewrite the migration complete detect logic Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 12/26] migration: Add option to set postcopy-blocktime Fabiano Rosas
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Yanfei Xu

From: Peter Xu <peterx@redhat.com>

This is a follow up on the other commit "migration/ram: avoid to do log
clear in the last round" but for postcopy.

https://lore.kernel.org/r/20250514115827.3216082-1-yanfei.xu@bytedance.com

I can observe more than 10% reduction of average page fault latency during
postcopy phase with this optimization:

  Before: 268.00us (+-1.87%)
  After:  232.67us (+-2.01%)

The test was done with a 16GB VM with 80 vCPUs, running a workload that
busy random writes to 13GB memory.

Cc: Yanfei Xu <yanfei.xu@bytedance.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613140801.474264-12-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/ram.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index e4871b4c17..7208bc114f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -835,8 +835,10 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
      * protections isn't needed as we know there will be either (1) no
      * further writes if migration will complete, or (2) migration fails
      * at last then tracking isn't needed either.
+     *
+     * Do the same for postcopy due to the same reason.
      */
-    if (!rs->last_stage) {
+    if (!rs->last_stage && !migration_in_postcopy()) {
         /*
          * Clear dirty bitmap if needed.  This _must_ be called before we
          * send any of the page in the chunk because we need to make sure
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 12/26] migration: Add option to set postcopy-blocktime
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (10 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 11/26] migration/postcopy: Avoid clearing dirty bitmap for postcopy too Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 13/26] migration/postcopy: Push blocktime start/end into page req mutex Fabiano Rosas
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

Add a global property to allow enabling postcopy-blocktime feature.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/options.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index 162c72cda4..4e923a2e07 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -187,6 +187,8 @@ const Property migration_properties[] = {
     DEFINE_PROP_MIG_CAP("x-postcopy-ram", MIGRATION_CAPABILITY_POSTCOPY_RAM),
     DEFINE_PROP_MIG_CAP("x-postcopy-preempt",
                         MIGRATION_CAPABILITY_POSTCOPY_PREEMPT),
+    DEFINE_PROP_MIG_CAP("postcopy-blocktime",
+                        MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME),
     DEFINE_PROP_MIG_CAP("x-colo", MIGRATION_CAPABILITY_X_COLO),
     DEFINE_PROP_MIG_CAP("x-release-ram", MIGRATION_CAPABILITY_RELEASE_RAM),
     DEFINE_PROP_MIG_CAP("x-return-path", MIGRATION_CAPABILITY_RETURN_PATH),
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 13/26] migration/postcopy: Push blocktime start/end into page req mutex
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (11 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 12/26] migration: Add option to set postcopy-blocktime Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 14/26] migration/postcopy: Drop all atomic ops in blocktime feature Fabiano Rosas
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

The postcopy blocktime feature was tricky that it used quite some atomic
operations over quite a few arrays and vars, without explaining how that
would be thread safe.  The thread safety here is about concurrency between
the fault thread and the fault resolution threads, possible to access the
same chunk of data.  All these atomic ops can be expensive too before
knowing clearly how it works.

OTOH, postcopy has one page_request_mutex used to serialize the received
bitmap updates.  So far it's ok - we don't yet have a lot of threads
contending the lock.  It might change after multifd will be supported, but
that's a separate story.  What is important is, with that mutex, it's
pretty lightweight to move all the blocktime maintenance into the mutex
critical section.  It's because the blocktime layer is lightweighted:
almost "remember which vcpu faulted on which address", and "ok we get some
fault resolved, calculate how long it takes".  It's also an optional
feature for now (but I have thought of changing that, maybe in the future).

Let's push the blocktime layer into the mutex, so that it's always
thread-safe even without any atomic ops.

To achieve that, I'll need to add a tid parameter on fault path so that
it'll start to pass the faulted thread ID into deeper the stack, but not
too deep.  When at it, add a comment for the shared fault handler (for
example, vhost-user devices running with postcopy), to mention a TODO.  One
reason it might not be trivial is that vhost-user's userfaultfds should be
opened by vhost-user process, so it's pretty hard to control making sure
the TID feature will be around.  It wasn't supported before, so keep it
like that for now.

Now we should be as ease when everything is protected by a mutex that we
always take anyway.

One side effect: we can finally remove one ramblock_recv_bitmap_test() in
mark_postcopy_blocktime_begin(), which was pretty weird and which also
includes a weird (but maybe necessary.. but maybe not?) operation to inject
a blocktime entry then quickly erase it..  When we're with the mutex, and
when we make sure it's invoked after checking the receive bitmap, it's not
needed anymore.  Instead, we assert.

As another side effect, this paves way for removing all atomic ops in all
the mem accesses in blocktime layer.

Note that we need a stub for mark_postcopy_blocktime_begin() for Windows
builds.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c    | 25 +++++++++++-------
 migration/migration.h    |  2 +-
 migration/postcopy-ram.c | 56 +++++++++++++++++++++-------------------
 migration/postcopy-ram.h |  2 ++
 migration/trace-events   |  2 +-
 5 files changed, 48 insertions(+), 39 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 923400f801..10c216d25d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -576,22 +576,27 @@ int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
 }
 
 int migrate_send_rp_req_pages(MigrationIncomingState *mis,
-                              RAMBlock *rb, ram_addr_t start, uint64_t haddr)
+                              RAMBlock *rb, ram_addr_t start, uint64_t haddr,
+                              uint32_t tid)
 {
     void *aligned = (void *)(uintptr_t)ROUND_DOWN(haddr, qemu_ram_pagesize(rb));
     bool received = false;
 
     WITH_QEMU_LOCK_GUARD(&mis->page_request_mutex) {
         received = ramblock_recv_bitmap_test_byte_offset(rb, start);
-        if (!received && !g_tree_lookup(mis->page_requested, aligned)) {
-            /*
-             * The page has not been received, and it's not yet in the page
-             * request list.  Queue it.  Set the value of element to 1, so that
-             * things like g_tree_lookup() will return TRUE (1) when found.
-             */
-            g_tree_insert(mis->page_requested, aligned, (gpointer)1);
-            qatomic_inc(&mis->page_requested_count);
-            trace_postcopy_page_req_add(aligned, mis->page_requested_count);
+        if (!received) {
+            if (!g_tree_lookup(mis->page_requested, aligned)) {
+                /*
+                 * The page has not been received, and it's not yet in the
+                 * page request list.  Queue it.  Set the value of element
+                 * to 1, so that things like g_tree_lookup() will return
+                 * TRUE (1) when found.
+                 */
+                g_tree_insert(mis->page_requested, aligned, (gpointer)1);
+                qatomic_inc(&mis->page_requested_count);
+                trace_postcopy_page_req_add(aligned, mis->page_requested_count);
+            }
+            mark_postcopy_blocktime_begin(haddr, tid, rb);
         }
     }
 
diff --git a/migration/migration.h b/migration/migration.h
index 739289de93..01329bf824 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -546,7 +546,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
 void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
 int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
-                              ram_addr_t start, uint64_t haddr);
+                              ram_addr_t start, uint64_t haddr, uint32_t tid);
 int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
                                       RAMBlock *rb, ram_addr_t start);
 void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 75fd310fb2..32fa06dabd 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -752,8 +752,12 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
                        pagesize);
 }
 
+/*
+ * NOTE: @tid is only used when postcopy-blocktime feature is enabled, and
+ * also optional: when zero is provided, the fault accounting will be ignored.
+ */
 static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
-                                 ram_addr_t start, uint64_t haddr)
+                                 ram_addr_t start, uint64_t haddr, uint32_t tid)
 {
     void *aligned = (void *)(uintptr_t)ROUND_DOWN(haddr, qemu_ram_pagesize(rb));
 
@@ -772,7 +776,7 @@ static int postcopy_request_page(MigrationIncomingState *mis, RAMBlock *rb,
         return received ? 0 : postcopy_place_page_zero(mis, aligned, rb);
     }
 
-    return migrate_send_rp_req_pages(mis, rb, start, haddr);
+    return migrate_send_rp_req_pages(mis, rb, start, haddr, tid);
 }
 
 /*
@@ -793,7 +797,8 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                         qemu_ram_get_idstr(rb), rb_offset);
         return postcopy_wake_shared(pcfd, client_addr, rb);
     }
-    postcopy_request_page(mis, rb, aligned_rbo, client_addr);
+    /* TODO: support blocktime tracking */
+    postcopy_request_page(mis, rb, aligned_rbo, client_addr, 0);
     return 0;
 }
 
@@ -819,17 +824,17 @@ static uint32_t get_low_time_offset(PostcopyBlocktimeContext *dc)
 }
 
 /*
- * This function is being called when pagefault occurs. It
- * tracks down vCPU blocking time.
+ * This function is being called when pagefault occurs. It tracks down vCPU
+ * blocking time.  It's protected by @page_request_mutex.
  *
  * @addr: faulted host virtual address
  * @ptid: faulted process thread id
  * @rb: ramblock appropriate to addr
  */
-static void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
-                                          RAMBlock *rb)
+void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
+                                   RAMBlock *rb)
 {
-    int cpu, already_received;
+    int cpu;
     MigrationIncomingState *mis = migration_incoming_get_current();
     PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
     uint32_t low_time_offset;
@@ -852,24 +857,19 @@ static void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
     qatomic_xchg(&dc->vcpu_addr[cpu], addr);
 
     /*
-     * check it here, not at the beginning of the function,
-     * due to, check could occur early than bitmap_set in
-     * qemu_ufd_copy_ioctl
+     * The caller should only inject a blocktime entry when the page is
+     * yet missing.
      */
-    already_received = ramblock_recv_bitmap_test(rb, (void *)addr);
-    if (already_received) {
-        qatomic_xchg(&dc->vcpu_addr[cpu], 0);
-        qatomic_xchg(&dc->page_fault_vcpu_time[cpu], 0);
-        qatomic_dec(&dc->smp_cpus_down);
-    }
+    assert(!ramblock_recv_bitmap_test(rb, (void *)addr));
+
     trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
-                                        cpu, already_received);
+                                        cpu);
 }
 
 /*
- *  This function just provide calculated blocktime per cpu and trace it.
- *  Total blocktime is calculated in mark_postcopy_blocktime_end.
- *
+ * This function just provide calculated blocktime per cpu and trace it.
+ * Total blocktime is calculated in mark_postcopy_blocktime_end.  It's
+ * protected by @page_request_mutex.
  *
  * Assume we have 3 CPU
  *
@@ -1068,17 +1068,14 @@ static void *postcopy_ram_fault_thread(void *opaque)
                                                 qemu_ram_get_idstr(rb),
                                                 rb_offset,
                                                 msg.arg.pagefault.feat.ptid);
-            mark_postcopy_blocktime_begin(
-                    (uintptr_t)(msg.arg.pagefault.address),
-                                msg.arg.pagefault.feat.ptid, rb);
-
 retry:
             /*
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
             ret = postcopy_request_page(mis, rb, rb_offset,
-                                        msg.arg.pagefault.address);
+                                        msg.arg.pagefault.address,
+                                        msg.arg.pagefault.feat.ptid);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
                 postcopy_pause_fault_thread(mis);
@@ -1299,8 +1296,8 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingState *mis, void *host_addr,
                 qemu_cond_signal(&mis->page_request_cond);
             }
         }
-        qemu_mutex_unlock(&mis->page_request_mutex);
         mark_postcopy_blocktime_end((uintptr_t)host_addr);
+        qemu_mutex_unlock(&mis->page_request_mutex);
     }
     return ret;
 }
@@ -1430,6 +1427,11 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
 {
     g_assert_not_reached();
 }
+
+void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
+                                   RAMBlock *rb)
+{
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index a6df1b2811..3852141d7e 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -196,5 +196,7 @@ void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
 void postcopy_preempt_setup(MigrationState *s);
 int postcopy_preempt_establish_channel(MigrationState *s);
 bool postcopy_is_paused(MigrationStatus status);
+void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
+                                   RAMBlock *rb);
 
 #endif
diff --git a/migration/trace-events b/migration/trace-events
index dcd8fe9a0c..917f521e88 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -285,7 +285,7 @@ postcopy_nhp_range(const char *ramblock, void *host_addr, size_t offset, size_t
 postcopy_place_page(void *host_addr) "host=%p"
 postcopy_place_page_zero(void *host_addr) "host=%p"
 postcopy_ram_enable_notify(void) ""
-mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int cpu, int received) "addr: 0x%" PRIx64 ", dd: %p, time: %u, cpu: %d, already_received: %d"
+mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, cpu: %d"
 mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d"
 postcopy_pause_fault_thread(void) ""
 postcopy_pause_fault_thread_continued(void) ""
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 14/26] migration/postcopy: Drop all atomic ops in blocktime feature
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (12 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 13/26] migration/postcopy: Push blocktime start/end into page req mutex Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 15/26] migration/postcopy: Make all blocktime vars 64bits Fabiano Rosas
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

Now with the mutex protection it's not needed anymore.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/postcopy-ram.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 32fa06dabd..81925532de 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -849,12 +849,12 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
 
     low_time_offset = get_low_time_offset(dc);
     if (dc->vcpu_addr[cpu] == 0) {
-        qatomic_inc(&dc->smp_cpus_down);
+        dc->smp_cpus_down++;
     }
 
-    qatomic_xchg(&dc->last_begin, low_time_offset);
-    qatomic_xchg(&dc->page_fault_vcpu_time[cpu], low_time_offset);
-    qatomic_xchg(&dc->vcpu_addr[cpu], addr);
+    dc->last_begin = low_time_offset;
+    dc->page_fault_vcpu_time[cpu] = low_time_offset;
+    dc->vcpu_addr[cpu] = addr;
 
     /*
      * The caller should only inject a blocktime entry when the page is
@@ -915,29 +915,26 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
     for (i = 0; i < smp_cpus; i++) {
         uint32_t vcpu_blocktime = 0;
 
-        read_vcpu_time = qatomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
-        if (qatomic_fetch_add(&dc->vcpu_addr[i], 0) != addr ||
-            read_vcpu_time == 0) {
+        read_vcpu_time = dc->page_fault_vcpu_time[i];
+        if (dc->vcpu_addr[i] != addr || read_vcpu_time == 0) {
             continue;
         }
-        qatomic_xchg(&dc->vcpu_addr[i], 0);
+        dc->vcpu_addr[i] = 0;
         vcpu_blocktime = low_time_offset - read_vcpu_time;
         affected_cpu += 1;
         /* we need to know is that mark_postcopy_end was due to
          * faulted page, another possible case it's prefetched
          * page and in that case we shouldn't be here */
-        if (!vcpu_total_blocktime &&
-            qatomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
+        if (!vcpu_total_blocktime && dc->smp_cpus_down == smp_cpus) {
             vcpu_total_blocktime = true;
         }
         /* continue cycle, due to one page could affect several vCPUs */
         dc->vcpu_blocktime[i] += vcpu_blocktime;
     }
 
-    qatomic_sub(&dc->smp_cpus_down, affected_cpu);
+    dc->smp_cpus_down -= affected_cpu;
     if (vcpu_total_blocktime) {
-        dc->total_blocktime += low_time_offset - qatomic_fetch_add(
-                &dc->last_begin, 0);
+        dc->total_blocktime += low_time_offset - dc->last_begin;
     }
     trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime,
                                       affected_cpu);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 15/26] migration/postcopy: Make all blocktime vars 64bits
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (13 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 14/26] migration/postcopy: Drop all atomic ops in blocktime feature Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 16/26] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Fabiano Rosas
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

I am guessing it was used to be 32bits because of the atomic ops.  Now all
the atomic ops are gone and we're protected by a mutex instead, it's ok we
can switch to 64 bits.

Reasons to move over:

  - Allow further patches to change the unit from ms to us: with postcopy
  preempt mode, we're really into hundreds of microseconds level on
  blocktime.  We'd better be able to trap those.

  - This also paves way for some other tricks that the original version
  used to avoid overflows, e.g., start_time was almost only useful before
  to make sure the sampled timestamp won't overflow a 32-bit field.

  - This prepares further reports on top of existing data collected,
  e.g. average page fault latencies.  When average operation is taken into
  account, milliseconds are simply too coarse grained.

When at it:

  - Rename page_fault_vcpu_time to vcpu_blocktime_start.

  - Rename vcpu_blocktime to vcpu_blocktime_total.

  - Touch up the trace-events to not dump blocktime ctx pointer

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/postcopy-ram.c | 50 ++++++++++++++++++++--------------------
 migration/trace-events   |  4 ++--
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 81925532de..ec91821b85 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -112,14 +112,15 @@ void postcopy_thread_create(MigrationIncomingState *mis,
 
 typedef struct PostcopyBlocktimeContext {
     /* time when page fault initiated per vCPU */
-    uint32_t *page_fault_vcpu_time;
+    uint64_t *vcpu_blocktime_start;
+    /* blocktime per vCPU */
+    uint64_t *vcpu_blocktime_total;
     /* page address per vCPU */
     uintptr_t *vcpu_addr;
-    uint32_t total_blocktime;
-    /* blocktime per vCPU */
-    uint32_t *vcpu_blocktime;
+    /* total blocktime when all vCPUs are stopped */
+    uint64_t total_blocktime;
     /* point in time when last page fault was initiated */
-    uint32_t last_begin;
+    uint64_t last_begin;
     /* number of vCPU are suspended */
     int smp_cpus_down;
     uint64_t start_time;
@@ -133,9 +134,9 @@ typedef struct PostcopyBlocktimeContext {
 
 static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
 {
-    g_free(ctx->page_fault_vcpu_time);
+    g_free(ctx->vcpu_blocktime_start);
+    g_free(ctx->vcpu_blocktime_total);
     g_free(ctx->vcpu_addr);
-    g_free(ctx->vcpu_blocktime);
     g_free(ctx);
 }
 
@@ -151,13 +152,14 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     MachineState *ms = MACHINE(qdev_get_machine());
     unsigned int smp_cpus = ms->smp.cpus;
     PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
-    ctx->page_fault_vcpu_time = g_new0(uint32_t, smp_cpus);
+
+    ctx->vcpu_blocktime_start = g_new0(uint64_t, smp_cpus);
+    ctx->vcpu_blocktime_total = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_addr = g_new0(uintptr_t, smp_cpus);
-    ctx->vcpu_blocktime = g_new0(uint32_t, smp_cpus);
-
     ctx->exit_notifier.notify = migration_exit_cb;
     ctx->start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     qemu_add_exit_notifier(&ctx->exit_notifier);
+
     return ctx;
 }
 
@@ -168,7 +170,7 @@ static uint32List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx)
     int i;
 
     for (i = ms->smp.cpus - 1; i >= 0; i--) {
-        QAPI_LIST_PREPEND(list, ctx->vcpu_blocktime[i]);
+        QAPI_LIST_PREPEND(list, (uint32_t)ctx->vcpu_blocktime_total[i]);
     }
 
     return list;
@@ -191,12 +193,12 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
     }
 
     info->has_postcopy_blocktime = true;
-    info->postcopy_blocktime = bc->total_blocktime;
+    info->postcopy_blocktime = (uint32_t)bc->total_blocktime;
     info->has_postcopy_vcpu_blocktime = true;
     info->postcopy_vcpu_blocktime = get_vcpu_blocktime_list(bc);
 }
 
-static uint32_t get_postcopy_total_blocktime(void)
+static uint64_t get_postcopy_total_blocktime(void)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     PostcopyBlocktimeContext *bc = mis->blocktime_ctx;
@@ -816,11 +818,9 @@ static int get_mem_fault_cpu_index(uint32_t pid)
     return -1;
 }
 
-static uint32_t get_low_time_offset(PostcopyBlocktimeContext *dc)
+static uint64_t get_low_time_offset(PostcopyBlocktimeContext *dc)
 {
-    int64_t start_time_offset = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
-                                    dc->start_time;
-    return start_time_offset < 1 ? 1 : start_time_offset & UINT32_MAX;
+    return (uint64_t)qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - dc->start_time;
 }
 
 /*
@@ -837,7 +837,7 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
     int cpu;
     MigrationIncomingState *mis = migration_incoming_get_current();
     PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
-    uint32_t low_time_offset;
+    uint64_t low_time_offset;
 
     if (!dc || ptid == 0) {
         return;
@@ -853,7 +853,7 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
     }
 
     dc->last_begin = low_time_offset;
-    dc->page_fault_vcpu_time[cpu] = low_time_offset;
+    dc->vcpu_blocktime_start[cpu] = low_time_offset;
     dc->vcpu_addr[cpu] = addr;
 
     /*
@@ -862,7 +862,7 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
      */
     assert(!ramblock_recv_bitmap_test(rb, (void *)addr));
 
-    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
+    trace_mark_postcopy_blocktime_begin(addr, dc->vcpu_blocktime_start[cpu],
                                         cpu);
 }
 
@@ -901,7 +901,7 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
     unsigned int smp_cpus = ms->smp.cpus;
     int i, affected_cpu = 0;
     bool vcpu_total_blocktime = false;
-    uint32_t read_vcpu_time, low_time_offset;
+    uint64_t read_vcpu_time, low_time_offset;
 
     if (!dc) {
         return;
@@ -913,9 +913,9 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
      * optimal, more optimal algorithm is keeping tree or hash
      * where key is address value is a list of  */
     for (i = 0; i < smp_cpus; i++) {
-        uint32_t vcpu_blocktime = 0;
+        uint64_t vcpu_blocktime = 0;
 
-        read_vcpu_time = dc->page_fault_vcpu_time[i];
+        read_vcpu_time = dc->vcpu_blocktime_start[i];
         if (dc->vcpu_addr[i] != addr || read_vcpu_time == 0) {
             continue;
         }
@@ -929,14 +929,14 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
             vcpu_total_blocktime = true;
         }
         /* continue cycle, due to one page could affect several vCPUs */
-        dc->vcpu_blocktime[i] += vcpu_blocktime;
+        dc->vcpu_blocktime_total[i] += vcpu_blocktime;
     }
 
     dc->smp_cpus_down -= affected_cpu;
     if (vcpu_total_blocktime) {
         dc->total_blocktime += low_time_offset - dc->last_begin;
     }
-    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime,
+    trace_mark_postcopy_blocktime_end(addr, dc->total_blocktime,
                                       affected_cpu);
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index 917f521e88..02cdb6e7cc 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -285,8 +285,8 @@ postcopy_nhp_range(const char *ramblock, void *host_addr, size_t offset, size_t
 postcopy_place_page(void *host_addr) "host=%p"
 postcopy_place_page_zero(void *host_addr) "host=%p"
 postcopy_ram_enable_notify(void) ""
-mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, cpu: %d"
-mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d"
+mark_postcopy_blocktime_begin(uint64_t addr, uint64_t time, int cpu) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", cpu: %d"
+mark_postcopy_blocktime_end(uint64_t addr, uint64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", affected_cpus: %d"
 postcopy_pause_fault_thread(void) ""
 postcopy_pause_fault_thread_continued(void) ""
 postcopy_pause_fast_load(void) ""
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 16/26] migration/postcopy: Drop PostcopyBlocktimeContext.start_time
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (14 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 15/26] migration/postcopy: Make all blocktime vars 64bits Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 17/26] migration/postcopy: Bring blocktime layer to ns level Fabiano Rosas
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

Now with 64bits, the offseting using start_time is not needed anymore,
because the array can always remember the whole timestamp.

Then drop the unused parameter in get_low_time_offset() altogether.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/postcopy-ram.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index ec91821b85..e9acb4ef6e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -123,7 +123,6 @@ typedef struct PostcopyBlocktimeContext {
     uint64_t last_begin;
     /* number of vCPU are suspended */
     int smp_cpus_down;
-    uint64_t start_time;
 
     /*
      * Handler for exit event, necessary for
@@ -157,7 +156,6 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     ctx->vcpu_blocktime_total = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_addr = g_new0(uintptr_t, smp_cpus);
     ctx->exit_notifier.notify = migration_exit_cb;
-    ctx->start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     qemu_add_exit_notifier(&ctx->exit_notifier);
 
     return ctx;
@@ -818,9 +816,9 @@ static int get_mem_fault_cpu_index(uint32_t pid)
     return -1;
 }
 
-static uint64_t get_low_time_offset(PostcopyBlocktimeContext *dc)
+static uint64_t get_low_time_offset(void)
 {
-    return (uint64_t)qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - dc->start_time;
+    return (uint64_t)qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 }
 
 /*
@@ -847,7 +845,7 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
         return;
     }
 
-    low_time_offset = get_low_time_offset(dc);
+    low_time_offset = get_low_time_offset();
     if (dc->vcpu_addr[cpu] == 0) {
         dc->smp_cpus_down++;
     }
@@ -907,7 +905,7 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
         return;
     }
 
-    low_time_offset = get_low_time_offset(dc);
+    low_time_offset = get_low_time_offset();
     /* lookup cpu, to clear it,
      * that algorithm looks straightforward, but it's not
      * optimal, more optimal algorithm is keeping tree or hash
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 17/26] migration/postcopy: Bring blocktime layer to ns level
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (15 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 16/26] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 18/26] migration/postcopy: Add blocktime fault counts per-vcpu Fabiano Rosas
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

With 64-bit fields, it is trivial.  The caution is when exposing any values
in QMP, it was still declared with milliseconds (ms).  Hence it's needed to
do the convertion when exporting the values to existing QMP queries.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-7-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/postcopy-ram.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e9acb4ef6e..9dfa92a62d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -110,6 +110,7 @@ void postcopy_thread_create(MigrationIncomingState *mis,
 #include <sys/eventfd.h>
 #include <linux/userfaultfd.h>
 
+/* All the time records are in unit of nanoseconds */
 typedef struct PostcopyBlocktimeContext {
     /* time when page fault initiated per vCPU */
     uint64_t *vcpu_blocktime_start;
@@ -168,7 +169,9 @@ static uint32List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx)
     int i;
 
     for (i = ms->smp.cpus - 1; i >= 0; i--) {
-        QAPI_LIST_PREPEND(list, (uint32_t)ctx->vcpu_blocktime_total[i]);
+        /* Convert ns -> ms */
+        QAPI_LIST_PREPEND(
+            list, (uint32_t)(ctx->vcpu_blocktime_total[i] / SCALE_MS));
     }
 
     return list;
@@ -191,7 +194,8 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
     }
 
     info->has_postcopy_blocktime = true;
-    info->postcopy_blocktime = (uint32_t)bc->total_blocktime;
+    /* Convert ns -> ms */
+    info->postcopy_blocktime = (uint32_t)(bc->total_blocktime / SCALE_MS);
     info->has_postcopy_vcpu_blocktime = true;
     info->postcopy_vcpu_blocktime = get_vcpu_blocktime_list(bc);
 }
@@ -816,9 +820,9 @@ static int get_mem_fault_cpu_index(uint32_t pid)
     return -1;
 }
 
-static uint64_t get_low_time_offset(void)
+static uint64_t get_current_ns(void)
 {
-    return (uint64_t)qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    return (uint64_t)qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
 
 /*
@@ -835,7 +839,7 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
     int cpu;
     MigrationIncomingState *mis = migration_incoming_get_current();
     PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
-    uint64_t low_time_offset;
+    uint64_t current;
 
     if (!dc || ptid == 0) {
         return;
@@ -845,13 +849,13 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
         return;
     }
 
-    low_time_offset = get_low_time_offset();
+    current = get_current_ns();
     if (dc->vcpu_addr[cpu] == 0) {
         dc->smp_cpus_down++;
     }
 
-    dc->last_begin = low_time_offset;
-    dc->vcpu_blocktime_start[cpu] = low_time_offset;
+    dc->last_begin = current;
+    dc->vcpu_blocktime_start[cpu] = current;
     dc->vcpu_addr[cpu] = addr;
 
     /*
@@ -899,13 +903,13 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
     unsigned int smp_cpus = ms->smp.cpus;
     int i, affected_cpu = 0;
     bool vcpu_total_blocktime = false;
-    uint64_t read_vcpu_time, low_time_offset;
+    uint64_t read_vcpu_time, current;
 
     if (!dc) {
         return;
     }
 
-    low_time_offset = get_low_time_offset();
+    current = get_current_ns();
     /* lookup cpu, to clear it,
      * that algorithm looks straightforward, but it's not
      * optimal, more optimal algorithm is keeping tree or hash
@@ -918,7 +922,7 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
             continue;
         }
         dc->vcpu_addr[i] = 0;
-        vcpu_blocktime = low_time_offset - read_vcpu_time;
+        vcpu_blocktime = current - read_vcpu_time;
         affected_cpu += 1;
         /* we need to know is that mark_postcopy_end was due to
          * faulted page, another possible case it's prefetched
@@ -932,7 +936,7 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
 
     dc->smp_cpus_down -= affected_cpu;
     if (vcpu_total_blocktime) {
-        dc->total_blocktime += low_time_offset - dc->last_begin;
+        dc->total_blocktime += current - dc->last_begin;
     }
     trace_mark_postcopy_blocktime_end(addr, dc->total_blocktime,
                                       affected_cpu);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 18/26] migration/postcopy: Add blocktime fault counts per-vcpu
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (16 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 17/26] migration/postcopy: Bring blocktime layer to ns level Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 19/26] migration/postcopy: Report fault latencies in blocktime Fabiano Rosas
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

Add a field to count how many remote faults one vCPU has taken.  So far
it's still not used, but will be soon.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-8-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/postcopy-ram.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9dfa92a62d..15ea106910 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -116,6 +116,8 @@ typedef struct PostcopyBlocktimeContext {
     uint64_t *vcpu_blocktime_start;
     /* blocktime per vCPU */
     uint64_t *vcpu_blocktime_total;
+    /* count of faults per vCPU */
+    uint64_t *vcpu_faults_count;
     /* page address per vCPU */
     uintptr_t *vcpu_addr;
     /* total blocktime when all vCPUs are stopped */
@@ -136,6 +138,7 @@ static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
 {
     g_free(ctx->vcpu_blocktime_start);
     g_free(ctx->vcpu_blocktime_total);
+    g_free(ctx->vcpu_faults_count);
     g_free(ctx->vcpu_addr);
     g_free(ctx);
 }
@@ -155,6 +158,7 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
 
     ctx->vcpu_blocktime_start = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_blocktime_total = g_new0(uint64_t, smp_cpus);
+    ctx->vcpu_faults_count = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_addr = g_new0(uintptr_t, smp_cpus);
     ctx->exit_notifier.notify = migration_exit_cb;
     qemu_add_exit_notifier(&ctx->exit_notifier);
@@ -857,6 +861,7 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
     dc->last_begin = current;
     dc->vcpu_blocktime_start[cpu] = current;
     dc->vcpu_addr[cpu] = addr;
+    dc->vcpu_faults_count[cpu]++;
 
     /*
      * The caller should only inject a blocktime entry when the page is
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 19/26] migration/postcopy: Report fault latencies in blocktime
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (17 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 18/26] migration/postcopy: Add blocktime fault counts per-vcpu Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 20/26] migration/postcopy: Initialize blocktime context only until listen Fabiano Rosas
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Markus Armbruster, Dr . David Alan Gilbert,
	Mario Casquero

From: Peter Xu <peterx@redhat.com>

Blocktime so far only cares about the time one vcpu (or the whole system)
got blocked.  It would be also be helpful if it can also report the latency
of page requests, which could be very sensitive during postcopy.

Blocktime itself is sometimes not very important, especially when one
thinks about KVM async PF support, which means vCPUs are literally almost
not blocked at all because the guest OS is smart enough to switch to
another task when a remote fault is needed.

However, latency is still sensitive and important because even if the guest
vCPU is running on threads that do not need a remote fault, the workload
that accesses some missing page is still affected.

Add two entries to the report, showing how long it takes to resolve a
remote fault.  Mention in the QAPI doc that this is not the real average
fault latency, but only the ones that was requested for a remote fault.

Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice,
meanwhile add the entry checks in qtests for all postcopy tests.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Link: https://lore.kernel.org/r/20250613141217.474825-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration-hmp-cmds.c        | 67 ++++++++++++++++++---------
 migration/postcopy-ram.c              | 49 +++++++++++++-------
 qapi/migration.json                   | 20 ++++++++
 tests/qtest/migration/migration-qmp.c |  3 ++
 4 files changed, 102 insertions(+), 37 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 867e017b32..8b3846dab5 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -52,6 +52,51 @@ static void migration_global_dump(Monitor *mon)
                    ms->clear_bitmap_shift);
 }
 
+static void migration_dump_blocktime(Monitor *mon, MigrationInfo *info)
+{
+    if (info->has_postcopy_blocktime) {
+        monitor_printf(mon, "Postcopy Blocktime (ms): %" PRIu32 "\n",
+                       info->postcopy_blocktime);
+    }
+
+    if (info->has_postcopy_vcpu_blocktime) {
+        uint32List *item = info->postcopy_vcpu_blocktime;
+        const char *sep = "";
+        int count = 0;
+
+        monitor_printf(mon, "Postcopy vCPU Blocktime (ms):\n [");
+
+        while (item) {
+            monitor_printf(mon, "%s%"PRIu32, sep, item->value);
+            item = item->next;
+            /* Each line 10 vcpu results, newline if there's more */
+            sep = ((++count % 10 == 0) && item) ? ",\n  " : ", ";
+        }
+        monitor_printf(mon, "]\n");
+    }
+
+    if (info->has_postcopy_latency) {
+        monitor_printf(mon, "Postcopy Latency (ns): %" PRIu64 "\n",
+                       info->postcopy_latency);
+    }
+
+    if (info->has_postcopy_vcpu_latency) {
+        uint64List *item = info->postcopy_vcpu_latency;
+        const char *sep = "";
+        int count = 0;
+
+        monitor_printf(mon, "Postcopy vCPU Latencies (ns):\n [");
+
+        while (item) {
+            monitor_printf(mon, "%s%"PRIu64, sep, item->value);
+            item = item->next;
+            /* Each line 10 vcpu results, newline if there's more */
+            sep = ((++count % 10 == 0) && item) ? ",\n  " : ", ";
+        }
+        monitor_printf(mon, "]\n");
+    }
+}
+
 void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 {
     bool show_all = qdict_get_try_bool(qdict, "all", false);
@@ -202,27 +247,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
                        info->dirty_limit_ring_full_time);
     }
 
-    if (info->has_postcopy_blocktime) {
-        monitor_printf(mon, "Postcopy Blocktime (ms): %" PRIu32 "\n",
-                       info->postcopy_blocktime);
-    }
-
-    if (info->has_postcopy_vcpu_blocktime) {
-        uint32List *item = info->postcopy_vcpu_blocktime;
-        const char *sep = "";
-        int count = 0;
-
-        monitor_printf(mon, "Postcopy vCPU Blocktime (ms):\n [");
-
-        while (item) {
-            monitor_printf(mon, "%s%"PRIu32, sep, item->value);
-            item = item->next;
-            /* Each line 10 vcpu results, newline if there's more */
-            sep = ((++count % 10 == 0) && item) ? ",\n  " : ", ";
-        }
-        monitor_printf(mon, "]\n");
-    }
-
+    migration_dump_blocktime(mon, info);
 out:
     qapi_free_MigrationInfo(info);
 }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 15ea106910..fe940f89b9 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -166,21 +166,6 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     return ctx;
 }
 
-static uint32List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx)
-{
-    MachineState *ms = MACHINE(qdev_get_machine());
-    uint32List *list = NULL;
-    int i;
-
-    for (i = ms->smp.cpus - 1; i >= 0; i--) {
-        /* Convert ns -> ms */
-        QAPI_LIST_PREPEND(
-            list, (uint32_t)(ctx->vcpu_blocktime_total[i] / SCALE_MS));
-    }
-
-    return list;
-}
-
 /*
  * This function just populates MigrationInfo from postcopy's
  * blocktime context. It will not populate MigrationInfo,
@@ -192,16 +177,48 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     PostcopyBlocktimeContext *bc = mis->blocktime_ctx;
+    MachineState *ms = MACHINE(qdev_get_machine());
+    uint64_t latency_total = 0, faults = 0;
+    uint32List *list_blocktime = NULL;
+    uint64List *list_latency = NULL;
+    int i;
 
     if (!bc) {
         return;
     }
 
+    for (i = ms->smp.cpus - 1; i >= 0; i--) {
+        uint64_t latency, total, count;
+
+        /* Convert ns -> ms */
+        QAPI_LIST_PREPEND(list_blocktime,
+                          (uint32_t)(bc->vcpu_blocktime_total[i] / SCALE_MS));
+
+        /* The rest in nanoseconds */
+        total = bc->vcpu_blocktime_total[i];
+        latency_total += total;
+        count = bc->vcpu_faults_count[i];
+        faults += count;
+
+        if (count) {
+            latency = total / count;
+        } else {
+            /* No fault detected */
+            latency = 0;
+        }
+
+        QAPI_LIST_PREPEND(list_latency, latency);
+    }
+
     info->has_postcopy_blocktime = true;
     /* Convert ns -> ms */
     info->postcopy_blocktime = (uint32_t)(bc->total_blocktime / SCALE_MS);
     info->has_postcopy_vcpu_blocktime = true;
-    info->postcopy_vcpu_blocktime = get_vcpu_blocktime_list(bc);
+    info->postcopy_vcpu_blocktime = list_blocktime;
+    info->has_postcopy_latency = true;
+    info->postcopy_latency = faults ? (latency_total / faults) : 0;
+    info->has_postcopy_vcpu_latency = true;
+    info->postcopy_vcpu_latency = list_latency;
 }
 
 static uint64_t get_postcopy_total_blocktime(void)
diff --git a/qapi/migration.json b/qapi/migration.json
index e8a7d3b2a9..bb41dc0795 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -236,6 +236,17 @@
 #     This is only present when the postcopy-blocktime migration
 #     capability is enabled.  (Since 3.0)
 #
+# @postcopy-latency: average remote page fault latency (in ns).  Note that
+#     this doesn't include all faults, but only the ones that require a
+#     remote page request.  So it should be always bigger than the real
+#     average page fault latency. This is only present when the
+#     postcopy-blocktime migration capability is enabled.  (Since 10.1)
+#
+# @postcopy-vcpu-latency: average remote page fault latency per vCPU (in
+#     ns).  It has the same definition of @postcopy-latency, but instead
+#     this is the per-vCPU statistics.  This is only present when the
+#     postcopy-blocktime migration capability is enabled.  (Since 10.1)
+#
 # @socket-address: Only used for tcp, to know what the real port is
 #     (Since 4.0)
 #
@@ -260,6 +271,11 @@
 #     average memory load of the virtual CPU indirectly.  Note that
 #     zero means guest doesn't dirty memory.  (Since 8.1)
 #
+# Features:
+#
+# @unstable: Members @postcopy-latency, @postcopy-vcpu-latency are
+#     experimental.
+#
 # Since: 0.14
 ##
 { 'struct': 'MigrationInfo',
@@ -275,6 +291,10 @@
            '*blocked-reasons': ['str'],
            '*postcopy-blocktime': 'uint32',
            '*postcopy-vcpu-blocktime': ['uint32'],
+           '*postcopy-latency': {
+               'type': 'uint64', 'features': [ 'unstable' ] },
+           '*postcopy-vcpu-latency': {
+               'type': ['uint64'], 'features': [ 'unstable' ] },
            '*socket-address': ['SocketAddress'],
            '*dirty-limit-throttle-time-per-round': 'uint64',
            '*dirty-limit-ring-full-time': 'uint64'} }
diff --git a/tests/qtest/migration/migration-qmp.c b/tests/qtest/migration/migration-qmp.c
index fb59741b2c..1a5ab2d229 100644
--- a/tests/qtest/migration/migration-qmp.c
+++ b/tests/qtest/migration/migration-qmp.c
@@ -358,6 +358,9 @@ void read_blocktime(QTestState *who)
 
     rsp_return = migrate_query_not_failed(who);
     g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
+    g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-blocktime"));
+    g_assert(qdict_haskey(rsp_return, "postcopy-latency"));
+    g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-latency"));
     qobject_unref(rsp_return);
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 20/26] migration/postcopy: Initialize blocktime context only until listen
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (18 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 19/26] migration/postcopy: Report fault latencies in blocktime Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 21/26] migration/postcopy: Cache the tid->vcpu mapping for blocktime Fabiano Rosas
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

Before this patch, the blocktime context can be created very early, because
postcopy_ram_supported_by_host() <- migrate_caps_check() can happen during
migration object init.

The trick here is the blocktime context needs system vCPU information,
which seems to be possible to change after that point.  I didn't verify it,
but it doesn't sound right.

Now move it out and initialize the context only when postcopy listen
starts.  That is already during a migration so it should be guaranteed the
vCPU topology can never change on both sides.

While at it, assert that the ctx isn't created instead this time; the old
"if" trick isn't needed when we're sure it will only happen once now.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-10-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/postcopy-ram.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index fe940f89b9..dd3615663f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -325,13 +325,13 @@ static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis,
     }
 
 #ifdef UFFD_FEATURE_THREAD_ID
+    /*
+     * Postcopy blocktime conditionally needs THREAD_ID feature (introduced
+     * to Linux in 2017). Always try to enable it when QEMU is compiled
+     * with such environment.
+     */
     if (UFFD_FEATURE_THREAD_ID & supported_features) {
         asked_features |= UFFD_FEATURE_THREAD_ID;
-        if (migrate_postcopy_blocktime()) {
-            if (!mis->blocktime_ctx) {
-                mis->blocktime_ctx = blocktime_context_new();
-            }
-        }
     }
 #endif
 
@@ -1239,6 +1239,11 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (migrate_postcopy_blocktime()) {
+        assert(mis->blocktime_ctx == NULL);
+        mis->blocktime_ctx = blocktime_context_new();
+    }
+
     /* Now an eventfd we use to tell the fault-thread to quit */
     mis->userfault_event_fd = eventfd(0, EFD_CLOEXEC);
     if (mis->userfault_event_fd == -1) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 21/26] migration/postcopy: Cache the tid->vcpu mapping for blocktime
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (19 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 20/26] migration/postcopy: Initialize blocktime context only until listen Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 22/26] migration/postcopy: Cleanup the total blocktime accounting Fabiano Rosas
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

Looking up the vCPU index for each fault can be expensive when there're
hundreds of vCPUs.  Provide a cache for tid->vcpu instead with a hash
table, then lookup from there.

When at it, add another counter to record how many non-vCPU faults it gets.
For example, the main thread can also access a guest page that was missing.
These kind of faults are not accounted by blocktime so far.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-11-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/postcopy-ram.c | 68 ++++++++++++++++++++++++++++++++++------
 migration/trace-events   |  3 +-
 2 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index dd3615663f..bf65d6035c 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -127,6 +127,17 @@ typedef struct PostcopyBlocktimeContext {
     /* number of vCPU are suspended */
     int smp_cpus_down;
 
+    /*
+     * Fast path for looking up vcpu_index from tid.  NOTE: this result
+     * only reflects the vcpu setup when postcopy is running.  It may not
+     * always match with the current vcpu setup because vcpus can be hot
+     * attached/detached after migration completes.  However this should be
+     * stable when blocktime is using the structure.
+     */
+    GHashTable *tid_to_vcpu_hash;
+    /* Count of non-vCPU faults.  This is only for debugging purpose. */
+    uint64_t non_vcpu_faults;
+
     /*
      * Handler for exit event, necessary for
      * releasing whole blocktime_ctx
@@ -136,6 +147,7 @@ typedef struct PostcopyBlocktimeContext {
 
 static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
 {
+    g_hash_table_destroy(ctx->tid_to_vcpu_hash);
     g_free(ctx->vcpu_blocktime_start);
     g_free(ctx->vcpu_blocktime_total);
     g_free(ctx->vcpu_faults_count);
@@ -150,6 +162,36 @@ static void migration_exit_cb(Notifier *n, void *data)
     destroy_blocktime_context(ctx);
 }
 
+static GHashTable *blocktime_init_tid_to_vcpu_hash(void)
+{
+    /*
+     * TID as an unsigned int can be directly used as the key.  However,
+     * CPU index can NOT be directly used as value, because CPU index can
+     * be 0, which means NULL.  Then when lookup we can never know whether
+     * it's 0 or "not found".  Hence use an indirection for CPU index.
+     */
+    GHashTable *table = g_hash_table_new_full(g_direct_hash, g_direct_equal,
+                                              NULL, g_free);
+    CPUState *cpu;
+
+    /*
+     * Initialize the tid->cpu_id mapping for lookups.  The caller needs to
+     * make sure when reaching here the CPU topology is frozen and will be
+     * stable for the whole blocktime trapping period.
+     */
+    CPU_FOREACH(cpu) {
+        int *value = g_new(int, 1);
+
+        *value = cpu->cpu_index;
+        g_hash_table_insert(table,
+                            GUINT_TO_POINTER((uint32_t)cpu->thread_id),
+                            value);
+        trace_postcopy_blocktime_tid_cpu_map(cpu->cpu_index, cpu->thread_id);
+    }
+
+    return table;
+}
+
 static struct PostcopyBlocktimeContext *blocktime_context_new(void)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
@@ -160,6 +202,8 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     ctx->vcpu_blocktime_total = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_faults_count = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_addr = g_new0(uintptr_t, smp_cpus);
+    ctx->tid_to_vcpu_hash = blocktime_init_tid_to_vcpu_hash();
+
     ctx->exit_notifier.notify = migration_exit_cb;
     qemu_add_exit_notifier(&ctx->exit_notifier);
 
@@ -827,18 +871,21 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
     return 0;
 }
 
-static int get_mem_fault_cpu_index(uint32_t pid)
+static int blocktime_get_vcpu(PostcopyBlocktimeContext *ctx, uint32_t tid)
 {
-    CPUState *cpu_iter;
+    int *found;
 
-    CPU_FOREACH(cpu_iter) {
-        if (cpu_iter->thread_id == pid) {
-            trace_get_mem_fault_cpu_index(cpu_iter->cpu_index, pid);
-            return cpu_iter->cpu_index;
-        }
+    found = g_hash_table_lookup(ctx->tid_to_vcpu_hash, GUINT_TO_POINTER(tid));
+    if (!found) {
+        /*
+         * NOTE: this is possible, because QEMU's non-vCPU threads can
+         * also access a missing page.  Or, when KVM async pf is enabled, a
+         * fault can even happen from a kworker..
+         */
+        return -1;
     }
-    trace_get_mem_fault_cpu_index(-1, pid);
-    return -1;
+
+    return *found;
 }
 
 static uint64_t get_current_ns(void)
@@ -865,8 +912,9 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
     if (!dc || ptid == 0) {
         return;
     }
-    cpu = get_mem_fault_cpu_index(ptid);
+    cpu = blocktime_get_vcpu(dc, ptid);
     if (cpu < 0) {
+        dc->non_vcpu_faults++;
         return;
     }
 
diff --git a/migration/trace-events b/migration/trace-events
index 02cdb6e7cc..9c1f3b7044 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -310,8 +310,7 @@ postcopy_preempt_tls_handshake(void) ""
 postcopy_preempt_new_channel(void) ""
 postcopy_preempt_thread_entry(void) ""
 postcopy_preempt_thread_exit(void) ""
-
-get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
+postcopy_blocktime_tid_cpu_map(int cpu, uint32_t tid) "cpu: %d, tid: %u"
 
 # exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 22/26] migration/postcopy: Cleanup the total blocktime accounting
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (20 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 21/26] migration/postcopy: Cache the tid->vcpu mapping for blocktime Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 23/26] migration/postcopy: Optimize blocktime fault tracking with hashtable Fabiano Rosas
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

The variable vcpu_total_blocktime isn't easy to follow.  In reality, it
wants to capture the case where all vCPUs are stopped, and now there will
be some vCPUs starts running.

The name now starts to conflict with vcpu_blocktime_total[], meanwhile it's
actually not necessary to have the variable at all: since nobody is
touching smp_cpus_down except ourselves, we can safely do the calculation
at the end before decrementing smp_cpus_down.

Hopefully this makes the logic easier to read, side benefit is we drop one
temp var.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-12-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/postcopy-ram.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index bf65d6035c..fd6c0bdb1e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -972,7 +972,6 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
     MachineState *ms = MACHINE(qdev_get_machine());
     unsigned int smp_cpus = ms->smp.cpus;
     int i, affected_cpu = 0;
-    bool vcpu_total_blocktime = false;
     uint64_t read_vcpu_time, current;
 
     if (!dc) {
@@ -994,20 +993,19 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
         dc->vcpu_addr[i] = 0;
         vcpu_blocktime = current - read_vcpu_time;
         affected_cpu += 1;
-        /* we need to know is that mark_postcopy_end was due to
-         * faulted page, another possible case it's prefetched
-         * page and in that case we shouldn't be here */
-        if (!vcpu_total_blocktime && dc->smp_cpus_down == smp_cpus) {
-            vcpu_total_blocktime = true;
-        }
         /* continue cycle, due to one page could affect several vCPUs */
         dc->vcpu_blocktime_total[i] += vcpu_blocktime;
     }
 
-    dc->smp_cpus_down -= affected_cpu;
-    if (vcpu_total_blocktime) {
+    /*
+     * If all vCPUs used to be down, and copying this page would free some
+     * vCPUs, then the system-level blocktime ends here.
+     */
+    if (dc->smp_cpus_down == smp_cpus && affected_cpu) {
         dc->total_blocktime += current - dc->last_begin;
     }
+    dc->smp_cpus_down -= affected_cpu;
+
     trace_mark_postcopy_blocktime_end(addr, dc->total_blocktime,
                                       affected_cpu);
 }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 23/26] migration/postcopy: Optimize blocktime fault tracking with hashtable
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (21 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 22/26] migration/postcopy: Cleanup the total blocktime accounting Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 24/26] migration/postcopy: blocktime allows track / report non-vCPU faults Fabiano Rosas
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

From: Peter Xu <peterx@redhat.com>

Currently, the postcopy blocktime feature maintains vCPU fault information
using an array (vcpu_addr[]).  It has two issues.

Issue 1: Performance Concern
============================

The old algorithm was almost OK and fast on inserts, except that the lookup
is slow and won't scale if there are a lot of vCPUs: when a page is copied
during postcopy, mark_postcopy_blocktime_end() will walk the whole array
trying to find which vCPUs are blocked by the address.  So it needs
constant O(N) walk for each page resolution.

Alexey (the author of postcopy blocktime) mentioned the perf issue and how
to optimize it in a piece of comment in the page resolution path.  The
comment was (interestingly..) not complete, but it's relatively clear what
he wanted to say about this perf issue.

Issue 2: Wrong Accounting on re-entrancies
==========================================

People might think that each vCPU should only and always get one fault at a
time, so that when the blocktime layer captured one fault on one vCPU, we
should never see another fault message on this vCPU.

It's almost correct, except in some extreme rare cases.

Case 1: it's possible the fault thread processes the userfaultfd messages
too fast so it can see >1 messages on one vCPU before the previous one was
resolved.

Case 2: it's theoretically also possible one vCPU can get even more than
one message on the same fault address if a fault is retried by the
kernel (e.g., handle_userfault() got interrupted before page resolution).

As this info might be important, instead of using commit message, I put
more details into the code as comment, when introducing an array
maintaining concurrent faults on one vCPU.  Please refer to the comments
for details on both cases, especially case 1 which can be tricky.

Case 1 sounds rare, but it can be easily reproduced locally for me when we
run blocktime together with the migration-test on the vanilla postcopy.

New Design
==========

This patch should do almost what Alexey mentioned, but slightly
differently: instead of having an array to maintain vCPU fault addresses,
for each of the fault message we push a message into a hash, indexed by the
fault address.

With the hash, it can replace the old two structs: both the vcpu_addr[]
array, and also the array to store the start time of the fault.  However
due to above we need one more counter array to account concurrent faults on
the same vCPU - that should even be needed in the old code, it's just that
the old code was buggy and it will blindly overwrite an existing
entry.. now we'll start to really track everything.

The hash structure might be more efficient than tree to maintain such
addr->(cpu, fault_time) information, so that the insert() and lookup()
paths should ideally both be ~O(1).  After all, we do not need to sort.

Here we need to do one remove() though after the lookup().  It could be
slow but only if many vCPUs faulted exactly on the same address (so when
the list of cpu entries is long), which should be unlikely. Even with that,
it's still a worst case O(N) (consider 400 vCPUs faulted on the same
address and how likely is it..) rather than a constant O(N) complexity.

When at it, touch up the tracepoints to make them slightly more useful.
One tracepoint is added when walking all the fault entries.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-13-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/postcopy-ram.c | 265 +++++++++++++++++++++++++++++++--------
 migration/trace-events   |   5 +-
 2 files changed, 219 insertions(+), 51 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index fd6c0bdb1e..91c23b446e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -112,14 +112,69 @@ void postcopy_thread_create(MigrationIncomingState *mis,
 
 /* All the time records are in unit of nanoseconds */
 typedef struct PostcopyBlocktimeContext {
-    /* time when page fault initiated per vCPU */
-    uint64_t *vcpu_blocktime_start;
     /* blocktime per vCPU */
     uint64_t *vcpu_blocktime_total;
     /* count of faults per vCPU */
     uint64_t *vcpu_faults_count;
-    /* page address per vCPU */
-    uintptr_t *vcpu_addr;
+    /*
+     * count of currently blocked faults per vCPU.
+     *
+     * NOTE: Normally there should only be one fault in-progress per vCPU
+     * thread, so logically it _seems_ vcpu_faults_count[] for any vCPU
+     * should be either zero or one.  However, there can be reasons we see
+     * >1 faults on the same vCPU thread.
+     *
+     * CASE (1): since the process to resolve faults (ioctl(UFFDIO_COPY),
+     * for example) is done before taking the mutex that protects the
+     * blocktime context, it can happen that we read more than one faulted
+     * addresses per vCPU.
+     *
+     * One example when we can see >1 faulted addresses for one vCPU:
+     *
+     *  vcpu1 thread       fault thread         resolve thread
+     *  ============       ============         ==============
+     *
+     *  faulted on addr1
+     *                     read uffd msg (addr1)
+     *                     MUTEX_LOCK
+     *                     add entry (cpu1, addr1)
+     *                     MUTEX_UNLOCK
+     *                     request remote fault (addr1)
+     *                                          resolve fault (addr1)
+     *  addr1 resolved, continue..
+     *  faulted on addr2
+     *                     read uffd msg (addr2)
+     *                     MUTEX_LOCK
+     *                     add entry (cpu1, addr2) <--------------- [A]
+     *                     MUTEX_UNLOCK
+     *                                          MUTEX_LOCK
+     *                                          remove entry (cpu1, addr1)
+     *                                          MUTEX_UNLOCK
+     *
+     * In above case, we may see (cpu1, addr1) and (cpu1, addr2) entries to
+     * appear together at [A], when it gets the lock before the resolve
+     * thread.  Use this counter to maintain such case, and only when it
+     * reaches zero we know the vCPU is not blocked anymore.
+     *
+     * CASE (2): theoretically (the author admit to not have verified
+     * this..), one vCPU thread can also generate more than one userfaultfd
+     * message on the same address. It can happen e.g. for whatever reason
+     * the fault got retried before a resolution arrives. In that extremely
+     * rare case, we could also see two (cpu1, addr1) entries.
+     *
+     * In all cases, be prepared with such re-entrancies with this array.
+     *
+     * Using uint8_t should be far enough for now.  For example, when
+     * there're only one resolve thread (postcopy ram listening thread),
+     * the max (concurrent fault entries) should be two.
+     */
+    uint8_t *vcpu_faults_current;
+    /*
+     * The hash that contains addr1->[(cpu1,ts1),(cpu2,ts2) ...] mappings.
+     * Each of the entry is a tuple of (CPU index, fault timestamp) showing
+     * that a fault was requested.
+     */
+    GHashTable *vcpu_addr_hash;
     /* total blocktime when all vCPUs are stopped */
     uint64_t total_blocktime;
     /* point in time when last page fault was initiated */
@@ -145,13 +200,38 @@ typedef struct PostcopyBlocktimeContext {
     Notifier exit_notifier;
 } PostcopyBlocktimeContext;
 
+typedef struct {
+    /* The time the fault was triggered */
+    uint64_t fault_time;
+    /* The vCPU index that was blocked */
+    int cpu;
+} BlocktimeVCPUEntry;
+
+/* Alloc an entry to record a vCPU fault */
+static BlocktimeVCPUEntry *
+blocktime_vcpu_entry_alloc(int cpu, uint64_t fault_time)
+{
+    BlocktimeVCPUEntry *entry = g_new(BlocktimeVCPUEntry, 1);
+
+    entry->fault_time = fault_time;
+    entry->cpu = cpu;
+
+    return entry;
+}
+
+/* Free a @GList of @BlocktimeVCPUEntry */
+static void blocktime_vcpu_list_free(gpointer data)
+{
+    g_list_free_full(data, g_free);
+}
+
 static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
 {
     g_hash_table_destroy(ctx->tid_to_vcpu_hash);
-    g_free(ctx->vcpu_blocktime_start);
+    g_hash_table_destroy(ctx->vcpu_addr_hash);
     g_free(ctx->vcpu_blocktime_total);
     g_free(ctx->vcpu_faults_count);
-    g_free(ctx->vcpu_addr);
+    g_free(ctx->vcpu_faults_current);
     g_free(ctx);
 }
 
@@ -198,12 +278,22 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     unsigned int smp_cpus = ms->smp.cpus;
     PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
 
-    ctx->vcpu_blocktime_start = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_blocktime_total = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_faults_count = g_new0(uint64_t, smp_cpus);
-    ctx->vcpu_addr = g_new0(uintptr_t, smp_cpus);
+    ctx->vcpu_faults_current = g_new0(uint8_t, smp_cpus);
     ctx->tid_to_vcpu_hash = blocktime_init_tid_to_vcpu_hash();
 
+    /*
+     * The key (host virtual addresses) will always be gpointer-sized on
+     * either 32bits or 64bits systems, so it'll fit as a direct key.
+     *
+     * The value will be a list of BlocktimeVCPUEntry entries.
+     */
+    ctx->vcpu_addr_hash = g_hash_table_new_full(g_direct_hash,
+                                                g_direct_equal,
+                                                NULL,
+                                                blocktime_vcpu_list_free);
+
     ctx->exit_notifier.notify = migration_exit_cb;
     qemu_add_exit_notifier(&ctx->exit_notifier);
 
@@ -893,6 +983,39 @@ static uint64_t get_current_ns(void)
     return (uint64_t)qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
 
+/* Inject an (cpu, fault_time) entry into the database, using addr as key */
+static void blocktime_fault_inject(PostcopyBlocktimeContext *ctx,
+                                   uintptr_t addr, int cpu, uint64_t time)
+{
+    BlocktimeVCPUEntry *entry = blocktime_vcpu_entry_alloc(cpu, time);
+    GHashTable *table = ctx->vcpu_addr_hash;
+    gpointer key = (gpointer)addr;
+    GList *head, *list;
+    gboolean result;
+
+    head = g_hash_table_lookup(table, key);
+    if (head) {
+        /*
+         * If existed, steal the @head for list operation rather than
+         * freeing it, making sure steal succeeded.
+         */
+        result = g_hash_table_steal(table, key);
+        assert(result == TRUE);
+    }
+
+    /*
+     * Now the key is guaranteed to be absent.  Two cases:
+     *
+     * (1) There's no existing entry, list contains the only one. Insert.
+     * (2) There're existing entries, after stealing we own it, prepend the
+     *     result and re-insert.
+     */
+    list = g_list_prepend(head, entry);
+    g_hash_table_insert(table, key, list);
+
+    trace_postcopy_blocktime_begin(addr, time, cpu, !!head);
+}
+
 /*
  * This function is being called when pagefault occurs. It tracks down vCPU
  * blocking time.  It's protected by @page_request_mutex.
@@ -912,21 +1035,6 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
     if (!dc || ptid == 0) {
         return;
     }
-    cpu = blocktime_get_vcpu(dc, ptid);
-    if (cpu < 0) {
-        dc->non_vcpu_faults++;
-        return;
-    }
-
-    current = get_current_ns();
-    if (dc->vcpu_addr[cpu] == 0) {
-        dc->smp_cpus_down++;
-    }
-
-    dc->last_begin = current;
-    dc->vcpu_blocktime_start[cpu] = current;
-    dc->vcpu_addr[cpu] = addr;
-    dc->vcpu_faults_count[cpu]++;
 
     /*
      * The caller should only inject a blocktime entry when the page is
@@ -934,8 +1042,67 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
      */
     assert(!ramblock_recv_bitmap_test(rb, (void *)addr));
 
-    trace_mark_postcopy_blocktime_begin(addr, dc->vcpu_blocktime_start[cpu],
-                                        cpu);
+    current = get_current_ns();
+    cpu = blocktime_get_vcpu(dc, ptid);
+
+    if (cpu >= 0) {
+        /* How many faults on this vCPU in total? */
+        dc->vcpu_faults_count[cpu]++;
+
+        /*
+         * Account how many concurrent faults on this vCPU we trapped.  See
+         * comments above vcpu_faults_current[] on why it can be more than one.
+         */
+        if (dc->vcpu_faults_current[cpu]++ == 0) {
+            dc->smp_cpus_down++;
+            /*
+             * We use last_begin to cover (1) the 1st fault on this specific
+             * vCPU, but meanwhile (2) the last vCPU that got blocked.  It's
+             * only used to calculate system-wide blocktime.
+             */
+            dc->last_begin = current;
+        }
+
+        /* Making sure it won't overflow - it really should never! */
+        assert(dc->vcpu_faults_current[cpu] <= 255);
+    } else {
+        /* We do not support non-vCPU thread tracking yet */
+        dc->non_vcpu_faults++;
+        return;
+    }
+
+    blocktime_fault_inject(dc, addr, cpu, current);
+}
+
+typedef struct {
+    PostcopyBlocktimeContext *ctx;
+    uint64_t current;
+    int affected_cpus;
+} BlockTimeVCPUIter;
+
+static void blocktime_cpu_list_iter_fn(gpointer data, gpointer user_data)
+{
+    BlockTimeVCPUIter *iter = user_data;
+    PostcopyBlocktimeContext *ctx = iter->ctx;
+    BlocktimeVCPUEntry *entry = data;
+    int cpu = entry->cpu;
+
+    /*
+     * Time should never go back.. so when the fault is resolved it must be
+     * later than when it was faulted.
+     */
+    assert(iter->current >= entry->fault_time);
+
+    /*
+     * If we resolved all pending faults on one vCPU due to this page
+     * resolution, take a note.
+     */
+    if (--ctx->vcpu_faults_current[cpu] == 0) {
+        ctx->vcpu_blocktime_total[cpu] += iter->current - entry->fault_time;
+        iter->affected_cpus += 1;
+    }
+
+    trace_postcopy_blocktime_end_one(cpu, ctx->vcpu_faults_current[cpu]);
 }
 
 /*
@@ -971,43 +1138,43 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
     PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
     MachineState *ms = MACHINE(qdev_get_machine());
     unsigned int smp_cpus = ms->smp.cpus;
-    int i, affected_cpu = 0;
-    uint64_t read_vcpu_time, current;
+    BlockTimeVCPUIter iter = {
+        .current = get_current_ns(),
+        .affected_cpus = 0,
+        .ctx = dc,
+    };
+    gpointer key = (gpointer)addr;
+    GHashTable *table;
+    GList *list;
 
     if (!dc) {
         return;
     }
 
-    current = get_current_ns();
-    /* lookup cpu, to clear it,
-     * that algorithm looks straightforward, but it's not
-     * optimal, more optimal algorithm is keeping tree or hash
-     * where key is address value is a list of  */
-    for (i = 0; i < smp_cpus; i++) {
-        uint64_t vcpu_blocktime = 0;
-
-        read_vcpu_time = dc->vcpu_blocktime_start[i];
-        if (dc->vcpu_addr[i] != addr || read_vcpu_time == 0) {
-            continue;
-        }
-        dc->vcpu_addr[i] = 0;
-        vcpu_blocktime = current - read_vcpu_time;
-        affected_cpu += 1;
-        /* continue cycle, due to one page could affect several vCPUs */
-        dc->vcpu_blocktime_total[i] += vcpu_blocktime;
+    table = dc->vcpu_addr_hash;
+    /* the address wasn't tracked at all? */
+    list = g_hash_table_lookup(table, key);
+    if (!list) {
+        return;
     }
 
+    /*
+     * Loop over the set of vCPUs that got blocked on this addr, do the
+     * blocktime accounting.  After that, remove the whole list.
+     */
+    g_list_foreach(list, blocktime_cpu_list_iter_fn, &iter);
+    g_hash_table_remove(table, key);
+
     /*
      * If all vCPUs used to be down, and copying this page would free some
      * vCPUs, then the system-level blocktime ends here.
      */
-    if (dc->smp_cpus_down == smp_cpus && affected_cpu) {
-        dc->total_blocktime += current - dc->last_begin;
+    if (dc->smp_cpus_down == smp_cpus && iter.affected_cpus) {
+        dc->total_blocktime += iter.current - dc->last_begin;
     }
-    dc->smp_cpus_down -= affected_cpu;
+    dc->smp_cpus_down -= iter.affected_cpus;
 
-    trace_mark_postcopy_blocktime_end(addr, dc->total_blocktime,
-                                      affected_cpu);
+    trace_postcopy_blocktime_end(addr, iter.current, iter.affected_cpus);
 }
 
 static void postcopy_pause_fault_thread(MigrationIncomingState *mis)
diff --git a/migration/trace-events b/migration/trace-events
index 9c1f3b7044..a36a78f01a 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -285,8 +285,6 @@ postcopy_nhp_range(const char *ramblock, void *host_addr, size_t offset, size_t
 postcopy_place_page(void *host_addr) "host=%p"
 postcopy_place_page_zero(void *host_addr) "host=%p"
 postcopy_ram_enable_notify(void) ""
-mark_postcopy_blocktime_begin(uint64_t addr, uint64_t time, int cpu) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", cpu: %d"
-mark_postcopy_blocktime_end(uint64_t addr, uint64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", affected_cpus: %d"
 postcopy_pause_fault_thread(void) ""
 postcopy_pause_fault_thread_continued(void) ""
 postcopy_pause_fast_load(void) ""
@@ -311,6 +309,9 @@ postcopy_preempt_new_channel(void) ""
 postcopy_preempt_thread_entry(void) ""
 postcopy_preempt_thread_exit(void) ""
 postcopy_blocktime_tid_cpu_map(int cpu, uint32_t tid) "cpu: %d, tid: %u"
+postcopy_blocktime_begin(uint64_t addr, uint64_t time, int cpu, bool exists) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", cpu: %d, exist: %d"
+postcopy_blocktime_end(uint64_t addr, uint64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", affected_cpus: %d"
+postcopy_blocktime_end_one(int cpu, uint8_t left_faults) "cpu: %d, left_faults: %" PRIu8
 
 # exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 24/26] migration/postcopy: blocktime allows track / report non-vCPU faults
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (22 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 23/26] migration/postcopy: Optimize blocktime fault tracking with hashtable Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-11 14:10 ` [PULL 25/26] migration/postcopy: Add latency distribution report for blocktime Fabiano Rosas
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Markus Armbruster, Dr . David Alan Gilbert,
	Mario Casquero

From: Peter Xu <peterx@redhat.com>

When used to report page fault latencies, the blocktime feature can be
almost useless when KVM async page fault is enabled, because in most cases
such remote fault will kickoff async page faults, then it's not trackable
from blocktime layer.

After all these recent rewrites to blocktime layer, it's finally so easy to
also support tracking non-vCPU faults.  It'll be even faster if we could
always index fault records with TIDs, unfortunately we need to maintain the
blocktime API which report things in vCPU indexes.

Of course this can work not only for kworkers, but also any guest accesses
that may reach a missing page, for example, very likely when in the QEMU
main thread too (and all other threads whenever applicable).

In this case, we don't care about "how long the threads are blocked", but
we only care about "how long the fault will be resolved".

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Link: https://lore.kernel.org/r/20250613141217.474825-14-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration-hmp-cmds.c        |  5 +++
 migration/postcopy-ram.c              | 64 +++++++++++++++++++++------
 migration/trace-events                |  2 +-
 qapi/migration.json                   | 12 ++++-
 tests/qtest/migration/migration-qmp.c |  1 +
 5 files changed, 67 insertions(+), 17 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 8b3846dab5..e1f9530520 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -80,6 +80,11 @@ static void migration_dump_blocktime(Monitor *mon, MigrationInfo *info)
                        info->postcopy_latency);
     }
 
+    if (info->has_postcopy_non_vcpu_latency) {
+        monitor_printf(mon, "Postcopy non-vCPU Latencies (ns): %" PRIu64 "\n",
+                       info->postcopy_non_vcpu_latency);
+    }
+
     if (info->has_postcopy_vcpu_latency) {
         uint64List *item = info->postcopy_vcpu_latency;
         const char *sep = "";
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 91c23b446e..f4cb23b3e0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -192,6 +192,8 @@ typedef struct PostcopyBlocktimeContext {
     GHashTable *tid_to_vcpu_hash;
     /* Count of non-vCPU faults.  This is only for debugging purpose. */
     uint64_t non_vcpu_faults;
+    /* total blocktime when a non-vCPU thread is stopped */
+    uint64_t non_vcpu_blocktime_total;
 
     /*
      * Handler for exit event, necessary for
@@ -203,7 +205,10 @@ typedef struct PostcopyBlocktimeContext {
 typedef struct {
     /* The time the fault was triggered */
     uint64_t fault_time;
-    /* The vCPU index that was blocked */
+    /*
+     * The vCPU index that was blocked, when cpu==-1, it means it's a
+     * fault from non-vCPU threads.
+     */
     int cpu;
 } BlocktimeVCPUEntry;
 
@@ -344,6 +349,12 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
         QAPI_LIST_PREPEND(list_latency, latency);
     }
 
+    latency_total += bc->non_vcpu_blocktime_total;
+    faults += bc->non_vcpu_faults;
+
+    info->has_postcopy_non_vcpu_latency = true;
+    info->postcopy_non_vcpu_latency = bc->non_vcpu_faults ?
+        (bc->non_vcpu_blocktime_total / bc->non_vcpu_faults) : 0;
     info->has_postcopy_blocktime = true;
     /* Convert ns -> ms */
     info->postcopy_blocktime = (uint32_t)(bc->total_blocktime / SCALE_MS);
@@ -983,7 +994,10 @@ static uint64_t get_current_ns(void)
     return (uint64_t)qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
 
-/* Inject an (cpu, fault_time) entry into the database, using addr as key */
+/*
+ * Inject an (cpu, fault_time) entry into the database, using addr as key.
+ * When cpu==-1, it means it's a non-vCPU fault.
+ */
 static void blocktime_fault_inject(PostcopyBlocktimeContext *ctx,
                                    uintptr_t addr, int cpu, uint64_t time)
 {
@@ -1066,9 +1080,17 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
         /* Making sure it won't overflow - it really should never! */
         assert(dc->vcpu_faults_current[cpu] <= 255);
     } else {
-        /* We do not support non-vCPU thread tracking yet */
+        /*
+         * For non-vCPU thread faults, we don't care about tid or cpu index
+         * or time the thread is blocked (e.g., a kworker trying to help
+         * KVM when async_pf=on is OK to be blocked and not affect guest
+         * responsiveness), but we care about latency.  Track it with
+         * cpu=-1.
+         *
+         * Note that this will NOT affect blocktime reports on vCPU being
+         * blocked, but only about system-wide latency reports.
+         */
         dc->non_vcpu_faults++;
-        return;
     }
 
     blocktime_fault_inject(dc, addr, cpu, current);
@@ -1078,6 +1100,7 @@ typedef struct {
     PostcopyBlocktimeContext *ctx;
     uint64_t current;
     int affected_cpus;
+    int affected_non_cpus;
 } BlockTimeVCPUIter;
 
 static void blocktime_cpu_list_iter_fn(gpointer data, gpointer user_data)
@@ -1085,6 +1108,7 @@ static void blocktime_cpu_list_iter_fn(gpointer data, gpointer user_data)
     BlockTimeVCPUIter *iter = user_data;
     PostcopyBlocktimeContext *ctx = iter->ctx;
     BlocktimeVCPUEntry *entry = data;
+    uint64_t time_passed;
     int cpu = entry->cpu;
 
     /*
@@ -1092,17 +1116,27 @@ static void blocktime_cpu_list_iter_fn(gpointer data, gpointer user_data)
      * later than when it was faulted.
      */
     assert(iter->current >= entry->fault_time);
+    time_passed = iter->current - entry->fault_time;
 
-    /*
-     * If we resolved all pending faults on one vCPU due to this page
-     * resolution, take a note.
-     */
-    if (--ctx->vcpu_faults_current[cpu] == 0) {
-        ctx->vcpu_blocktime_total[cpu] += iter->current - entry->fault_time;
-        iter->affected_cpus += 1;
+    if (cpu >= 0) {
+        /*
+         * If we resolved all pending faults on one vCPU due to this page
+         * resolution, take a note.
+         */
+        if (--ctx->vcpu_faults_current[cpu] == 0) {
+            ctx->vcpu_blocktime_total[cpu] += time_passed;
+            iter->affected_cpus += 1;
+        }
+        trace_postcopy_blocktime_end_one(cpu, ctx->vcpu_faults_current[cpu]);
+    } else {
+        iter->affected_non_cpus++;
+        ctx->non_vcpu_blocktime_total += time_passed;
+        /*
+         * We do not maintain how many pending non-vCPU faults because we
+         * do not care about blocktime, only latency.
+         */
+        trace_postcopy_blocktime_end_one(-1, 0);
     }
-
-    trace_postcopy_blocktime_end_one(cpu, ctx->vcpu_faults_current[cpu]);
 }
 
 /*
@@ -1141,6 +1175,7 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
     BlockTimeVCPUIter iter = {
         .current = get_current_ns(),
         .affected_cpus = 0,
+        .affected_non_cpus = 0,
         .ctx = dc,
     };
     gpointer key = (gpointer)addr;
@@ -1174,7 +1209,8 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
     }
     dc->smp_cpus_down -= iter.affected_cpus;
 
-    trace_postcopy_blocktime_end(addr, iter.current, iter.affected_cpus);
+    trace_postcopy_blocktime_end(addr, iter.current, iter.affected_cpus,
+                                 iter.affected_non_cpus);
 }
 
 static void postcopy_pause_fault_thread(MigrationIncomingState *mis)
diff --git a/migration/trace-events b/migration/trace-events
index a36a78f01a..706db97def 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -310,7 +310,7 @@ postcopy_preempt_thread_entry(void) ""
 postcopy_preempt_thread_exit(void) ""
 postcopy_blocktime_tid_cpu_map(int cpu, uint32_t tid) "cpu: %d, tid: %u"
 postcopy_blocktime_begin(uint64_t addr, uint64_t time, int cpu, bool exists) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", cpu: %d, exist: %d"
-postcopy_blocktime_end(uint64_t addr, uint64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", affected_cpus: %d"
+postcopy_blocktime_end(uint64_t addr, uint64_t time, int affected_cpu, int affected_non_cpus) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", affected_cpus: %d, affected_non_cpus: %d"
 postcopy_blocktime_end_one(int cpu, uint8_t left_faults) "cpu: %d, left_faults: %" PRIu8
 
 # exec.c
diff --git a/qapi/migration.json b/qapi/migration.json
index bb41dc0795..66fb8ac74d 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -247,6 +247,12 @@
 #     this is the per-vCPU statistics.  This is only present when the
 #     postcopy-blocktime migration capability is enabled.  (Since 10.1)
 #
+# @postcopy-non-vcpu-latency: average remote page fault latency for all
+#     faults happend in non-vCPU threads (in ns).  It has the same
+#     definition of @postcopy-latency but this only provides statistics to
+#     non-vCPU faults.  This is only present when the postcopy-blocktime
+#     migration capability is enabled.  (Since 10.1)
+#
 # @socket-address: Only used for tcp, to know what the real port is
 #     (Since 4.0)
 #
@@ -273,8 +279,8 @@
 #
 # Features:
 #
-# @unstable: Members @postcopy-latency, @postcopy-vcpu-latency are
-#     experimental.
+# @unstable: Members @postcopy-latency, @postcopy-vcpu-latency,
+#     @postcopy-non-vcpu-latency are experimental.
 #
 # Since: 0.14
 ##
@@ -295,6 +301,8 @@
                'type': 'uint64', 'features': [ 'unstable' ] },
            '*postcopy-vcpu-latency': {
                'type': ['uint64'], 'features': [ 'unstable' ] },
+           '*postcopy-non-vcpu-latency': {
+               'type': 'uint64', 'features': [ 'unstable' ] },
            '*socket-address': ['SocketAddress'],
            '*dirty-limit-throttle-time-per-round': 'uint64',
            '*dirty-limit-ring-full-time': 'uint64'} }
diff --git a/tests/qtest/migration/migration-qmp.c b/tests/qtest/migration/migration-qmp.c
index 1a5ab2d229..67a67d4bd6 100644
--- a/tests/qtest/migration/migration-qmp.c
+++ b/tests/qtest/migration/migration-qmp.c
@@ -361,6 +361,7 @@ void read_blocktime(QTestState *who)
     g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-blocktime"));
     g_assert(qdict_haskey(rsp_return, "postcopy-latency"));
     g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-latency"));
+    g_assert(qdict_haskey(rsp_return, "postcopy-non-vcpu-latency"));
     qobject_unref(rsp_return);
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 25/26] migration/postcopy: Add latency distribution report for blocktime
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (23 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 24/26] migration/postcopy: blocktime allows track / report non-vCPU faults Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-14 10:47   ` Peter Maydell
  2025-07-11 14:10 ` [PULL 26/26] migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread Fabiano Rosas
  2025-07-13  7:06 ` [PULL 00/26] Migration patches for 2025-07-11 Stefan Hajnoczi
  26 siblings, 1 reply; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Markus Armbruster, Dr . David Alan Gilbert

From: Peter Xu <peterx@redhat.com>

Add the latency distribution too for blocktime, using order-of-two buckets.
It accounts for all the faults, from either vCPU or non-vCPU threads.  With
prior rework, it's very easy to achieve by adding an array to account for
faults in each buckets.

Sample output for HMP (while for QMP it's simply an array):

Postcopy Latency Distribution:
  [     1 us -     2 us ]:          0
  [     2 us -     4 us ]:          0
  [     4 us -     8 us ]:          1
  [     8 us -    16 us ]:          2
  [    16 us -    32 us ]:          2
  [    32 us -    64 us ]:          3
  [    64 us -   128 us ]:      10169
  [   128 us -   256 us ]:      50151
  [   256 us -   512 us ]:      12876
  [   512 us -     1 ms ]:         97
  [     1 ms -     2 ms ]:         42
  [     2 ms -     4 ms ]:         44
  [     4 ms -     8 ms ]:         93
  [     8 ms -    16 ms ]:        138
  [    16 ms -    32 ms ]:          0
  [    32 ms -    65 ms ]:          0
  [    65 ms -   131 ms ]:          0
  [   131 ms -   262 ms ]:          0
  [   262 ms -   524 ms ]:          0
  [   524 ms -    1 sec ]:          0
  [    1 sec -    2 sec ]:          0
  [    2 sec -    4 sec ]:          0
  [    4 sec -    8 sec ]:          0
  [    8 sec -   16 sec ]:          0

Cc: Markus Armbruster <armbru@redhat.com>
Acked-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20250613141217.474825-15-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration-hmp-cmds.c        | 32 +++++++++++++++++++
 migration/postcopy-ram.c              | 46 +++++++++++++++++++++++++++
 qapi/migration.json                   | 12 ++++++-
 tests/qtest/migration/migration-qmp.c |  1 +
 4 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index e1f9530520..cef5608210 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -52,6 +52,21 @@ static void migration_global_dump(Monitor *mon)
                    ms->clear_bitmap_shift);
 }
 
+static const gchar *format_time_str(uint64_t us)
+{
+    const char *units[] = {"us", "ms", "sec"};
+    int index = 0;
+
+    while (us > 1000) {
+        us /= 1000;
+        if (++index >= (sizeof(units) - 1)) {
+            break;
+        }
+    }
+
+    return g_strdup_printf("%"PRIu64" %s", us, units[index]);
+}
+
 static void migration_dump_blocktime(Monitor *mon, MigrationInfo *info)
 {
     if (info->has_postcopy_blocktime) {
@@ -100,6 +115,23 @@ static void migration_dump_blocktime(Monitor *mon, MigrationInfo *info)
         }
         monitor_printf(mon, "]\n");
     }
+
+    if (info->has_postcopy_latency_dist) {
+        uint64List *item = info->postcopy_latency_dist;
+        int count = 0;
+
+        monitor_printf(mon, "Postcopy Latency Distribution:\n");
+
+        while (item) {
+            g_autofree const gchar *from = format_time_str(1UL << count);
+            g_autofree const gchar *to = format_time_str(1UL << (count + 1));
+
+            monitor_printf(mon, "  [ %8s - %8s ]: %10"PRIu64"\n",
+                           from, to, item->value);
+            item = item->next;
+            count++;
+        }
+    }
 }
 
 void hmp_info_migrate(Monitor *mon, const QDict *qdict)
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index f4cb23b3e0..45af9a361e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -110,6 +110,15 @@ void postcopy_thread_create(MigrationIncomingState *mis,
 #include <sys/eventfd.h>
 #include <linux/userfaultfd.h>
 
+/*
+ * Here we use 24 buckets, which means the last bucket will cover [2^24 us,
+ * 2^25 us) ~= [16, 32) seconds.  It should be far enough to record even
+ * extreme (perf-wise broken) 1G pages moving over, which can sometimes
+ * take a few seconds due to various reasons.  Anything more than that
+ * might be unsensible to account anymore.
+ */
+#define  BLOCKTIME_LATENCY_BUCKET_N  (24)
+
 /* All the time records are in unit of nanoseconds */
 typedef struct PostcopyBlocktimeContext {
     /* blocktime per vCPU */
@@ -175,6 +184,11 @@ typedef struct PostcopyBlocktimeContext {
      * that a fault was requested.
      */
     GHashTable *vcpu_addr_hash;
+    /*
+     * Each bucket stores the count of faults that were resolved within the
+     * bucket window [2^N us, 2^(N+1) us).
+     */
+    uint64_t latency_buckets[BLOCKTIME_LATENCY_BUCKET_N];
     /* total blocktime when all vCPUs are stopped */
     uint64_t total_blocktime;
     /* point in time when last page fault was initiated */
@@ -283,6 +297,9 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     unsigned int smp_cpus = ms->smp.cpus;
     PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
 
+    /* Initialize all counters to be zeros */
+    memset(ctx->latency_buckets, 0, sizeof(ctx->latency_buckets));
+
     ctx->vcpu_blocktime_total = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_faults_count = g_new0(uint64_t, smp_cpus);
     ctx->vcpu_faults_current = g_new0(uint8_t, smp_cpus);
@@ -320,6 +337,7 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
     uint64_t latency_total = 0, faults = 0;
     uint32List *list_blocktime = NULL;
     uint64List *list_latency = NULL;
+    uint64List *latency_buckets = NULL;
     int i;
 
     if (!bc) {
@@ -349,6 +367,10 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
         QAPI_LIST_PREPEND(list_latency, latency);
     }
 
+    for (i = BLOCKTIME_LATENCY_BUCKET_N - 1; i >= 0; i--) {
+        QAPI_LIST_PREPEND(latency_buckets, bc->latency_buckets[i]);
+    }
+
     latency_total += bc->non_vcpu_blocktime_total;
     faults += bc->non_vcpu_faults;
 
@@ -364,6 +386,8 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
     info->postcopy_latency = faults ? (latency_total / faults) : 0;
     info->has_postcopy_vcpu_latency = true;
     info->postcopy_vcpu_latency = list_latency;
+    info->has_postcopy_latency_dist = true;
+    info->postcopy_latency_dist = latency_buckets;
 }
 
 static uint64_t get_postcopy_total_blocktime(void)
@@ -1096,6 +1120,25 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
     blocktime_fault_inject(dc, addr, cpu, current);
 }
 
+static void blocktime_latency_account(PostcopyBlocktimeContext *ctx,
+                                      uint64_t time_us)
+{
+    /*
+     * Convert time (in us) to bucket index it belongs.  Take extra caution
+     * of time_us==0 even if normally rare - when happens put into bucket 0.
+     */
+    int index = time_us ? (63 - clz64(time_us)) : 0;
+
+    assert(index >= 0);
+
+    /* If it's too large, put into top bucket */
+    if (index >= BLOCKTIME_LATENCY_BUCKET_N) {
+        index = BLOCKTIME_LATENCY_BUCKET_N - 1;
+    }
+
+    ctx->latency_buckets[index]++;
+}
+
 typedef struct {
     PostcopyBlocktimeContext *ctx;
     uint64_t current;
@@ -1118,6 +1161,9 @@ static void blocktime_cpu_list_iter_fn(gpointer data, gpointer user_data)
     assert(iter->current >= entry->fault_time);
     time_passed = iter->current - entry->fault_time;
 
+    /* Latency buckets are in microseconds */
+    blocktime_latency_account(ctx, time_passed / SCALE_US);
+
     if (cpu >= 0) {
         /*
          * If we resolved all pending faults on one vCPU due to this page
diff --git a/qapi/migration.json b/qapi/migration.json
index 66fb8ac74d..2d39a8f748 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -242,6 +242,14 @@
 #     average page fault latency. This is only present when the
 #     postcopy-blocktime migration capability is enabled.  (Since 10.1)
 #
+# @postcopy-latency-dist: remote page fault latency distributions.  Each
+#     element of the array is the number of faults that fall into the
+#     bucket period.  For the N-th bucket (N>=0), the latency window is
+#     [2^Nus, 2^(N+1)us).  For example, the 8th element stores how many
+#     remote faults got resolved within [256us, 512us) window. This is only
+#     present when the postcopy-blocktime migration capability is enabled.
+#     (Since 10.1)
+#
 # @postcopy-vcpu-latency: average remote page fault latency per vCPU (in
 #     ns).  It has the same definition of @postcopy-latency, but instead
 #     this is the per-vCPU statistics.  This is only present when the
@@ -280,7 +288,7 @@
 # Features:
 #
 # @unstable: Members @postcopy-latency, @postcopy-vcpu-latency,
-#     @postcopy-non-vcpu-latency are experimental.
+#     @postcopy-latency-dist, @postcopy-non-vcpu-latency are experimental.
 #
 # Since: 0.14
 ##
@@ -299,6 +307,8 @@
            '*postcopy-vcpu-blocktime': ['uint32'],
            '*postcopy-latency': {
                'type': 'uint64', 'features': [ 'unstable' ] },
+           '*postcopy-latency-dist': {
+               'type': ['uint64'], 'features': [ 'unstable' ] },
            '*postcopy-vcpu-latency': {
                'type': ['uint64'], 'features': [ 'unstable' ] },
            '*postcopy-non-vcpu-latency': {
diff --git a/tests/qtest/migration/migration-qmp.c b/tests/qtest/migration/migration-qmp.c
index 67a67d4bd6..66dd369ba7 100644
--- a/tests/qtest/migration/migration-qmp.c
+++ b/tests/qtest/migration/migration-qmp.c
@@ -360,6 +360,7 @@ void read_blocktime(QTestState *who)
     g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
     g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-blocktime"));
     g_assert(qdict_haskey(rsp_return, "postcopy-latency"));
+    g_assert(qdict_haskey(rsp_return, "postcopy-latency-dist"));
     g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-latency"));
     g_assert(qdict_haskey(rsp_return, "postcopy-non-vcpu-latency"));
     qobject_unref(rsp_return);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 26/26] migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (24 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 25/26] migration/postcopy: Add latency distribution report for blocktime Fabiano Rosas
@ 2025-07-11 14:10 ` Fabiano Rosas
  2025-07-13  7:06 ` [PULL 00/26] Migration patches for 2025-07-11 Stefan Hajnoczi
  26 siblings, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Juraj Marcin, Alex Williamson, Cédric Le Goater

From: Juraj Marcin <jmarcin@redhat.com>

Recent patch [1] renames the save_live_complete_precopy handler to
save_complete, as the machine is not live in most cases when this
handler is executed. The same is true also for
save_live_complete_precopy_thread, therefore this patch removes the
"live" keyword from the handler itself and related types to keep the
naming unified.

In contrast to save_complete, this handler is only executed at the end
of precopy, therefore the "precopy" keyword is retained.

[1]: https://lore.kernel.org/all/20250613140801.474264-7-peterx@redhat.com/

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250626085235.294690-1-jmarcin@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 docs/devel/migration/vfio.rst    |  4 ++--
 hw/vfio/migration-multifd.c      |  4 ++--
 hw/vfio/migration-multifd.h      |  2 +-
 hw/vfio/migration.c              |  2 +-
 include/migration/misc.h         |  8 ++++----
 include/migration/register.h     |  6 +++---
 include/qemu/typedefs.h          |  6 +++---
 migration/multifd-device-state.c | 10 +++++-----
 migration/savevm.c               |  6 +++---
 9 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst
index 8ff5ab0c74..2d8e5ca9dd 100644
--- a/docs/devel/migration/vfio.rst
+++ b/docs/devel/migration/vfio.rst
@@ -80,7 +80,7 @@ VFIO implements the device hooks for the iterative approach as follows:
   vendor driver indicates that no data remains.  In the multifd mode it
   just emits a dummy EOS marker.
 
-* A ``save_live_complete_precopy_thread`` function that in the multifd mode
+* A ``save_complete_precopy_thread`` function that in the multifd mode
   provides thread handler performing multifd device state transfer.
   It sets the VFIO device to _STOP_COPY state, iteratively reads the data
   from the VFIO device and queues it for multifd transmission until the vendor
@@ -200,7 +200,7 @@ Live migration save path
                              .save_complete() until
                                pending data is 0
 	          In the multifd mode this iteration is done in
-	          .save_live_complete_precopy_thread() instead.
+	          .save_complete_precopy_thread() instead.
                                       |
                      (POSTMIGRATE, _COMPLETED, _STOP_COPY)
             Migraton thread schedules cleanup bottom half and exits
diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 850a319488..55635486c8 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -583,7 +583,7 @@ vfio_save_complete_precopy_thread_config_state(VFIODevice *vbasedev,
 
 /*
  * This thread is spawned by the migration core directly via
- * .save_live_complete_precopy_thread SaveVMHandler.
+ * .save_complete_precopy_thread SaveVMHandler.
  *
  * It exits after either:
  * * completing saving the remaining device state and device config, OR:
@@ -592,7 +592,7 @@ vfio_save_complete_precopy_thread_config_state(VFIODevice *vbasedev,
  *   multifd_device_state_save_thread_should_exit() returning true.
  */
 bool
-vfio_multifd_save_complete_precopy_thread(SaveLiveCompletePrecopyThreadData *d,
+vfio_multifd_save_complete_precopy_thread(SaveCompletePrecopyThreadData *d,
                                           Error **errp)
 {
     VFIODevice *vbasedev = d->handler_opaque;
diff --git a/hw/vfio/migration-multifd.h b/hw/vfio/migration-multifd.h
index 0bab63211d..ebf22a7997 100644
--- a/hw/vfio/migration-multifd.h
+++ b/hw/vfio/migration-multifd.h
@@ -26,7 +26,7 @@ bool vfio_multifd_load_state_buffer(void *opaque, char *data, size_t data_size,
 void vfio_multifd_emit_dummy_eos(VFIODevice *vbasedev, QEMUFile *f);
 
 bool
-vfio_multifd_save_complete_precopy_thread(SaveLiveCompletePrecopyThreadData *d,
+vfio_multifd_save_complete_precopy_thread(SaveCompletePrecopyThreadData *d,
                                           Error **errp);
 
 int vfio_multifd_switchover_start(VFIODevice *vbasedev);
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 33a71f8999..c329578eec 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -835,7 +835,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
      */
     .load_state_buffer = vfio_multifd_load_state_buffer,
     .switchover_start = vfio_switchover_start,
-    .save_live_complete_precopy_thread = vfio_multifd_save_complete_precopy_thread,
+    .save_complete_precopy_thread = vfio_multifd_save_complete_precopy_thread,
 };
 
 /* ---------------------------------------------------------------------- */
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 8fd36eba1d..a261f99d89 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -119,19 +119,19 @@ bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
                        Error **errp);
 
 /* migration/multifd-device-state.c */
-typedef struct SaveLiveCompletePrecopyThreadData {
-    SaveLiveCompletePrecopyThreadHandler hdlr;
+typedef struct SaveCompletePrecopyThreadData {
+    SaveCompletePrecopyThreadHandler hdlr;
     char *idstr;
     uint32_t instance_id;
     void *handler_opaque;
-} SaveLiveCompletePrecopyThreadData;
+} SaveCompletePrecopyThreadData;
 
 bool multifd_queue_device_state(char *idstr, uint32_t instance_id,
                                 char *data, size_t len);
 bool multifd_device_state_supported(void);
 
 void
-multifd_spawn_device_state_save_thread(SaveLiveCompletePrecopyThreadHandler hdlr,
+multifd_spawn_device_state_save_thread(SaveCompletePrecopyThreadHandler hdlr,
                                        char *idstr, uint32_t instance_id,
                                        void *opaque);
 
diff --git a/include/migration/register.h b/include/migration/register.h
index 2a26e76a68..ae79794cdd 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -98,7 +98,7 @@ typedef struct SaveVMHandlers {
     int (*save_complete)(QEMUFile *f, void *opaque);
 
     /**
-     * @save_live_complete_precopy_thread (invoked in a separate thread)
+     * @save_complete_precopy_thread (invoked in a separate thread)
      *
      * Called at the end of a precopy phase from a separate worker thread
      * in configurations where multifd device state transfer is supported
@@ -107,14 +107,14 @@ typedef struct SaveVMHandlers {
      * When postcopy is enabled, devices that support postcopy will skip this
      * step.
      *
-     * @d: a #SaveLiveCompletePrecopyThreadData containing parameters that the
+     * @d: a #SaveCompletePrecopyThreadData containing parameters that the
      * handler may need, including this device section idstr and instance_id,
      * and opaque data pointer passed to register_savevm_live().
      * @errp: pointer to Error*, to store an error if it happens.
      *
      * Returns true to indicate success and false for errors.
      */
-    SaveLiveCompletePrecopyThreadHandler save_live_complete_precopy_thread;
+    SaveCompletePrecopyThreadHandler save_complete_precopy_thread;
 
     /* This runs both outside and inside the BQL.  */
 
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 507f0814d5..4a94af9665 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -109,7 +109,7 @@ typedef struct QString QString;
 typedef struct RAMBlock RAMBlock;
 typedef struct Range Range;
 typedef struct ReservedRegion ReservedRegion;
-typedef struct SaveLiveCompletePrecopyThreadData SaveLiveCompletePrecopyThreadData;
+typedef struct SaveCompletePrecopyThreadData SaveCompletePrecopyThreadData;
 typedef struct SHPCDevice SHPCDevice;
 typedef struct SSIBus SSIBus;
 typedef struct TCGCPUOps TCGCPUOps;
@@ -135,7 +135,7 @@ typedef struct IRQState *qemu_irq;
 typedef void (*qemu_irq_handler)(void *opaque, int n, int level);
 typedef bool (*MigrationLoadThread)(void *opaque, bool *should_quit,
                                     Error **errp);
-typedef bool (*SaveLiveCompletePrecopyThreadHandler)(SaveLiveCompletePrecopyThreadData *d,
-                                                     Error **errp);
+typedef bool (*SaveCompletePrecopyThreadHandler)(SaveCompletePrecopyThreadData *d,
+                                                 Error **errp);
 
 #endif /* QEMU_TYPEDEFS_H */
diff --git a/migration/multifd-device-state.c b/migration/multifd-device-state.c
index 94222d0eb0..fce64f00b0 100644
--- a/migration/multifd-device-state.c
+++ b/migration/multifd-device-state.c
@@ -131,7 +131,7 @@ bool multifd_device_state_supported(void)
 
 static void multifd_device_state_save_thread_data_free(void *opaque)
 {
-    SaveLiveCompletePrecopyThreadData *data = opaque;
+    SaveCompletePrecopyThreadData *data = opaque;
 
     g_clear_pointer(&data->idstr, g_free);
     g_free(data);
@@ -139,7 +139,7 @@ static void multifd_device_state_save_thread_data_free(void *opaque)
 
 static int multifd_device_state_save_thread(void *opaque)
 {
-    SaveLiveCompletePrecopyThreadData *data = opaque;
+    SaveCompletePrecopyThreadData *data = opaque;
     g_autoptr(Error) local_err = NULL;
 
     if (!data->hdlr(data, &local_err)) {
@@ -170,18 +170,18 @@ bool multifd_device_state_save_thread_should_exit(void)
 }
 
 void
-multifd_spawn_device_state_save_thread(SaveLiveCompletePrecopyThreadHandler hdlr,
+multifd_spawn_device_state_save_thread(SaveCompletePrecopyThreadHandler hdlr,
                                        char *idstr, uint32_t instance_id,
                                        void *opaque)
 {
-    SaveLiveCompletePrecopyThreadData *data;
+    SaveCompletePrecopyThreadData *data;
 
     assert(multifd_device_state_supported());
     assert(multifd_send_device_state);
 
     assert(!qatomic_read(&multifd_send_device_state->threads_abort));
 
-    data = g_new(SaveLiveCompletePrecopyThreadData, 1);
+    data = g_new(SaveCompletePrecopyThreadData, 1);
     data->hdlr = hdlr;
     data->idstr = g_strdup(idstr);
     data->instance_id = instance_id;
diff --git a/migration/savevm.c b/migration/savevm.c
index c4fd5f5a5b..fabbeb296a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1581,15 +1581,15 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 
     if (multifd_device_state) {
         QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-            SaveLiveCompletePrecopyThreadHandler hdlr;
+            SaveCompletePrecopyThreadHandler hdlr;
 
             if (!se->ops || (in_postcopy && se->ops->has_postcopy &&
                              se->ops->has_postcopy(se->opaque)) ||
-                !se->ops->save_live_complete_precopy_thread) {
+                !se->ops->save_complete_precopy_thread) {
                 continue;
             }
 
-            hdlr = se->ops->save_live_complete_precopy_thread;
+            hdlr = se->ops->save_complete_precopy_thread;
             multifd_spawn_device_state_save_thread(hdlr,
                                                    se->idstr, se->instance_id,
                                                    se->opaque);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PULL 00/26] Migration patches for 2025-07-11
  2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
                   ` (25 preceding siblings ...)
  2025-07-11 14:10 ` [PULL 26/26] migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread Fabiano Rosas
@ 2025-07-13  7:06 ` Stefan Hajnoczi
  26 siblings, 0 replies; 31+ messages in thread
From: Stefan Hajnoczi @ 2025-07-13  7:06 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu

[-- Attachment #1: Type: text/plain, Size: 116 bytes --]

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/10.1 for any user-visible changes.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PULL 25/26] migration/postcopy: Add latency distribution report for blocktime
  2025-07-11 14:10 ` [PULL 25/26] migration/postcopy: Add latency distribution report for blocktime Fabiano Rosas
@ 2025-07-14 10:47   ` Peter Maydell
  2025-07-14 19:11     ` Fabiano Rosas
  2025-07-14 19:39     ` Philippe Mathieu-Daudé
  0 siblings, 2 replies; 31+ messages in thread
From: Peter Maydell @ 2025-07-14 10:47 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Peter Xu, Markus Armbruster, Dr . David Alan Gilbert

On Fri, 11 Jul 2025 at 15:20, Fabiano Rosas <farosas@suse.de> wrote:
>
> From: Peter Xu <peterx@redhat.com>
>
> Add the latency distribution too for blocktime, using order-of-two buckets.
> It accounts for all the faults, from either vCPU or non-vCPU threads.  With
> prior rework, it's very easy to achieve by adding an array to account for
> faults in each buckets.
>
> Sample output for HMP (while for QMP it's simply an array):
>
> Postcopy Latency Distribution:

Hi; Coverity points out that there is a bug here (CID 1612248):

> +static const gchar *format_time_str(uint64_t us)
> +{
> +    const char *units[] = {"us", "ms", "sec"};
> +    int index = 0;
> +
> +    while (us > 1000) {
> +        us /= 1000;
> +        if (++index >= (sizeof(units) - 1)) {

sizeof(units) is the size in bytes, which in this case is
24, as it is an array of three 8-byte pointers. So it's
not the right thing to use in bounds checking the array index.

You probably wanted ARRAY_SIZE(units). Also, the ++index
inside the comparison here seems unnecessarily confusing.
I would suggest something like

    while (us > 1000 && index + 1 < ARRAY_SIZE(units)) {
        us /= 1000;
        index++;
    }

which puts into the while condition the two conditions under
which we are OK to shift down a unit, and keeps it
clear that we maintain the invariant of "when we shift
down a unit we also divide the value by 1000".

> +            break;
> +        }
> +    }
> +
> +    return g_strdup_printf("%"PRIu64" %s", us, units[index]);

Otherwise this units[index] access could be off the end of
the array.

> +}

thanks
-- PMM

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PULL 25/26] migration/postcopy: Add latency distribution report for blocktime
  2025-07-14 10:47   ` Peter Maydell
@ 2025-07-14 19:11     ` Fabiano Rosas
  2025-07-14 19:39     ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 31+ messages in thread
From: Fabiano Rosas @ 2025-07-14 19:11 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel, Peter Xu, Markus Armbruster, Dr . David Alan Gilbert

Peter Maydell <peter.maydell@linaro.org> writes:

> On Fri, 11 Jul 2025 at 15:20, Fabiano Rosas <farosas@suse.de> wrote:
>>
>> From: Peter Xu <peterx@redhat.com>
>>
>> Add the latency distribution too for blocktime, using order-of-two buckets.
>> It accounts for all the faults, from either vCPU or non-vCPU threads.  With
>> prior rework, it's very easy to achieve by adding an array to account for
>> faults in each buckets.
>>
>> Sample output for HMP (while for QMP it's simply an array):
>>
>> Postcopy Latency Distribution:
>
> Hi; Coverity points out that there is a bug here (CID 1612248):
>
>> +static const gchar *format_time_str(uint64_t us)
>> +{
>> +    const char *units[] = {"us", "ms", "sec"};
>> +    int index = 0;
>> +
>> +    while (us > 1000) {
>> +        us /= 1000;
>> +        if (++index >= (sizeof(units) - 1)) {
>
> sizeof(units) is the size in bytes, which in this case is
> 24, as it is an array of three 8-byte pointers. So it's
> not the right thing to use in bounds checking the array index.
>
> You probably wanted ARRAY_SIZE(units). Also, the ++index
> inside the comparison here seems unnecessarily confusing.
> I would suggest something like
>
>     while (us > 1000 && index + 1 < ARRAY_SIZE(units)) {
>         us /= 1000;
>         index++;
>     }
>
> which puts into the while condition the two conditions under
> which we are OK to shift down a unit, and keeps it
> clear that we maintain the invariant of "when we shift
> down a unit we also divide the value by 1000".
>
>> +            break;
>> +        }
>> +    }
>> +
>> +    return g_strdup_printf("%"PRIu64" %s", us, units[index]);
>
> Otherwise this units[index] access could be off the end of
> the array.
>
>> +}

Thanks for the report. Peter Xu is not available at the moment, I'll
work on this.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PULL 25/26] migration/postcopy: Add latency distribution report for blocktime
  2025-07-14 10:47   ` Peter Maydell
  2025-07-14 19:11     ` Fabiano Rosas
@ 2025-07-14 19:39     ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 31+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-07-14 19:39 UTC (permalink / raw)
  To: Peter Maydell, Fabiano Rosas
  Cc: qemu-devel, Peter Xu, Markus Armbruster, Dr . David Alan Gilbert

On 14/7/25 12:47, Peter Maydell wrote:
> On Fri, 11 Jul 2025 at 15:20, Fabiano Rosas <farosas@suse.de> wrote:
>>
>> From: Peter Xu <peterx@redhat.com>
>>
>> Add the latency distribution too for blocktime, using order-of-two buckets.
>> It accounts for all the faults, from either vCPU or non-vCPU threads.  With
>> prior rework, it's very easy to achieve by adding an array to account for
>> faults in each buckets.
>>
>> Sample output for HMP (while for QMP it's simply an array):
>>
>> Postcopy Latency Distribution:
> 
> Hi; Coverity points out that there is a bug here (CID 1612248):
> 
>> +static const gchar *format_time_str(uint64_t us)
>> +{
>> +    const char *units[] = {"us", "ms", "sec"};
>> +    int index = 0;
>> +
>> +    while (us > 1000) {
>> +        us /= 1000;
>> +        if (++index >= (sizeof(units) - 1)) {
> 
> sizeof(units) is the size in bytes, which in this case is
> 24, as it is an array of three 8-byte pointers. So it's
> not the right thing to use in bounds checking the array index.
> 
> You probably wanted ARRAY_SIZE(units). Also, the ++index
> inside the comparison here seems unnecessarily confusing.
> I would suggest something like
> 
>      while (us > 1000 && index + 1 < ARRAY_SIZE(units)) {
>          us /= 1000;
>          index++;
>      }
> 
> which puts into the while condition the two conditions under
> which we are OK to shift down a unit, and keeps it
> clear that we maintain the invariant of "when we shift
> down a unit we also divide the value by 1000".
> 
>> +            break;
>> +        }
>> +    }
>> +
>> +    return g_strdup_printf("%"PRIu64" %s", us, units[index]);
> 
> Otherwise this units[index] access could be off the end of
> the array.
> 
>> +}

As a follow-up, this method could be declared in "qemu/cutils.h"
along with size_to_str() and freq_to_str(), maybe timeus_to_str()?

Regards,

Phil.


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-07-14 21:23 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-11 14:10 [PULL 00/26] Migration patches for 2025-07-11 Fabiano Rosas
2025-07-11 14:10 ` [PULL 01/26] migration/hmp: Reorg "info migrate" once more Fabiano Rosas
2025-07-11 14:10 ` [PULL 02/26] migration/hmp: Fix postcopy-blocktime per-vCPU results Fabiano Rosas
2025-07-11 14:10 ` [PULL 03/26] migration/docs: Move docs for postcopy blocktime feature Fabiano Rosas
2025-07-11 14:10 ` [PULL 04/26] migration/bg-snapshot: Do not check for SKIP in iterator Fabiano Rosas
2025-07-11 14:10 ` [PULL 05/26] migration: Drop save_live_complete_postcopy hook Fabiano Rosas
2025-07-11 14:10 ` [PULL 06/26] migration: Rename save_live_complete_precopy to save_complete Fabiano Rosas
2025-07-11 14:10 ` [PULL 07/26] migration: qemu_savevm_complete*() helpers Fabiano Rosas
2025-07-11 14:10 ` [PULL 08/26] migration/ram: One less indent for ram_find_and_save_block() Fabiano Rosas
2025-07-11 14:10 ` [PULL 09/26] migration/ram: Add tracepoints for ram_save_complete() Fabiano Rosas
2025-07-11 14:10 ` [PULL 10/26] migration: Rewrite the migration complete detect logic Fabiano Rosas
2025-07-11 14:10 ` [PULL 11/26] migration/postcopy: Avoid clearing dirty bitmap for postcopy too Fabiano Rosas
2025-07-11 14:10 ` [PULL 12/26] migration: Add option to set postcopy-blocktime Fabiano Rosas
2025-07-11 14:10 ` [PULL 13/26] migration/postcopy: Push blocktime start/end into page req mutex Fabiano Rosas
2025-07-11 14:10 ` [PULL 14/26] migration/postcopy: Drop all atomic ops in blocktime feature Fabiano Rosas
2025-07-11 14:10 ` [PULL 15/26] migration/postcopy: Make all blocktime vars 64bits Fabiano Rosas
2025-07-11 14:10 ` [PULL 16/26] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Fabiano Rosas
2025-07-11 14:10 ` [PULL 17/26] migration/postcopy: Bring blocktime layer to ns level Fabiano Rosas
2025-07-11 14:10 ` [PULL 18/26] migration/postcopy: Add blocktime fault counts per-vcpu Fabiano Rosas
2025-07-11 14:10 ` [PULL 19/26] migration/postcopy: Report fault latencies in blocktime Fabiano Rosas
2025-07-11 14:10 ` [PULL 20/26] migration/postcopy: Initialize blocktime context only until listen Fabiano Rosas
2025-07-11 14:10 ` [PULL 21/26] migration/postcopy: Cache the tid->vcpu mapping for blocktime Fabiano Rosas
2025-07-11 14:10 ` [PULL 22/26] migration/postcopy: Cleanup the total blocktime accounting Fabiano Rosas
2025-07-11 14:10 ` [PULL 23/26] migration/postcopy: Optimize blocktime fault tracking with hashtable Fabiano Rosas
2025-07-11 14:10 ` [PULL 24/26] migration/postcopy: blocktime allows track / report non-vCPU faults Fabiano Rosas
2025-07-11 14:10 ` [PULL 25/26] migration/postcopy: Add latency distribution report for blocktime Fabiano Rosas
2025-07-14 10:47   ` Peter Maydell
2025-07-14 19:11     ` Fabiano Rosas
2025-07-14 19:39     ` Philippe Mathieu-Daudé
2025-07-11 14:10 ` [PULL 26/26] migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread Fabiano Rosas
2025-07-13  7:06 ` [PULL 00/26] Migration patches for 2025-07-11 Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.