All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports
@ 2026-04-21 20:20 Peter Xu
  2026-04-21 20:20 ` [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap Peter Xu
                   ` (16 more replies)
  0 siblings, 17 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

CI:  https://gitlab.com/peterx/qemu/-/pipelines/2469074018
rfc: https://lore.kernel.org/r/20260319231302.123135-1-peterx@redhat.com
v1:  https://lore.kernel.org/r/20260408165559.157108-1-peterx@redhat.com

v2:
- Added tags
- Patch 4
  - Fix and rework doc for @save_query_pending [Juraj]
  - Trace "exact" in trace_vfio_state_pending() [Avihai]
  - Avoid mentioning "pre-copy" in vfio.rst doc for query [Avihai]
- Patch 12
  - English errors [Fabiano]
- Patch 13
  - Remove " (bytes)" in HMP line [Fabiano]
- Added patch "qemu-iotests: Add query-migrate test for dirty-bitmap"
  - This covers a bug that I found when testing v1
- Added patch "vfio/migration: Add tracepoints for precopy/stopcopy query
  ioctls" to be able to dump the raw results from the two VFIO ioctls
- Replace patch "migration: Make qemu_savevm_query_pending() available
  anytime" with patch "migration: Remember total dirty bytes in mig_stats"
  - I fell back to "cache the total dirty bytes" idea on this one to avoid
    complication of save_query_pending() invoked anywhere.

Overview
========

VFIO migration was merged quite a while, but we do still see things off
here and there.  This series tries to address some of them, but only based
on my limited understandings.

Two major issues I wanted to resolve:

(1) VFIO reports state_pending_{exact|estimate}() differently

It reports stop-only sizes in exact() only (which includes both precopy and
stopcopy data), while in estimate() it only reports precopy data.  This is
violating the API.  It was done like it to trigger proper sync on the VFIO
ioctls only but it was only a workaround.  This series should fix it by
introducing stopcopy size reporting facility for vmstate handlers.

(2) expected_downtime / remaining doesn't take VFIO devices into account

When query migration, QEMU reports one field called "expected-downtime".
The document was phrasing this almost from RAM perspective, but ideally it
should be about an estimated blackout window (in milliseconds) if we
switchover anytime, based on known information.

This didn't yet took VFIO into account, especially in the case of VFIO
devices that may contain a large amount of device states (like GPUs).

For problem (2), the use case should be that an mgmt app when migrating a
VFIO GPU device needs to always adjust downtime for migration to converge,
because when it's involved normal downtime like 300ms will normally not
suffice.

Now the issue with that is the mgmt doesn't have a good way to know exactly
how well the precopy goes with the whole system and the GPU device.

The hope is fixed expected_downtime will provide one way for the mgmt app
to have a reasonable hint for downtime to setup to converge a migration.

Meanwhile, with a system-wise "remaining" field introduced, mgmt can query
this results at beginning of each iteration to know if a stall is
happening, IOW, if it's likely that this migration will not converge at
all.  When detected, mgmt can start to consider the expected_downtime value
reported above for converging this migration.  See more on testing below.

Tests
=====

Thanks to Cédric on help testing v2.  One thing to mention is we did
encounter one case where we observed reported dirty size overflowed for
uint64_t (on both expected_downtime and system remaining data).

Quotes from test results from Cédric, migrating a RHEL9 VM with a vGPU
(NVIDIA L4-2B) and an MLX5 VF, from a RHEL9 host (vGPU mdev) to a RHEL10
host (vGPU VF), with the vGPU under load (glxgears):

(qemu) info migrate
Status:                 active
Time (ms):              total=21140, setup=86, exp_down=152455434886355 <---- !?!
Remaining:              16 EiB                                          <---- !?!
RAM info:
  Throughput (Mbps):    967.98
  Sizes:                pagesize=4 KiB, total=4 GiB
  Transfers:            transferred=2.29 GiB, remain=4.7 MiB
    Channels:           precopy=1.91 GiB, multifd=0 B, postcopy=0 B, vfio=387 MiB
    Page Types:         normal=499427, zero=559708
  Page Rates (pps):     transfer=0, dirty=1892
  Others:               dirty_syncs=3

It got fixed itself after a few more rounds of iterations, so it also
didn't affects migration ultimately.  Further attempts didn't reproduce it
after I added the tracepoint patch. It would be good if someone knows if it
was a known driver issue.

For detailed testing steps, please refer to v1's cover letter.

Peter Xu (16):
  qemu-iotests: Add query-migrate test for dirty-bitmap
  migration: Fix low possibility downtime violation
  migration/qapi: Rename MigrationStats to MigrationRAMStats
  vfio/migration: Cache stop size in VFIOMigration
  migration/treewide: Merge @state_pending_{exact|estimate} APIs
  migration: Use the new save_query_pending() API directly
  migration: Introduce stopcopy_bytes in save_query_pending()
  vfio/migration: Fix incorrect reporting for VFIO pending data
  migration: Move iteration counter out of RAM
  migration: Introduce a helper to return switchover bw estimate
  migration: Calculate expected downtime on demand
  migration: Fix calculation of expected_downtime to take VFIO info
  migration: Remember total dirty bytes in mig_stats
  migration/qapi: Introduce system-wise "remaining" reports
  migration/qapi: Update unit for avail-switchover-bandwidth
  vfio/migration: Add tracepoints for precopy/stopcopy query ioctls

 docs/about/removed-features.rst               |   2 +-
 docs/devel/migration/main.rst                 |   9 +-
 docs/devel/migration/vfio.rst                 |   9 +-
 qapi/migration.json                           |  32 ++--
 hw/vfio/vfio-migration-internal.h             |   8 +
 include/migration/register.h                  |  59 +++---
 migration/migration-stats.h                   |  20 +-
 migration/migration.h                         |   2 +-
 migration/savevm.h                            |   7 +-
 hw/s390x/s390-stattrib.c                      |   9 +-
 hw/vfio/migration.c                           | 123 +++++++-----
 migration/block-dirty-bitmap.c                |  10 +-
 migration/migration-hmp-cmds.c                |   5 +
 migration/migration.c                         | 177 +++++++++++++-----
 migration/ram.c                               |  40 +---
 migration/savevm.c                            |  42 ++---
 hw/vfio/trace-events                          |   5 +-
 migration/trace-events                        |   3 +-
 .../tests/migrate-bitmaps-postcopy-test       |   6 +
 19 files changed, 322 insertions(+), 246 deletions(-)

-- 
2.53.0



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
@ 2026-04-21 20:20 ` Peter Xu
  2026-04-22  8:08   ` Vladimir Sementsov-Ogievskiy
  2026-04-21 20:20 ` [PATCH v2 02/16] migration: Fix low possibility downtime violation Peter Xu
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin,
	Vladimir Sementsov-Ogievskiy, Eric Blake

This helps me to identify a hang issue with some recent change in
migration.  Add this into the test suite.

Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test b/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
index c519e6db8c..67d69d9d1e 100755
--- a/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
+++ b/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
@@ -162,8 +162,14 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
         self.vm_a.cmd('migrate', uri='exec:cat>' + fifo)
 
+        # Verify query-migrate working with dirty-bitmaps in precopy mode
+        self.vm_a.qmp('query-migrate')
+
         self.vm_a.cmd('migrate-start-postcopy')
 
+        # Verify query-migrate working with dirty-bitmaps in postcopy mode
+        self.vm_a.qmp('query-migrate')
+
         event_resume = self.vm_b.event_wait('RESUME')
         self.vm_b_events.append(event_resume)
         return (event_resume, discards1_sha256, all_discards_sha256)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 02/16] migration: Fix low possibility downtime violation
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
  2026-04-21 20:20 ` [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap Peter Xu
@ 2026-04-21 20:20 ` Peter Xu
  2026-04-21 20:20 ` [PATCH v2 03/16] migration/qapi: Rename MigrationStats to MigrationRAMStats Peter Xu
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin, qemu-stable

When QEMU queried the estimated version of pending data and thinks it's
ready to converge, it'll send another accurate query to make sure of it.
It is needed to make sure we collect the latest reports and that equation
still holds true.

However we missed one tiny little difference here on "<" v.s. "<=" when
comparing pending_size (A) to threshold_size (B)..

QEMU src only re-query if A<B, but will kickoff switchover if A<=B.

I think it means it is possible to happen if A (as an estimate only so far)
accidentally equals to B, then re-query won't happen and switchover will
proceed without considering new dirtied data.

It turns out it was an accident in my commit 7aaa1fc072 when refactoring
the code around.  Fix this by using the same equation in both places.

Fixes: 7aaa1fc072 ("migration: Rewrite the migration complete detect logic")
Cc: qemu-stable@nongnu.org
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5c9aaa6e58..dfc60372cf 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3242,7 +3242,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
          * postcopy started, so ESTIMATE should always match with EXACT
          * during postcopy phase.
          */
-        if (pending_size < s->threshold_size) {
+        if (pending_size <= s->threshold_size) {
             qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
             pending_size = must_precopy + can_postcopy;
             trace_migrate_pending_exact(pending_size, must_precopy,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 03/16] migration/qapi: Rename MigrationStats to MigrationRAMStats
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
  2026-04-21 20:20 ` [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap Peter Xu
  2026-04-21 20:20 ` [PATCH v2 02/16] migration: Fix low possibility downtime violation Peter Xu
@ 2026-04-21 20:20 ` Peter Xu
  2026-04-24  9:03   ` Markus Armbruster
  2026-04-21 20:20 ` [PATCH v2 04/16] vfio/migration: Cache stop size in VFIOMigration Peter Xu
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin, devel,
	Michal Privoznik

This stats is only about RAM, make it accurate.  This paves way for
statistics for all devices.

Thanks to Markus, who pointed out that docs/devel/qapi-code-gen.rst has a
section "Compatibility considerations" stated:

    Since type names are not visible in the Client JSON Protocol, types
    may be freely renamed.  Even certain refactorings are invisible, such
    as splitting members from one type into a common base type.

Hence this change is not ABI violation according to the document.

While at it, touch up the lines to make it read better, correct the
restriction on migration status being 'active' or 'completed': over time we
grew too many new status that will also report "ram" section.

Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: devel@lists.libvirt.org
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 docs/about/removed-features.rst |  2 +-
 qapi/migration.json             | 10 +++++-----
 migration/migration-stats.h     |  2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index e75db08410..626162022a 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -699,7 +699,7 @@ was superseded by ``sections``.
 ``query-migrate`` return value member ``skipped`` (removed in 9.1)
 ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
 
-Member ``skipped`` of the ``MigrationStats`` struct hasn't been used
+Member ``skipped`` of the ``MigrationRAMStats`` struct hasn't been used
 for more than 10 years. Removed with no replacement.
 
 ``migrate`` command option ``inc`` (removed in 9.1)
diff --git a/qapi/migration.json b/qapi/migration.json
index 7134d4ce47..e3ad3f0604 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -12,7 +12,7 @@
 { 'include': 'sockets.json' }
 
 ##
-# @MigrationStats:
+# @MigrationRAMStats:
 #
 # Detailed migration status.
 #
@@ -64,7 +64,7 @@
 #
 # Since: 0.14
 ##
-{ 'struct': 'MigrationStats',
+{ 'struct': 'MigrationRAMStats',
   'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
            'duplicate': 'int',
            'normal': 'int',
@@ -209,8 +209,8 @@
 #     If this field is not returned, no migration process has been
 #     initiated
 #
-# @ram: `MigrationStats` containing detailed migration status, only
-#     returned if status is 'active' or 'completed'(since 1.2)
+# @ram: Detailed migration RAM statistics, only returned if migration
+#     is in progress or completed (since 1.2)
 #
 # @xbzrle-cache: `XBZRLECacheStats` containing detailed XBZRLE
 #     migration statistics, only returned if XBZRLE feature is on and
@@ -309,7 +309,7 @@
 # Since: 0.14
 ##
 { 'struct': 'MigrationInfo',
-  'data': {'*status': 'MigrationStatus', '*ram': 'MigrationStats',
+  'data': {'*status': 'MigrationStatus', '*ram': 'MigrationRAMStats',
            '*vfio': 'VfioStats',
            '*xbzrle-cache': 'XBZRLECacheStats',
            '*total-time': 'int',
diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index c0f50144c9..1153520f7a 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -27,7 +27,7 @@
 
 /*
  * These are the ram migration statistic counters.  It is loosely
- * based on MigrationStats.
+ * based on MigrationRAMStats.
  */
 typedef struct {
     /*
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 04/16] vfio/migration: Cache stop size in VFIOMigration
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (2 preceding siblings ...)
  2026-04-21 20:20 ` [PATCH v2 03/16] migration/qapi: Rename MigrationStats to MigrationRAMStats Peter Xu
@ 2026-04-21 20:20 ` Peter Xu
  2026-04-21 20:20 ` [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs Peter Xu
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

Add a field to cache stop size.  Note that there's an initial value change
in vfio_save_setup for the stop size default, but it shouldn't matter if it
is followed with a math of MIN() against VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE.

Document that all the three sizes we read from VFIO's uAPI on dirty or stop
sizes are estimates, so QEMU needs to always remember they can be anything.

Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/vfio/vfio-migration-internal.h |  8 +++++
 hw/vfio/migration.c               | 50 ++++++++++++++++++-------------
 2 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/hw/vfio/vfio-migration-internal.h b/hw/vfio/vfio-migration-internal.h
index 814fbd9eba..a15fc74703 100644
--- a/hw/vfio/vfio-migration-internal.h
+++ b/hw/vfio/vfio-migration-internal.h
@@ -45,8 +45,16 @@ typedef struct VFIOMigration {
     void *data_buffer;
     size_t data_buffer_size;
     uint64_t mig_flags;
+    /*
+     * NOTE: all three sizes cached are reported from VFIO's uAPI, which
+     * are defined as estimate only.  QEMU should not trust these values
+     * but only use them to do best-effort estimates.  Always be prepared
+     * that these sizes may either grow or even shrink in reality while
+     * read()ing from the VFIO fds.
+     */
     uint64_t precopy_init_size;
     uint64_t precopy_dirty_size;
+    uint64_t stopcopy_size;
     bool multifd_transfer;
     VFIOMultifd *multifd;
     bool initial_data_sent;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 83327b6573..5d5fca09bd 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -41,6 +41,12 @@
  */
 #define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
 
+/*
+ * Migration size of VFIO devices can be as little as a few KBs or as big as
+ * many GBs. This value should be big enough to cover the worst case.
+ */
+#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
+
 static unsigned long bytes_transferred;
 
 static const char *mig_state_to_str(enum vfio_device_mig_state state)
@@ -314,8 +320,7 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
     migration->data_fd = -1;
 }
 
-static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
-                                     uint64_t *stop_copy_size)
+static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
 {
     uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
                               sizeof(struct vfio_device_feature_mig_data_size),
@@ -323,16 +328,22 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
     struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
     struct vfio_device_feature_mig_data_size *mig_data_size =
         (struct vfio_device_feature_mig_data_size *)feature->data;
+    VFIOMigration *migration = vbasedev->migration;
 
     feature->argsz = sizeof(buf);
     feature->flags =
         VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIG_DATA_SIZE;
 
     if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+        /*
+         * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE
+         * is reported so downtime limit won't be violated.
+         */
+        migration->stopcopy_size = VFIO_MIG_STOP_COPY_SIZE;
         return -errno;
     }
 
-    *stop_copy_size = mig_data_size->stop_copy_length;
+    migration->stopcopy_size = mig_data_size->stop_copy_length;
 
     return 0;
 }
@@ -409,6 +420,16 @@ static void vfio_update_estimated_pending_data(VFIOMigration *migration,
         return;
     }
 
+    /*
+     * The total size remaining requires separate accounting.  Do not trust
+     * the counter, so what we have read() may be more than what reported.
+     */
+    if (migration->stopcopy_size > data_size) {
+        migration->stopcopy_size -= data_size;
+    } else {
+        migration->stopcopy_size = 0;
+    }
+
     if (migration->precopy_init_size) {
         uint64_t init_size = MIN(migration->precopy_init_size, data_size);
 
@@ -463,7 +484,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
-    uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE;
     int ret;
 
     if (!vfio_multifd_setup(vbasedev, false, errp)) {
@@ -472,9 +492,9 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp)
 
     qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
 
-    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
+    vfio_query_stop_copy_size(vbasedev);
     migration->data_buffer_size = MIN(VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE,
-                                      stop_copy_size);
+                                      migration->stopcopy_size);
     migration->data_buffer = g_try_malloc0(migration->data_buffer_size);
     if (!migration->data_buffer) {
         error_setg(errp, "%s: Failed to allocate migration data buffer",
@@ -570,32 +590,22 @@ static void vfio_state_pending_estimate(void *opaque, uint64_t *must_precopy,
                                       migration->precopy_dirty_size);
 }
 
-/*
- * Migration size of VFIO devices can be as little as a few KBs or as big as
- * many GBs. This value should be big enough to cover the worst case.
- */
-#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
-
 static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
                                      uint64_t *can_postcopy)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
-    uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
 
-    /*
-     * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
-     * reported so downtime limit won't be violated.
-     */
-    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
-    *must_precopy += stop_copy_size;
+    vfio_query_stop_copy_size(vbasedev);
+    *must_precopy += migration->stopcopy_size;
 
     if (vfio_device_state_is_precopy(vbasedev)) {
         vfio_query_precopy_size(migration);
     }
 
     trace_vfio_state_pending_exact(vbasedev->name, *must_precopy, *can_postcopy,
-                                   stop_copy_size, migration->precopy_init_size,
+                                   migration->stopcopy_size,
+                                   migration->precopy_init_size,
                                    migration->precopy_dirty_size);
 }
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (3 preceding siblings ...)
  2026-04-21 20:20 ` [PATCH v2 04/16] vfio/migration: Cache stop size in VFIOMigration Peter Xu
@ 2026-04-21 20:20 ` Peter Xu
  2026-04-22  8:23   ` Vladimir Sementsov-Ogievskiy
  2026-04-22  8:29   ` Vladimir Sementsov-Ogievskiy
  2026-04-21 20:21 ` [PATCH v2 06/16] migration: Use the new save_query_pending() API directly Peter Xu
                   ` (11 subsequent siblings)
  16 siblings, 2 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin, Halil Pasic,
	Christian Borntraeger, Eric Farman, Matthew Rosato,
	Richard Henderson, Ilya Leoshkevich, David Hildenbrand,
	Cornelia Huck, Eric Blake, Vladimir Sementsov-Ogievskiy,
	John Snow, Jason J. Herne

These two APIs are a slight duplication.  For example, there're a few users
that directly pass in the same function.

It might also be error prone to provide two hooks, so that it's easier to
happen one module report different things via the two hooks.

In reality, they should always report the same thing, only about whether we
should use a fast-path when the slow path might be too slow, as QEMU may
query these information quite frequently during migration process.

Merge it into one API, provide a bool showing if the query is an exact
query or not.  No functional change intended.

Export qemu_savevm_query_pending().  We should use the new API here
provided when there're new users to do the query.  This will happen very
soon.

Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Cc: John Snow <jsnow@redhat.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 docs/devel/migration/main.rst  |  9 ++----
 docs/devel/migration/vfio.rst  |  9 ++----
 include/migration/register.h   | 52 ++++++++++++----------------------
 migration/savevm.h             |  3 ++
 hw/s390x/s390-stattrib.c       |  9 +++---
 hw/vfio/migration.c            | 48 ++++++++++++++-----------------
 migration/block-dirty-bitmap.c | 10 +++----
 migration/ram.c                | 33 +++++++--------------
 migration/savevm.c             | 42 +++++++++++++--------------
 hw/vfio/trace-events           |  3 +-
 10 files changed, 86 insertions(+), 132 deletions(-)

diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 234d280249..e6a6ca3681 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -515,13 +515,8 @@ An iterative device must provide:
   - A ``load_setup`` function that initialises the data structures on the
     destination.
 
-  - A ``state_pending_exact`` function that indicates how much more
-    data we must save.  The core migration code will use this to
-    determine when to pause the CPUs and complete the migration.
-
-  - A ``state_pending_estimate`` function that indicates how much more
-    data we must save.  When the estimated amount is smaller than the
-    threshold, we call ``state_pending_exact``.
+  - A ``save_query_pending`` function that indicates how much more
+    data we must save.
 
   - A ``save_live_iterate`` function should send a chunk of data until
     the point that stream bandwidth limits tell it to stop.  Each call
diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst
index 0790e5031d..691061d182 100644
--- a/docs/devel/migration/vfio.rst
+++ b/docs/devel/migration/vfio.rst
@@ -50,13 +50,8 @@ VFIO implements the device hooks for the iterative approach as follows:
 * A ``load_setup`` function that sets the VFIO device on the destination in
   _RESUMING state.
 
-* A ``state_pending_estimate`` function that reports an estimate of the
-  remaining pre-copy data that the vendor driver has yet to save for the VFIO
-  device.
-
-* A ``state_pending_exact`` function that reads pending_bytes from the vendor
-  driver, which indicates the amount of data that the vendor driver has yet to
-  save for the VFIO device.
+* A ``save_query_pending`` function that reports the remaining data that
+  the vendor driver has yet to save for the VFIO device.
 
 * An ``is_active_iterate`` function that indicates ``save_live_iterate`` is
   active only when the VFIO device is in pre-copy states.
diff --git a/include/migration/register.h b/include/migration/register.h
index d0f37f5f43..e2117e8dd4 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -16,6 +16,13 @@
 
 #include "hw/core/vmstate-if.h"
 
+typedef struct MigPendingData {
+    /* Amount of pending bytes can be transferred in precopy or stopcopy */
+    uint64_t precopy_bytes;
+    /* Amount of pending bytes can be transferred in postcopy */
+    uint64_t postcopy_bytes;
+} MigPendingData;
+
 /**
  * struct SaveVMHandlers: handler structure to finely control
  * migration of complex subsystems and devices, such as RAM, block and
@@ -197,46 +204,23 @@ typedef struct SaveVMHandlers {
     bool (*save_postcopy_prepare)(QEMUFile *f, void *opaque, Error **errp);
 
     /**
-     * @state_pending_estimate
-     *
-     * This estimates the remaining data to transfer
+     * @save_query_pending
      *
-     * Sum of @can_postcopy and @must_postcopy is the whole amount of
-     * pending data.
-     *
-     * @opaque: data pointer passed to register_savevm_live()
-     * @must_precopy: amount of data that must be migrated in precopy
-     *                or in stopped state, i.e. that must be migrated
-     *                before target start.
-     * @can_postcopy: amount of data that can be migrated in postcopy
-     *                or in stopped state, i.e. after target start.
-     *                Some can also be migrated during precopy (RAM).
-     *                Some must be migrated after source stops
-     *                (block-dirty-bitmap)
-     */
-    void (*state_pending_estimate)(void *opaque, uint64_t *must_precopy,
-                                   uint64_t *can_postcopy);
-
-    /**
-     * @state_pending_exact
+     * This estimates the remaining data to transfer on the source side.
      *
-     * This calculates the exact remaining data to transfer
+     * When @exact is true, a module must report accurate results.  When
+     * @exact is false, a module may report estimates.
      *
-     * Sum of @can_postcopy and @must_postcopy is the whole amount of
-     * pending data.
+     * It's highly recommended that modules implement a faster version of
+     * the query path (for example, by proper caching on the counters) if
+     * an accurate query will be time-consuming.
      *
      * @opaque: data pointer passed to register_savevm_live()
-     * @must_precopy: amount of data that must be migrated in precopy
-     *                or in stopped state, i.e. that must be migrated
-     *                before target start.
-     * @can_postcopy: amount of data that can be migrated in postcopy
-     *                or in stopped state, i.e. after target start.
-     *                Some can also be migrated during precopy (RAM).
-     *                Some must be migrated after source stops
-     *                (block-dirty-bitmap)
+     * @pending: pointer to a MigPendingData struct
+     * @exact: set to true for an accurate (slow) query
      */
-    void (*state_pending_exact)(void *opaque, uint64_t *must_precopy,
-                                uint64_t *can_postcopy);
+    void (*save_query_pending)(void *opaque, MigPendingData *pending,
+                               bool exact);
 
     /**
      * @load_state
diff --git a/migration/savevm.h b/migration/savevm.h
index b3d1e8a13c..e4efd243f3 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -14,6 +14,8 @@
 #ifndef MIGRATION_SAVEVM_H
 #define MIGRATION_SAVEVM_H
 
+#include "migration/register.h"
+
 #define QEMU_VM_FILE_MAGIC           0x5145564d
 #define QEMU_VM_FILE_VERSION_COMPAT  0x00000002
 #define QEMU_VM_FILE_VERSION         0x00000003
@@ -43,6 +45,7 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(MigrationState *s);
+void qemu_savevm_query_pending(MigPendingData *pending, bool exact);
 void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
                                      uint64_t *can_postcopy);
 void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index d808ece3b9..a22469a9e9 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -187,15 +187,15 @@ static int cmma_save_setup(QEMUFile *f, void *opaque, Error **errp)
     return 0;
 }
 
-static void cmma_state_pending(void *opaque, uint64_t *must_precopy,
-                               uint64_t *can_postcopy)
+static void cmma_state_pending(void *opaque, MigPendingData *pending,
+                               bool exact)
 {
     S390StAttribState *sas = S390_STATTRIB(opaque);
     S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
     long long res = sac->get_dirtycount(sas);
 
     if (res >= 0) {
-        *must_precopy += res;
+        pending->precopy_bytes += res;
     }
 }
 
@@ -340,8 +340,7 @@ static SaveVMHandlers savevm_s390_stattrib_handlers = {
     .save_setup = cmma_save_setup,
     .save_live_iterate = cmma_save_iterate,
     .save_complete = cmma_save_complete,
-    .state_pending_exact = cmma_state_pending,
-    .state_pending_estimate = cmma_state_pending,
+    .save_query_pending = cmma_state_pending,
     .save_cleanup = cmma_save_cleanup,
     .load_state = cmma_load,
     .is_active = cmma_active,
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 5d5fca09bd..e965ba51fb 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -571,42 +571,39 @@ static void vfio_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_state_pending_estimate(void *opaque, uint64_t *must_precopy,
-                                        uint64_t *can_postcopy)
+static void vfio_state_pending_sync(VFIODevice *vbasedev)
 {
-    VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
 
-    if (!vfio_device_state_is_precopy(vbasedev)) {
-        return;
-    }
-
-    *must_precopy +=
-        migration->precopy_init_size + migration->precopy_dirty_size;
+    vfio_query_stop_copy_size(vbasedev);
 
-    trace_vfio_state_pending_estimate(vbasedev->name, *must_precopy,
-                                      *can_postcopy,
-                                      migration->precopy_init_size,
-                                      migration->precopy_dirty_size);
+    if (vfio_device_state_is_precopy(vbasedev)) {
+        vfio_query_precopy_size(migration);
+    }
 }
 
-static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
-                                     uint64_t *can_postcopy)
+static void vfio_state_pending(void *opaque, MigPendingData *pending,
+                               bool exact)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
+    uint64_t remain;
 
-    vfio_query_stop_copy_size(vbasedev);
-    *must_precopy += migration->stopcopy_size;
-
-    if (vfio_device_state_is_precopy(vbasedev)) {
-        vfio_query_precopy_size(migration);
+    if (exact) {
+        vfio_state_pending_sync(vbasedev);
+        remain = migration->stopcopy_size;
+    } else {
+        if (!vfio_device_state_is_precopy(vbasedev)) {
+            return;
+        }
+        remain = migration->precopy_init_size + migration->precopy_dirty_size;
     }
 
-    trace_vfio_state_pending_exact(vbasedev->name, *must_precopy, *can_postcopy,
-                                   migration->stopcopy_size,
-                                   migration->precopy_init_size,
-                                   migration->precopy_dirty_size);
+    pending->precopy_bytes += remain;
+
+    trace_vfio_state_pending(vbasedev->name, migration->stopcopy_size,
+                             migration->precopy_init_size,
+                             migration->precopy_dirty_size, exact);
 }
 
 static bool vfio_is_active_iterate(void *opaque)
@@ -851,8 +848,7 @@ static const SaveVMHandlers savevm_vfio_handlers = {
     .save_prepare = vfio_save_prepare,
     .save_setup = vfio_save_setup,
     .save_cleanup = vfio_save_cleanup,
-    .state_pending_estimate = vfio_state_pending_estimate,
-    .state_pending_exact = vfio_state_pending_exact,
+    .save_query_pending = vfio_state_pending,
     .is_active_iterate = vfio_is_active_iterate,
     .save_live_iterate = vfio_save_iterate,
     .save_complete = vfio_save_complete_precopy,
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index a061aad817..15d417013c 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -766,9 +766,8 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void dirty_bitmap_state_pending(void *opaque,
-                                       uint64_t *must_precopy,
-                                       uint64_t *can_postcopy)
+static void dirty_bitmap_state_pending(void *opaque, MigPendingData *data,
+                                       bool exact)
 {
     DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms;
@@ -788,7 +787,7 @@ static void dirty_bitmap_state_pending(void *opaque,
 
     trace_dirty_bitmap_state_pending(pending);
 
-    *can_postcopy += pending;
+    data->postcopy_bytes += pending;
 }
 
 /* First occurrence of this bitmap. It should be created if doesn't exist */
@@ -1250,8 +1249,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
     .save_setup = dirty_bitmap_save_setup,
     .save_complete = dirty_bitmap_save_complete,
     .has_postcopy = dirty_bitmap_has_postcopy,
-    .state_pending_exact = dirty_bitmap_state_pending,
-    .state_pending_estimate = dirty_bitmap_state_pending,
+    .save_query_pending = dirty_bitmap_state_pending,
     .save_live_iterate = dirty_bitmap_save_iterate,
     .is_active_iterate = dirty_bitmap_is_active_iterate,
     .load_state = dirty_bitmap_load,
diff --git a/migration/ram.c b/migration/ram.c
index 979751f61b..e5b7217bf5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3443,30 +3443,18 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return qemu_fflush(f);
 }
 
-static void ram_state_pending_estimate(void *opaque, uint64_t *must_precopy,
-                                       uint64_t *can_postcopy)
-{
-    RAMState **temp = opaque;
-    RAMState *rs = *temp;
-
-    uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
-
-    if (migrate_postcopy_ram()) {
-        /* We can do postcopy, and all the data is postcopiable */
-        *can_postcopy += remaining_size;
-    } else {
-        *must_precopy += remaining_size;
-    }
-}
-
-static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy,
-                                    uint64_t *can_postcopy)
+static void ram_state_pending(void *opaque, MigPendingData *pending,
+                              bool exact)
 {
     RAMState **temp = opaque;
     RAMState *rs = *temp;
     uint64_t remaining_size;
 
-    if (!migration_in_postcopy()) {
+    /*
+     * Sync is not needed either with: (1) a fast query, or (2) after
+     * postcopy has started (no new dirty will generate anymore).
+     */
+    if (exact && !migration_in_postcopy()) {
         bql_lock();
         WITH_RCU_READ_LOCK_GUARD() {
             migration_bitmap_sync_precopy(false);
@@ -3478,9 +3466,9 @@ static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy,
 
     if (migrate_postcopy_ram()) {
         /* We can do postcopy, and all the data is postcopiable */
-        *can_postcopy += remaining_size;
+        pending->postcopy_bytes += remaining_size;
     } else {
-        *must_precopy += remaining_size;
+        pending->precopy_bytes += remaining_size;
     }
 }
 
@@ -4703,8 +4691,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_live_iterate = ram_save_iterate,
     .save_complete = ram_save_complete,
     .has_postcopy = ram_has_postcopy,
-    .state_pending_exact = ram_state_pending_exact,
-    .state_pending_estimate = ram_state_pending_estimate,
+    .save_query_pending = ram_state_pending,
     .load_state = ram_load,
     .save_cleanup = ram_save_cleanup,
     .load_setup = ram_load_setup,
diff --git a/migration/savevm.c b/migration/savevm.c
index dd58f2a705..392d840955 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1762,46 +1762,44 @@ int qemu_savevm_state_complete_precopy(MigrationState *s)
     return qemu_fflush(f);
 }
 
-/* Give an estimate of the amount left to be transferred,
- * the result is split into the amount for units that can and
- * for units that can't do postcopy.
- */
-void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
-                                        uint64_t *can_postcopy)
+void qemu_savevm_query_pending(MigPendingData *pending, bool exact)
 {
     SaveStateEntry *se;
 
-    *must_precopy = 0;
-    *can_postcopy = 0;
+    pending->precopy_bytes = 0;
+    pending->postcopy_bytes = 0;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->state_pending_estimate) {
+        if (!se->ops || !se->ops->save_query_pending) {
             continue;
         }
         if (!qemu_savevm_state_active(se)) {
             continue;
         }
-        se->ops->state_pending_estimate(se->opaque, must_precopy, can_postcopy);
+        se->ops->save_query_pending(se->opaque, pending, exact);
     }
 }
 
+void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
+                                        uint64_t *can_postcopy)
+{
+    MigPendingData pending;
+
+    qemu_savevm_query_pending(&pending, false);
+
+    *must_precopy = pending.precopy_bytes;
+    *can_postcopy = pending.postcopy_bytes;
+}
+
 void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
                                      uint64_t *can_postcopy)
 {
-    SaveStateEntry *se;
+    MigPendingData pending;
 
-    *must_precopy = 0;
-    *can_postcopy = 0;
+    qemu_savevm_query_pending(&pending, true);
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->state_pending_exact) {
-            continue;
-        }
-        if (!qemu_savevm_state_active(se)) {
-            continue;
-        }
-        se->ops->state_pending_exact(se->opaque, must_precopy, can_postcopy);
-    }
+    *must_precopy = pending.precopy_bytes;
+    *can_postcopy = pending.postcopy_bytes;
 }
 
 void qemu_savevm_state_cleanup(void)
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 846e3625c5..287df0b8cb 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -173,8 +173,7 @@ vfio_save_device_config_state(const char *name) " (%s)"
 vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size %"PRIu64" precopy dirty size %"PRIu64
 vfio_save_iterate_start(const char *name) " (%s)"
 vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size %"PRIu64
-vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy %"PRIu64" postcopy %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64
-vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy %"PRIu64" postcopy %"PRIu64" stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64
+vfio_state_pending(const char *name, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size, bool exact) " (%s) stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64 " exact %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
 vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 06/16] migration: Use the new save_query_pending() API directly
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (4 preceding siblings ...)
  2026-04-21 20:20 ` [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-21 20:21 ` [PATCH v2 07/16] migration: Introduce stopcopy_bytes in save_query_pending() Peter Xu
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

It's easier to use the new API directly in the migration iterations.  This
also paves way for follow up patches to add new data to report directly to
the iterator function.

When at it, merge the tracepoints too into one.

No functional change intended.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.h     |  4 ----
 migration/migration.c  | 16 +++++++---------
 migration/savevm.c     | 23 ++---------------------
 migration/trace-events |  3 +--
 4 files changed, 10 insertions(+), 36 deletions(-)

diff --git a/migration/savevm.h b/migration/savevm.h
index e4efd243f3..96fdf96d4e 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -46,10 +46,6 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(MigrationState *s);
 void qemu_savevm_query_pending(MigPendingData *pending, bool exact);
-void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
-                                     uint64_t *can_postcopy);
-void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
-                                        uint64_t *can_postcopy);
 int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy);
 bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp);
 void qemu_savevm_state_end(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index dfc60372cf..68cfe2d3bf 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3204,17 +3204,17 @@ typedef enum {
  */
 static MigIterateState migration_iteration_run(MigrationState *s)
 {
-    uint64_t must_precopy, can_postcopy, pending_size;
     Error *local_err = NULL;
     bool in_postcopy = (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
                         s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
     bool can_switchover = migration_can_switchover(s);
+    MigPendingData pending = { };
+    uint64_t pending_size;
     bool complete_ready;
 
     /* Fast path - get the estimated amount of pending data */
-    qemu_savevm_state_pending_estimate(&must_precopy, &can_postcopy);
-    pending_size = must_precopy + can_postcopy;
-    trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy);
+    qemu_savevm_query_pending(&pending, false);
+    pending_size = pending.precopy_bytes + pending.postcopy_bytes;
 
     if (in_postcopy) {
         /*
@@ -3243,14 +3243,12 @@ static MigIterateState migration_iteration_run(MigrationState *s)
          * during postcopy phase.
          */
         if (pending_size <= s->threshold_size) {
-            qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
-            pending_size = must_precopy + can_postcopy;
-            trace_migrate_pending_exact(pending_size, must_precopy,
-                                        can_postcopy);
+            qemu_savevm_query_pending(&pending, true);
+            pending_size = pending.precopy_bytes + pending.postcopy_bytes;
         }
 
         /* Should we switch to postcopy now? */
-        if (must_precopy <= s->threshold_size &&
+        if (pending.precopy_bytes <= s->threshold_size &&
             can_switchover && qatomic_read(&s->start_postcopy)) {
             if (postcopy_start(s, &local_err)) {
                 migrate_error_propagate(s, error_copy(local_err));
diff --git a/migration/savevm.c b/migration/savevm.c
index 392d840955..397f602257 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1778,28 +1778,9 @@ void qemu_savevm_query_pending(MigPendingData *pending, bool exact)
         }
         se->ops->save_query_pending(se->opaque, pending, exact);
     }
-}
-
-void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
-                                        uint64_t *can_postcopy)
-{
-    MigPendingData pending;
-
-    qemu_savevm_query_pending(&pending, false);
-
-    *must_precopy = pending.precopy_bytes;
-    *can_postcopy = pending.postcopy_bytes;
-}
-
-void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
-                                     uint64_t *can_postcopy)
-{
-    MigPendingData pending;
-
-    qemu_savevm_query_pending(&pending, true);
 
-    *must_precopy = pending.precopy_bytes;
-    *can_postcopy = pending.postcopy_bytes;
+    trace_qemu_savevm_query_pending(exact, pending->precopy_bytes,
+                                    pending->postcopy_bytes);
 }
 
 void qemu_savevm_state_cleanup(void)
diff --git a/migration/trace-events b/migration/trace-events
index 60e5087e38..f8995b8d0d 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -7,6 +7,7 @@ qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
 qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
 qemu_savevm_send_packaged(void) ""
+qemu_savevm_query_pending(bool exact, uint64_t precopy, uint64_t postcopy) "exact=%d, precopy=%"PRIu64", postcopy=%"PRIu64
 loadvm_state_switchover_ack_needed(unsigned int switchover_ack_pending_num) "Switchover ack pending num=%u"
 loadvm_state_setup(void) ""
 loadvm_state_cleanup(void) ""
@@ -159,8 +160,6 @@ migration_cleanup(void) ""
 migrate_error(const char *error_desc) "error=%s"
 migration_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
-migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
-migrate_pending_estimate(uint64_t size, uint64_t pre, uint64_t post) "estimate pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
 migration_completion_file_err(void) ""
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 07/16] migration: Introduce stopcopy_bytes in save_query_pending()
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (5 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 06/16] migration: Use the new save_query_pending() API directly Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-22 13:16   ` Juraj Marcin
  2026-04-23 15:05   ` Avihai Horon
  2026-04-21 20:21 ` [PATCH v2 08/16] vfio/migration: Fix incorrect reporting for VFIO pending data Peter Xu
                   ` (9 subsequent siblings)
  16 siblings, 2 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

Allow modules to report data that can only be migrated after VM is stopped.

When this concept is introduced, we will need to account stopcopy size to
be part of pending_size as before.

However, when there're data only can be migrated in stopcopy phase, it
means the old "pending_size" may not always be able to reach low enough to
kickoff an slow version of query sync.

It used to be almost guaranteed to happen as all prior iterative modules
doesn't have stopcopy only data.  VFIO may change that fact by having some
data that must be copied during stop phase.

So we need to make sure QEMU will kickoff a synchronized version of query
pending when all precopy data is migrated.  This might be important to VFIO
to keep making progress even if the downtime cannot yet be satisfied.

So far, this patch should introduce no functional change, as no module yet
report stopcopy size.

This paves way for VFIO to properly report its pending data sizes, which
will start to include stop-only data.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/migration/register.h |  7 ++++
 migration/migration.c        | 65 ++++++++++++++++++++++++++++++------
 migration/savevm.c           | 10 ++++--
 migration/trace-events       |  2 +-
 4 files changed, 70 insertions(+), 14 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index e2117e8dd4..5e5e0ee432 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -21,6 +21,13 @@ typedef struct MigPendingData {
     uint64_t precopy_bytes;
     /* Amount of pending bytes can be transferred in postcopy */
     uint64_t postcopy_bytes;
+    /* Amount of pending bytes can be transferred only in stopcopy */
+    uint64_t stopcopy_bytes;
+    /*
+     * Total pending data, modules do not need to update this field, it
+     * will be automatically calculated by migration core API.
+     */
+    uint64_t total_bytes;
 } MigPendingData;
 
 /**
diff --git a/migration/migration.c b/migration/migration.c
index 68cfe2d3bf..4b54fda4d7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3198,6 +3198,54 @@ typedef enum {
     MIG_ITERATE_BREAK,          /* Break the loop */
 } MigIterateState;
 
+/* Are we ready to move to the next iteration phase? */
+static bool migration_iteration_next_ready(MigrationState *s,
+                                           MigPendingData *pending)
+{
+    /*
+     * If the estimated values already suggest us to switchover, mark this
+     * iteration finished, time to do a slow sync.
+     */
+    if (pending->total_bytes <= s->threshold_size) {
+        return true;
+    }
+
+    /*
+     * Since we may have modules reporting stop-only data, we also want to
+     * re-query with slow mode if all precopy data is moved over.  This
+     * will also mark the current iteration done.
+     *
+     * This could happen when e.g. a module (like, VFIO) reports stopcopy
+     * size too large so it will never yet satisfy the downtime with the
+     * current setup (above check).  Here, slow version of re-query helps
+     * because we keep trying the best to move whatever we have.
+     */
+    if (pending->precopy_bytes == 0) {
+        return true;
+    }
+
+    return false;
+}
+
+static void migration_iteration_go_next(MigPendingData *pending)
+{
+    /*
+     * Do a slow sync will achieve this.  TODO: move RAM iteration code
+     * into the core layer.
+     */
+    qemu_savevm_query_pending(pending, true);
+}
+
+static bool postcopy_should_start(MigrationState *s, MigPendingData *pending)
+{
+    /* If postcopy's switchver will violate user specified downtime, stop */
+    if (pending->precopy_bytes + pending->stopcopy_bytes > s->threshold_size) {
+        return false;
+    }
+
+    return qatomic_read(&s->start_postcopy);
+}
+
 /*
  * Return true if continue to the next iteration directly, false
  * otherwise.
@@ -3209,12 +3257,10 @@ static MigIterateState migration_iteration_run(MigrationState *s)
                         s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
     bool can_switchover = migration_can_switchover(s);
     MigPendingData pending = { };
-    uint64_t pending_size;
     bool complete_ready;
 
     /* Fast path - get the estimated amount of pending data */
     qemu_savevm_query_pending(&pending, false);
-    pending_size = pending.precopy_bytes + pending.postcopy_bytes;
 
     if (in_postcopy) {
         /*
@@ -3222,7 +3268,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
          * postcopy completion doesn't rely on can_switchover, because when
          * POSTCOPY_ACTIVE it means switchover already happened.
          */
-        complete_ready = !pending_size;
+        complete_ready = !pending.total_bytes;
         if (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE &&
             (s->postcopy_package_loaded || complete_ready)) {
             /*
@@ -3242,14 +3288,12 @@ static MigIterateState migration_iteration_run(MigrationState *s)
          * postcopy started, so ESTIMATE should always match with EXACT
          * during postcopy phase.
          */
-        if (pending_size <= s->threshold_size) {
-            qemu_savevm_query_pending(&pending, true);
-            pending_size = pending.precopy_bytes + pending.postcopy_bytes;
+        if (migration_iteration_next_ready(s, &pending)) {
+            migration_iteration_go_next(&pending);
         }
 
         /* Should we switch to postcopy now? */
-        if (pending.precopy_bytes <= s->threshold_size &&
-            can_switchover && qatomic_read(&s->start_postcopy)) {
+        if (can_switchover && postcopy_should_start(s, &pending)) {
             if (postcopy_start(s, &local_err)) {
                 migrate_error_propagate(s, error_copy(local_err));
                 error_report_err(local_err);
@@ -3264,11 +3308,12 @@ static MigIterateState migration_iteration_run(MigrationState *s)
          * (2) Pending size is no more than the threshold specified
          *     (which was calculated from expected downtime)
          */
-        complete_ready = can_switchover && (pending_size <= s->threshold_size);
+        complete_ready = can_switchover &&
+            (pending.total_bytes <= s->threshold_size);
     }
 
     if (complete_ready) {
-        trace_migration_thread_low_pending(pending_size);
+        trace_migration_thread_low_pending(pending.total_bytes);
         migration_completion(s);
         return MIG_ITERATE_BREAK;
     }
diff --git a/migration/savevm.c b/migration/savevm.c
index 397f602257..d221e2961b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1766,8 +1766,7 @@ void qemu_savevm_query_pending(MigPendingData *pending, bool exact)
 {
     SaveStateEntry *se;
 
-    pending->precopy_bytes = 0;
-    pending->postcopy_bytes = 0;
+    memset(pending, 0, sizeof(*pending));
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops || !se->ops->save_query_pending) {
@@ -1779,8 +1778,13 @@ void qemu_savevm_query_pending(MigPendingData *pending, bool exact)
         se->ops->save_query_pending(se->opaque, pending, exact);
     }
 
+    pending->total_bytes = pending->precopy_bytes +
+        pending->stopcopy_bytes + pending->postcopy_bytes;
+
     trace_qemu_savevm_query_pending(exact, pending->precopy_bytes,
-                                    pending->postcopy_bytes);
+                                    pending->stopcopy_bytes,
+                                    pending->postcopy_bytes,
+                                    pending->total_bytes);
 }
 
 void qemu_savevm_state_cleanup(void)
diff --git a/migration/trace-events b/migration/trace-events
index f8995b8d0d..d2134af862 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -7,7 +7,7 @@ qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
 qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
 qemu_savevm_send_packaged(void) ""
-qemu_savevm_query_pending(bool exact, uint64_t precopy, uint64_t postcopy) "exact=%d, precopy=%"PRIu64", postcopy=%"PRIu64
+qemu_savevm_query_pending(bool exact, uint64_t precopy, uint64_t stopcopy, uint64_t postcopy, uint64_t total) "exact=%d, precopy=%"PRIu64", stopcopy=%"PRIu64", postcopy=%"PRIu64", total=%"PRIu64
 loadvm_state_switchover_ack_needed(unsigned int switchover_ack_pending_num) "Switchover ack pending num=%u"
 loadvm_state_setup(void) ""
 loadvm_state_cleanup(void) ""
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 08/16] vfio/migration: Fix incorrect reporting for VFIO pending data
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (6 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 07/16] migration: Introduce stopcopy_bytes in save_query_pending() Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-21 20:21 ` [PATCH v2 09/16] migration: Move iteration counter out of RAM Peter Xu
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

VFIO reports different things in its fast/slow version of query pending
results.  It was because it wants to make sure precopy data can reach 0,
which is needed to make sure sync queries will happen periodically over
time.

Now with stopcopy size reporting facility it doesn't need this hack
anymore.  Fix this by reporting the same values in fast/slow versions of
query pending request, except that the slow version will do a slow sync
with the hardwares.

When at it, removing the special casing for vfio_device_state_is_precopy()
which may reporting nothing in a fast query.  Then ther reporting will be
consistent to VFIO devices that do not support precopy phase.

Copy stable might be too much; just skip it and skip the Fixes.

Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Tested-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/vfio/migration.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e965ba51fb..e6e6a0d53d 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -587,19 +587,23 @@ static void vfio_state_pending(void *opaque, MigPendingData *pending,
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
-    uint64_t remain;
+    uint64_t precopy_size, stopcopy_size;
 
     if (exact) {
         vfio_state_pending_sync(vbasedev);
-        remain = migration->stopcopy_size;
+    }
+
+    precopy_size =
+        migration->precopy_init_size + migration->precopy_dirty_size;
+
+    if (migration->stopcopy_size > precopy_size) {
+        stopcopy_size = migration->stopcopy_size - precopy_size;
     } else {
-        if (!vfio_device_state_is_precopy(vbasedev)) {
-            return;
-        }
-        remain = migration->precopy_init_size + migration->precopy_dirty_size;
+        stopcopy_size = 0;
     }
 
-    pending->precopy_bytes += remain;
+    pending->precopy_bytes += precopy_size;
+    pending->stopcopy_bytes += stopcopy_size;
 
     trace_vfio_state_pending(vbasedev->name, migration->stopcopy_size,
                              migration->precopy_init_size,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 09/16] migration: Move iteration counter out of RAM
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (7 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 08/16] vfio/migration: Fix incorrect reporting for VFIO pending data Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-21 20:21 ` [PATCH v2 10/16] migration: Introduce a helper to return switchover bw estimate Peter Xu
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin, Hyman Huang,
	Prasad Pandit

It used to hide in RAM dirty sync path.  Now with more modules being able
to slow sync on dirty information, keeping it there may not be good anymore
because it's not RAM's own concept for iterations: all modules should
follow.

More importantly, mgmt may try to query dirty info (to make policy
decisions like adjusting downtime) by listening to iteration count changes
via QMP events.  So we must make sure the boost of iterations only happens
_after_ the dirty sync operations with whatever form (RAM's dirty bitmap
sync, or VFIO's different ioctls to fetch latest dirty info from kernel).

Move this to core migration path to manage, together with the event
generation, so that it can be well ordered with the sync operations for all
modules.

This brings a good side effect that we should have an old issue regarding
to cpu_throttle_dirty_sync_timer_tick() which can randomly boost iteration
counts (because it invokes sync ops).  Now it won't, which is actually the
right behavior.

Said that, we have code (not only QEMU, but likely mgmt too) assuming the
1st iteration will always shows dirty count to 1.  Make it initialized with
1 this time, because we'll miss the dirty sync for setup() on boosting this
counter now.

Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration-stats.h |  3 ++-
 migration/migration.c       | 29 ++++++++++++++++++++++++++---
 migration/ram.c             |  6 ------
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index 1153520f7a..326ddb0088 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -43,7 +43,8 @@ typedef struct {
      */
     uint64_t dirty_pages_rate;
     /*
-     * Number of times we have synchronized guest bitmaps.
+     * Number of times we have synchronized guest bitmaps.  This always
+     * starts from 1 for the 1st iteration.
      */
     uint64_t dirty_sync_count;
     /*
diff --git a/migration/migration.c b/migration/migration.c
index 4b54fda4d7..e3f82baaac 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1654,10 +1654,15 @@ int migrate_init(MigrationState *s, Error **errp)
     s->threshold_size = 0;
     s->switchover_acked = false;
     s->rdma_migration = false;
+
     /*
-     * set mig_stats memory to zero for a new migration
+     * set mig_stats memory to zero for a new migration.. except the
+     * iteration counter, which we want to make sure it returns 1 for the
+     * first iteration.
      */
     memset(&mig_stats, 0, sizeof(mig_stats));
+    mig_stats.dirty_sync_count = 1;
+
     migration_reset_vfio_bytes_transferred();
 
     s->postcopy_package_loaded = false;
@@ -3230,10 +3235,28 @@ static bool migration_iteration_next_ready(MigrationState *s,
 static void migration_iteration_go_next(MigPendingData *pending)
 {
     /*
-     * Do a slow sync will achieve this.  TODO: move RAM iteration code
-     * into the core layer.
+     * Do a slow sync first before boosting the iteration count.
      */
     qemu_savevm_query_pending(pending, true);
+
+    /*
+     * Boost dirty sync count to reflect we finished one iteration.
+     *
+     * NOTE: we need to make sure when this happens (together with the
+     * event sent below) all modules have slow-synced the pending data
+     * above.  That means a write mem barrier, but qatomic_add() should be
+     * enough.
+     *
+     * It's because a mgmt could wait on the iteration event to query again
+     * on pending data for policy changes (e.g. downtime adjustments).  The
+     * ordering will make sure the query will fetch the latest results from
+     * all the modules.
+     */
+    qatomic_add(&mig_stats.dirty_sync_count, 1);
+
+    if (migrate_events()) {
+        qapi_event_send_migration_pass(mig_stats.dirty_sync_count);
+    }
 }
 
 static bool postcopy_should_start(MigrationState *s, MigPendingData *pending)
diff --git a/migration/ram.c b/migration/ram.c
index e5b7217bf5..686162643d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1136,8 +1136,6 @@ static void migration_bitmap_sync(RAMState *rs, bool last_stage)
     RAMBlock *block;
     int64_t end_time;
 
-    qatomic_add(&mig_stats.dirty_sync_count, 1);
-
     if (!rs->time_last_bitmap_sync) {
         rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     }
@@ -1172,10 +1170,6 @@ static void migration_bitmap_sync(RAMState *rs, bool last_stage)
         rs->num_dirty_pages_period = 0;
         rs->bytes_xfer_prev = migration_transferred_bytes();
     }
-    if (migrate_events()) {
-        uint64_t generation = qatomic_read(&mig_stats.dirty_sync_count);
-        qapi_event_send_migration_pass(generation);
-    }
 }
 
 void migration_bitmap_sync_precopy(bool last_stage)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 10/16] migration: Introduce a helper to return switchover bw estimate
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (8 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 09/16] migration: Move iteration counter out of RAM Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-21 20:21 ` [PATCH v2 11/16] migration: Calculate expected downtime on demand Peter Xu
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

Add a helper migration_get_switchover_bw() to return an estimate of
switchover bandwidth.  Use it to simplify the current code.

This will be used in later to remove expected_downtime.

When at it, remove two qatomic_read() to shrink the lines because atomic
ops are not needed when it's always the same thread who does the updates.

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 48 +++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e3f82baaac..caa1d13130 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -984,6 +984,21 @@ void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value)
     migrate_send_rp_message(mis, MIG_RP_MSG_RESUME_ACK, sizeof(buf), &buf);
 }
 
+/*
+ * Returns the estimated switchover bandwidth (unit: bytes / seconds)
+ */
+static double migration_get_switchover_bw(MigrationState *s)
+{
+    uint64_t switchover_bw = migrate_avail_switchover_bandwidth();
+
+    if (switchover_bw) {
+        /* If user specified, prioritize this value and don't estimate */
+        return (double)switchover_bw;
+    }
+
+    return s->mbps / 8 * 1000 * 1000;
+}
+
 bool migration_is_running(void)
 {
     MigrationState *s = current_migration;
@@ -3126,37 +3141,22 @@ static void migration_update_counters(MigrationState *s,
 {
     uint64_t transferred, transferred_pages, time_spent;
     uint64_t current_bytes; /* bytes transferred since the beginning */
-    uint64_t switchover_bw;
-    /* Expected bandwidth when switching over to destination QEMU */
-    double expected_bw_per_ms;
-    double bandwidth;
+    double switchover_bw_per_ms;
 
     if (current_time < s->iteration_start_time + BUFFER_DELAY) {
         return;
     }
 
-    switchover_bw = migrate_avail_switchover_bandwidth();
     current_bytes = migration_transferred_bytes();
     transferred = current_bytes - s->iteration_initial_bytes;
     time_spent = current_time - s->iteration_start_time;
-    bandwidth = (double)transferred / time_spent;
-
-    if (switchover_bw) {
-        /*
-         * If the user specified a switchover bandwidth, let's trust the
-         * user so that can be more accurate than what we estimated.
-         */
-        expected_bw_per_ms = (double)switchover_bw / 1000;
-    } else {
-        /* If the user doesn't specify bandwidth, we use the estimated */
-        expected_bw_per_ms = bandwidth;
-    }
-
-    s->threshold_size = expected_bw_per_ms * migrate_downtime_limit();
-
     s->mbps = (((double) transferred * 8.0) /
                ((double) time_spent / 1000.0)) / 1000.0 / 1000.0;
 
+    /* NOTE: only update this after bandwidth (s->mbps) updated */
+    switchover_bw_per_ms = migration_get_switchover_bw(s) / 1000;
+    s->threshold_size = switchover_bw_per_ms * migrate_downtime_limit();
+
     transferred_pages = ram_get_total_transferred_pages() -
                             s->iteration_initial_pages;
     s->pages_per_second = (double) transferred_pages /
@@ -3166,10 +3166,9 @@ static void migration_update_counters(MigrationState *s,
      * if we haven't sent anything, we don't want to
      * recalculate. 10000 is a small enough number for our purposes
      */
-    if (qatomic_read(&mig_stats.dirty_pages_rate) &&
-        transferred > 10000) {
+    if (mig_stats.dirty_pages_rate && transferred > 10000) {
         s->expected_downtime =
-            qatomic_read(&mig_stats.dirty_bytes_last_sync) / expected_bw_per_ms;
+            mig_stats.dirty_bytes_last_sync / switchover_bw_per_ms;
     }
 
     migration_rate_reset();
@@ -3178,7 +3177,8 @@ static void migration_update_counters(MigrationState *s,
 
     trace_migrate_transferred(transferred, time_spent,
                               /* Both in unit bytes/ms */
-                              bandwidth, switchover_bw / 1000,
+                              (uint64_t)s->mbps,
+                              (uint64_t)switchover_bw_per_ms,
                               s->threshold_size);
 }
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 11/16] migration: Calculate expected downtime on demand
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (9 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 10/16] migration: Introduce a helper to return switchover bw estimate Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-21 20:21 ` [PATCH v2 12/16] migration: Fix calculation of expected_downtime to take VFIO info Peter Xu
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

This value does not need to be calculated as frequent.  Only calculate it
on demand when query-migrate happened.  With that we can remove the
variable in MigrationState.

This paves way for fixing this value to include all modules (not only RAM
but others too).

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h |  2 +-
 migration/migration.c | 25 ++++++++++++-------------
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index b6888daced..ba0f9e0f9c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -359,7 +359,6 @@ struct MigrationState {
     /* Timestamp when VM is down (ms) to migrate the last stuff */
     int64_t downtime_start;
     int64_t downtime;
-    int64_t expected_downtime;
     bool capabilities[MIGRATION_CAPABILITY__MAX];
     int64_t setup_time;
 
@@ -586,6 +585,7 @@ void migration_cancel(void);
 void migration_populate_vfio_info(MigrationInfo *info);
 void migration_reset_vfio_bytes_transferred(void);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
+int64_t migration_downtime_calc_expected(MigrationState *s);
 
 /*
  * Migration thread waiting for return path thread.  Return non-zero if an
diff --git a/migration/migration.c b/migration/migration.c
index caa1d13130..d4d3534cf1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1041,6 +1041,17 @@ static bool migrate_show_downtime(MigrationState *s)
     return (s->state == MIGRATION_STATUS_COMPLETED) || migration_in_postcopy();
 }
 
+/* Return expected downtime (unit: milliseconds) */
+int64_t migration_downtime_calc_expected(MigrationState *s)
+{
+    if (mig_stats.dirty_sync_count <= 1) {
+        return migrate_downtime_limit();
+    }
+
+    return mig_stats.dirty_bytes_last_sync /
+        migration_get_switchover_bw(s) * 1000;
+}
+
 static void populate_time_info(MigrationInfo *info, MigrationState *s)
 {
     info->has_status = true;
@@ -1061,7 +1072,7 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
         info->downtime = s->downtime;
     } else {
         info->has_expected_downtime = true;
-        info->expected_downtime = s->expected_downtime;
+        info->expected_downtime = migration_downtime_calc_expected(s);
     }
 }
 
@@ -1649,7 +1660,6 @@ int migrate_init(MigrationState *s, Error **errp)
     s->mbps = 0.0;
     s->pages_per_second = 0.0;
     s->downtime = 0;
-    s->expected_downtime = 0;
     s->setup_time = 0;
     s->start_postcopy = false;
     s->migration_thread_running = false;
@@ -3162,15 +3172,6 @@ static void migration_update_counters(MigrationState *s,
     s->pages_per_second = (double) transferred_pages /
                              (((double) time_spent / 1000.0));
 
-    /*
-     * if we haven't sent anything, we don't want to
-     * recalculate. 10000 is a small enough number for our purposes
-     */
-    if (mig_stats.dirty_pages_rate && transferred > 10000) {
-        s->expected_downtime =
-            mig_stats.dirty_bytes_last_sync / switchover_bw_per_ms;
-    }
-
     migration_rate_reset();
 
     update_iteration_initial_status(s);
@@ -3825,8 +3826,6 @@ void migration_start_outgoing(MigrationState *s)
     bool resume = (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP);
     int ret;
 
-    s->expected_downtime = migrate_downtime_limit();
-
     if (resume) {
         /* This is a resumed migration */
         rate_limit = migrate_max_postcopy_bandwidth();
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 12/16] migration: Fix calculation of expected_downtime to take VFIO info
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (10 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 11/16] migration: Calculate expected downtime on demand Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-21 20:21 ` [PATCH v2 13/16] migration: Remember total dirty bytes in mig_stats Peter Xu
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

QEMU will provide an expected downtime for the whole system during
migration, by remembering the total dirty RAM that we synced the last time,
divides the estimated switchover bandwidth.

That was flawed when VFIO is taking into account: consider there is a VFIO
GPU device that contains GBs of data to migrate during stop phase.  Those
will not be accounted in this math.

Fix it by updating dirty_bytes_last_sync properly only when we go to the
next iteration, rather than hide this update in the RAM code.  Meanwhile,
fetch the total (rather than RAM-only) portion of dirty bytes, so as to
include GPU device states too.

Update the comment of the field to reflect its new meaning.

Now after this change, the expected-downtime to be read from query-migrate
should be very accurate even with VFIO devices involved.

Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration-stats.h |  8 +++-----
 migration/migration.c       | 11 ++++++++---
 migration/ram.c             |  1 -
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index 326ddb0088..1775b916df 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -31,11 +31,9 @@
  */
 typedef struct {
     /*
-     * Number of bytes that were dirty last time that we synced with
-     * the guest memory.  We use that to calculate the downtime.  As
-     * the remaining dirty amounts to what we know that is still dirty
-     * since last iteration, not counting what the guest has dirtied
-     * since we synchronized bitmaps.
+     * Number of bytes that were reported dirty after the latest
+     * system-wise synchronization of dirty information.  It is used to do
+     * best-effort estimation on expected downtime.
      */
     uint64_t dirty_bytes_last_sync;
     /*
diff --git a/migration/migration.c b/migration/migration.c
index d4d3534cf1..5d68591215 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3240,18 +3240,23 @@ static void migration_iteration_go_next(MigPendingData *pending)
      */
     qemu_savevm_query_pending(pending, true);
 
+    /*
+     * Update the dirty information for the whole system for this
+     * iteration.  This value is used to calculate expected downtime.
+     */
+    qatomic_set(&mig_stats.dirty_bytes_last_sync, pending->total_bytes);
+
     /*
      * Boost dirty sync count to reflect we finished one iteration.
      *
      * NOTE: we need to make sure when this happens (together with the
      * event sent below) all modules have slow-synced the pending data
-     * above.  That means a write mem barrier, but qatomic_add() should be
-     * enough.
+     * above and updated corresponding fields (e.g. dirty_bytes_last_sync).
      *
      * It's because a mgmt could wait on the iteration event to query again
      * on pending data for policy changes (e.g. downtime adjustments).  The
      * ordering will make sure the query will fetch the latest results from
-     * all the modules.
+     * all the modules on everything.
      */
     qatomic_add(&mig_stats.dirty_sync_count, 1);
 
diff --git a/migration/ram.c b/migration/ram.c
index 686162643d..d927ad7508 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1148,7 +1148,6 @@ static void migration_bitmap_sync(RAMState *rs, bool last_stage)
             RAMBLOCK_FOREACH_NOT_IGNORED(block) {
                 ramblock_sync_dirty_bitmap(rs, block);
             }
-            qatomic_set(&mig_stats.dirty_bytes_last_sync, ram_bytes_remaining());
         }
     }
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 13/16] migration: Remember total dirty bytes in mig_stats
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (11 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 12/16] migration: Fix calculation of expected_downtime to take VFIO info Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-22 13:18   ` Juraj Marcin
  2026-04-21 20:21 ` [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports Peter Xu
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

Introduce this new counter to remember the total dirty bytes for the whole
system.  It will be used for query-migrate command to fetch system-wise
remaining data.

A prior attempt was made to not use this counter but query directly from
all the modules in a QMP handler, but it exposed some complexity not only
on migration state machine race conditions (where the query may be invoked
anytime of the state machine), or on locking implications (where some of
the query hooks may take BQL, which is illegal at least in a QMP handler).
For more information, see:

https://lore.kernel.org/r/aeZMtxqrKWAMKzdN@x1.local

This oneliner will resolve everything, except that it is not as accurate.
The hope is it is a worthwhile trade-off solution, after knowing above
challenges.

Now, there is one more reason we should make each invocation of
save_live_iterate() to be lightweight, because this counter will only get
updated once for each loop over all save_live_iterate() hooks when present.
But that's always the goal.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration-stats.h | 7 +++++++
 migration/savevm.c          | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index 1775b916df..9f9a8eb9eb 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -36,6 +36,13 @@ typedef struct {
      * best-effort estimation on expected downtime.
      */
     uint64_t dirty_bytes_last_sync;
+    /*
+     * Number of bytes that were reported dirty now.  This is an estimate
+     * value and will be updated every time migration thread queries from
+     * modules in an iteration loop.  It is used to provide best-effort
+     * estimation on total remaining data.
+     */
+    uint64_t dirty_bytes_total;
     /*
      * Number of pages dirtied per second.
      */
diff --git a/migration/savevm.c b/migration/savevm.c
index d221e2961b..b49a80f574 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1781,6 +1781,13 @@ void qemu_savevm_query_pending(MigPendingData *pending, bool exact)
     pending->total_bytes = pending->precopy_bytes +
         pending->stopcopy_bytes + pending->postcopy_bytes;
 
+    /*
+     * Update system remaining dirty bytes whenever QEMU queries.  It will
+     * make the value to be not as accurate, but should still be pretty
+     * close to reality when this got invoked frequently while iterating.
+     */
+    mig_stats.dirty_bytes_total = pending->total_bytes;
+
     trace_qemu_savevm_query_pending(exact, pending->precopy_bytes,
                                     pending->stopcopy_bytes,
                                     pending->postcopy_bytes,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (12 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 13/16] migration: Remember total dirty bytes in mig_stats Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-24  7:17   ` Markus Armbruster
  2026-04-21 20:21 ` [PATCH v2 15/16] migration/qapi: Update unit for avail-switchover-bandwidth Peter Xu
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin,
	Dr. David Alan Gilbert

Currently, mgmt can only query for remaining RAM, not system-wise remaining
data.  It was not a problem before, because for a very long time RAM was
the only part that matters.

After VFIO migrations landed upstream, it may not be true anymore
especially considering that there can be GPU devices that contain GBs of
device states.

Add a new "remaining" field in query-migrate results, reflecting
system-wise remaining data, which will include everything (e.g. VFIO).

This information will be useful for mgmt to implement generic way of stall
detection that covers all system resources.  Say, when system remaining
data does not decrease anymore for a relatively long period of time, then
it may mean that there is a challenge of converging, so mgmt can act based
on how this value changes over time (especially if sampled after each
migration iteration).

Before this patch, "expected_downtime" almost played this role. For
example, by monitoring "expected_downtime" at the beginning of each
iteration can in most cases also reflect the progress of migration
system-wise.  Said that, "expected_downtime" was always calculated based on
a bandwidth value that can fluctuate a lot if avail-switchover-bandwidth is
not used. This new "remaining" field will remove that part of uncertainty
for mgmt.

With the new field, HMP "info migrate" now reports this:

(qemu) info migrate
Status:                 active
Time (ms):              total=12080, setup=14, exp_down=300
Remaining:              1.36 GiB        <------------------- newline
RAM info:
  Throughput (Mbps):    840.50
  Sizes:                pagesize=4 KiB, total=4.02 GiB
  Transfers:            transferred=1.18 GiB, remain=1.36 GiB
    Channels:           precopy=1.18 GiB, multifd=0 B, postcopy=0 B
    Page Types:         normal=307923, zero=388148
  Page Rates (pps):     transfer=25660
  Others:               dirty_syncs=1

It should be the same value as RAM's remaining report when VFIO is not
involved, and it should report more than that when VFIO is involved.

Cc: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json            | 4 ++++
 migration/migration-hmp-cmds.c | 5 +++++
 migration/migration.c          | 7 +++++++
 3 files changed, 16 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index e3ad3f0604..a6e24b5685 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -300,6 +300,9 @@
 #     average memory load of the virtual CPU indirectly.  Note that
 #     zero means guest doesn't dirty memory.  (Since 8.1)
 #
+# @remaining: amount of bytes remaining to be migrated system-wise,
+#     includes both RAM and all devices (like VFIO).  (Since 11.1)
+#
 # Features:
 #
 # @unstable: Members @postcopy-latency, @postcopy-vcpu-latency,
@@ -310,6 +313,7 @@
 ##
 { 'struct': 'MigrationInfo',
   'data': {'*status': 'MigrationStatus', '*ram': 'MigrationRAMStats',
+           '*remaining': 'uint64',
            '*vfio': 'VfioStats',
            '*xbzrle-cache': 'XBZRLECacheStats',
            '*total-time': 'int',
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 0a193b8f54..a3887cc0d7 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -178,6 +178,11 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
         }
     }
 
+    if (info->has_remaining) {
+        g_autofree char *remaining = size_to_str(info->remaining);
+        monitor_printf(mon, "Remaining: \t\t%s\n", remaining);
+    }
+
     if (info->has_socket_address) {
         SocketAddressList *addr;
 
diff --git a/migration/migration.c b/migration/migration.c
index 5d68591215..6fd89995a2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1076,6 +1076,12 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
     }
 }
 
+static void populate_global_info(MigrationInfo *info, MigrationState *s)
+{
+    info->has_remaining = true;
+    info->remaining = qatomic_read(&mig_stats.dirty_bytes_total);
+}
+
 static void populate_ram_info(MigrationInfo *info, MigrationState *s)
 {
     size_t page_size = qemu_target_page_size();
@@ -1177,6 +1183,7 @@ static void fill_source_migration_info(MigrationInfo *info)
         /* TODO add some postcopy stats */
         populate_time_info(info, s);
         populate_ram_info(info, s);
+        populate_global_info(info, s);
         migration_populate_vfio_info(info);
         break;
     case MIGRATION_STATUS_COLO:
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 15/16] migration/qapi: Update unit for avail-switchover-bandwidth
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (13 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-24  7:18   ` Markus Armbruster
  2026-04-21 20:21 ` [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls Peter Xu
  2026-04-29 19:52 ` [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
  16 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

Add ", in bytes per second".  Unfortunately indentations need to be updated
completely, but no change on the rest.

Cc: Markus Armbruster <armbru@redhat.com>
Suggested-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index a6e24b5685..b7518b29c6 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -921,15 +921,15 @@
 #     (Since 2.8)
 #
 # @avail-switchover-bandwidth: to set the available bandwidth that
-#     migration can use during switchover phase.  **Note:** this does
-#     not limit the bandwidth during switchover, but only for
-#     calculations when making decisions to switchover.  By default,
-#     this value is zero, which means QEMU will estimate the bandwidth
-#     automatically.  This can be set when the estimated value is not
-#     accurate, while the user is able to guarantee such bandwidth is
-#     available when switching over.  When specified correctly, this
-#     can make the switchover decision much more accurate.
-#     (Since 8.2)
+#     migration can use during switchover phase, in bytes per
+#     second.  **Note:** this does not limit the bandwidth during
+#     switchover, but only for calculations when making decisions to
+#     switchover.  By default, this value is zero, which means QEMU
+#     will estimate the bandwidth automatically.  This can be set
+#     when the estimated value is not accurate, while the user is
+#     able to guarantee such bandwidth is available when switching
+#     over.  When specified correctly, this can make the switchover
+#     decision much more accurate.  (Since 8.2)
 #
 # @downtime-limit: set maximum tolerated downtime for migration.
 #     maximum downtime in milliseconds (Since 2.8)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (14 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 15/16] migration/qapi: Update unit for avail-switchover-bandwidth Peter Xu
@ 2026-04-21 20:21 ` Peter Xu
  2026-04-22  7:51   ` Cédric Le Goater
                     ` (4 more replies)
  2026-04-29 19:52 ` [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
  16 siblings, 5 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

Add two tracepoints for both precopy and stopcopy query ioctls.  When at
it, add one warn_report_once() for each of them when it fails.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/vfio/migration.c  | 33 +++++++++++++++++++++++----------
 hw/vfio/trace-events |  2 ++
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e6e6a0d53d..04d9f94edb 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -329,6 +329,7 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
     struct vfio_device_feature_mig_data_size *mig_data_size =
         (struct vfio_device_feature_mig_data_size *)feature->data;
     VFIOMigration *migration = vbasedev->migration;
+    int ret;
 
     feature->argsz = sizeof(buf);
     feature->flags =
@@ -340,12 +341,18 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
          * is reported so downtime limit won't be violated.
          */
         migration->stopcopy_size = VFIO_MIG_STOP_COPY_SIZE;
-        return -errno;
+        ret = -errno;
+        warn_report_once("VFIO device %s ioctl(VFIO_DEVICE_FEATURE) on "
+                         "VFIO_DEVICE_FEATURE_MIG_DATA_SIZE failed (%d)",
+                         vbasedev->name, ret);
+    } else {
+        migration->stopcopy_size = mig_data_size->stop_copy_length;
+        ret = 0;
     }
 
-    migration->stopcopy_size = mig_data_size->stop_copy_length;
+    trace_vfio_query_stop_copy_size(migration->stopcopy_size, ret);
 
-    return 0;
+    return ret;
 }
 
 static int vfio_query_precopy_size(VFIOMigration *migration)
@@ -353,18 +360,24 @@ static int vfio_query_precopy_size(VFIOMigration *migration)
     struct vfio_precopy_info precopy = {
         .argsz = sizeof(precopy),
     };
-
-    migration->precopy_init_size = 0;
-    migration->precopy_dirty_size = 0;
+    int ret;
 
     if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO, &precopy)) {
-        return -errno;
+        migration->precopy_init_size = 0;
+        migration->precopy_dirty_size = 0;
+        ret = -errno;
+        warn_report_once("VFIO device %s ioctl(VFIO_MIG_GET_PRECOPY_INFO) "
+                         "failed (%d)", migration->vbasedev->name, ret);
+    } else {
+        migration->precopy_init_size = precopy.initial_bytes;
+        migration->precopy_dirty_size = precopy.dirty_bytes;
+        ret = 0;
     }
 
-    migration->precopy_init_size = precopy.initial_bytes;
-    migration->precopy_dirty_size = precopy.dirty_bytes;
+    trace_vfio_query_precopy_size(migration->precopy_init_size,
+                                  migration->precopy_dirty_size, ret);
 
-    return 0;
+    return ret;
 }
 
 /* Returns the size of saved data on success and -errno on error */
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 287df0b8cb..854a7e4b19 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -176,6 +176,8 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
 vfio_state_pending(const char *name, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size, bool exact) " (%s) stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64 " exact %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
 vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
+vfio_query_stop_copy_size(uint64_t size, int ret) "stopcopy size %"PRIu64" ret %d"
+vfio_query_precopy_size(uint64_t init_size, uint64_t dirty_size, int ret) "init %"PRIu64" dirty %"PRIu64" ret %d"
 
 #iommufd.c
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls
  2026-04-21 20:21 ` [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls Peter Xu
@ 2026-04-22  7:51   ` Cédric Le Goater
  2026-04-22  7:52   ` Cédric Le Goater
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2026-04-22  7:51 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Joao Martins, Markus Armbruster, Avihai Horon,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin

On 4/21/26 22:21, Peter Xu wrote:
> Add two tracepoints for both precopy and stopcopy query ioctls.  When at
> it, add one warn_report_once() for each of them when it fails.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>



Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/migration.c  | 33 +++++++++++++++++++++++----------
>   hw/vfio/trace-events |  2 ++
>   2 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index e6e6a0d53d..04d9f94edb 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -329,6 +329,7 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
>       struct vfio_device_feature_mig_data_size *mig_data_size =
>           (struct vfio_device_feature_mig_data_size *)feature->data;
>       VFIOMigration *migration = vbasedev->migration;
> +    int ret;
>   
>       feature->argsz = sizeof(buf);
>       feature->flags =
> @@ -340,12 +341,18 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
>            * is reported so downtime limit won't be violated.
>            */
>           migration->stopcopy_size = VFIO_MIG_STOP_COPY_SIZE;
> -        return -errno;
> +        ret = -errno;
> +        warn_report_once("VFIO device %s ioctl(VFIO_DEVICE_FEATURE) on "
> +                         "VFIO_DEVICE_FEATURE_MIG_DATA_SIZE failed (%d)",
> +                         vbasedev->name, ret);
> +    } else {
> +        migration->stopcopy_size = mig_data_size->stop_copy_length;
> +        ret = 0;
>       }
>   
> -    migration->stopcopy_size = mig_data_size->stop_copy_length;
> +    trace_vfio_query_stop_copy_size(migration->stopcopy_size, ret);
>   
> -    return 0;
> +    return ret;
>   }
>   
>   static int vfio_query_precopy_size(VFIOMigration *migration)
> @@ -353,18 +360,24 @@ static int vfio_query_precopy_size(VFIOMigration *migration)
>       struct vfio_precopy_info precopy = {
>           .argsz = sizeof(precopy),
>       };
> -
> -    migration->precopy_init_size = 0;
> -    migration->precopy_dirty_size = 0;
> +    int ret;
>   
>       if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO, &precopy)) {
> -        return -errno;
> +        migration->precopy_init_size = 0;
> +        migration->precopy_dirty_size = 0;
> +        ret = -errno;
> +        warn_report_once("VFIO device %s ioctl(VFIO_MIG_GET_PRECOPY_INFO) "
> +                         "failed (%d)", migration->vbasedev->name, ret);
> +    } else {
> +        migration->precopy_init_size = precopy.initial_bytes;
> +        migration->precopy_dirty_size = precopy.dirty_bytes;
> +        ret = 0;
>       }
>   
> -    migration->precopy_init_size = precopy.initial_bytes;
> -    migration->precopy_dirty_size = precopy.dirty_bytes;
> +    trace_vfio_query_precopy_size(migration->precopy_init_size,
> +                                  migration->precopy_dirty_size, ret);
>   
> -    return 0;
> +    return ret;
>   }
>   
>   /* Returns the size of saved data on success and -errno on error */
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 287df0b8cb..854a7e4b19 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -176,6 +176,8 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
>   vfio_state_pending(const char *name, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size, bool exact) " (%s) stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64 " exact %d"
>   vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
>   vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> +vfio_query_stop_copy_size(uint64_t size, int ret) "stopcopy size %"PRIu64" ret %d"
> +vfio_query_precopy_size(uint64_t init_size, uint64_t dirty_size, int ret) "init %"PRIu64" dirty %"PRIu64" ret %d"
>   
>   #iommufd.c
>   



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls
  2026-04-21 20:21 ` [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls Peter Xu
  2026-04-22  7:51   ` Cédric Le Goater
@ 2026-04-22  7:52   ` Cédric Le Goater
  2026-04-22  9:56   ` Cédric Le Goater
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2026-04-22  7:52 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Joao Martins, Markus Armbruster, Avihai Horon,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin

On 4/21/26 22:21, Peter Xu wrote:
> Add two tracepoints for both precopy and stopcopy query ioctls.  When at
> it, add one warn_report_once() for each of them when it fails.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>


Tested-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

> ---
>   hw/vfio/migration.c  | 33 +++++++++++++++++++++++----------
>   hw/vfio/trace-events |  2 ++
>   2 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index e6e6a0d53d..04d9f94edb 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -329,6 +329,7 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
>       struct vfio_device_feature_mig_data_size *mig_data_size =
>           (struct vfio_device_feature_mig_data_size *)feature->data;
>       VFIOMigration *migration = vbasedev->migration;
> +    int ret;
>   
>       feature->argsz = sizeof(buf);
>       feature->flags =
> @@ -340,12 +341,18 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
>            * is reported so downtime limit won't be violated.
>            */
>           migration->stopcopy_size = VFIO_MIG_STOP_COPY_SIZE;
> -        return -errno;
> +        ret = -errno;
> +        warn_report_once("VFIO device %s ioctl(VFIO_DEVICE_FEATURE) on "
> +                         "VFIO_DEVICE_FEATURE_MIG_DATA_SIZE failed (%d)",
> +                         vbasedev->name, ret);
> +    } else {
> +        migration->stopcopy_size = mig_data_size->stop_copy_length;
> +        ret = 0;
>       }
>   
> -    migration->stopcopy_size = mig_data_size->stop_copy_length;
> +    trace_vfio_query_stop_copy_size(migration->stopcopy_size, ret);
>   
> -    return 0;
> +    return ret;
>   }
>   
>   static int vfio_query_precopy_size(VFIOMigration *migration)
> @@ -353,18 +360,24 @@ static int vfio_query_precopy_size(VFIOMigration *migration)
>       struct vfio_precopy_info precopy = {
>           .argsz = sizeof(precopy),
>       };
> -
> -    migration->precopy_init_size = 0;
> -    migration->precopy_dirty_size = 0;
> +    int ret;
>   
>       if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO, &precopy)) {
> -        return -errno;
> +        migration->precopy_init_size = 0;
> +        migration->precopy_dirty_size = 0;
> +        ret = -errno;
> +        warn_report_once("VFIO device %s ioctl(VFIO_MIG_GET_PRECOPY_INFO) "
> +                         "failed (%d)", migration->vbasedev->name, ret);
> +    } else {
> +        migration->precopy_init_size = precopy.initial_bytes;
> +        migration->precopy_dirty_size = precopy.dirty_bytes;
> +        ret = 0;
>       }
>   
> -    migration->precopy_init_size = precopy.initial_bytes;
> -    migration->precopy_dirty_size = precopy.dirty_bytes;
> +    trace_vfio_query_precopy_size(migration->precopy_init_size,
> +                                  migration->precopy_dirty_size, ret);
>   
> -    return 0;
> +    return ret;
>   }
>   
>   /* Returns the size of saved data on success and -errno on error */
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 287df0b8cb..854a7e4b19 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -176,6 +176,8 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
>   vfio_state_pending(const char *name, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size, bool exact) " (%s) stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64 " exact %d"
>   vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
>   vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> +vfio_query_stop_copy_size(uint64_t size, int ret) "stopcopy size %"PRIu64" ret %d"
> +vfio_query_precopy_size(uint64_t init_size, uint64_t dirty_size, int ret) "init %"PRIu64" dirty %"PRIu64" ret %d"
>   
>   #iommufd.c
>   



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap
  2026-04-21 20:20 ` [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap Peter Xu
@ 2026-04-22  8:08   ` Vladimir Sementsov-Ogievskiy
  2026-04-24 14:50     ` Peter Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2026-04-22  8:08 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Maciej S . Szmigiero, Juraj Marcin, Eric Blake

On 21.04.26 23:20, Peter Xu wrote:
> This helps me to identify a hang issue with some recent change in
> migration.  Add this into the test suite.
> 
> Cc: Vladimir Sementsov-Ogievskiy<vsementsov@yandex-team.ru>
> Cc: Eric Blake<eblake@redhat.com>
> Signed-off-by: Peter Xu<peterx@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs
  2026-04-21 20:20 ` [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs Peter Xu
@ 2026-04-22  8:23   ` Vladimir Sementsov-Ogievskiy
  2026-04-22  8:29   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2026-04-22  8:23 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Maciej S . Szmigiero, Juraj Marcin, Halil Pasic,
	Christian Borntraeger, Eric Farman, Matthew Rosato,
	Richard Henderson, Ilya Leoshkevich, David Hildenbrand,
	Cornelia Huck, Eric Blake, John Snow, Jason J. Herne

On 21.04.26 23:20, Peter Xu wrote:
> -     * @can_postcopy: amount of data that can be migrated in postcopy
> -     *                or in stopped state, i.e. after target start.
> -     *                Some can also be migrated during precopy (RAM).
> -     *                Some must be migrated after source stops
> -     *                (block-dirty-bitmap)

Shouldn't we save this bit of information as part of MigPendingData documentation?


Currently it seems incomplete, and it uses "can" in a bit different meanings:

+typedef struct MigPendingData {
+    /* Amount of pending bytes can be transferred in precopy or stopcopy */

this "can" is restrictive: these bytes can be transferred ONLY in precopy or stopcopy

+    uint64_t precopy_bytes;
+    /* Amount of pending bytes can be transferred in postcopy */

this "can" _looks_ like permissive, it seems obvious that we should be able to transfer
these bytes in precopy as well. And that's true for RAM but wrong for block dirty bitmaps.

+    uint64_t postcopy_bytes;
+} MigPendingData;
+

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs
  2026-04-21 20:20 ` [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs Peter Xu
  2026-04-22  8:23   ` Vladimir Sementsov-Ogievskiy
@ 2026-04-22  8:29   ` Vladimir Sementsov-Ogievskiy
  2026-04-22 15:44     ` Peter Xu
  1 sibling, 1 reply; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2026-04-22  8:29 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Maciej S . Szmigiero, Juraj Marcin, Halil Pasic,
	Christian Borntraeger, Eric Farman, Matthew Rosato,
	Richard Henderson, Ilya Leoshkevich, David Hildenbrand,
	Cornelia Huck, Eric Blake, John Snow, Jason J. Herne

On 21.04.26 23:20, Peter Xu wrote:
> These two APIs are a slight duplication.  For example, there're a few users
> that directly pass in the same function.
> 
> It might also be error prone to provide two hooks, so that it's easier to
> happen one module report different things via the two hooks.
> 
> In reality, they should always report the same thing, only about whether we
> should use a fast-path when the slow path might be too slow, as QEMU may
> query these information quite frequently during migration process.
> 
> Merge it into one API, provide a bool showing if the query is an exact
> query or not.  No functional change intended.
> 
> Export qemu_savevm_query_pending().  We should use the new API here
> provided when there're new users to do the query.  This will happen very
> soon.
> 
> Cc: Halil Pasic<pasic@linux.ibm.com>
> Cc: Christian Borntraeger<borntraeger@linux.ibm.com>
> Cc: Eric Farman<farman@linux.ibm.com>
> Cc: Matthew Rosato<mjrosato@linux.ibm.com>
> Cc: Richard Henderson<richard.henderson@linaro.org>
> Cc: Ilya Leoshkevich<iii@linux.ibm.com>
> Cc: David Hildenbrand<david@kernel.org>
> Cc: Cornelia Huck<cohuck@redhat.com>
> Cc: Eric Blake<eblake@redhat.com>
> Cc: Vladimir Sementsov-Ogievskiy<vsementsov@yandex-team.ru>
> Cc: John Snow<jsnow@redhat.com>
> Reviewed-by: Jason J. Herne<jjherne@linux.ibm.com>
> Reviewed-by: Juraj Marcin<jmarcin@redhat.com>
> Reviewed-by: Avihai Horon<avihaih@nvidia.com>
> Signed-off-by: Peter Xu<peterx@redhat.com>


Probably too late after all these r-b-s, but it would be simpler to review,
if renaming and reworking the documentation of the fields, moving them
to a separate structure, and combining two handler functions into one
would all be different patches.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls
  2026-04-21 20:21 ` [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls Peter Xu
  2026-04-22  7:51   ` Cédric Le Goater
  2026-04-22  7:52   ` Cédric Le Goater
@ 2026-04-22  9:56   ` Cédric Le Goater
  2026-04-23 15:10   ` Avihai Horon
  2026-04-29 14:46   ` Avihai Horon
  4 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2026-04-22  9:56 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Joao Martins, Markus Armbruster, Avihai Horon,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin

On 4/21/26 22:21, Peter Xu wrote:
> Add two tracepoints for both precopy and stopcopy query ioctls.  When at
> it, add one warn_report_once() for each of them when it fails.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   hw/vfio/migration.c  | 33 +++++++++++++++++++++++----------
>   hw/vfio/trace-events |  2 ++
>   2 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index e6e6a0d53d..04d9f94edb 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -329,6 +329,7 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
>       struct vfio_device_feature_mig_data_size *mig_data_size =
>           (struct vfio_device_feature_mig_data_size *)feature->data;
>       VFIOMigration *migration = vbasedev->migration;
> +    int ret;
>   
>       feature->argsz = sizeof(buf);
>       feature->flags =
> @@ -340,12 +341,18 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
>            * is reported so downtime limit won't be violated.
>            */
>           migration->stopcopy_size = VFIO_MIG_STOP_COPY_SIZE;
> -        return -errno;
> +        ret = -errno;
> +        warn_report_once("VFIO device %s ioctl(VFIO_DEVICE_FEATURE) on "
> +                         "VFIO_DEVICE_FEATURE_MIG_DATA_SIZE failed (%d)",
> +                         vbasedev->name, ret);
> +    } else {
> +        migration->stopcopy_size = mig_data_size->stop_copy_length;
> +        ret = 0;
>       }
>   
> -    migration->stopcopy_size = mig_data_size->stop_copy_length;
> +    trace_vfio_query_stop_copy_size(migration->stopcopy_size, ret);
>   
> -    return 0;
> +    return ret;
>   }
>   
>   static int vfio_query_precopy_size(VFIOMigration *migration)
> @@ -353,18 +360,24 @@ static int vfio_query_precopy_size(VFIOMigration *migration)
>       struct vfio_precopy_info precopy = {
>           .argsz = sizeof(precopy),
>       };
> -
> -    migration->precopy_init_size = 0;
> -    migration->precopy_dirty_size = 0;
> +    int ret;
>   
>       if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO, &precopy)) {
> -        return -errno;
> +        migration->precopy_init_size = 0;
> +        migration->precopy_dirty_size = 0;
> +        ret = -errno;
> +        warn_report_once("VFIO device %s ioctl(VFIO_MIG_GET_PRECOPY_INFO) "
> +                         "failed (%d)", migration->vbasedev->name, ret);
> +    } else {
> +        migration->precopy_init_size = precopy.initial_bytes;
> +        migration->precopy_dirty_size = precopy.dirty_bytes;
> +        ret = 0;
>       }
>   
> -    migration->precopy_init_size = precopy.initial_bytes;
> -    migration->precopy_dirty_size = precopy.dirty_bytes;
> +    trace_vfio_query_precopy_size(migration->precopy_init_size,
> +                                  migration->precopy_dirty_size, ret);


This is possibly an overflow (in the kernel)   :

vfio_query_precopy_size init 18446744073281946832 dirty 0 ret 0
vfio_state_pending  (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size 3735178384 precopy initial size 18446744073281946832 precopy dirty size 0 exact 1
vfio_state_pending  (0000:b1:01.0) stopcopy size 7106032 precopy initial size 0 precopy dirty size 496 exact 0
vfio_state_pending  (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size 3734129808 precopy initial size 18446744073280898256 precopy dirty size 0 exact 0
vfio_state_pending  (0000:b1:01.0) stopcopy size 7106032 precopy initial size 0 precopy dirty size 0 exact 0
vfio_state_pending  (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size 3733081232 precopy initial size 18446744073279849680 precopy dirty size 0 exact 0
vfio_state_pending  (0000:b1:01.0) stopcopy size 7106032 precopy initial size 0 precopy dirty size 0 exact 0
vfio_state_pending  (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size 3732032656 precopy initial size 18446744073278801104 precopy dirty size 0 exact 0
vfio_state_pending  (0000:b1:01.0) stopcopy size 7106032 precopy initial size 0 precopy dirty size 0 exact 0
vfio_state_pending  (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size 3730984080 precopy initial size 18446744073277752528 precopy dirty size 0 exact 0

C.



>   
> -    return 0;
> +    return ret;
>   }
>   
>   /* Returns the size of saved data on success and -errno on error */
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 287df0b8cb..854a7e4b19 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -176,6 +176,8 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
>   vfio_state_pending(const char *name, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size, bool exact) " (%s) stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64 " exact %d"
>   vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
>   vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> +vfio_query_stop_copy_size(uint64_t size, int ret) "stopcopy size %"PRIu64" ret %d"
> +vfio_query_precopy_size(uint64_t init_size, uint64_t dirty_size, int ret) "init %"PRIu64" dirty %"PRIu64" ret %d"
>   
>   #iommufd.c
>   



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 07/16] migration: Introduce stopcopy_bytes in save_query_pending()
  2026-04-21 20:21 ` [PATCH v2 07/16] migration: Introduce stopcopy_bytes in save_query_pending() Peter Xu
@ 2026-04-22 13:16   ` Juraj Marcin
  2026-04-23 15:05   ` Avihai Horon
  1 sibling, 0 replies; 41+ messages in thread
From: Juraj Marcin @ 2026-04-22 13:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Joao Martins, Markus Armbruster,
	Cédric Le Goater, Avihai Horon, Daniel P . Berrangé,
	Fabiano Rosas, Prasad Pandit, Alex Williamson, Kirti Wankhede,
	Zhiyi Guo, Maciej S . Szmigiero

On 2026-04-21 16:21, Peter Xu wrote:
> Allow modules to report data that can only be migrated after VM is stopped.
> 
> When this concept is introduced, we will need to account stopcopy size to
> be part of pending_size as before.
> 
> However, when there're data only can be migrated in stopcopy phase, it
> means the old "pending_size" may not always be able to reach low enough to
> kickoff an slow version of query sync.
> 
> It used to be almost guaranteed to happen as all prior iterative modules
> doesn't have stopcopy only data.  VFIO may change that fact by having some
> data that must be copied during stop phase.
> 
> So we need to make sure QEMU will kickoff a synchronized version of query
> pending when all precopy data is migrated.  This might be important to VFIO
> to keep making progress even if the downtime cannot yet be satisfied.
> 
> So far, this patch should introduce no functional change, as no module yet
> report stopcopy size.
> 
> This paves way for VFIO to properly report its pending data sizes, which
> will start to include stop-only data.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/migration/register.h |  7 ++++
>  migration/migration.c        | 65 ++++++++++++++++++++++++++++++------
>  migration/savevm.c           | 10 ++++--
>  migration/trace-events       |  2 +-
>  4 files changed, 70 insertions(+), 14 deletions(-)

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 13/16] migration: Remember total dirty bytes in mig_stats
  2026-04-21 20:21 ` [PATCH v2 13/16] migration: Remember total dirty bytes in mig_stats Peter Xu
@ 2026-04-22 13:18   ` Juraj Marcin
  0 siblings, 0 replies; 41+ messages in thread
From: Juraj Marcin @ 2026-04-22 13:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Joao Martins, Markus Armbruster,
	Cédric Le Goater, Avihai Horon, Daniel P . Berrangé,
	Fabiano Rosas, Prasad Pandit, Alex Williamson, Kirti Wankhede,
	Zhiyi Guo, Maciej S . Szmigiero

On 2026-04-21 16:21, Peter Xu wrote:
> Introduce this new counter to remember the total dirty bytes for the whole
> system.  It will be used for query-migrate command to fetch system-wise
> remaining data.
> 
> A prior attempt was made to not use this counter but query directly from
> all the modules in a QMP handler, but it exposed some complexity not only
> on migration state machine race conditions (where the query may be invoked
> anytime of the state machine), or on locking implications (where some of
> the query hooks may take BQL, which is illegal at least in a QMP handler).
> For more information, see:
> 
> https://lore.kernel.org/r/aeZMtxqrKWAMKzdN@x1.local
> 
> This oneliner will resolve everything, except that it is not as accurate.
> The hope is it is a worthwhile trade-off solution, after knowing above
> challenges.
> 
> Now, there is one more reason we should make each invocation of
> save_live_iterate() to be lightweight, because this counter will only get
> updated once for each loop over all save_live_iterate() hooks when present.
> But that's always the goal.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration-stats.h | 7 +++++++
>  migration/savevm.c          | 7 +++++++
>  2 files changed, 14 insertions(+)

Reviewed-by: Juraj Marcin <jmarcin@redhat.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs
  2026-04-22  8:29   ` Vladimir Sementsov-Ogievskiy
@ 2026-04-22 15:44     ` Peter Xu
  2026-04-22 17:06       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2026-04-22 15:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Joao Martins, Markus Armbruster,
	Cédric Le Goater, Avihai Horon, Daniel P . Berrangé,
	Fabiano Rosas, Prasad Pandit, Alex Williamson, Kirti Wankhede,
	Zhiyi Guo, Maciej S . Szmigiero, Juraj Marcin, Halil Pasic,
	Christian Borntraeger, Eric Farman, Matthew Rosato,
	Richard Henderson, Ilya Leoshkevich, David Hildenbrand,
	Cornelia Huck, Eric Blake, John Snow, Jason J. Herne

On Wed, Apr 22, 2026 at 11:29:11AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 21.04.26 23:20, Peter Xu wrote:
> > These two APIs are a slight duplication.  For example, there're a few users
> > that directly pass in the same function.
> > 
> > It might also be error prone to provide two hooks, so that it's easier to
> > happen one module report different things via the two hooks.
> > 
> > In reality, they should always report the same thing, only about whether we
> > should use a fast-path when the slow path might be too slow, as QEMU may
> > query these information quite frequently during migration process.
> > 
> > Merge it into one API, provide a bool showing if the query is an exact
> > query or not.  No functional change intended.
> > 
> > Export qemu_savevm_query_pending().  We should use the new API here
> > provided when there're new users to do the query.  This will happen very
> > soon.
> > 
> > Cc: Halil Pasic<pasic@linux.ibm.com>
> > Cc: Christian Borntraeger<borntraeger@linux.ibm.com>
> > Cc: Eric Farman<farman@linux.ibm.com>
> > Cc: Matthew Rosato<mjrosato@linux.ibm.com>
> > Cc: Richard Henderson<richard.henderson@linaro.org>
> > Cc: Ilya Leoshkevich<iii@linux.ibm.com>
> > Cc: David Hildenbrand<david@kernel.org>
> > Cc: Cornelia Huck<cohuck@redhat.com>
> > Cc: Eric Blake<eblake@redhat.com>
> > Cc: Vladimir Sementsov-Ogievskiy<vsementsov@yandex-team.ru>
> > Cc: John Snow<jsnow@redhat.com>
> > Reviewed-by: Jason J. Herne<jjherne@linux.ibm.com>
> > Reviewed-by: Juraj Marcin<jmarcin@redhat.com>
> > Reviewed-by: Avihai Horon<avihaih@nvidia.com>
> > Signed-off-by: Peter Xu<peterx@redhat.com>
> 
> 
> Probably too late after all these r-b-s, but it would be simpler to review,
> if renaming and reworking the documentation of the fields, moving them
> to a separate structure, and combining two handler functions into one
> would all be different patches.

Split of the patch might involve some code added then quickly removed
again, which I also want to avoid.

Considering the reviews done on this series (and the challenge of finding
hardwares to test..), I'd appreciate if you're OK we land this sooner then
rework whatever thing on top if the changes will be cosmetic.  That is, I
agree your question with the documentation for using "can" in the new API:
it is a bit ambiguous indeed.  I'll see how I will integrate them best, it
may depend on whether there're other reasons for a full repost / retest.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs
  2026-04-22 15:44     ` Peter Xu
@ 2026-04-22 17:06       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2026-04-22 17:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Joao Martins, Markus Armbruster,
	Cédric Le Goater, Avihai Horon, Daniel P. Berrangé,
	Fabiano Rosas, Prasad Pandit, Alex Williamson, Kirti Wankhede,
	Zhiyi Guo, Maciej S . Szmigiero, Juraj Marcin, Halil Pasic,
	Christian Borntraeger, Eric Farman, Matthew Rosato,
	Richard Henderson, Ilya Leoshkevich, David Hildenbrand,
	Cornelia Huck, Eric Blake, John Snow, Jason J. Herne

On 22.04.26 18:44, Peter Xu wrote:
> On Wed, Apr 22, 2026 at 11:29:11AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 21.04.26 23:20, Peter Xu wrote:
>>> These two APIs are a slight duplication.  For example, there're a few users
>>> that directly pass in the same function.
>>>
>>> It might also be error prone to provide two hooks, so that it's easier to
>>> happen one module report different things via the two hooks.
>>>
>>> In reality, they should always report the same thing, only about whether we
>>> should use a fast-path when the slow path might be too slow, as QEMU may
>>> query these information quite frequently during migration process.
>>>
>>> Merge it into one API, provide a bool showing if the query is an exact
>>> query or not.  No functional change intended.
>>>
>>> Export qemu_savevm_query_pending().  We should use the new API here
>>> provided when there're new users to do the query.  This will happen very
>>> soon.
>>>
>>> Cc: Halil Pasic<pasic@linux.ibm.com>
>>> Cc: Christian Borntraeger<borntraeger@linux.ibm.com>
>>> Cc: Eric Farman<farman@linux.ibm.com>
>>> Cc: Matthew Rosato<mjrosato@linux.ibm.com>
>>> Cc: Richard Henderson<richard.henderson@linaro.org>
>>> Cc: Ilya Leoshkevich<iii@linux.ibm.com>
>>> Cc: David Hildenbrand<david@kernel.org>
>>> Cc: Cornelia Huck<cohuck@redhat.com>
>>> Cc: Eric Blake<eblake@redhat.com>
>>> Cc: Vladimir Sementsov-Ogievskiy<vsementsov@yandex-team.ru>
>>> Cc: John Snow<jsnow@redhat.com>
>>> Reviewed-by: Jason J. Herne<jjherne@linux.ibm.com>
>>> Reviewed-by: Juraj Marcin<jmarcin@redhat.com>
>>> Reviewed-by: Avihai Horon<avihaih@nvidia.com>
>>> Signed-off-by: Peter Xu<peterx@redhat.com>
>>
>>
>> Probably too late after all these r-b-s, but it would be simpler to review,
>> if renaming and reworking the documentation of the fields, moving them
>> to a separate structure, and combining two handler functions into one
>> would all be different patches.
> 
> Split of the patch might involve some code added then quickly removed
> again, which I also want to avoid.
> 
> Considering the reviews done on this series (and the challenge of finding
> hardwares to test..), I'd appreciate if you're OK we land this sooner then
> rework whatever thing on top if the changes will be cosmetic.  That is, I
> agree your question with the documentation for using "can" in the new API:
> it is a bit ambiguous indeed.  I'll see how I will integrate them best, it
> may depend on whether there're other reasons for a full repost / retest.

Of course, I agree.

for block dirty bitmaps:

Acked-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 07/16] migration: Introduce stopcopy_bytes in save_query_pending()
  2026-04-21 20:21 ` [PATCH v2 07/16] migration: Introduce stopcopy_bytes in save_query_pending() Peter Xu
  2026-04-22 13:16   ` Juraj Marcin
@ 2026-04-23 15:05   ` Avihai Horon
  1 sibling, 0 replies; 41+ messages in thread
From: Avihai Horon @ 2026-04-23 15:05 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin


On 4/21/2026 23:21, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> Allow modules to report data that can only be migrated after VM is stopped.
>
> When this concept is introduced, we will need to account stopcopy size to
> be part of pending_size as before.
>
> However, when there're data only can be migrated in stopcopy phase, it
> means the old "pending_size" may not always be able to reach low enough to
> kickoff an slow version of query sync.
>
> It used to be almost guaranteed to happen as all prior iterative modules
> doesn't have stopcopy only data.  VFIO may change that fact by having some
> data that must be copied during stop phase.
>
> So we need to make sure QEMU will kickoff a synchronized version of query
> pending when all precopy data is migrated.  This might be important to VFIO
> to keep making progress even if the downtime cannot yet be satisfied.
>
> So far, this patch should introduce no functional change, as no module yet
> report stopcopy size.
>
> This paves way for VFIO to properly report its pending data sizes, which
> will start to include stop-only data.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   include/migration/register.h |  7 ++++
>   migration/migration.c        | 65 ++++++++++++++++++++++++++++++------
>   migration/savevm.c           | 10 ++++--
>   migration/trace-events       |  2 +-
>   4 files changed, 70 insertions(+), 14 deletions(-)

Reviewed-by: Avihai Horon <avihaih@nvidia.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls
  2026-04-21 20:21 ` [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls Peter Xu
                     ` (2 preceding siblings ...)
  2026-04-22  9:56   ` Cédric Le Goater
@ 2026-04-23 15:10   ` Avihai Horon
  2026-04-29 14:46   ` Avihai Horon
  4 siblings, 0 replies; 41+ messages in thread
From: Avihai Horon @ 2026-04-23 15:10 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin


On 4/21/2026 23:21, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> Add two tracepoints for both precopy and stopcopy query ioctls.  When at
> it, add one warn_report_once() for each of them when it fails.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   hw/vfio/migration.c  | 33 +++++++++++++++++++++++----------
>   hw/vfio/trace-events |  2 ++
>   2 files changed, 25 insertions(+), 10 deletions(-)

Reviewed-by: Avihai Horon <avihaih@nvidia.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports
  2026-04-21 20:21 ` [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports Peter Xu
@ 2026-04-24  7:17   ` Markus Armbruster
  2026-04-24 15:15     ` Peter Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Armbruster @ 2026-04-24  7:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Joao Martins, Markus Armbruster,
	Cédric Le Goater, Avihai Horon, Daniel P . Berrangé,
	Fabiano Rosas, Prasad Pandit, Alex Williamson, Kirti Wankhede,
	Zhiyi Guo, Maciej S . Szmigiero, Juraj Marcin,
	Dr. David Alan Gilbert

Peter Xu <peterx@redhat.com> writes:

> Currently, mgmt can only query for remaining RAM,

Remind me: how?

>                                                   not system-wise remaining
> data.  It was not a problem before, because for a very long time RAM was
> the only part that matters.
>
> After VFIO migrations landed upstream, it may not be true anymore
> especially considering that there can be GPU devices that contain GBs of
> device states.
>
> Add a new "remaining" field in query-migrate results, reflecting
> system-wise remaining data, which will include everything (e.g. VFIO).

"system-wise"?  Do you mean "system-wide"?  Maybe "total"?
>
> This information will be useful for mgmt to implement generic way of stall
> detection that covers all system resources.  Say, when system remaining
> data does not decrease anymore for a relatively long period of time, then
> it may mean that there is a challenge of converging, so mgmt can act based
> on how this value changes over time (especially if sampled after each
> migration iteration).
>
> Before this patch, "expected_downtime" almost played this role. For
> example, by monitoring "expected_downtime" at the beginning of each
> iteration can in most cases also reflect the progress of migration
> system-wise.  Said that, "expected_downtime" was always calculated based on
> a bandwidth value that can fluctuate a lot if avail-switchover-bandwidth is
> not used. This new "remaining" field will remove that part of uncertainty
> for mgmt.
>
> With the new field, HMP "info migrate" now reports this:
>
> (qemu) info migrate
> Status:                 active
> Time (ms):              total=12080, setup=14, exp_down=300
> Remaining:              1.36 GiB        <------------------- newline

"Newline" is ASCI character '\n'.  I guess you mean "this is the new
line".

> RAM info:
>   Throughput (Mbps):    840.50
>   Sizes:                pagesize=4 KiB, total=4.02 GiB
>   Transfers:            transferred=1.18 GiB, remain=1.36 GiB
>     Channels:           precopy=1.18 GiB, multifd=0 B, postcopy=0 B
>     Page Types:         normal=307923, zero=388148
>   Page Rates (pps):     transfer=25660
>   Others:               dirty_syncs=1
>
> It should be the same value as RAM's remaining report when VFIO is not
> involved, and it should report more than that when VFIO is involved.

"RAM's remaining report" is the "remain=1.36 GiB" part, isn't it?

> Cc: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
> Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  qapi/migration.json            | 4 ++++
>  migration/migration-hmp-cmds.c | 5 +++++
>  migration/migration.c          | 7 +++++++
>  3 files changed, 16 insertions(+)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index e3ad3f0604..a6e24b5685 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -300,6 +300,9 @@
>  #     average memory load of the virtual CPU indirectly.  Note that
>  #     zero means guest doesn't dirty memory.  (Since 8.1)
>  #
> +# @remaining: amount of bytes remaining to be migrated system-wise,
> +#     includes both RAM and all devices (like VFIO).  (Since 11.1)
> +#
>  # Features:
>  #
>  # @unstable: Members @postcopy-latency, @postcopy-vcpu-latency,
> @@ -310,6 +313,7 @@
>  ##
>  { 'struct': 'MigrationInfo',
>    'data': {'*status': 'MigrationStatus', '*ram': 'MigrationRAMStats',
> +           '*remaining': 'uint64',

It's a byte count, so let's make it 'size'.

>             '*vfio': 'VfioStats',
>             '*xbzrle-cache': 'XBZRLECacheStats',
>             '*total-time': 'int',

[...]



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 15/16] migration/qapi: Update unit for avail-switchover-bandwidth
  2026-04-21 20:21 ` [PATCH v2 15/16] migration/qapi: Update unit for avail-switchover-bandwidth Peter Xu
@ 2026-04-24  7:18   ` Markus Armbruster
  0 siblings, 0 replies; 41+ messages in thread
From: Markus Armbruster @ 2026-04-24  7:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Joao Martins, Cédric Le Goater, Avihai Horon,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin

Peter Xu <peterx@redhat.com> writes:

> Add ", in bytes per second".  Unfortunately indentations need to be updated
> completely, but no change on the rest.
>
> Cc: Markus Armbruster <armbru@redhat.com>
> Suggested-by: Juraj Marcin <jmarcin@redhat.com>
> Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Acked-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 03/16] migration/qapi: Rename MigrationStats to MigrationRAMStats
  2026-04-21 20:20 ` [PATCH v2 03/16] migration/qapi: Rename MigrationStats to MigrationRAMStats Peter Xu
@ 2026-04-24  9:03   ` Markus Armbruster
  0 siblings, 0 replies; 41+ messages in thread
From: Markus Armbruster @ 2026-04-24  9:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Joao Martins, Cédric Le Goater, Avihai Horon,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin, devel, Michal Privoznik

Peter Xu <peterx@redhat.com> writes:

> This stats is only about RAM, make it accurate.  This paves way for
> statistics for all devices.
>
> Thanks to Markus, who pointed out that docs/devel/qapi-code-gen.rst has a
> section "Compatibility considerations" stated:
>
>     Since type names are not visible in the Client JSON Protocol, types
>     may be freely renamed.  Even certain refactorings are invisible, such
>     as splitting members from one type into a common base type.
>
> Hence this change is not ABI violation according to the document.
>
> While at it, touch up the lines to make it read better, correct the
> restriction on migration status being 'active' or 'completed': over time we
> grew too many new status that will also report "ram" section.
>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Cc: devel@lists.libvirt.org
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  docs/about/removed-features.rst |  2 +-
>  qapi/migration.json             | 10 +++++-----
>  migration/migration-stats.h     |  2 +-
>  3 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
> index e75db08410..626162022a 100644
> --- a/docs/about/removed-features.rst
> +++ b/docs/about/removed-features.rst
> @@ -699,7 +699,7 @@ was superseded by ``sections``.
>  ``query-migrate`` return value member ``skipped`` (removed in 9.1)
>  ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
>  
> -Member ``skipped`` of the ``MigrationStats`` struct hasn't been used
> +Member ``skipped`` of the ``MigrationRAMStats`` struct hasn't been used
>  for more than 10 years. Removed with no replacement.
>  
>  ``migrate`` command option ``inc`` (removed in 9.1)

docs/about/removed-features.rst and docs/about/deprecated.rst are meant
for consumers of external interfaces.  Since QAPI types are not relevant
there, I try to avoid mentioning them.

Your patch is just fine as is.

"Member ``skipped`` of the return value" would also be fine.

[...]

Reviewed-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap
  2026-04-22  8:08   ` Vladimir Sementsov-Ogievskiy
@ 2026-04-24 14:50     ` Peter Xu
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-24 14:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Kevin Wolf, Hanna Czenczek
  Cc: qemu-devel, Joao Martins, Markus Armbruster,
	Cédric Le Goater, Avihai Horon, Daniel P . Berrangé,
	Fabiano Rosas, Prasad Pandit, Alex Williamson, Kirti Wankhede,
	Zhiyi Guo, Maciej S . Szmigiero, Juraj Marcin, Eric Blake

On Wed, Apr 22, 2026 at 11:08:56AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 21.04.26 23:20, Peter Xu wrote:
> > This helps me to identify a hang issue with some recent change in
> > migration.  Add this into the test suite.
> > 
> > Cc: Vladimir Sementsov-Ogievskiy<vsementsov@yandex-team.ru>
> > Cc: Eric Blake<eblake@redhat.com>
> > Signed-off-by: Peter Xu<peterx@redhat.com>
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Thanks, Vladimir.

I just noticed Kevin and Hanna are not copied in this patch.. I was
expecting git-publish and auto script did it..

Kevin/Hanna, do you want me to pick up this patch together with this series
if I'm going to queue most of it?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports
  2026-04-24  7:17   ` Markus Armbruster
@ 2026-04-24 15:15     ` Peter Xu
  2026-04-25  5:46       ` Markus Armbruster
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2026-04-24 15:15 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Joao Martins, Cédric Le Goater, Avihai Horon,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin, Dr. David Alan Gilbert

On Fri, Apr 24, 2026 at 09:17:21AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Currently, mgmt can only query for remaining RAM,
> 
> Remind me: how?

It is the same command, as mentioned in [1] below.  I'll enrich the commit
message here to explain.

> 
> >                                                   not system-wise remaining
> > data.  It was not a problem before, because for a very long time RAM was
> > the only part that matters.
> >
> > After VFIO migrations landed upstream, it may not be true anymore
> > especially considering that there can be GPU devices that contain GBs of
> > device states.
> >
> > Add a new "remaining" field in query-migrate results, reflecting
> > system-wise remaining data, which will include everything (e.g. VFIO).
> 
> "system-wise"?  Do you mean "system-wide"?  Maybe "total"?

Since "total" has been used elsewhere, I'll use "system-wide", hoping
that's easier to digest.

> >
> > This information will be useful for mgmt to implement generic way of stall
> > detection that covers all system resources.  Say, when system remaining
> > data does not decrease anymore for a relatively long period of time, then
> > it may mean that there is a challenge of converging, so mgmt can act based
> > on how this value changes over time (especially if sampled after each
> > migration iteration).
> >
> > Before this patch, "expected_downtime" almost played this role. For
> > example, by monitoring "expected_downtime" at the beginning of each
> > iteration can in most cases also reflect the progress of migration
> > system-wise.  Said that, "expected_downtime" was always calculated based on
> > a bandwidth value that can fluctuate a lot if avail-switchover-bandwidth is
> > not used. This new "remaining" field will remove that part of uncertainty
> > for mgmt.
> >
> > With the new field, HMP "info migrate" now reports this:
> >
> > (qemu) info migrate
> > Status:                 active
> > Time (ms):              total=12080, setup=14, exp_down=300
> > Remaining:              1.36 GiB        <------------------- newline
> 
> "Newline" is ASCI character '\n'.  I guess you mean "this is the new
> line".

Yes.  I'll remove this "<----..." if it causes any confusion.

> 
> > RAM info:
> >   Throughput (Mbps):    840.50
> >   Sizes:                pagesize=4 KiB, total=4.02 GiB
> >   Transfers:            transferred=1.18 GiB, remain=1.36 GiB
> >     Channels:           precopy=1.18 GiB, multifd=0 B, postcopy=0 B
> >     Page Types:         normal=307923, zero=388148
> >   Page Rates (pps):     transfer=25660
> >   Others:               dirty_syncs=1
> >
> > It should be the same value as RAM's remaining report when VFIO is not
> > involved, and it should report more than that when VFIO is involved.
> 
> "RAM's remaining report" is the "remain=1.36 GiB" part, isn't it?

[1]

Correct.

> 
> > Cc: Markus Armbruster <armbru@redhat.com>
> > Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
> > Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  qapi/migration.json            | 4 ++++
> >  migration/migration-hmp-cmds.c | 5 +++++
> >  migration/migration.c          | 7 +++++++
> >  3 files changed, 16 insertions(+)
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index e3ad3f0604..a6e24b5685 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -300,6 +300,9 @@
> >  #     average memory load of the virtual CPU indirectly.  Note that
> >  #     zero means guest doesn't dirty memory.  (Since 8.1)
> >  #
> > +# @remaining: amount of bytes remaining to be migrated system-wise,
> > +#     includes both RAM and all devices (like VFIO).  (Since 11.1)
> > +#
> >  # Features:
> >  #
> >  # @unstable: Members @postcopy-latency, @postcopy-vcpu-latency,
> > @@ -310,6 +313,7 @@
> >  ##
> >  { 'struct': 'MigrationInfo',
> >    'data': {'*status': 'MigrationStatus', '*ram': 'MigrationRAMStats',
> > +           '*remaining': 'uint64',
> 
> It's a byte count, so let's make it 'size'.

Will do.

Since this will be the last functional change so far on the whole series,
and the update seems to be pretty under control (say, qapi schema.py
generates same c code for both "size" and "uint64"), could I request an ACK
on this one with a short diff below, instead of reposting the whole series?

The diff attached here (I'll also fix the commit messages on
e.g. system-wide wordings if I'll not repost):

diff --git a/qapi/migration.json b/qapi/migration.json
index b7518b29c6..c701ef1cf5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -300,7 +300,7 @@
 #     average memory load of the virtual CPU indirectly.  Note that
 #     zero means guest doesn't dirty memory.  (Since 8.1)
 #
-# @remaining: amount of bytes remaining to be migrated system-wise,
+# @remaining: amount of bytes remaining to be migrated system-wide,
 #     includes both RAM and all devices (like VFIO).  (Since 11.1)
 #
 # Features:
@@ -313,7 +313,7 @@
 ##
 { 'struct': 'MigrationInfo',
   'data': {'*status': 'MigrationStatus', '*ram': 'MigrationRAMStats',
-           '*remaining': 'uint64',
+           '*remaining': 'size',
            '*vfio': 'VfioStats',
            '*xbzrle-cache': 'XBZRLECacheStats',
            '*total-time': 'int',

===8<====

The complete new version of patch is here (I updated quite a few places on
the commit message):

https://gitlab.com/peterx/qemu/-/commit/86d973360890cecc564a4a5bcf9a01b9efde368a

Thanks,

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports
  2026-04-24 15:15     ` Peter Xu
@ 2026-04-25  5:46       ` Markus Armbruster
  2026-04-28 15:26         ` Peter Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Armbruster @ 2026-04-25  5:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Joao Martins, Cédric Le Goater, Avihai Horon,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin, Dr. David Alan Gilbert

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 24, 2026 at 09:17:21AM +0200, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Currently, mgmt can only query for remaining RAM,
>> 
>> Remind me: how?
>
> It is the same command, as mentioned in [1] below.  I'll enrich the commit
> message here to explain.
>
>> 
>> >                                                   not system-wise remaining
>> > data.  It was not a problem before, because for a very long time RAM was
>> > the only part that matters.
>> >
>> > After VFIO migrations landed upstream, it may not be true anymore
>> > especially considering that there can be GPU devices that contain GBs of
>> > device states.
>> >
>> > Add a new "remaining" field in query-migrate results, reflecting
>> > system-wise remaining data, which will include everything (e.g. VFIO).
>> 
>> "system-wise"?  Do you mean "system-wide"?  Maybe "total"?
>
> Since "total" has been used elsewhere, I'll use "system-wide", hoping
> that's easier to digest.

Which "total" do you mean?  Perhaps MigrationStats member

    # @total: total amount of bytes involved in the migration process

What does this @total count?  RAM only?  If yes, the description is
misleading and needs fixing.  Separate patch, followup fine.

>> > This information will be useful for mgmt to implement generic way of stall
>> > detection that covers all system resources.  Say, when system remaining
>> > data does not decrease anymore for a relatively long period of time, then
>> > it may mean that there is a challenge of converging, so mgmt can act based
>> > on how this value changes over time (especially if sampled after each
>> > migration iteration).
>> >
>> > Before this patch, "expected_downtime" almost played this role. For
>> > example, by monitoring "expected_downtime" at the beginning of each
>> > iteration can in most cases also reflect the progress of migration
>> > system-wise.  Said that, "expected_downtime" was always calculated based on
>> > a bandwidth value that can fluctuate a lot if avail-switchover-bandwidth is
>> > not used. This new "remaining" field will remove that part of uncertainty
>> > for mgmt.
>> >
>> > With the new field, HMP "info migrate" now reports this:
>> >
>> > (qemu) info migrate
>> > Status:                 active
>> > Time (ms):              total=12080, setup=14, exp_down=300

"exp_down" isn't nice for humans.  I *guess* it's for "expected
downtime".  Could use "expected_downtime=300" instead.  Not this patch's
problem, of course.

>> > Remaining:              1.36 GiB        <------------------- newline
>> 
>> "Newline" is ASCI character '\n'.  I guess you mean "this is the new
>> line".
>
> Yes.  I'll remove this "<----..." if it causes any confusion.

Annotating output like you did feels just fine, only the word you chose
makes it mildly confusing.  Perhaps

     Remaining:              1.36 GiB        <--- this is the new line

would be clearer.

>> > RAM info:
>> >   Throughput (Mbps):    840.50
>> >   Sizes:                pagesize=4 KiB, total=4.02 GiB
>> >   Transfers:            transferred=1.18 GiB, remain=1.36 GiB
>> >     Channels:           precopy=1.18 GiB, multifd=0 B, postcopy=0 B
>> >     Page Types:         normal=307923, zero=388148
>> >   Page Rates (pps):     transfer=25660
>> >   Others:               dirty_syncs=1
>> >
>> > It should be the same value as RAM's remaining report when VFIO is not
>> > involved, and it should report more than that when VFIO is involved.
>> 
>> "RAM's remaining report" is the "remain=1.36 GiB" part, isn't it?
>
> [1]
>
> Correct.

Thanks.  Could be a bit more explicit.  Up to you.

>> > Cc: Markus Armbruster <armbru@redhat.com>
>> > Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
>> > Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org>
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  qapi/migration.json            | 4 ++++
>> >  migration/migration-hmp-cmds.c | 5 +++++
>> >  migration/migration.c          | 7 +++++++
>> >  3 files changed, 16 insertions(+)
>> >
>> > diff --git a/qapi/migration.json b/qapi/migration.json
>> > index e3ad3f0604..a6e24b5685 100644
>> > --- a/qapi/migration.json
>> > +++ b/qapi/migration.json
>> > @@ -300,6 +300,9 @@
>> >  #     average memory load of the virtual CPU indirectly.  Note that
>> >  #     zero means guest doesn't dirty memory.  (Since 8.1)
>> >  #
>> > +# @remaining: amount of bytes remaining to be migrated system-wise,
>> > +#     includes both RAM and all devices (like VFIO).  (Since 11.1)
>> > +#
>> >  # Features:
>> >  #
>> >  # @unstable: Members @postcopy-latency, @postcopy-vcpu-latency,
>> > @@ -310,6 +313,7 @@
>> >  ##
>> >  { 'struct': 'MigrationInfo',
>> >    'data': {'*status': 'MigrationStatus', '*ram': 'MigrationRAMStats',
>> > +           '*remaining': 'uint64',
>> 
>> It's a byte count, so let's make it 'size'.
>
> Will do.
>
> Since this will be the last functional change so far on the whole series,
> and the update seems to be pretty under control (say, qapi schema.py
> generates same c code for both "size" and "uint64"),

Yes, 'size' is almost exactly the same as 'uint64'.  If I remember
correctly, the one difference is the use of visit_type_size() instead of
visit_type_uint64().  visit_type_size() recognizes additional syntax
with "human" visitors: qobject keyval, string input, and opts visitor.

>                                                      could I request an ACK
> on this one with a short diff below, instead of reposting the whole series?
>
> The diff attached here (I'll also fix the commit messages on
> e.g. system-wide wordings if I'll not repost):
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index b7518b29c6..c701ef1cf5 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -300,7 +300,7 @@
>  #     average memory load of the virtual CPU indirectly.  Note that
>  #     zero means guest doesn't dirty memory.  (Since 8.1)
>  #
> -# @remaining: amount of bytes remaining to be migrated system-wise,
> +# @remaining: amount of bytes remaining to be migrated system-wide,
>  #     includes both RAM and all devices (like VFIO).  (Since 11.1)
>  #
>  # Features:
> @@ -313,7 +313,7 @@
>  ##
>  { 'struct': 'MigrationInfo',
>    'data': {'*status': 'MigrationStatus', '*ram': 'MigrationRAMStats',
> -           '*remaining': 'uint64',
> +           '*remaining': 'size',
>             '*vfio': 'VfioStats',
>             '*xbzrle-cache': 'XBZRLECacheStats',
>             '*total-time': 'int',
>
> ===8<====
>
> The complete new version of patch is here (I updated quite a few places on
> the commit message):
>
> https://gitlab.com/peterx/qemu/-/commit/86d973360890cecc564a4a5bcf9a01b9efde368a
>
> Thanks,

I read the commit message.  No surprises except

    It should be the same value as RAM's remaining report when VFIO is not
    involved, and it should report more than that when VFIO is involved.

    One note is that this field will be an estimate and may not be sampled the
    exact same time versus the RAM remaining section.  So it may report
    slightly different values even if only RAM is involved.  The difference
    shouldn't matter though to mgmt to make correct decisions.

The second paragraph is new.  The first paragraph says they "should be
the same", the second that they "may [be] slightly different".
Suboptimal.

Here's my try:

    It should be approximately the same value ...

    Only approximately, because this field will be ...

It's just a commit message, though.  Up to you.

QAPI schema
Acked-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports
  2026-04-25  5:46       ` Markus Armbruster
@ 2026-04-28 15:26         ` Peter Xu
  2026-04-28 19:02           ` Markus Armbruster
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2026-04-28 15:26 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Joao Martins, Cédric Le Goater, Avihai Horon,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin, Dr. David Alan Gilbert

On Sat, Apr 25, 2026 at 07:46:45AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Apr 24, 2026 at 09:17:21AM +0200, Markus Armbruster wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > Currently, mgmt can only query for remaining RAM,
> >> 
> >> Remind me: how?
> >
> > It is the same command, as mentioned in [1] below.  I'll enrich the commit
> > message here to explain.
> >
> >> 
> >> >                                                   not system-wise remaining
> >> > data.  It was not a problem before, because for a very long time RAM was
> >> > the only part that matters.
> >> >
> >> > After VFIO migrations landed upstream, it may not be true anymore
> >> > especially considering that there can be GPU devices that contain GBs of
> >> > device states.
> >> >
> >> > Add a new "remaining" field in query-migrate results, reflecting
> >> > system-wise remaining data, which will include everything (e.g. VFIO).
> >> 
> >> "system-wise"?  Do you mean "system-wide"?  Maybe "total"?
> >
> > Since "total" has been used elsewhere, I'll use "system-wide", hoping
> > that's easier to digest.
> 
> Which "total" do you mean?  Perhaps MigrationStats member
> 
>     # @total: total amount of bytes involved in the migration process
> 
> What does this @total count?  RAM only?  If yes, the description is
> misleading and needs fixing.  Separate patch, followup fine.

I'll follow up.

> 
> >> > This information will be useful for mgmt to implement generic way of stall
> >> > detection that covers all system resources.  Say, when system remaining
> >> > data does not decrease anymore for a relatively long period of time, then
> >> > it may mean that there is a challenge of converging, so mgmt can act based
> >> > on how this value changes over time (especially if sampled after each
> >> > migration iteration).
> >> >
> >> > Before this patch, "expected_downtime" almost played this role. For
> >> > example, by monitoring "expected_downtime" at the beginning of each
> >> > iteration can in most cases also reflect the progress of migration
> >> > system-wise.  Said that, "expected_downtime" was always calculated based on
> >> > a bandwidth value that can fluctuate a lot if avail-switchover-bandwidth is
> >> > not used. This new "remaining" field will remove that part of uncertainty
> >> > for mgmt.
> >> >
> >> > With the new field, HMP "info migrate" now reports this:
> >> >
> >> > (qemu) info migrate
> >> > Status:                 active
> >> > Time (ms):              total=12080, setup=14, exp_down=300
> 
> "exp_down" isn't nice for humans.  I *guess* it's for "expected
> downtime".  Could use "expected_downtime=300" instead.  Not this patch's
> problem, of course.

Will follow up too.

> 
> >> > Remaining:              1.36 GiB        <------------------- newline
> >> 
> >> "Newline" is ASCI character '\n'.  I guess you mean "this is the new
> >> line".
> >
> > Yes.  I'll remove this "<----..." if it causes any confusion.
> 
> Annotating output like you did feels just fine, only the word you chose
> makes it mildly confusing.  Perhaps
> 
>      Remaining:              1.36 GiB        <--- this is the new line
> 
> would be clearer.

Sure.

> 
> >> > RAM info:
> >> >   Throughput (Mbps):    840.50
> >> >   Sizes:                pagesize=4 KiB, total=4.02 GiB
> >> >   Transfers:            transferred=1.18 GiB, remain=1.36 GiB
> >> >     Channels:           precopy=1.18 GiB, multifd=0 B, postcopy=0 B
> >> >     Page Types:         normal=307923, zero=388148
> >> >   Page Rates (pps):     transfer=25660
> >> >   Others:               dirty_syncs=1
> >> >
> >> > It should be the same value as RAM's remaining report when VFIO is not
> >> > involved, and it should report more than that when VFIO is involved.
> >> 
> >> "RAM's remaining report" is the "remain=1.36 GiB" part, isn't it?
> >
> > [1]
> >
> > Correct.
> 
> Thanks.  Could be a bit more explicit.  Up to you.

I'll attach a new version at the end.

> 
> >> > Cc: Markus Armbruster <armbru@redhat.com>
> >> > Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
> >> > Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org>
> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
> >> > ---
> >> >  qapi/migration.json            | 4 ++++
> >> >  migration/migration-hmp-cmds.c | 5 +++++
> >> >  migration/migration.c          | 7 +++++++
> >> >  3 files changed, 16 insertions(+)
> >> >
> >> > diff --git a/qapi/migration.json b/qapi/migration.json
> >> > index e3ad3f0604..a6e24b5685 100644
> >> > --- a/qapi/migration.json
> >> > +++ b/qapi/migration.json
> >> > @@ -300,6 +300,9 @@
> >> >  #     average memory load of the virtual CPU indirectly.  Note that
> >> >  #     zero means guest doesn't dirty memory.  (Since 8.1)
> >> >  #
> >> > +# @remaining: amount of bytes remaining to be migrated system-wise,
> >> > +#     includes both RAM and all devices (like VFIO).  (Since 11.1)
> >> > +#
> >> >  # Features:
> >> >  #
> >> >  # @unstable: Members @postcopy-latency, @postcopy-vcpu-latency,
> >> > @@ -310,6 +313,7 @@
> >> >  ##
> >> >  { 'struct': 'MigrationInfo',
> >> >    'data': {'*status': 'MigrationStatus', '*ram': 'MigrationRAMStats',
> >> > +           '*remaining': 'uint64',
> >> 
> >> It's a byte count, so let's make it 'size'.
> >
> > Will do.
> >
> > Since this will be the last functional change so far on the whole series,
> > and the update seems to be pretty under control (say, qapi schema.py
> > generates same c code for both "size" and "uint64"),
> 
> Yes, 'size' is almost exactly the same as 'uint64'.  If I remember
> correctly, the one difference is the use of visit_type_size() instead of
> visit_type_uint64().  visit_type_size() recognizes additional syntax
> with "human" visitors: qobject keyval, string input, and opts visitor.
> 
> >                                                      could I request an ACK
> > on this one with a short diff below, instead of reposting the whole series?
> >
> > The diff attached here (I'll also fix the commit messages on
> > e.g. system-wide wordings if I'll not repost):
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index b7518b29c6..c701ef1cf5 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -300,7 +300,7 @@
> >  #     average memory load of the virtual CPU indirectly.  Note that
> >  #     zero means guest doesn't dirty memory.  (Since 8.1)
> >  #
> > -# @remaining: amount of bytes remaining to be migrated system-wise,
> > +# @remaining: amount of bytes remaining to be migrated system-wide,
> >  #     includes both RAM and all devices (like VFIO).  (Since 11.1)
> >  #
> >  # Features:
> > @@ -313,7 +313,7 @@
> >  ##
> >  { 'struct': 'MigrationInfo',
> >    'data': {'*status': 'MigrationStatus', '*ram': 'MigrationRAMStats',
> > -           '*remaining': 'uint64',
> > +           '*remaining': 'size',
> >             '*vfio': 'VfioStats',
> >             '*xbzrle-cache': 'XBZRLECacheStats',
> >             '*total-time': 'int',
> >
> > ===8<====
> >
> > The complete new version of patch is here (I updated quite a few places on
> > the commit message):
> >
> > https://gitlab.com/peterx/qemu/-/commit/86d973360890cecc564a4a5bcf9a01b9efde368a
> >
> > Thanks,
> 
> I read the commit message.  No surprises except
> 
>     It should be the same value as RAM's remaining report when VFIO is not
>     involved, and it should report more than that when VFIO is involved.
> 
>     One note is that this field will be an estimate and may not be sampled the
>     exact same time versus the RAM remaining section.  So it may report
>     slightly different values even if only RAM is involved.  The difference
>     shouldn't matter though to mgmt to make correct decisions.
> 
> The second paragraph is new.  The first paragraph says they "should be
> the same", the second that they "may [be] slightly different".
> Suboptimal.

Yes, it is misleading, I overlooked that. :(

> 
> Here's my try:
> 
>     It should be approximately the same value ...
> 
>     Only approximately, because this field will be ...
> 
> It's just a commit message, though.  Up to you.

New version:

    When VFIO is not involved, the value reported in the new field should
    be approximately the same as reported in the "remaining" field of the
    RAM section.  It is only an approximate value because the system-wide
    remaining data is a cached value, which gets frequently updated by
    migration core.  OTOH, the RAM's remaining data is accurate.
    
    When VFIO is involved, the new value reported should normally be
    larger, because it will include the size of VFIO remaining data too.

> 
> QAPI schema
> Acked-by: Markus Armbruster <armbru@redhat.com>

Thanks, I will at least wait for 1-2 days if it still needs update.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports
  2026-04-28 15:26         ` Peter Xu
@ 2026-04-28 19:02           ` Markus Armbruster
  0 siblings, 0 replies; 41+ messages in thread
From: Markus Armbruster @ 2026-04-28 19:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: Markus Armbruster, qemu-devel, Joao Martins,
	Cédric Le Goater, Avihai Horon, Daniel P . Berrangé,
	Fabiano Rosas, Prasad Pandit, Alex Williamson, Kirti Wankhede,
	Zhiyi Guo, Maciej S . Szmigiero, Juraj Marcin,
	Dr. David Alan Gilbert

Peter Xu <peterx@redhat.com> writes:

> On Sat, Apr 25, 2026 at 07:46:45AM +0200, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:

[...]

>> > The complete new version of patch is here (I updated quite a few places on
>> > the commit message):
>> >
>> > https://gitlab.com/peterx/qemu/-/commit/86d973360890cecc564a4a5bcf9a01b9efde368a
>> >
>> > Thanks,
>> 
>> I read the commit message.  No surprises except
>> 
>>     It should be the same value as RAM's remaining report when VFIO is not
>>     involved, and it should report more than that when VFIO is involved.
>> 
>>     One note is that this field will be an estimate and may not be sampled the
>>     exact same time versus the RAM remaining section.  So it may report
>>     slightly different values even if only RAM is involved.  The difference
>>     shouldn't matter though to mgmt to make correct decisions.
>> 
>> The second paragraph is new.  The first paragraph says they "should be
>> the same", the second that they "may [be] slightly different".
>> Suboptimal.
>
> Yes, it is misleading, I overlooked that. :(
>
>> 
>> Here's my try:
>> 
>>     It should be approximately the same value ...
>> 
>>     Only approximately, because this field will be ...
>> 
>> It's just a commit message, though.  Up to you.
>
> New version:
>
>     When VFIO is not involved, the value reported in the new field should
>     be approximately the same as reported in the "remaining" field of the
>     RAM section.  It is only an approximate value because the system-wide
>     remaining data is a cached value, which gets frequently updated by
>     migration core.  OTOH, the RAM's remaining data is accurate.
>     
>     When VFIO is involved, the new value reported should normally be
>     larger, because it will include the size of VFIO remaining data too.

Looks good to me, thanks!

>> QAPI schema
>> Acked-by: Markus Armbruster <armbru@redhat.com>
>
> Thanks, I will at least wait for 1-2 days if it still needs update.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls
  2026-04-21 20:21 ` [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls Peter Xu
                     ` (3 preceding siblings ...)
  2026-04-23 15:10   ` Avihai Horon
@ 2026-04-29 14:46   ` Avihai Horon
  2026-04-29 15:43     ` Peter Xu
  4 siblings, 1 reply; 41+ messages in thread
From: Avihai Horon @ 2026-04-29 14:46 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Daniel P . Berrangé, Fabiano Rosas, Prasad Pandit,
	Alex Williamson, Kirti Wankhede, Zhiyi Guo, Maciej S . Szmigiero,
	Juraj Marcin


On 4/21/2026 23:21, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> Add two tracepoints for both precopy and stopcopy query ioctls.  When at
> it, add one warn_report_once() for each of them when it fails.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   hw/vfio/migration.c  | 33 +++++++++++++++++++++++----------
>   hw/vfio/trace-events |  2 ++
>   2 files changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index e6e6a0d53d..04d9f94edb 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -329,6 +329,7 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
>       struct vfio_device_feature_mig_data_size *mig_data_size =
>           (struct vfio_device_feature_mig_data_size *)feature->data;
>       VFIOMigration *migration = vbasedev->migration;
> +    int ret;
>
>       feature->argsz = sizeof(buf);
>       feature->flags =
> @@ -340,12 +341,18 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
>            * is reported so downtime limit won't be violated.
>            */
>           migration->stopcopy_size = VFIO_MIG_STOP_COPY_SIZE;
> -        return -errno;
> +        ret = -errno;
> +        warn_report_once("VFIO device %s ioctl(VFIO_DEVICE_FEATURE) on "
> +                         "VFIO_DEVICE_FEATURE_MIG_DATA_SIZE failed (%d)",
> +                         vbasedev->name, ret);
> +    } else {
> +        migration->stopcopy_size = mig_data_size->stop_copy_length;
> +        ret = 0;
>       }
>
> -    migration->stopcopy_size = mig_data_size->stop_copy_length;
> +    trace_vfio_query_stop_copy_size(migration->stopcopy_size, ret);
>
> -    return 0;
> +    return ret;
>   }
>
>   static int vfio_query_precopy_size(VFIOMigration *migration)
> @@ -353,18 +360,24 @@ static int vfio_query_precopy_size(VFIOMigration *migration)
>       struct vfio_precopy_info precopy = {
>           .argsz = sizeof(precopy),
>       };
> -
> -    migration->precopy_init_size = 0;
> -    migration->precopy_dirty_size = 0;
> +    int ret;
>
>       if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO, &precopy)) {
> -        return -errno;
> +        migration->precopy_init_size = 0;
> +        migration->precopy_dirty_size = 0;
> +        ret = -errno;
> +        warn_report_once("VFIO device %s ioctl(VFIO_MIG_GET_PRECOPY_INFO) "
> +                         "failed (%d)", migration->vbasedev->name, ret);
> +    } else {
> +        migration->precopy_init_size = precopy.initial_bytes;
> +        migration->precopy_dirty_size = precopy.dirty_bytes;
> +        ret = 0;
>       }
>
> -    migration->precopy_init_size = precopy.initial_bytes;
> -    migration->precopy_dirty_size = precopy.dirty_bytes;
> +    trace_vfio_query_precopy_size(migration->precopy_init_size,
> +                                  migration->precopy_dirty_size, ret);
>
> -    return 0;
> +    return ret;
>   }
>
>   /* Returns the size of saved data on success and -errno on error */
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 287df0b8cb..854a7e4b19 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -176,6 +176,8 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
>   vfio_state_pending(const char *name, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size, bool exact) " (%s) stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64 " exact %d"
>   vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
>   vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> +vfio_query_stop_copy_size(uint64_t size, int ret) "stopcopy size %"PRIu64" ret %d"
> +vfio_query_precopy_size(uint64_t init_size, uint64_t dirty_size, int ret) "init %"PRIu64" dirty %"PRIu64" ret %d"

Ah sorry, I just noticed this now while doing some other work -- if you 
respin the series, could you add the device name to both traces? And 
while at it keep the traces alphabetically sorted?

Thanks.

>
>   #iommufd.c
>
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls
  2026-04-29 14:46   ` Avihai Horon
@ 2026-04-29 15:43     ` Peter Xu
  2026-04-29 18:03       ` Avihai Horon
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2026-04-29 15:43 UTC (permalink / raw)
  To: Avihai Horon
  Cc: qemu-devel, Joao Martins, Markus Armbruster,
	Cédric Le Goater, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Maciej S . Szmigiero, Juraj Marcin

On Wed, Apr 29, 2026 at 05:46:19PM +0300, Avihai Horon wrote:
> Ah sorry, I just noticed this now while doing some other work -- if you
> respin the series, could you add the device name to both traces? And while
> at it keep the traces alphabetically sorted?

Could you help check if below fixup is suitable to be squashed?

Thanks,

===8<===
From 930723da46e16c2ed5405916a7e10d4f560e22fa Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 29 Apr 2026 11:41:49 -0400
Subject: [PATCH] fixup! vfio/migration: Add tracepoints for precopy/stopcopy
 query ioctls

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/vfio/migration.c  | 6 ++++--
 hw/vfio/trace-events | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 04d9f94edb..150e28656e 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -350,7 +350,8 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
         ret = 0;
     }
 
-    trace_vfio_query_stop_copy_size(migration->stopcopy_size, ret);
+    trace_vfio_query_stop_copy_size(vbasedev->name,
+                                    migration->stopcopy_size, ret);
 
     return ret;
 }
@@ -374,7 +375,8 @@ static int vfio_query_precopy_size(VFIOMigration *migration)
         ret = 0;
     }
 
-    trace_vfio_query_precopy_size(migration->precopy_init_size,
+    trace_vfio_query_precopy_size(migration->vbasedev->name,
+                                  migration->precopy_init_size,
                                   migration->precopy_dirty_size, ret);
 
     return ret;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 854a7e4b19..ab27ff5ea2 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -162,6 +162,8 @@ vfio_migration_realize(const char *name) " (%s)"
 vfio_migration_set_device_state(const char *name, const char *state) " (%s) state %s"
 vfio_migration_set_state(const char *name, const char *new_state, const char *recover_state) " (%s) new state %s, recover state %s"
 vfio_migration_state_notifier(const char *name, int state) " (%s) state %d"
+vfio_query_stop_copy_size(const char *name, uint64_t size, int ret) " (%s) stopcopy size %"PRIu64" ret %d"
+vfio_query_precopy_size(const char *name, uint64_t init_size, uint64_t dirty_size, int ret) " (%s) init %"PRIu64" dirty %"PRIu64" ret %d"
 vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_block_precopy_empty_hit(const char *name) " (%s)"
 vfio_save_cleanup(const char *name) " (%s)"
@@ -176,8 +178,6 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
 vfio_state_pending(const char *name, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size, bool exact) " (%s) stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64 " exact %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
 vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
-vfio_query_stop_copy_size(uint64_t size, int ret) "stopcopy size %"PRIu64" ret %d"
-vfio_query_precopy_size(uint64_t init_size, uint64_t dirty_size, int ret) "init %"PRIu64" dirty %"PRIu64" ret %d"
 
 #iommufd.c
 
-- 
2.53.0


-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls
  2026-04-29 15:43     ` Peter Xu
@ 2026-04-29 18:03       ` Avihai Horon
  0 siblings, 0 replies; 41+ messages in thread
From: Avihai Horon @ 2026-04-29 18:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Joao Martins, Markus Armbruster,
	Cédric Le Goater, Daniel P. Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Maciej S . Szmigiero, Juraj Marcin


On 4/29/2026 18:43, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Wed, Apr 29, 2026 at 05:46:19PM +0300, Avihai Horon wrote:
>> Ah sorry, I just noticed this now while doing some other work -- if you
>> respin the series, could you add the device name to both traces? And while
>> at it keep the traces alphabetically sorted?
> Could you help check if below fixup is suitable to be squashed?

Yes, looks good, thanks!

>
> Thanks,
>
> ===8<===
>  From 930723da46e16c2ed5405916a7e10d4f560e22fa Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Wed, 29 Apr 2026 11:41:49 -0400
> Subject: [PATCH] fixup! vfio/migration: Add tracepoints for precopy/stopcopy
>   query ioctls
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   hw/vfio/migration.c  | 6 ++++--
>   hw/vfio/trace-events | 4 ++--
>   2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 04d9f94edb..150e28656e 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -350,7 +350,8 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev)
>           ret = 0;
>       }
>
> -    trace_vfio_query_stop_copy_size(migration->stopcopy_size, ret);
> +    trace_vfio_query_stop_copy_size(vbasedev->name,
> +                                    migration->stopcopy_size, ret);
>
>       return ret;
>   }
> @@ -374,7 +375,8 @@ static int vfio_query_precopy_size(VFIOMigration *migration)
>           ret = 0;
>       }
>
> -    trace_vfio_query_precopy_size(migration->precopy_init_size,
> +    trace_vfio_query_precopy_size(migration->vbasedev->name,
> +                                  migration->precopy_init_size,
>                                     migration->precopy_dirty_size, ret);
>
>       return ret;
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 854a7e4b19..ab27ff5ea2 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -162,6 +162,8 @@ vfio_migration_realize(const char *name) " (%s)"
>   vfio_migration_set_device_state(const char *name, const char *state) " (%s) state %s"
>   vfio_migration_set_state(const char *name, const char *new_state, const char *recover_state) " (%s) new state %s, recover state %s"
>   vfio_migration_state_notifier(const char *name, int state) " (%s) state %d"
> +vfio_query_stop_copy_size(const char *name, uint64_t size, int ret) " (%s) stopcopy size %"PRIu64" ret %d"
> +vfio_query_precopy_size(const char *name, uint64_t init_size, uint64_t dirty_size, int ret) " (%s) init %"PRIu64" dirty %"PRIu64" ret %d"
>   vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
>   vfio_save_block_precopy_empty_hit(const char *name) " (%s)"
>   vfio_save_cleanup(const char *name) " (%s)"
> @@ -176,8 +178,6 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
>   vfio_state_pending(const char *name, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size, bool exact) " (%s) stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64 " exact %d"
>   vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
>   vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> -vfio_query_stop_copy_size(uint64_t size, int ret) "stopcopy size %"PRIu64" ret %d"
> -vfio_query_precopy_size(uint64_t init_size, uint64_t dirty_size, int ret) "init %"PRIu64" dirty %"PRIu64" ret %d"
>
>   #iommufd.c
>
> --
> 2.53.0
>
>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports
  2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
                   ` (15 preceding siblings ...)
  2026-04-21 20:21 ` [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls Peter Xu
@ 2026-04-29 19:52 ` Peter Xu
  16 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-29 19:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Maciej S . Szmigiero, Juraj Marcin

On Tue, Apr 21, 2026 at 04:20:54PM -0400, Peter Xu wrote:
> Peter Xu (16):
>   qemu-iotests: Add query-migrate test for dirty-bitmap
>   migration: Fix low possibility downtime violation
>   migration/qapi: Rename MigrationStats to MigrationRAMStats
>   vfio/migration: Cache stop size in VFIOMigration
>   migration/treewide: Merge @state_pending_{exact|estimate} APIs
>   migration: Use the new save_query_pending() API directly
>   migration: Introduce stopcopy_bytes in save_query_pending()
>   vfio/migration: Fix incorrect reporting for VFIO pending data
>   migration: Move iteration counter out of RAM
>   migration: Introduce a helper to return switchover bw estimate
>   migration: Calculate expected downtime on demand
>   migration: Fix calculation of expected_downtime to take VFIO info
>   migration: Remember total dirty bytes in mig_stats
>   migration/qapi: Introduce system-wise "remaining" reports
>   migration/qapi: Update unit for avail-switchover-bandwidth
>   vfio/migration: Add tracepoints for precopy/stopcopy query ioctls

I queued patch 2-16, with slight amendments per reviewers on some patches.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2026-04-29 19:52 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
2026-04-21 20:20 ` [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap Peter Xu
2026-04-22  8:08   ` Vladimir Sementsov-Ogievskiy
2026-04-24 14:50     ` Peter Xu
2026-04-21 20:20 ` [PATCH v2 02/16] migration: Fix low possibility downtime violation Peter Xu
2026-04-21 20:20 ` [PATCH v2 03/16] migration/qapi: Rename MigrationStats to MigrationRAMStats Peter Xu
2026-04-24  9:03   ` Markus Armbruster
2026-04-21 20:20 ` [PATCH v2 04/16] vfio/migration: Cache stop size in VFIOMigration Peter Xu
2026-04-21 20:20 ` [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs Peter Xu
2026-04-22  8:23   ` Vladimir Sementsov-Ogievskiy
2026-04-22  8:29   ` Vladimir Sementsov-Ogievskiy
2026-04-22 15:44     ` Peter Xu
2026-04-22 17:06       ` Vladimir Sementsov-Ogievskiy
2026-04-21 20:21 ` [PATCH v2 06/16] migration: Use the new save_query_pending() API directly Peter Xu
2026-04-21 20:21 ` [PATCH v2 07/16] migration: Introduce stopcopy_bytes in save_query_pending() Peter Xu
2026-04-22 13:16   ` Juraj Marcin
2026-04-23 15:05   ` Avihai Horon
2026-04-21 20:21 ` [PATCH v2 08/16] vfio/migration: Fix incorrect reporting for VFIO pending data Peter Xu
2026-04-21 20:21 ` [PATCH v2 09/16] migration: Move iteration counter out of RAM Peter Xu
2026-04-21 20:21 ` [PATCH v2 10/16] migration: Introduce a helper to return switchover bw estimate Peter Xu
2026-04-21 20:21 ` [PATCH v2 11/16] migration: Calculate expected downtime on demand Peter Xu
2026-04-21 20:21 ` [PATCH v2 12/16] migration: Fix calculation of expected_downtime to take VFIO info Peter Xu
2026-04-21 20:21 ` [PATCH v2 13/16] migration: Remember total dirty bytes in mig_stats Peter Xu
2026-04-22 13:18   ` Juraj Marcin
2026-04-21 20:21 ` [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports Peter Xu
2026-04-24  7:17   ` Markus Armbruster
2026-04-24 15:15     ` Peter Xu
2026-04-25  5:46       ` Markus Armbruster
2026-04-28 15:26         ` Peter Xu
2026-04-28 19:02           ` Markus Armbruster
2026-04-21 20:21 ` [PATCH v2 15/16] migration/qapi: Update unit for avail-switchover-bandwidth Peter Xu
2026-04-24  7:18   ` Markus Armbruster
2026-04-21 20:21 ` [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls Peter Xu
2026-04-22  7:51   ` Cédric Le Goater
2026-04-22  7:52   ` Cédric Le Goater
2026-04-22  9:56   ` Cédric Le Goater
2026-04-23 15:10   ` Avihai Horon
2026-04-29 14:46   ` Avihai Horon
2026-04-29 15:43     ` Peter Xu
2026-04-29 18:03       ` Avihai Horon
2026-04-29 19:52 ` [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.