All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports
@ 2026-04-21 20:20 Peter Xu
  2026-04-21 20:20 ` [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap Peter Xu
                   ` (16 more replies)
  0 siblings, 17 replies; 41+ messages in thread
From: Peter Xu @ 2026-04-21 20:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Joao Martins, Markus Armbruster, Cédric Le Goater,
	Avihai Horon, Daniel P . Berrangé, Fabiano Rosas,
	Prasad Pandit, Alex Williamson, Kirti Wankhede, Zhiyi Guo,
	Peter Xu, Maciej S . Szmigiero, Juraj Marcin

CI:  https://gitlab.com/peterx/qemu/-/pipelines/2469074018
rfc: https://lore.kernel.org/r/20260319231302.123135-1-peterx@redhat.com
v1:  https://lore.kernel.org/r/20260408165559.157108-1-peterx@redhat.com

v2:
- Added tags
- Patch 4
  - Fix and rework doc for @save_query_pending [Juraj]
  - Trace "exact" in trace_vfio_state_pending() [Avihai]
  - Avoid mentioning "pre-copy" in vfio.rst doc for query [Avihai]
- Patch 12
  - English errors [Fabiano]
- Patch 13
  - Remove " (bytes)" in HMP line [Fabiano]
- Added patch "qemu-iotests: Add query-migrate test for dirty-bitmap"
  - This covers a bug that I found when testing v1
- Added patch "vfio/migration: Add tracepoints for precopy/stopcopy query
  ioctls" to be able to dump the raw results from the two VFIO ioctls
- Replace patch "migration: Make qemu_savevm_query_pending() available
  anytime" with patch "migration: Remember total dirty bytes in mig_stats"
  - I fell back to "cache the total dirty bytes" idea on this one to avoid
    complication of save_query_pending() invoked anywhere.

Overview
========

VFIO migration was merged quite a while, but we do still see things off
here and there.  This series tries to address some of them, but only based
on my limited understandings.

Two major issues I wanted to resolve:

(1) VFIO reports state_pending_{exact|estimate}() differently

It reports stop-only sizes in exact() only (which includes both precopy and
stopcopy data), while in estimate() it only reports precopy data.  This is
violating the API.  It was done like it to trigger proper sync on the VFIO
ioctls only but it was only a workaround.  This series should fix it by
introducing stopcopy size reporting facility for vmstate handlers.

(2) expected_downtime / remaining doesn't take VFIO devices into account

When query migration, QEMU reports one field called "expected-downtime".
The document was phrasing this almost from RAM perspective, but ideally it
should be about an estimated blackout window (in milliseconds) if we
switchover anytime, based on known information.

This didn't yet took VFIO into account, especially in the case of VFIO
devices that may contain a large amount of device states (like GPUs).

For problem (2), the use case should be that an mgmt app when migrating a
VFIO GPU device needs to always adjust downtime for migration to converge,
because when it's involved normal downtime like 300ms will normally not
suffice.

Now the issue with that is the mgmt doesn't have a good way to know exactly
how well the precopy goes with the whole system and the GPU device.

The hope is fixed expected_downtime will provide one way for the mgmt app
to have a reasonable hint for downtime to setup to converge a migration.

Meanwhile, with a system-wise "remaining" field introduced, mgmt can query
this results at beginning of each iteration to know if a stall is
happening, IOW, if it's likely that this migration will not converge at
all.  When detected, mgmt can start to consider the expected_downtime value
reported above for converging this migration.  See more on testing below.

Tests
=====

Thanks to Cédric on help testing v2.  One thing to mention is we did
encounter one case where we observed reported dirty size overflowed for
uint64_t (on both expected_downtime and system remaining data).

Quotes from test results from Cédric, migrating a RHEL9 VM with a vGPU
(NVIDIA L4-2B) and an MLX5 VF, from a RHEL9 host (vGPU mdev) to a RHEL10
host (vGPU VF), with the vGPU under load (glxgears):

(qemu) info migrate
Status:                 active
Time (ms):              total=21140, setup=86, exp_down=152455434886355 <---- !?!
Remaining:              16 EiB                                          <---- !?!
RAM info:
  Throughput (Mbps):    967.98
  Sizes:                pagesize=4 KiB, total=4 GiB
  Transfers:            transferred=2.29 GiB, remain=4.7 MiB
    Channels:           precopy=1.91 GiB, multifd=0 B, postcopy=0 B, vfio=387 MiB
    Page Types:         normal=499427, zero=559708
  Page Rates (pps):     transfer=0, dirty=1892
  Others:               dirty_syncs=3

It got fixed itself after a few more rounds of iterations, so it also
didn't affects migration ultimately.  Further attempts didn't reproduce it
after I added the tracepoint patch. It would be good if someone knows if it
was a known driver issue.

For detailed testing steps, please refer to v1's cover letter.

Peter Xu (16):
  qemu-iotests: Add query-migrate test for dirty-bitmap
  migration: Fix low possibility downtime violation
  migration/qapi: Rename MigrationStats to MigrationRAMStats
  vfio/migration: Cache stop size in VFIOMigration
  migration/treewide: Merge @state_pending_{exact|estimate} APIs
  migration: Use the new save_query_pending() API directly
  migration: Introduce stopcopy_bytes in save_query_pending()
  vfio/migration: Fix incorrect reporting for VFIO pending data
  migration: Move iteration counter out of RAM
  migration: Introduce a helper to return switchover bw estimate
  migration: Calculate expected downtime on demand
  migration: Fix calculation of expected_downtime to take VFIO info
  migration: Remember total dirty bytes in mig_stats
  migration/qapi: Introduce system-wise "remaining" reports
  migration/qapi: Update unit for avail-switchover-bandwidth
  vfio/migration: Add tracepoints for precopy/stopcopy query ioctls

 docs/about/removed-features.rst               |   2 +-
 docs/devel/migration/main.rst                 |   9 +-
 docs/devel/migration/vfio.rst                 |   9 +-
 qapi/migration.json                           |  32 ++--
 hw/vfio/vfio-migration-internal.h             |   8 +
 include/migration/register.h                  |  59 +++---
 migration/migration-stats.h                   |  20 +-
 migration/migration.h                         |   2 +-
 migration/savevm.h                            |   7 +-
 hw/s390x/s390-stattrib.c                      |   9 +-
 hw/vfio/migration.c                           | 123 +++++++-----
 migration/block-dirty-bitmap.c                |  10 +-
 migration/migration-hmp-cmds.c                |   5 +
 migration/migration.c                         | 177 +++++++++++++-----
 migration/ram.c                               |  40 +---
 migration/savevm.c                            |  42 ++---
 hw/vfio/trace-events                          |   5 +-
 migration/trace-events                        |   3 +-
 .../tests/migrate-bitmaps-postcopy-test       |   6 +
 19 files changed, 322 insertions(+), 246 deletions(-)

-- 
2.53.0



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2026-04-29 19:52 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 20:20 [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
2026-04-21 20:20 ` [PATCH v2 01/16] qemu-iotests: Add query-migrate test for dirty-bitmap Peter Xu
2026-04-22  8:08   ` Vladimir Sementsov-Ogievskiy
2026-04-24 14:50     ` Peter Xu
2026-04-21 20:20 ` [PATCH v2 02/16] migration: Fix low possibility downtime violation Peter Xu
2026-04-21 20:20 ` [PATCH v2 03/16] migration/qapi: Rename MigrationStats to MigrationRAMStats Peter Xu
2026-04-24  9:03   ` Markus Armbruster
2026-04-21 20:20 ` [PATCH v2 04/16] vfio/migration: Cache stop size in VFIOMigration Peter Xu
2026-04-21 20:20 ` [PATCH v2 05/16] migration/treewide: Merge @state_pending_{exact|estimate} APIs Peter Xu
2026-04-22  8:23   ` Vladimir Sementsov-Ogievskiy
2026-04-22  8:29   ` Vladimir Sementsov-Ogievskiy
2026-04-22 15:44     ` Peter Xu
2026-04-22 17:06       ` Vladimir Sementsov-Ogievskiy
2026-04-21 20:21 ` [PATCH v2 06/16] migration: Use the new save_query_pending() API directly Peter Xu
2026-04-21 20:21 ` [PATCH v2 07/16] migration: Introduce stopcopy_bytes in save_query_pending() Peter Xu
2026-04-22 13:16   ` Juraj Marcin
2026-04-23 15:05   ` Avihai Horon
2026-04-21 20:21 ` [PATCH v2 08/16] vfio/migration: Fix incorrect reporting for VFIO pending data Peter Xu
2026-04-21 20:21 ` [PATCH v2 09/16] migration: Move iteration counter out of RAM Peter Xu
2026-04-21 20:21 ` [PATCH v2 10/16] migration: Introduce a helper to return switchover bw estimate Peter Xu
2026-04-21 20:21 ` [PATCH v2 11/16] migration: Calculate expected downtime on demand Peter Xu
2026-04-21 20:21 ` [PATCH v2 12/16] migration: Fix calculation of expected_downtime to take VFIO info Peter Xu
2026-04-21 20:21 ` [PATCH v2 13/16] migration: Remember total dirty bytes in mig_stats Peter Xu
2026-04-22 13:18   ` Juraj Marcin
2026-04-21 20:21 ` [PATCH v2 14/16] migration/qapi: Introduce system-wise "remaining" reports Peter Xu
2026-04-24  7:17   ` Markus Armbruster
2026-04-24 15:15     ` Peter Xu
2026-04-25  5:46       ` Markus Armbruster
2026-04-28 15:26         ` Peter Xu
2026-04-28 19:02           ` Markus Armbruster
2026-04-21 20:21 ` [PATCH v2 15/16] migration/qapi: Update unit for avail-switchover-bandwidth Peter Xu
2026-04-24  7:18   ` Markus Armbruster
2026-04-21 20:21 ` [PATCH v2 16/16] vfio/migration: Add tracepoints for precopy/stopcopy query ioctls Peter Xu
2026-04-22  7:51   ` Cédric Le Goater
2026-04-22  7:52   ` Cédric Le Goater
2026-04-22  9:56   ` Cédric Le Goater
2026-04-23 15:10   ` Avihai Horon
2026-04-29 14:46   ` Avihai Horon
2026-04-29 15:43     ` Peter Xu
2026-04-29 18:03       ` Avihai Horon
2026-04-29 19:52 ` [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.