qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 00/12] vfio/migration: Implement VFIO migration protocol v2
@ 2023-02-09 19:20 Avihai Horon
  2023-02-09 19:20 ` [PATCH v10 01/12] linux-headers: Update to v6.2-rc1 Avihai Horon
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Avihai Horon @ 2023-02-09 19:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Juan Quintela, Dr. David Alan Gilbert,
	Michael S. Tsirkin, Cornelia Huck, Paolo Bonzini,
	Vladimir Sementsov-Ogievskiy, Cédric Le Goater, Yishai Hadas,
	Jason Gunthorpe, Maor Gottlieb, Avihai Horon, Kirti Wankhede,
	Tarun Gupta, Joao Martins

Hello,

This v10 is rebased over Juan's pull request [1] which made some changes
to the migration handlers. Therefore, I had to do some more significant
changes to the main patch (#9) and thus removed Cedric's R-b.
The full changelog is below.



Following VFIO migration protocol v2 acceptance in kernel, this series
implements VFIO migration according to the new v2 protocol and replaces
the now deprecated v1 implementation.

The main differences between v1 and v2 migration protocols are:
1. VFIO device state is represented as a finite state machine instead of
   a bitmap.

2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
   ioctl and normal read() and write() instead of the migration region
   used in v1.

3. Pre-copy is made optional in v2 protocol. Support for pre-copy will
   be added later on.

Full description of the v2 protocol and the differences from v1 can be
found here [2].



Patch list:

Patch 1 updates linux headers so we will have the MIG_DATA_SIZE ioctl.

Patches 2-8 are prep patches fixing bugs, adding QEMUFile function
that will be used later and refactoring v1 protocol code to make it
easier to add v2 protocol.

Patches 9-12 implement v2 protocol and remove v1 protocol.

Thanks.



Changes from v9 [3]:
- Rebased on latest master branch. As part of it:
  1. Dropped patch #2 as it was merged in Juan's pull req already.
  2. Dropped patch #11 (vfio_save_pending() optimization). With Juan's
     changes we can achieve this optimization by implementing only
     .state_pending_exact() handler and not a .state_pending_estimate()
      handler (see comment in code).
  3. Some function refactoring to be aligned with Juan's changes.

- Made multiple_devices_migration_blocker static (Cedric).

- Addressed Alex's comments:
  1. Changed multiple devices migration block/unblock logic.
  2. Removed recover_state local variable in
     vfio_save_complete_precopy().
  3. Added comments explaining the purpose of ERROR state usage as a
     recover state (three places).
  4. Changed vfio_migration_set_state() to directly reset the device if
     ERROR is given as recover_state, instead of setting the device in
     ERROR state, failing and only then resetting the device.
  5. Added a note in VFIO migration documentation that migration is
     supported only with a single device.



Changes from v8 [4]:
- Added patch that blocks migration of multiple devices. As discussed,
  this is necessary since VFIO migration doesn't support P2P yet.
- Removed unnecessary P2P code. This should be added when P2P support is
  added.
- Fixed vfio_save_block() comment to say -errno is returned on error.
- Added Reviewed-by tag to linux headers sync patch.



Changes from v7 [5]:
- Fixed compilation error on windows in patch #9 reported by Cedric.



Changes from v6 [6]:
- Fixed another compilation error in patch #9 reported by Cedric.
- Added Reviewed-by tags.



Changes from v5 [7]:
- Dropped patch #3.
- Simplified patch #5 as per Alex's suggestion.
- Changed qemu_file_get_to_fd() to return -EIO instead of -1, as
  suggested by Cedric.
  Also changed it so now write returns -errno instead of -1 on error.
- Fixed compilation error reported by Cedric.
- Changed vfio_migration_query_flags() to print error message and return
  -errno in error case as suggested by Cedric.
- Added Reviewed-by tags.



Changes from v4 [8]:
- Rebased on latest master branch.
- Added linux header update to kernel v6.2-rc1.
- Merged preview patches (#13-14) into this series.



Changes from v3 [9]:
- Rebased on latest master branch.

- Dropped patch #1 "migration: Remove res_compatible parameter" as
  it's not mandatory to this series and needs some further discussion.

- Dropped patch #3 "migration: Block migration comment or code is
  wrong" as it has been merged already.

- Addressed overlooked corner case reported by Vladimir in patch #4
  "migration: Simplify migration_iteration_run()".

- Dropped patch #5 "vfio/migration: Fix wrong enum usage" as it has
  been merged already.

- In patch #12 "vfio/migration: Implement VFIO migration protocol v2":
  1. Changed vfio_save_pending() to update res_precopy_only instead of
     res_postcopy_only (as VFIO migration doesn’t support postcopy).
  2. Moved VFIOMigration->data_buffer allocation to vfio_save_setup()
     and its de-allocation to vfio_save_cleanup(), so now it's
     allocated when actually used (during migration and only on source
     side).

- Addressed Alex's comments:
  1. Eliminated code duplication in patch #7 "vfio/migration: Allow
     migration without VFIO IOMMU dirty tracking support".
  2. Removed redundant initialization of vfio_region_info in patch #10
     "vfio/migration: Move migration v1 logic to vfio_migration_init()".
  3. Added comment about VFIO_MIG_DATA_BUFFER_SIZE heuristic (and
     renamed to VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE).
  4. Cast migration structs to their actual types instead of void *.
  5. Return -errno and -EBADF instead of -1 in vfio_migration_set_state().
  6. Set migration->device_state to new_state even in case of data_fd
     out of sync. Although migration will be aborted, setting device
     state succeeded so we should reflect that.
  7. Renamed VFIO_MIG_PENDING_SIZE to VFIO_MIG_STOP_COPY_SIZE, set it
     to 100G and added a comment about the size choice.
  8. Changed vfio_save_block() to return -errno on error.
  9. Squashed Patch #14 to patch #12.
  10. Adjusted migration data buffer size according to MIG_DATA_SIZE
      ioctl.

- In preview patch #17 "vfio/migration: Query device data size in
  vfio_save_pending()" - changed vfio_save_pending() to report
  VFIO_MIG_STOP_COPY_SIZE on any error.
   
- Added another preview patch "vfio/migration: Optimize
  vfio_save_pending()".

- Added ret value on some traces as suggested by David.

- Added Reviewed-By tags.



Changes from v2 [10]:
- Rebased on top of latest master branch.

- Added relevant patches from Juan's RFC [11] with minor changes:
  1. Added Reviewed-by tag to patch #3 in the RFC.
  2. Adjusted patch #6 to work without patch #4 in the RFC.

- Added a new patch "vfio/migration: Fix wrong enum usage" that fixes a
  small bug in v1 code. This patch has been sent a few weeks ago [12] but
  wasn't taken yet.

- Patch #2 (vfio/migration: Skip pre-copy if dirty page tracking is not
  supported):
  1. Dropped this patch and replaced it with
     "vfio/migration: Allow migration without VFIO IOMMU dirty tracking
     support".
     The new patch uses a different approach – instead of skipping
     pre-copy phase completely, QEMU VFIO code will mark RAM dirty
     (instead of kernel). This ensures that current migration behavior
     is not changed and SLA is taken into account.

- Patch #4 (vfio/common: Change vfio_devices_all_running_and_saving()
  logic to equivalent one):
  1. Improved commit message to better explain the change.

- Patch #7 (vfio/migration: Implement VFIO migration protocol v2):
  1. Enhanced vfio_migration_set_state() error reporting.
  2. In vfio_save_complete_precopy() of v2 protocol - when changing
     device state to STOP, set recover state to ERROR instead of STOP as
     suggested by Joao.
  3. Constify SaveVMHandlers of v2 protocol.
  4. Modified trace_vfio_vmstate_change and
     trace_vfio_migration_set_state
     to print device state string instead of enum.
  5. Replaced qemu_put_buffer_async() with qemu_put_buffer() in
     vfio_save_block(), as requested by Juan.
  6. Implemented v2 protocol version of vfio_save_pending() as requested
     by Juan. Until ioctl to get device state size is added, we just
     report some big hard coded value, as agreed in KVM call.

- Patch #9 (vfio/migration: Reset device if setting recover state
  fails):
  1. Enhanced error reporting.
  2. Set VFIOMigration->device_state to RUNNING after device reset.

- Patch #11 (docs/devel: Align vfio-migration docs to VFIO migration
  v2):
  1. Adjusted vfio migration documentation to the added
     vfio_save_pending()

- Added the last patch (which is not for merging yet) that demonstrates
  how the new ioctl to get device state size will work once added.



Changes from v1 [13]:
- Split the big patch that replaced v1 with v2 into several patches as
  suggested by Joao, to make review easier.
- Change warn_report to warn_report_once when container doesn't support
  dirty tracking.
- Add Reviewed-by tag.

[1]
https://lore.kernel.org/qemu-devel/20230202160640.2300-1-quintela@redhat.com/

[2]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/

[3]
https://lore.kernel.org/qemu-devel/20230206123137.31149-1-avihaih@nvidia.com/

[4]
https://lore.kernel.org/qemu-devel/20230116141135.12021-1-avihaih@nvidia.com/

[5]
https://lore.kernel.org/qemu-devel/20230115183556.7691-1-avihaih@nvidia.com/

[6]
https://lore.kernel.org/qemu-devel/20230112085020.15866-1-avihaih@nvidia.com/

[7]
https://lore.kernel.org/qemu-devel/20221229110345.12480-1-avihaih@nvidia.com/

[8]
https://lore.kernel.org/qemu-devel/20221130094414.27247-1-avihaih@nvidia.com/

[9]
https://lore.kernel.org/qemu-devel/20221103161620.13120-1-avihaih@nvidia.com/

[10]
https://lore.kernel.org/all/20220530170739.19072-1-avihaih@nvidia.com/

[11]
https://lore.kernel.org/qemu-devel/20221003031600.20084-1-quintela@redhat.com/T/

[12]
https://lore.kernel.org/all/20221016085752.32740-1-avihaih@nvidia.com/

[13]
https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/

Avihai Horon (12):
  linux-headers: Update to v6.2-rc1
  vfio/migration: Fix NULL pointer dereference bug
  vfio/migration: Allow migration without VFIO IOMMU dirty tracking
    support
  migration/qemu-file: Add qemu_file_get_to_fd()
  vfio/common: Change vfio_devices_all_running_and_saving() logic to
    equivalent one
  vfio/migration: Block multiple devices migration
  vfio/migration: Move migration v1 logic to vfio_migration_init()
  vfio/migration: Rename functions/structs related to v1 protocol
  vfio/migration: Implement VFIO migration protocol v2
  vfio/migration: Remove VFIO migration protocol v1
  vfio: Alphabetize migration section of VFIO trace-events file
  docs/devel: Align VFIO migration docs to v2 protocol

 docs/devel/vfio-migration.rst                 |  72 +-
 include/hw/vfio/vfio-common.h                 |  10 +-
 include/standard-headers/drm/drm_fourcc.h     |  63 +-
 include/standard-headers/linux/ethtool.h      |  81 +-
 include/standard-headers/linux/fuse.h         |  20 +-
 .../linux/input-event-codes.h                 |   4 +
 include/standard-headers/linux/pci_regs.h     |   2 +
 include/standard-headers/linux/virtio_blk.h   |  19 +
 include/standard-headers/linux/virtio_bt.h    |   8 +
 include/standard-headers/linux/virtio_net.h   |   4 +
 linux-headers/asm-arm64/kvm.h                 |   1 +
 linux-headers/asm-generic/hugetlb_encode.h    |  26 +-
 linux-headers/asm-generic/mman-common.h       |   2 +
 linux-headers/asm-mips/mman.h                 |   2 +
 linux-headers/asm-riscv/kvm.h                 |   7 +
 linux-headers/asm-x86/kvm.h                   |  11 +-
 linux-headers/linux/kvm.h                     |  32 +-
 linux-headers/linux/psci.h                    |  14 +
 linux-headers/linux/userfaultfd.h             |   4 +
 linux-headers/linux/vfio.h                    | 278 ++++++-
 migration/qemu-file.h                         |   1 +
 hw/vfio/common.c                              |  92 ++-
 hw/vfio/migration.c                           | 741 ++++++------------
 migration/qemu-file.c                         |  34 +
 hw/vfio/trace-events                          |  28 +-
 25 files changed, 940 insertions(+), 616 deletions(-)

-- 
2.26.3



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-02-16  8:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-09 19:20 [PATCH v10 00/12] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-02-09 19:20 ` [PATCH v10 01/12] linux-headers: Update to v6.2-rc1 Avihai Horon
2023-02-09 19:20 ` [PATCH v10 02/12] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
2023-02-09 19:20 ` [PATCH v10 03/12] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support Avihai Horon
2023-02-15 12:43   ` Juan Quintela
2023-02-15 17:47     ` Avihai Horon
2023-02-15 18:04       ` Juan Quintela
2023-02-15 20:14         ` Alex Williamson
2023-02-15 20:38           ` Jason Gunthorpe
2023-02-15 21:02             ` Alex Williamson
2023-02-09 19:20 ` [PATCH v10 04/12] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
2023-02-09 23:50   ` Alex Williamson
2023-02-15  7:33   ` Juan Quintela
2023-02-09 19:20 ` [PATCH v10 05/12] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
2023-02-09 19:20 ` [PATCH v10 06/12] vfio/migration: Block multiple devices migration Avihai Horon
2023-02-10 13:56   ` Cédric Le Goater
2023-02-15 12:46   ` Juan Quintela
2023-02-09 19:20 ` [PATCH v10 07/12] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
2023-02-09 19:20 ` [PATCH v10 08/12] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
2023-02-09 19:20 ` [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-02-15 13:01   ` Juan Quintela
2023-02-15 18:23     ` Avihai Horon
2023-02-15 20:53       ` Alex Williamson
2023-02-16  8:15         ` Avihai Horon
2023-02-09 19:20 ` [PATCH v10 10/12] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
2023-02-15 13:02   ` Juan Quintela
2023-02-09 19:20 ` [PATCH v10 11/12] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
2023-02-15 13:03   ` Juan Quintela
2023-02-09 19:20 ` [PATCH v10 12/12] docs/devel: Align VFIO migration docs to v2 protocol Avihai Horon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).