From: "Cédric Le Goater" <clg@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Peter Xu" <peterx@redhat.com>, "Fabiano Rosas" <farosas@suse.de>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Avihai Horon" <avihaih@nvidia.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Markus Armbruster" <armbru@redhat.com>,
"Prasad Pandit" <pjp@fedoraproject.org>,
"Cédric Le Goater" <clg@redhat.com>
Subject: [PATCH v4 00/25] migration: Improve error reporting
Date: Wed, 6 Mar 2024 14:34:15 +0100 [thread overview]
Message-ID: <20240306133441.2351700-1-clg@redhat.com> (raw)
Hello,
The motivation behind these changes is to improve error reporting to
the upper management layer (libvirt) with a more detailed error, this
to let it decide, depending on the reported error, whether to try
migration again later. It would be useful in cases where migration
fails due to lack of HW resources on the host. For instance, some
adapters can only initiate a limited number of simultaneous dirty
tracking requests and this imposes a limit on the the number of VMs
that can be migrated simultaneously.
We are not quite ready for such a mechanism but what we can do first is
to cleanup the error reporting in the early save_setup sequence. This
is what the following changes propose, by adding an Error** argument to
various handlers and propagating it to the core migration subsystem.
Patchset is organized as follow :
* [1-4] already queued in migration-next.
migration: Report error when shutdown fails
migration: Remove SaveStateHandler and LoadStateHandler typedefs
migration: Add documentation for SaveVMHandlers
migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
* [5-9] are prequisite changes in other components related to the
migration save_setup() handler. They make sure a failure is not
returned without setting an error.
s390/stattrib: Add Error** argument to set_migrationmode() handler
vfio: Always report an error in vfio_save_setup()
migration: Always report an error in block_save_setup()
migration: Always report an error in ram_save_setup()
migration: Add Error** argument to vmstate_save()
* [10-15] are the core changes in migration and memory components to
propagate an error reported in a save_setup() handler.
migration: Add Error** argument to qemu_savevm_state_setup()
migration: Add Error** argument to .save_setup() handler
migration: Add Error** argument to .load_setup() handler
memory: Add Error** argument to .log_global_start() handler
memory: Add Error** argument to the global_dirty_log routines
migration: Modify ram_init_bitmaps() to report dirty tracking errors
* [16-19] contains the VFIO changes we are interested in. Can go
through vfio-next.
vfio: Add Error** argument to .set_dirty_page_tracking() handler
vfio: Add Error** argument to vfio_devices_dma_logging_start()
vfio: Add Error** argument to vfio_devices_dma_logging_stop()
vfio: Use new Error** argument in vfio_save_setup()
* [20-25] are followups for better error handling in VFIO. Good to
have but not necessary for the issue described in the intro. Can go
through vfio-next.
vfio: Add Error** argument to .vfio_save_config() handler
vfio: Reverse test on vfio_get_dirty_bitmap()
memory: Add Error** argument to memory_get_xlat_addr()
vfio: Add Error** argument to .get_dirty_bitmap() handler
vfio: Also trace event failures in vfio_save_complete_precopy()
vfio: Extend vfio_set_migration_error() with Error* argument
Thanks,
C.
Changes in v4:
- Fixed frenchism futur to future
- Fixed typo in set_migrationmode() handler
- Added error_free() in hmp_migrationmode()
- Fixed state name printed out in error returned by vfio_save_setup()
- Fixed test on error returned by qemu_file_get_error()
- Added an error when bdrv_nb_sectors() returns a negative value
- Dropped log_global_stop() and log_global_sync() changes
- Dropped MEMORY_LISTENER_CALL_LOG_GLOBAL
- Modified memory_global_dirty_log_start() to loop on the list of
listeners and handle errors directly.
- Introduced memory_global_dirty_log_rollback() to revert operations
previously done
Changes in v3:
- New changes to make sure an error is always set in case of failure.
This is the reason behing the 5/6 extra patches. (Markus)
- Documentation fixup (Peter + Avihai)
- Set migration state to MIGRATION_STATUS_FAILED always
- Fixed error handling in bg_migration_thread() (Peter)
- Fixed return value of vfio_listener_log_global_start/stop().
Went unnoticed because value is not tested. (Peter)
- Add ERRP_GUARD() when error_prepend is used
- Use error_setg_errno() when possible
Changes in v2:
- Removed v1 patches addressing the return-path thread termination as
they are now superseded by :
https://lore.kernel.org/qemu-devel/20240226203122.22894-1-farosas@suse.de/
- Documentation updates of handlers
- Removed call to PRECOPY_NOTIFY_SETUP notifiers in case of errors
- Modified routines taking an Error** argument to return a bool when
possible and made adjustments in callers.
- new MEMORY_LISTENER_CALL_LOG_GLOBAL macro for .log_global*()
handlers
- Handled SETUP state when migration terminates
- Modified memory_get_xlat_addr() to take an Error** argument
- Various refinements on error handling
Cédric Le Goater (25):
migration: Report error when shutdown fails
migration: Remove SaveStateHandler and LoadStateHandler typedefs
migration: Add documentation for SaveVMHandlers
migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
s390/stattrib: Add Error** argument to set_migrationmode() handler
vfio: Always report an error in vfio_save_setup()
migration: Always report an error in block_save_setup()
migration: Always report an error in ram_save_setup()
migration: Add Error** argument to vmstate_save()
migration: Add Error** argument to qemu_savevm_state_setup()
migration: Add Error** argument to .save_setup() handler
migration: Add Error** argument to .load_setup() handler
memory: Add Error** argument to .log_global_start() handler
memory: Add Error** argument to the global_dirty_log routines
migration: Modify ram_init_bitmaps() to report dirty tracking errors
vfio: Add Error** argument to .set_dirty_page_tracking() handler
vfio: Add Error** argument to vfio_devices_dma_logging_start()
vfio: Add Error** argument to vfio_devices_dma_logging_stop()
vfio: Use new Error** argument in vfio_save_setup()
vfio: Add Error** argument to .vfio_save_config() handler
vfio: Reverse test on vfio_get_dirty_bitmap()
memory: Add Error** argument to memory_get_xlat_addr()
vfio: Add Error** argument to .get_dirty_bitmap() handler
vfio: Also trace event failures in vfio_save_complete_precopy()
vfio: Extend vfio_set_migration_error() with Error* argument
include/exec/memory.h | 25 ++-
include/hw/s390x/storage-attributes.h | 2 +-
include/hw/vfio/vfio-common.h | 29 ++-
include/hw/vfio/vfio-container-base.h | 35 +++-
include/migration/register.h | 273 +++++++++++++++++++++++---
include/qemu/typedefs.h | 2 -
migration/savevm.h | 2 +-
hw/i386/xen/xen-hvm.c | 5 +-
hw/ppc/spapr.c | 2 +-
hw/s390x/s390-stattrib-kvm.c | 12 +-
hw/s390x/s390-stattrib.c | 15 +-
hw/vfio/common.c | 161 +++++++++------
hw/vfio/container-base.c | 9 +-
hw/vfio/container.c | 19 +-
hw/vfio/migration.c | 99 ++++++----
hw/vfio/pci.c | 5 +-
hw/virtio/vhost-vdpa.c | 5 +-
hw/virtio/vhost.c | 3 +-
migration/block-dirty-bitmap.c | 4 +-
migration/block.c | 19 +-
migration/dirtyrate.c | 13 +-
migration/migration.c | 27 ++-
migration/qemu-file.c | 5 +-
migration/ram.c | 46 ++++-
migration/savevm.c | 59 +++---
system/memory.c | 56 +++++-
26 files changed, 713 insertions(+), 219 deletions(-)
--
2.44.0
next reply other threads:[~2024-03-06 13:42 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-06 13:34 Cédric Le Goater [this message]
2024-03-06 13:34 ` [PATCH v4 01/25] migration: Report error when shutdown fails Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 02/25] migration: Remove SaveStateHandler and LoadStateHandler typedefs Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 03/25] migration: Add documentation for SaveVMHandlers Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 04/25] migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler Cédric Le Goater
2024-03-07 12:18 ` Fabiano Rosas
2024-03-08 8:11 ` Peter Xu
2024-03-08 8:45 ` Thomas Huth
2024-03-06 13:34 ` [PATCH v4 06/25] vfio: Always report an error in vfio_save_setup() Cédric Le Goater
2024-03-07 9:36 ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 07/25] migration: Always report an error in block_save_setup() Cédric Le Goater
2024-03-07 12:28 ` Fabiano Rosas
2024-03-08 6:59 ` Peter Xu
2024-03-11 15:22 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 08/25] migration: Always report an error in ram_save_setup() Cédric Le Goater
2024-03-07 12:28 ` Fabiano Rosas
2024-03-06 13:34 ` [PATCH v4 09/25] migration: Add Error** argument to vmstate_save() Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup() Cédric Le Goater
2024-03-07 12:45 ` Fabiano Rosas
2024-03-08 12:56 ` Peter Xu
2024-03-08 13:14 ` Cédric Le Goater
2024-03-08 13:39 ` Cédric Le Goater
2024-03-08 13:55 ` Cédric Le Goater
2024-03-08 14:17 ` Peter Xu
2024-03-11 18:12 ` Cédric Le Goater
2024-03-11 20:15 ` Peter Xu
2024-03-08 14:11 ` Fabiano Rosas
2024-03-08 14:36 ` Fabiano Rosas
2024-03-11 18:15 ` Cédric Le Goater
2024-03-11 19:03 ` Fabiano Rosas
2024-03-11 20:10 ` Peter Xu
2024-03-12 13:01 ` Cédric Le Goater
2024-03-12 12:32 ` Cédric Le Goater
2024-03-12 13:34 ` Cédric Le Goater
2024-03-12 14:01 ` Cédric Le Goater
2024-03-12 14:24 ` Fabiano Rosas
2024-03-12 15:18 ` Peter Xu
2024-03-12 18:06 ` Cédric Le Goater
2024-03-12 18:28 ` Fabiano Rosas
2024-03-15 10:17 ` Cédric Le Goater
2024-03-15 11:01 ` Peter Xu
2024-03-15 12:20 ` Cédric Le Goater
2024-03-15 13:09 ` Peter Xu
2024-03-15 14:30 ` Cédric Le Goater
2024-03-15 13:11 ` Peter Xu
2024-03-15 14:31 ` Cédric Le Goater
2024-03-15 14:57 ` Peter Xu
2024-03-15 14:21 ` Cédric Le Goater
2024-03-15 14:52 ` Peter Xu
2024-03-19 10:46 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler Cédric Le Goater
2024-03-07 9:53 ` Vladimir Sementsov-Ogievskiy
2024-03-07 10:31 ` Cédric Le Goater
2024-03-07 11:39 ` Vladimir Sementsov-Ogievskiy
2024-03-08 7:11 ` Peter Xu
2024-03-08 8:08 ` Peter Xu
2024-03-06 13:34 ` [PATCH v4 12/25] migration: Add Error** argument to .load_setup() handler Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler Cédric Le Goater
2024-03-15 11:18 ` Peter Xu
2024-03-18 14:33 ` Cédric Le Goater
2024-03-18 14:54 ` Cédric Le Goater
2024-03-18 16:27 ` Peter Xu
2024-03-06 13:34 ` [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines Cédric Le Goater
2024-03-15 11:34 ` Peter Xu
2024-03-18 10:43 ` Cédric Le Goater
2024-03-18 16:03 ` Cédric Le Goater
2024-03-18 16:08 ` Cédric Le Goater
2024-03-18 16:31 ` Peter Xu
2024-03-16 2:41 ` Yong Huang
2024-03-18 16:19 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 15/25] migration: Modify ram_init_bitmaps() to report dirty tracking errors Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler Cédric Le Goater
2024-03-07 8:09 ` Eric Auger
2024-03-07 12:06 ` Cédric Le Goater
2024-03-08 7:39 ` Eric Auger
2024-03-08 13:00 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 17/25] vfio: Add Error** argument to vfio_devices_dma_logging_start() Cédric Le Goater
2024-03-07 8:15 ` Eric Auger
2024-03-07 13:15 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 18/25] vfio: Add Error** argument to vfio_devices_dma_logging_stop() Cédric Le Goater
2024-03-07 8:53 ` Eric Auger
2024-03-07 14:05 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 19/25] vfio: Use new Error** argument in vfio_save_setup() Cédric Le Goater
2024-03-07 9:04 ` Eric Auger
2024-03-07 13:35 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler Cédric Le Goater
2024-03-07 9:13 ` Eric Auger
2024-03-07 13:55 ` Cédric Le Goater
2024-03-08 7:41 ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 21/25] vfio: Reverse test on vfio_get_dirty_bitmap() Cédric Le Goater
2024-03-06 20:51 ` Philippe Mathieu-Daudé
2024-03-07 7:13 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 22/25] memory: Add Error** argument to memory_get_xlat_addr() Cédric Le Goater
2024-03-15 15:06 ` Peter Xu
2024-03-06 13:34 ` [PATCH v4 23/25] vfio: Add Error** argument to .get_dirty_bitmap() handler Cédric Le Goater
2024-03-07 9:23 ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy() Cédric Le Goater
2024-03-07 9:28 ` Eric Auger
2024-03-07 13:36 ` Cédric Le Goater
2024-03-08 7:42 ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 25/25] vfio: Extend vfio_set_migration_error() with Error* argument Cédric Le Goater
2024-03-07 9:30 ` Eric Auger
2024-03-08 8:15 ` [PATCH v4 00/25] migration: Improve error reporting Peter Xu
2024-03-08 13:03 ` Cédric Le Goater
2024-03-11 20:24 ` Peter Xu
2024-03-12 7:16 ` Cédric Le Goater
2024-03-12 9:58 ` Cédric Le Goater
2024-03-12 11:50 ` Peter Xu
2024-03-12 12:09 ` Cédric Le Goater
2024-03-12 12:25 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240306133441.2351700-1-clg@redhat.com \
--to=clg@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=avihaih@nvidia.com \
--cc=farosas@suse.de \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=pjp@fedoraproject.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).