qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 00/27] migration: propagate vTPM errors using Error objects
@ 2025-08-05 18:25 Arun Menon
  2025-08-05 18:25 ` [PATCH v9 01/27] migration: push Error **errp into vmstate_subsection_load() Arun Menon
                   ` (26 more replies)
  0 siblings, 27 replies; 63+ messages in thread
From: Arun Menon @ 2025-08-05 18:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Bennée, Akihiko Odaki,
	Dmitry Osipenko, Michael S. Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Halil Pasic, Eric Farman, Thomas Huth,
	Christian Borntraeger, Matthew Rosato, Richard Henderson,
	David Hildenbrand, Ilya Leoshkevich, Nicholas Piggin,
	Harsh Prateek Bora, Paolo Bonzini, Fam Zheng, Alex Williamson,
	Cédric Le Goater, Steve Sistare, Marc-André Lureau,
	qemu-s390x, qemu-ppc, Hailiang Zhang, Stefan Berger, Arun Menon,
	Daniel P. Berrangé, Stefan Berger

Hello,

Currently, when a migration of a VM with an encrypted vTPM
fails on the destination host (e.g., due to a mismatch in secret values),
the error message displayed on the source host is generic and unhelpful.

For example, a typical error looks like this:
"operation failed: job 'migration out' failed: Sibling indicated error 1.
operation failed: job 'migration in' failed: load of migration failed:
Input/output error"

This message does not provide any specific indication of a vTPM failure.
Such generic errors are logged using error_report(), which prints to
the console/monitor but does not make the detailed error accessible via
the QMP query-migrate command.

This series addresses the issue, by ensuring that specific TPM error
messages are propagated via the QEMU Error object.
To make this possible,
- A set of functions in the call stack is changed
  to incorporate an Error object as an additional parameter.
- Also, the TPM backend makes use of a new hook called post_load_errp()
  that explicitly passes an Error object.

It is organized as follows,
 - Patches 1-21 focuses on pushing Error object into the functions
   that are important in the call stack where TPM errors are observed.
   We still need to make changes in rest of the functions in savevm.c
   such that they also incorporate the errp object for propagating errors.
 - Patch 22 introduces the new variants of the hooks in VMStateDescription
   structure. These hooks should be used in future implementations.
 - Patch 23 focuses on changing the TPM backend such that the errors are
   set in the Error object.

While this series focuses specifically on TPM error reporting during
live migration, it lays the groundwork for broader improvements.
A lot of methods in savevm.c that previously returned an integer now capture
errors in the Error object, enabling other modules to adopt the
post_load_errp hook in the future.

One such change previously attempted:
https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01727.html

Resolves: https://issues.redhat.com/browse/RHEL-82826

Signed-off-by: Arun Menon <armenon@redhat.com>
---
Changes in v9:
- Re ordering patches such that error is reported in each one of them.
- format specifier enclosed in '' changed i.e. '%d' changed to %d
- Reporting errors where they were missed before. Setting errp to NULL
  in case of retry.
- Link to v8: https://lore.kernel.org/qemu-devel/20250731-propagate_tpm_error-v8-0-28fd82fdfdb2@redhat.com

Changes in v8:
- 3 new patches added:
  - patch 23:
	- Changes the error propagation by returning the most recent error
	  to the caller when both save device state and post_save fails.
  - patch 24:
	- Refactors the vmstate_save_state_v() function by adding wrapper
	  functions to separate concerns.
  - patch 25:
	- Removes the error variant of the vmstate_save_state()
	  function introduced in commit 969298f9d7.
- Use ERRP_GUARD() where there is an errp dereference or an error_prepend call.
- Pass &error_warn in place of NULL, in vmstate_load_state() calls so
  that the caller knows about the error.
- Remove unnecessary null check before setting errp. Dereferencing it is not required.
- Documentation for the new variants of post/pre save/load hooks added.
- Some patches, although they received a 'Reviewed-by' tag, have undergone few minor changes,
	Patch 1 : removed extra space
	Patch 2 : Commit message changed, refactoring the function to
		always set errp and return.
	Patch 8 : Commit message changed.
	Patch 9 : use error_setg_errno instead of error_setg.
	Patch 27 : use error_setg_errno instead of error_setg.
- Link to v7: https://lore.kernel.org/qemu-devel/20250725-propagate_tpm_error-v7-0-d52704443975@redhat.com

Changes in v7:
- Fix propagating errors in post_save_errp. The latest error encountered is
  propagated.
- user-strings in error_prepend() calls now end with a ': ' so that the print is pretty.
- Change the order of one of the patches.
- Link to v6: https://lore.kernel.org/qemu-devel/20250721-propagate_tpm_error-v6-0-fef740e15e17@redhat.com

Changes in v6:
- Incorporated review comments from Daniel and Akihiko, related to few
  semantic errors and improve error logging.
- Add one more patch that removes NULL checks after calling
  qemu_file_get_return_path() because it does not fail.
- Link to v5: https://lore.kernel.org/qemu-devel/20250717-propagate_tpm_error-v5-0-1f406f88ee65@redhat.com

Changes in v5:
- Solve a bug that set errp even though it was not NULL, pointed out by Fabiano in v4.
- Link to v4: https://lore.kernel.org/qemu-devel/20250716-propagate_tpm_error-v4-0-7141902077c0@redhat.com

Changes in v4:
- Split the patches into smaller ones based on functions. Pass NULL in the
  caller until errp is made available. Every function that has an
  Error **errp object passed to it, ensures that it sets the errp object
  in case of failure.
- A few more functions within loadvm_process_command() now handle errors using
  the errp object. I've converted these for consistency, taking Daniel's
  patches (link above) as a reference.
- Along with the post_load_errp() hook, other duplicate hooks are also introduced.
  This will enable us to migrate to the newer versions eventually.
- Fix some semantic errors, like using error_propagate_prepend() in places where
  we need to preserve existing behaviour of accumulating the error in local_err
  and then propagating it to errp. This can be refactored in a later commit.
- Add more information in commit messages explaining the changes.
- Link to v3: https://lore.kernel.org/qemu-devel/20250702-propagate_tpm_error-v3-0-986d94540528@redhat.com

Changes in v3:
- Split the 2nd patch into 2. Introducing post_load_with_error() hook
  has been separated from using it in the backends TPM module. This is
  so that it can be acknowledged.
- Link to v2: https://lore.kernel.org/qemu-devel/20250627-propagate_tpm_error-v2-0-85990c89da29@redhat.com

Changes in v2:
- Combine the first two changes into one, focusing on passing the
  Error object (errp) consistently through functions involved in
  loading the VM's state. Other functions are not yet changed.
- As suggested in the review comment, add null checks for errp
  before adding error messages, preventing crashes.
  We also now correctly set errors when post-copy migration fails.
- In process_incoming_migration_co(), switch to error_prepend
  instead of error_setg. This means we now null-check local_err in
  the "fail" section before using it, preventing dereferencing issues.
- Link to v1: https://lore.kernel.org/qemu-devel/20250624-propagate_tpm_error-v1-0-2171487a593d@redhat.com

---
Arun Menon (27):
      migration: push Error **errp into vmstate_subsection_load()
      migration: push Error **errp into vmstate_load_state()
      migration: push Error **errp into qemu_loadvm_state_header()
      migration: push Error **errp into vmstate_load()
      migration: push Error **errp into loadvm_process_command()
      migration: push Error **errp into loadvm_handle_cmd_packaged()
      migration: push Error **errp into qemu_loadvm_state()
      migration: push Error **errp into qemu_load_device_state()
      migration: push Error **errp into qemu_loadvm_state_main()
      migration: push Error **errp into qemu_loadvm_section_start_full()
      migration: push Error **errp into qemu_loadvm_section_part_end()
      migration: Update qemu_file_get_return_path() docs and remove dead checks
      migration: make loadvm_postcopy_handle_resume() void
      migration: push Error **errp into ram_postcopy_incoming_init()
      migration: push Error **errp into loadvm_postcopy_handle_advise()
      migration: push Error **errp into loadvm_postcopy_handle_listen()
      migration: push Error **errp into loadvm_postcopy_handle_run()
      migration: push Error **errp into loadvm_postcopy_ram_handle_discard()
      migration: push Error **errp into loadvm_handle_recv_bitmap()
      migration: push Error **errp into loadvm_process_enable_colo()
      migration: push Error **errp into loadvm_postcopy_handle_switchover_start()
      migration: Capture error in postcopy_ram_listen_thread()
      migration: Refactor vmstate_save_state_v() function
      migration: Propagate last encountered error in vmstate_save_state_v() function
      migration: Remove error variant of vmstate_save_state() function
      migration: Add error-parameterized function variants in VMSD struct
      backends/tpm: Propagate vTPM error on migration failure

 backends/tpm/tpm_emulator.c   |  40 ++---
 docs/devel/migration/main.rst |  24 +++
 hw/display/virtio-gpu.c       |   5 +-
 hw/pci/pci.c                  |   5 +-
 hw/s390x/virtio-ccw.c         |   4 +-
 hw/scsi/spapr_vscsi.c         |   4 +-
 hw/vfio/pci.c                 |   6 +-
 hw/virtio/virtio-mmio.c       |   5 +-
 hw/virtio/virtio-pci.c        |   4 +-
 hw/virtio/virtio.c            |   8 +-
 include/migration/colo.h      |   2 +-
 include/migration/vmstate.h   |  19 ++-
 migration/colo.c              |  13 +-
 migration/cpr.c               |  10 +-
 migration/migration.c         |  31 ++--
 migration/postcopy-ram.c      |   9 +-
 migration/postcopy-ram.h      |   2 +-
 migration/qemu-file.c         |   1 -
 migration/ram.c               |  12 +-
 migration/ram.h               |   4 +-
 migration/savevm.c            | 330 ++++++++++++++++++++++++------------------
 migration/savevm.h            |   7 +-
 migration/vmstate-types.c     |  23 +--
 migration/vmstate.c           | 191 ++++++++++++++++++------
 tests/unit/test-vmstate.c     |  28 ++--
 ui/vdagent.c                  |   5 +-
 26 files changed, 497 insertions(+), 295 deletions(-)
---
base-commit: 4e06566dbd1b1251c2788af26a30bd148d4eb6c1
change-id: 20250624-propagate_tpm_error-bf4ae6c23d30

Best regards,
-- 
Arun Menon <armenon@redhat.com>



^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2025-08-10  4:49 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-05 18:25 [PATCH v9 00/27] migration: propagate vTPM errors using Error objects Arun Menon
2025-08-05 18:25 ` [PATCH v9 01/27] migration: push Error **errp into vmstate_subsection_load() Arun Menon
2025-08-05 18:25 ` [PATCH v9 02/27] migration: push Error **errp into vmstate_load_state() Arun Menon
2025-08-06  8:31   ` Marc-André Lureau
2025-08-06  9:47     ` Arun Menon
2025-08-10  4:47   ` Akihiko Odaki
2025-08-05 18:25 ` [PATCH v9 03/27] migration: push Error **errp into qemu_loadvm_state_header() Arun Menon
2025-08-05 18:25 ` [PATCH v9 04/27] migration: push Error **errp into vmstate_load() Arun Menon
2025-08-05 18:25 ` [PATCH v9 05/27] migration: push Error **errp into loadvm_process_command() Arun Menon
2025-08-06  8:31   ` Marc-André Lureau
2025-08-06  9:46     ` Arun Menon
2025-08-05 18:25 ` [PATCH v9 06/27] migration: push Error **errp into loadvm_handle_cmd_packaged() Arun Menon
2025-08-05 18:25 ` [PATCH v9 07/27] migration: push Error **errp into qemu_loadvm_state() Arun Menon
2025-08-06  5:17   ` Akihiko Odaki
2025-08-07  6:07     ` Arun Menon
2025-08-06  7:24   ` Marc-André Lureau
2025-08-05 18:25 ` [PATCH v9 08/27] migration: push Error **errp into qemu_load_device_state() Arun Menon
2025-08-06  7:27   ` Marc-André Lureau
2025-08-05 18:25 ` [PATCH v9 09/27] migration: push Error **errp into qemu_loadvm_state_main() Arun Menon
2025-08-06  5:14   ` Akihiko Odaki
2025-08-06  7:34   ` Marc-André Lureau
2025-08-06  7:43     ` Arun Menon
2025-08-05 18:25 ` [PATCH v9 10/27] migration: push Error **errp into qemu_loadvm_section_start_full() Arun Menon
2025-08-06  7:37   ` Marc-André Lureau
2025-08-05 18:25 ` [PATCH v9 11/27] migration: push Error **errp into qemu_loadvm_section_part_end() Arun Menon
2025-08-06  7:46   ` Marc-André Lureau
2025-08-05 18:25 ` [PATCH v9 12/27] migration: Update qemu_file_get_return_path() docs and remove dead checks Arun Menon
2025-08-08 13:07   ` Fabiano Rosas
2025-08-05 18:25 ` [PATCH v9 13/27] migration: make loadvm_postcopy_handle_resume() void Arun Menon
2025-08-08 13:08   ` Fabiano Rosas
2025-08-05 18:25 ` [PATCH v9 14/27] migration: push Error **errp into ram_postcopy_incoming_init() Arun Menon
2025-08-05 18:25 ` [PATCH v9 15/27] migration: push Error **errp into loadvm_postcopy_handle_advise() Arun Menon
2025-08-05 18:25 ` [PATCH v9 16/27] migration: push Error **errp into loadvm_postcopy_handle_listen() Arun Menon
2025-08-05 18:25 ` [PATCH v9 17/27] migration: push Error **errp into loadvm_postcopy_handle_run() Arun Menon
2025-08-05 18:25 ` [PATCH v9 18/27] migration: push Error **errp into loadvm_postcopy_ram_handle_discard() Arun Menon
2025-08-06  7:54   ` Marc-André Lureau
2025-08-07  6:06     ` Arun Menon
2025-08-05 18:25 ` [PATCH v9 19/27] migration: push Error **errp into loadvm_handle_recv_bitmap() Arun Menon
2025-08-05 18:25 ` [PATCH v9 20/27] migration: push Error **errp into loadvm_process_enable_colo() Arun Menon
2025-08-06  8:07   ` Marc-André Lureau
2025-08-06  9:56     ` Arun Menon
2025-08-06 10:04       ` Marc-André Lureau
2025-08-06 10:19         ` Arun Menon
2025-08-05 18:25 ` [PATCH v9 21/27] migration: push Error **errp into loadvm_postcopy_handle_switchover_start() Arun Menon
2025-08-05 18:25 ` [PATCH v9 22/27] migration: Capture error in postcopy_ram_listen_thread() Arun Menon
2025-08-08 13:27   ` Fabiano Rosas
2025-08-05 18:25 ` [PATCH v9 23/27] migration: Refactor vmstate_save_state_v() function Arun Menon
2025-08-06  5:10   ` Akihiko Odaki
2025-08-06  8:19   ` Marc-André Lureau
2025-08-06  9:45     ` Arun Menon
2025-08-05 18:25 ` [PATCH v9 24/27] migration: Propagate last encountered error in " Arun Menon
2025-08-06  5:24   ` Akihiko Odaki
2025-08-07  6:07     ` Arun Menon
2025-08-05 18:25 ` [PATCH v9 25/27] migration: Remove error variant of vmstate_save_state() function Arun Menon
2025-08-06  8:28   ` Marc-André Lureau
2025-08-06  9:46     ` Arun Menon
2025-08-06 10:01       ` Marc-André Lureau
2025-08-06 10:30         ` Arun Menon
2025-08-06 11:16     ` Arun Menon
2025-08-05 18:25 ` [PATCH v9 26/27] migration: Add error-parameterized function variants in VMSD struct Arun Menon
2025-08-06  5:45   ` Akihiko Odaki
2025-08-06  7:27     ` Arun Menon
2025-08-05 18:25 ` [PATCH v9 27/27] backends/tpm: Propagate vTPM error on migration failure Arun Menon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).