[PATCH v6 00/35] Introduce support for confidential guest reset (x86)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 00/35] Introduce support for confidential guest reset (x86)
@ 2026-02-25  3:49 Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 01/35] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
                   ` (34 more replies)
  0 siblings, 35 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  Cc: Ani Sinha, kraxel, pbonzini, ani, qemu-devel, vkuznets, graf

This change introduces support for confidential guests
(SEV-ES, SEV-SNP and TDX) to reset/reboot just like other non-confidential
guests. Currently, a reboot intiated from the confidential guest results
in termination of the QEMU hypervisor as the CPUs are not resettable. As the
initial state of the guest including private memory is locked and encrypted,
the contents of that memory will not be accessible post reset. Hence a new
KVM file descriptor must be opened to create a new confidential VM context
closing the old one. All KVM VM specific ioctls must be called again. New
VCPU file descriptors must be created against the new KVM fd and most VCPU
ioctls must be called again as well.

This change perfoms closing of the old KVM fd and creating a new one. After
the new KVM fd is opened, all generic and architecture specific ioctl calls
are issued again. Notifiers are added to notify subsystems that:
- The KVM file fd is about to be changed to state sync-ing from KVM to QEMU
  should be done if required.
- The KVM file fd has changed, so ioctl calls to the new KVM fd has to be
  performed again.
- That new VCPU fds are created so that VCPU ioctl calls must be called again
  where required.

Specific subsystems use these notifiers to re-issue ioctl calls where required.

Changes are made to SEV and TDX modules to reinitialize the confidential guest
state and seal it again. Along the way, some bug fixes are made so that some
initialization functions can be called again. Some refactoring of existing
code is done so that both init and reset paths can use them.

Tested by triggering reset through guest console as well as QMP "system_reset" command.
All tests use multi vcpu (SMP) guests.

Tested on TDX hardware running with RHEL-9.6 on the host.
Used Fedora 43 guest (cloud image) + basic UKI kernel boot (fc43 kernel).

Tested on SEV-SNP on two diferent hosts  - one running RHEL 9.8, other Fedora 43.
Used Fedora 43 guest (cloud image) + basic UKI kernel boot (fc43 kernel).

Tested SEV-ES on a SEV-SNP host running Fedora 42 on the host.
Basic UKI kernel boot only (fc39 and fc43 kernels).
It is to be noted that Fedora 43 and RHEL hosts seems to no longer support SEV-ES and
bails out with the following error:

qemu-system-x86_64: -accel kvm: sev_launch_start: LAUNCH_START ret=1 fw_error=21 'Feature not supported'
qemu-system-x86_64: -accel kvm: sev_common_kvm_init: failed to create encryption context
qemu-system-x86_64: -accel kvm: failed to initialize kvm: Operation not permitted

For non-coco ran the added functional tests on a fc43 host:

$ ./build/run tests/functional/x86_64/test_vmfd_change_reboot.py 
TAP version 13
ok 1 test_vmfd_change_reboot.KVMGuest.test_reset_console
ok 2 test_vmfd_change_reboot.KVMGuest.test_reset_hyperv_vmbus
ok 3 test_vmfd_change_reboot.KVMGuest.test_reset_kvmpit
ok 4 test_vmfd_change_reboot.KVMGuest.test_reset_qmp
ok 5 test_vmfd_change_reboot.KVMGuest.test_reset_xen_emulation
1..5

For regression, ran QEMU CI pipeline and it passes:
https://gitlab.com/anisinha/qemu/-/pipelines/2346276955

CC: qemu-devel@nongnu.org
CC: pbonzini@redhat.com
CC: kraxel@redhat.com
CC: vkuznets@redhat.com
CC: graf@amazon.com

Changelog:

v6:
 - fixed regression issue reported by Cleber on v4/v5.
 - reworked patch #28 as per review comments from v5.
 - tags added.
 - rebased.

v5:
 - suggestions from v4 added.
 - xen code refactoring seperated into a new patch.
 - minor fixes.
 - tags added.
 - rebased.

v4:
 - Fixed reset on non-coco with qmp "system_reset" command.
 - Numerious misc fixes.
 - addressed review comments from v3.
 - dropped three patches that are not required.
 - Added more functional tests including one vmbus test.
 - added noop callbacks to stubs/kvm removing them from arch specific headers.
 - Tags added.
 - Rebased.

v3:
 - Combined pre and post vmfd change notifier into one.
 - rename kvm_arch_vmfd_change_ops() -> kvm_arch_on_vmfd_change()
 - reuse kvm_arch_init() code in kvm_arch_on_vmfd_change()
 - moved around migration blockers and notifers to more appropriate place.
 - fixed Xen emulation.
 - fixed SEV-ES reset.
 - fixed/reorganized reset code in system/runstate.c
 - can_rebuild_guest_state is now a boolean not a callback.
 - misc fixes.
 - added a functional test for Xen emulation with vmfd change.
 - rebased.

v2:
 - Bugfixes.
 - Added a new machine option so that we can exercize most of the non-coco changes
   related to reboot on non-coco platforms.
 - added a new functional test. Currently its skipped on CI pipeline as KVM is not
   enabled (no /dev/kvm on the container)for QEMU CI tests. It can be run manually and it
   passes on those systems where KVM is enabled.
 - Addressed comments from v1 with regards to refactoring of code, code simplication by
   removal of redundant stuff, moved around code
   so that notifiers, migration blockers are added only on one place.
 - Added some tracepoints for future debugging on newly added functions.
 - Rebased.


Ani Sinha (35):
  i386/kvm: avoid installing duplicate msr entries in msr_handlers
  accel/kvm: add confidential class member to indicate guest rebuild
    capability
  hw/accel: add a per-accelerator callback to change VM accelerator
    handle
  system/physmem: add helper to reattach existing memory after KVM VM fd
    change
  accel/kvm: add changes required to support KVM VM file descriptor
    change
  accel/kvm: mark guest state as unprotected after vm file descriptor
    change
  accel/kvm: add a notifier to indicate KVM VM file descriptor has
    changed
  accel/kvm: notify when KVM VM file fd is about to be changed
  i386/kvm: unregister smram listeners prior to vm file descriptor
    change
  kvm/i386: implement architecture support for kvm file descriptor
    change
  i386/kvm: refactor xen init into a new function
  hw/i386: refactor x86_bios_rom_init for reuse in confidential guest
    reset
  hw/i386: export a new function x86_bios_rom_reload
  kvm/i386: reload firmware for confidential guest reset
  accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon
    reset
  i386/tdx: refactor TDX firmware memory initialization code into a new
    function
  i386/tdx: finalize TDX guest state upon reset
  i386/tdx: add a pre-vmfd change notifier to reset tdx state
  i386/sev: add migration blockers only once
  i386/sev: add notifiers only once
  i386/sev: free existing launch update data and kernel hashes data on
    init
  i386/sev: add support for confidential guest reset
  hw/vfio: generate new file fd for pseudo device and rebind existing
    descriptors
  kvm/i8254: refactor pit initialization into a helper
  kvm/i8254: add support for confidential guest reset
  kvm/hyperv: add synic feature to CPU only if its not enabled
  hw/hyperv/vmbus: add support for confidential guest reset
  kvm/xen-emu: re-initialize capabilities during confidential guest
    reset
  ppc/openpic: create a new openpic device and reattach mem region on
    coco reset
  kvm/vcpu: add notifiers to inform vcpu file descriptor change
  kvm/clock: add support for confidential guest reset
  hw/machine: introduce machine specific option 'x-change-vmfd-on-reset'
  tests/functional/x86_64: add functional test to exercise vm fd change
    on reset
  qom: add 'confidential-guest-reset' property for x86 confidential vms
  migration: return EEXIST when trying to add the same migration blocker

 MAINTAINERS                                  |   7 +
 accel/kvm/kvm-all.c                          | 371 ++++++++++++++++---
 accel/kvm/trace-events                       |   2 +
 accel/stubs/kvm-stub.c                       |  18 +
 hw/core/machine.c                            |  22 ++
 hw/hyperv/trace-events                       |   1 +
 hw/hyperv/vmbus.c                            |  37 ++
 hw/i386/kvm/clock.c                          |  59 +++
 hw/i386/kvm/i8254.c                          |  91 +++--
 hw/i386/kvm/trace-events                     |   1 +
 hw/i386/x86-common.c                         |  71 +++-
 hw/intc/openpic_kvm.c                        | 112 ++++--
 hw/vfio/helpers.c                            |  92 +++++
 include/accel/accel-ops.h                    |   2 +
 include/hw/core/boards.h                     |   6 +
 include/hw/i386/x86.h                        |   1 +
 include/system/confidential-guest-support.h  |  20 +
 include/system/kvm.h                         |  43 +++
 include/system/physmem.h                     |   1 +
 migration/migration.c                        |   4 +
 qapi/qom.json                                |  16 +-
 stubs/kvm.c                                  |  22 ++
 stubs/meson.build                            |   1 +
 system/physmem.c                             |  28 ++
 system/runstate.c                            |  44 ++-
 target/i386/kvm/kvm.c                        | 136 +++++--
 target/i386/kvm/tdx.c                        | 141 +++++--
 target/i386/kvm/tdx.h                        |   1 +
 target/i386/kvm/trace-events                 |   4 +
 target/i386/kvm/xen-emu.c                    |  38 +-
 target/i386/sev.c                            | 127 +++++--
 target/i386/trace-events                     |   1 +
 tests/functional/x86_64/meson.build          |   1 +
 tests/functional/x86_64/test_rebuild_vmfd.py | 136 +++++++
 34 files changed, 1438 insertions(+), 219 deletions(-)
 create mode 100644 stubs/kvm.c
 create mode 100755 tests/functional/x86_64/test_rebuild_vmfd.py

-- 
2.42.0



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v6 01/35] i386/kvm: avoid installing duplicate msr entries in msr_handlers
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 02/35] accel/kvm: add confidential class member to indicate guest rebuild capability Ani Sinha
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

kvm_filter_msr() does not check if an msr entry is already present in the
msr_handlers table and installs a new handler unconditionally. If the function
is called again with the same MSR, it will result in duplicate entries in the
table and multiple such calls will fill up the table needlessly. Fix that.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9f1a4d4cbb..6d823a7991 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6278,27 +6278,33 @@ static int kvm_install_msr_filters(KVMState *s)
 static int kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
                           QEMUWRMSRHandler *wrmsr)
 {
-    int i, ret;
+    int i, ret = 0;
 
     for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
-        if (!msr_handlers[i].msr) {
+        if (msr_handlers[i].msr == msr) {
+            break;
+        } else if (!msr_handlers[i].msr) {
             msr_handlers[i] = (KVMMSRHandlers) {
                 .msr = msr,
                 .rdmsr = rdmsr,
                 .wrmsr = wrmsr,
             };
+            break;
+        }
+    }
 
-            ret = kvm_install_msr_filters(s);
-            if (ret) {
-                msr_handlers[i] = (KVMMSRHandlers) { };
-                return ret;
-            }
+    if (i == ARRAY_SIZE(msr_handlers)) {
+        ret = -EINVAL;
+        goto end;
+    }
 
-            return 0;
-        }
+    ret = kvm_install_msr_filters(s);
+    if (ret) {
+        msr_handlers[i] = (KVMMSRHandlers) { };
     }
 
-    return -EINVAL;
+ end:
+    return ret;
 }
 
 static int kvm_handle_rdmsr(X86CPU *cpu, struct kvm_run *run)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 02/35] accel/kvm: add confidential class member to indicate guest rebuild capability
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 01/35] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 03/35] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu
  Cc: Ani Sinha, kraxel, ani, qemu-devel, kvm

As a part of the confidential guest reset process, the existing encrypted guest
state must be made mutable since it would be discarded after reset. A new
encrypted and locked guest state must be established after the reset. To this
end, a new boolean member per confidential guest support class
(eg, tdx or sev-snp) is added that will indicate whether its possible to
rebuild guest state:

bool can_rebuild_guest_state;

This is true if rebuilding guest state is possible, false otherwise.
A KVM based confidential guest reset is only possible when
the existing state is locked but its possible to rebuild guest state.
Otherwise, the guest is not resettable.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 include/system/confidential-guest-support.h | 20 ++++++++++++++++++++
 system/runstate.c                           |  6 +++---
 target/i386/kvm/tdx.c                       |  1 +
 target/i386/sev.c                           |  1 +
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/system/confidential-guest-support.h b/include/system/confidential-guest-support.h
index 0cc8b26e64..5dca717308 100644
--- a/include/system/confidential-guest-support.h
+++ b/include/system/confidential-guest-support.h
@@ -152,6 +152,11 @@ typedef struct ConfidentialGuestSupportClass {
      */
     int (*get_mem_map_entry)(int index, ConfidentialGuestMemoryMapEntry *entry,
                              Error **errp);
+
+    /*
+     * is it possible to rebuild the guest state?
+     */
+    bool can_rebuild_guest_state;
 } ConfidentialGuestSupportClass;
 
 static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
@@ -167,6 +172,21 @@ static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
     return 0;
 }
 
+static inline bool
+confidential_guest_can_rebuild_state(ConfidentialGuestSupport *cgs)
+{
+    ConfidentialGuestSupportClass *klass;
+
+    if (!cgs) {
+        /* non-confidential guests */
+        return true;
+    }
+
+    klass = CONFIDENTIAL_GUEST_SUPPORT_GET_CLASS(cgs);
+    return klass->can_rebuild_guest_state;
+
+}
+
 static inline int confidential_guest_kvm_reset(ConfidentialGuestSupport *cgs,
                                                Error **errp)
 {
diff --git a/system/runstate.c b/system/runstate.c
index d091a2bddd..13f32bed8c 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -57,6 +57,7 @@
 #include "system/reset.h"
 #include "system/runstate.h"
 #include "system/runstate-action.h"
+#include "system/confidential-guest-support.h"
 #include "system/system.h"
 #include "system/tpm.h"
 #include "trace.h"
@@ -543,8 +544,6 @@ void qemu_system_reset(ShutdownCause reason)
      */
     if (cpus_are_resettable()) {
         cpu_synchronize_all_post_reset();
-    } else {
-        assert(runstate_check(RUN_STATE_PRELAUNCH));
     }
 
     vm_set_suspended(false);
@@ -697,7 +696,8 @@ void qemu_system_reset_request(ShutdownCause reason)
     if (reboot_action == REBOOT_ACTION_SHUTDOWN &&
         reason != SHUTDOWN_CAUSE_SUBSYSTEM_RESET) {
         shutdown_requested = reason;
-    } else if (!cpus_are_resettable()) {
+    } else if (!cpus_are_resettable() &&
+               !confidential_guest_can_rebuild_state(current_machine->cgs)) {
         error_report("cpus are not resettable, terminating");
         shutdown_requested = reason;
     } else {
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0161985768..a3e81e1c0c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -1543,6 +1543,7 @@ static void tdx_guest_class_init(ObjectClass *oc, const void *data)
     X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
 
     klass->kvm_init = tdx_kvm_init;
+    klass->can_rebuild_guest_state = true;
     x86_klass->kvm_type = tdx_kvm_type;
     x86_klass->cpu_instance_init = tdx_cpu_instance_init;
     x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index acdcb9c4e6..66e38ca32e 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -2760,6 +2760,7 @@ sev_common_instance_init(Object *obj)
     cgs->set_guest_state = cgs_set_guest_state;
     cgs->get_mem_map_entry = cgs_get_mem_map_entry;
     cgs->set_guest_policy = cgs_set_guest_policy;
+    cgs->can_rebuild_guest_state = true;
 
     QTAILQ_INIT(&sev_common->launch_vmsa);
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 03/35] hw/accel: add a per-accelerator callback to change VM accelerator handle
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 01/35] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 02/35] accel/kvm: add confidential class member to indicate guest rebuild capability Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 04/35] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Richard Henderson, Paolo Bonzini, Philippe Mathieu-Daudé
  Cc: Ani Sinha, kraxel, ani, qemu-devel

When a confidential virtual machine is reset, a new guest context in the
accelerator must be generated post reset. Therefore, the old accelerator guest
file handle must be closed and a new one created. To this end, a per-accelerator
callback, "rebuild_guest" is introduced that would get called when a confidential
guest is reset. Subsequent patches will introduce specific implementation of
this callback for KVM accelerator.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 include/accel/accel-ops.h |  2 ++
 system/runstate.c         | 38 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/include/accel/accel-ops.h b/include/accel/accel-ops.h
index 23a8c246e1..f46492e3fe 100644
--- a/include/accel/accel-ops.h
+++ b/include/accel/accel-ops.h
@@ -23,6 +23,8 @@ struct AccelClass {
     AccelOpsClass *ops;
 
     int (*init_machine)(AccelState *as, MachineState *ms);
+    /* used mainly by confidential guests to rebuild guest state upon reset */
+    int (*rebuild_guest)(MachineState *ms);
     bool (*cpu_common_realize)(CPUState *cpu, Error **errp);
     void (*cpu_common_unrealize)(CPUState *cpu);
     /* get_stats: Append statistics to @buf */
diff --git a/system/runstate.c b/system/runstate.c
index 13f32bed8c..e7b50e6a3b 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -42,6 +42,7 @@
 #include "qapi/qapi-commands-run-state.h"
 #include "qapi/qapi-events-run-state.h"
 #include "qemu/accel.h"
+#include "accel/accel-ops.h"
 #include "qemu/error-report.h"
 #include "qemu/job.h"
 #include "qemu/log.h"
@@ -509,6 +510,9 @@ void qemu_system_reset(ShutdownCause reason)
 {
     MachineClass *mc;
     ResetType type;
+    AccelClass *ac = ACCEL_GET_CLASS(current_accel());
+    bool guest_state_rebuilt = false;
+    int ret;
 
     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
 
@@ -521,6 +525,29 @@ void qemu_system_reset(ShutdownCause reason)
     default:
         type = RESET_TYPE_COLD;
     }
+
+    if (!cpus_are_resettable() &&
+        (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
+         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
+        if (ac->rebuild_guest) {
+            ret = ac->rebuild_guest(current_machine);
+            if (ret < 0) {
+                error_report("unable to rebuild guest: %s(%d)",
+                             strerror(-ret), ret);
+                vm_stop(RUN_STATE_INTERNAL_ERROR);
+            } else {
+                info_report("virtual machine state has been rebuilt with new "
+                            "guest file handle.");
+                guest_state_rebuilt = true;
+            }
+        } else if (!cpus_are_resettable())  {
+            error_report("accelerator does not support reset!");
+        } else {
+            error_report("accelerator does not support rebuilding guest state,"
+                         " proceeding with normal reset!");
+        }
+    }
+
     if (mc && mc->reset) {
         mc->reset(current_machine, type);
     } else {
@@ -543,7 +570,16 @@ void qemu_system_reset(ShutdownCause reason)
      * it does _more_  than cpu_synchronize_all_post_reset().
      */
     if (cpus_are_resettable()) {
-        cpu_synchronize_all_post_reset();
+        if (guest_state_rebuilt) {
+            /*
+             * If guest state has been rebuilt, then we
+             * need to sync full cpu state for non confidential guests post
+             * reset.
+             */
+            cpu_synchronize_all_post_init();
+        } else {
+            cpu_synchronize_all_post_reset();
+        }
     }
 
     vm_set_suspended(false);
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 04/35] system/physmem: add helper to reattach existing memory after KVM VM fd change
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (2 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 03/35] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 05/35] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Xu, Philippe Mathieu-Daudé
  Cc: Ani Sinha, kraxel, ani, qemu-devel

After the guest KVM file descriptor has changed as a part of the process of
confidential guest reset mechanism, existing memory needs to be reattached to
the new file descriptor. This change adds a helper function ram_block_rebind()
for this purpose. The next patch will make use of this function.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 include/system/physmem.h |  1 +
 system/physmem.c         | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/system/physmem.h b/include/system/physmem.h
index 7bb7d3e154..da91b77bd9 100644
--- a/include/system/physmem.h
+++ b/include/system/physmem.h
@@ -51,5 +51,6 @@ physical_memory_snapshot_and_clear_dirty(MemoryRegion *mr, hwaddr offset,
 bool physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
                                         ram_addr_t start,
                                         ram_addr_t length);
+int ram_block_rebind(Error **errp);
 
 #endif
diff --git a/system/physmem.c b/system/physmem.c
index 2fb0c25c93..e5ff26acec 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2826,6 +2826,34 @@ found:
     return block;
 }
 
+/*
+ * Creates new guest memfd for the ramblocks and closes the
+ * existing memfd.
+ */
+int ram_block_rebind(Error **errp)
+{
+    RAMBlock *block;
+
+    qemu_mutex_lock_ramlist();
+
+    RAMBLOCK_FOREACH(block) {
+        if (block->flags & RAM_GUEST_MEMFD) {
+            if (block->guest_memfd >= 0) {
+                close(block->guest_memfd);
+            }
+            block->guest_memfd = kvm_create_guest_memfd(block->max_length,
+                                                        0, errp);
+            if (block->guest_memfd < 0) {
+                qemu_mutex_unlock_ramlist();
+                return -1;
+            }
+
+        }
+    }
+    qemu_mutex_unlock_ramlist();
+    return 0;
+}
+
 /*
  * Finds the named RAMBlock
  *
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 05/35] accel/kvm: add changes required to support KVM VM file descriptor change
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (3 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 04/35] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 06/35] accel/kvm: mark guest state as unprotected after vm " Ani Sinha
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Ani Sinha, Marcelo Tosatti; +Cc: kraxel, ani, qemu-devel, kvm

This change adds common kvm specific support to handle KVM VM file descriptor
change. KVM VM file descriptor can change as a part of confidential guest reset
mechanism. A new function api kvm_arch_on_vmfd_change() per
architecture platform is added in order to implement architecture specific
changes required to support it. A subsequent patch will add x86 specific
implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
confidential guest reset.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 MAINTAINERS            |  6 +++
 accel/kvm/kvm-all.c    | 88 ++++++++++++++++++++++++++++++++++++++++--
 accel/kvm/trace-events |  1 +
 include/system/kvm.h   |  3 ++
 stubs/kvm.c            | 22 +++++++++++
 stubs/meson.build      |  1 +
 target/i386/kvm/kvm.c  | 10 +++++
 7 files changed, 128 insertions(+), 3 deletions(-)
 create mode 100644 stubs/kvm.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 233d2a5e71..6377ff5898 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -152,6 +152,12 @@ F: tools/i386/
 F: tests/functional/i386/
 F: tests/functional/x86_64/
 
+X86 VM file descriptor change on reset test
+M: Ani Sinha <anisinha@redhat.com>
+M: Paolo Bonzini <pbonzini@redhat.com>
+S: Maintained
+F: stubs/kvm.c
+
 Guest CPU cores (TCG)
 ---------------------
 Overall TCG CPUs
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0d8b0c4347..cc5c42ce4d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2415,11 +2415,9 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, qemu_irq irq, int gsi)
     g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));
 }
 
-static void kvm_irqchip_create(KVMState *s)
+static void do_kvm_irqchip_create(KVMState *s)
 {
     int ret;
-
-    assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
     if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
         ;
     } else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) {
@@ -2452,7 +2450,13 @@ static void kvm_irqchip_create(KVMState *s)
         fprintf(stderr, "Create kernel irqchip failed: %s\n", strerror(-ret));
         exit(1);
     }
+}
+
+static void kvm_irqchip_create(KVMState *s)
+{
+    assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
 
+    do_kvm_irqchip_create(s);
     kvm_kernel_irqchip = true;
     /* If we have an in-kernel IRQ chip then we must have asynchronous
      * interrupt delivery (though the reverse is not necessarily true)
@@ -2607,6 +2611,83 @@ static int kvm_setup_dirty_ring(KVMState *s)
     return 0;
 }
 
+static int kvm_reset_vmfd(MachineState *ms)
+{
+    KVMState *s;
+    KVMMemoryListener *kml;
+    int ret = 0, type;
+    Error *err = NULL;
+
+    /*
+     * bail if the current architecture does not support VM file
+     * descriptor change.
+     */
+    if (!kvm_arch_supports_vmfd_change()) {
+        error_report("This target architecture does not support KVM VM "
+                     "file descriptor change.");
+        return -EOPNOTSUPP;
+    }
+
+    s = KVM_STATE(ms->accelerator);
+    kml = &s->memory_listener;
+
+    memory_listener_unregister(&kml->listener);
+    memory_listener_unregister(&kvm_io_listener);
+
+    if (s->vmfd >= 0) {
+        close(s->vmfd);
+    }
+
+    type = find_kvm_machine_type(ms);
+    if (type < 0) {
+        return -EINVAL;
+    }
+
+    ret = do_kvm_create_vm(s, type);
+    if (ret < 0) {
+        return ret;
+    }
+
+    s->vmfd = ret;
+
+    kvm_setup_dirty_ring(s);
+
+    /* rebind memory to new vm fd */
+    ret = ram_block_rebind(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
+    ret = kvm_arch_on_vmfd_change(ms, s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (s->kernel_irqchip_allowed) {
+        do_kvm_irqchip_create(s);
+    }
+
+    /* these can be only called after ram_block_rebind() */
+    memory_listener_register(&kml->listener, &address_space_memory);
+    memory_listener_register(&kvm_io_listener, &address_space_io);
+
+    /*
+     * kvm fd has changed. Commit the irq routes to KVM once more.
+     */
+    kvm_irqchip_commit_routes(s);
+    /*
+     * for confidential guest, this is the last possible place where we
+     * can call synchronize_all_post_init() to sync all vcpu states to
+     * kvm.
+     */
+    if (ms->cgs) {
+        cpu_synchronize_all_post_init();
+    }
+    trace_kvm_reset_vmfd();
+    return ret;
+}
+
 static int kvm_init(AccelState *as, MachineState *ms)
 {
     MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -4015,6 +4096,7 @@ static void kvm_accel_class_init(ObjectClass *oc, const void *data)
     AccelClass *ac = ACCEL_CLASS(oc);
     ac->name = "KVM";
     ac->init_machine = kvm_init;
+    ac->rebuild_guest = kvm_reset_vmfd;
     ac->has_memory = kvm_accel_has_memory;
     ac->allowed = &kvm_allowed;
     ac->gdbstub_supported_sstep_flags = kvm_gdbstub_sstep_flags;
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e43d18a869..e4beda0148 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -14,6 +14,7 @@ kvm_destroy_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
 kvm_park_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
 kvm_unpark_vcpu(unsigned long arch_cpu_id, const char *msg) "id: %lu %s"
 kvm_irqchip_commit_routes(void) ""
+kvm_reset_vmfd(void) ""
 kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
 kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 8f9eecf044..5fc7251fd9 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -456,6 +456,9 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr,
 
 #endif /* COMPILING_PER_TARGET */
 
+bool kvm_arch_supports_vmfd_change(void);
+int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s);
+
 void kvm_cpu_synchronize_state(CPUState *cpu);
 
 void kvm_init_cpu_signals(CPUState *cpu);
diff --git a/stubs/kvm.c b/stubs/kvm.c
new file mode 100644
index 0000000000..2db61d89a7
--- /dev/null
+++ b/stubs/kvm.c
@@ -0,0 +1,22 @@
+/*
+ * kvm target arch specific stubs
+ *
+ * Copyright (c) 2026 Red Hat, Inc.
+ *
+ * Author:
+ *   Ani Sinha <anisinha@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include "qemu/osdep.h"
+#include "system/kvm.h"
+
+int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
+{
+    abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+    return false;
+}
diff --git a/stubs/meson.build b/stubs/meson.build
index 8a07059500..6ae478bacc 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -74,6 +74,7 @@ if have_system
   if igvm.found()
     stub_ss.add(files('igvm.c'))
   endif
+  stub_ss.add(files('kvm.c'))
   stub_ss.add(files('target-get-monitor-def.c'))
   stub_ss.add(files('target-monitor-defs.c'))
   stub_ss.add(files('win32-kbd-hook.c'))
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6d823a7991..a4e18734b1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3389,6 +3389,16 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
     return 0;
 }
 
+int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
+{
+    abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+    return false;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     int ret;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 06/35] accel/kvm: mark guest state as unprotected after vm file descriptor change
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (4 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 05/35] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 07/35] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

When the KVM VM file descriptor has changed and a new one created, the guest
state is no longer in protected state. Mark it as such.
The guest state becomes protected again when TDX and SEV-ES and SEV-SNP mark
it as such.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index cc5c42ce4d..096edb5e19 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2650,6 +2650,9 @@ static int kvm_reset_vmfd(MachineState *ms)
 
     s->vmfd = ret;
 
+    /* guest state is now unprotected again */
+    kvm_state->guest_state_protected = false;
+
     kvm_setup_dirty_ring(s);
 
     /* rebind memory to new vm fd */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 07/35] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (5 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 06/35] accel/kvm: mark guest state as unprotected after vm " Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 08/35] accel/kvm: notify when KVM VM file fd is about to be changed Ani Sinha
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

A notifier callback can be used by various subsystems to perform actions when
KVM file descriptor for a virtual machine changes as a part of confidential
guest reset process. This change adds this notifier mechanism. Subsequent
patches will add specific implementations for various notifier callbacks
corresponding to various subsystems that need to take action when KVM VM file
descriptor changed.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c    | 30 ++++++++++++++++++++++++++++++
 accel/stubs/kvm-stub.c |  8 ++++++++
 include/system/kvm.h   | 21 +++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 096edb5e19..3b57d2f976 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -90,6 +90,7 @@ struct KVMParkedVcpu {
 };
 
 KVMState *kvm_state;
+VmfdChangeNotifier vmfd_notifier;
 bool kvm_kernel_irqchip;
 bool kvm_split_irqchip;
 bool kvm_async_interrupts_allowed;
@@ -123,6 +124,9 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = {
 static NotifierList kvm_irqchip_change_notifiers =
     NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
 
+static NotifierWithReturnList register_vmfd_changed_notifiers =
+    NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
+
 struct KVMResampleFd {
     int gsi;
     EventNotifier *resample_event;
@@ -2173,6 +2177,22 @@ void kvm_irqchip_change_notify(void)
     notifier_list_notify(&kvm_irqchip_change_notifiers, NULL);
 }
 
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_list_add(&register_vmfd_changed_notifiers, n);
+}
+
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_remove(n);
+}
+
+static int kvm_vmfd_change_notify(Error **errp)
+{
+    return notifier_with_return_list_notify(&register_vmfd_changed_notifiers,
+                                            &vmfd_notifier, errp);
+}
+
 int kvm_irqchip_get_virq(KVMState *s)
 {
     int next_virq;
@@ -2671,6 +2691,16 @@ static int kvm_reset_vmfd(MachineState *ms)
         do_kvm_irqchip_create(s);
     }
 
+    /*
+     * notify everyone that vmfd has changed.
+     */
+    vmfd_notifier.vmfd = s->vmfd;
+    ret = kvm_vmfd_change_notify(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
     /* these can be only called after ram_block_rebind() */
     memory_listener_register(&kml->listener, &address_space_memory);
     memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 68cd33ba97..a6e8a6e16c 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -79,6 +79,14 @@ void kvm_irqchip_change_notify(void)
 {
 }
 
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
+{
+}
+
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
+{
+}
+
 int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
                                        EventNotifier *rn, int virq)
 {
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 5fc7251fd9..f11729f432 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -181,6 +181,7 @@ DECLARE_INSTANCE_CHECKER(KVMState, KVM_STATE,
 
 extern KVMState *kvm_state;
 typedef struct Notifier Notifier;
+typedef struct NotifierWithReturn NotifierWithReturn;
 
 typedef struct KVMRouteChange {
      KVMState *s;
@@ -567,4 +568,24 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size);
 
 int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
 
+/* argument to vmfd change notifier */
+typedef struct VmfdChangeNotifier {
+    int vmfd;
+} VmfdChangeNotifier;
+
+/**
+ * kvm_vmfd_add_change_notifier - register a notifier to get notified when
+ * a KVM vm file descriptor changes as a part of the confidential guest "reset"
+ * process. Various subsystems should use this mechanism to take actions such
+ * as creating new fds against this new vm file descriptor.
+ * @n: notifier with return value.
+ */
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
+/**
+ * kvm_vmfd_remove_change_notifier - de-register a notifer previously
+ * registered with kvm_vmfd_add_change_notifier call.
+ * @n: notifier that was previously registered.
+ */
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
+
 #endif
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 08/35] accel/kvm: notify when KVM VM file fd is about to be changed
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (6 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 07/35] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 09/35] i386/kvm: unregister smram listeners prior to vm file descriptor change Ani Sinha
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

Various subsystems might need to take some steps before the KVM file descriptor
for a virtual machine is changed. So a new boolean attribute is added to the
vmfd_notifier structure which is passed to the notifier callbacks.
vmfd_notifer.pre is true for pre-notification of vmfd change and false for
post notification. Notifier callback implementations can simply check
the boolean value for (vmfd_notifer*)->pre and can take actions for pre or
post vmfd change based on the value.

Subsequent patches will add callback implementations for specific components
that need this pre-notification.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c  | 9 +++++++++
 include/system/kvm.h | 6 ++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 3b57d2f976..d244156f6f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2654,6 +2654,13 @@ static int kvm_reset_vmfd(MachineState *ms)
     memory_listener_unregister(&kml->listener);
     memory_listener_unregister(&kvm_io_listener);
 
+    vmfd_notifier.pre = true;
+    ret = kvm_vmfd_change_notify(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
     if (s->vmfd >= 0) {
         close(s->vmfd);
     }
@@ -2695,6 +2702,8 @@ static int kvm_reset_vmfd(MachineState *ms)
      * notify everyone that vmfd has changed.
      */
     vmfd_notifier.vmfd = s->vmfd;
+    vmfd_notifier.pre = false;
+
     ret = kvm_vmfd_change_notify(&err);
     if (ret < 0) {
         return ret;
diff --git a/include/system/kvm.h b/include/system/kvm.h
index f11729f432..fbe23608a1 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -571,12 +571,14 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
 /* argument to vmfd change notifier */
 typedef struct VmfdChangeNotifier {
     int vmfd;
+    bool pre;
 } VmfdChangeNotifier;
 
 /**
  * kvm_vmfd_add_change_notifier - register a notifier to get notified when
- * a KVM vm file descriptor changes as a part of the confidential guest "reset"
- * process. Various subsystems should use this mechanism to take actions such
+ * a KVM vm file descriptor changes or about to be changed as a part of the
+ * confidential guest "reset" process.
+ * Various subsystems should use this mechanism to take actions such
  * as creating new fds against this new vm file descriptor.
  * @n: notifier with return value.
  */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 09/35] i386/kvm: unregister smram listeners prior to vm file descriptor change
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (7 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 08/35] accel/kvm: notify when KVM VM file fd is about to be changed Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 10/35] kvm/i386: implement architecture support for kvm " Ani Sinha
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

We will re-register smram listeners after the VM file descriptors has changed.
We need to unregister them first to make sure addresses and reference counters
work properly.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a4e18734b1..83657fe832 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -112,6 +112,11 @@ typedef struct {
 static void kvm_init_msrs(X86CPU *cpu);
 static int kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
                           QEMUWRMSRHandler *wrmsr);
+static int unregister_smram_listener(NotifierWithReturn *notifier,
+                                     void *data, Error** errp);
+NotifierWithReturn kvm_vmfd_change_notifier = {
+    .notify = unregister_smram_listener,
+};
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
     KVM_CAP_INFO(SET_TSS_ADDR),
@@ -2885,6 +2890,17 @@ static void register_smram_listener(Notifier *n, void *unused)
     }
 }
 
+static int unregister_smram_listener(NotifierWithReturn *notifier,
+                                     void *data, Error** errp)
+{
+    if (!((VmfdChangeNotifier *)data)->pre) {
+        return 0;
+    }
+
+    memory_listener_unregister(&smram_listener.listener);
+    return 0;
+}
+
 /* It should only be called in cpu's hotplug callback */
 void kvm_smm_cpu_address_space_init(X86CPU *cpu)
 {
@@ -3538,6 +3554,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     }
 
     pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+    kvm_vmfd_add_change_notifier(&kvm_vmfd_change_notifier);
 
     return 0;
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 10/35] kvm/i386: implement architecture support for kvm file descriptor change
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (8 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 09/35] i386/kvm: unregister smram listeners prior to vm file descriptor change Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 11/35] i386/kvm: refactor xen init into a new function Ani Sinha
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

When the kvm file descriptor changes as a part of confidential guest reset,
some architecture specific setups including SEV/SEV-SNP/TDX specific setups
needs to be redone. These changes are implemented as a part of the
kvm_arch_on_vmfd_change() callback which was introduced previously.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c        | 49 ++++++++++++++++++++++++++++--------
 target/i386/kvm/trace-events |  1 +
 2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 83657fe832..8679e7d3fa 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3407,12 +3407,30 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
 
 int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
 {
-    abort();
+    int ret;
+
+    ret = kvm_arch_init(ms, s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
+        X86MachineState *x86ms = X86_MACHINE(ms);
+
+        if (x86_machine_is_smm_enabled(x86ms)) {
+            memory_listener_register(&smram_listener.listener,
+                                     &smram_address_space);
+        }
+        kvm_set_max_apic_id(x86ms->apic_id_limit);
+    }
+
+    trace_kvm_arch_on_vmfd_change();
+    return 0;
 }
 
 bool kvm_arch_supports_vmfd_change(void)
 {
-    return false;
+    return true;
 }
 
 int kvm_arch_init(MachineState *ms, KVMState *s)
@@ -3420,6 +3438,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     int ret;
     struct utsname utsname;
     Error *local_err = NULL;
+    static bool first = true;
 
     /*
      * Initialize confidential guest (SEV/TDX) context, if required
@@ -3489,16 +3508,17 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         return ret;
     }
 
-    /* Tell fw_cfg to notify the BIOS to reserve the range. */
-    e820_add_entry(KVM_IDENTITY_BASE, 0x4000, E820_RESERVED);
-
+    if (first) {
+        /* Tell fw_cfg to notify the BIOS to reserve the range. */
+        e820_add_entry(KVM_IDENTITY_BASE, 0x4000, E820_RESERVED);
+    }
     ret = kvm_vm_set_nr_mmu_pages(s);
     if (ret < 0) {
         return ret;
     }
 
     if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
-        x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
+        x86_machine_is_smm_enabled(X86_MACHINE(ms)) && first) {
         smram_machine_done.notify = register_smram_listener;
         qemu_add_machine_init_done_notifier(&smram_machine_done);
     }
@@ -3545,16 +3565,23 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
                 return ret;
             }
 
-            ret = kvm_msr_energy_thread_init(s, ms);
-            if (ret < 0) {
-                error_report("kvm : error RAPL feature requirement not met");
-                return ret;
+            if (first) {
+                ret = kvm_msr_energy_thread_init(s, ms);
+                if (ret < 0) {
+                    error_report("kvm : "
+                                 "error RAPL feature requirement not met");
+                    return ret;
+                }
             }
         }
     }
 
     pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
-    kvm_vmfd_add_change_notifier(&kvm_vmfd_change_notifier);
+
+    if (first) {
+        kvm_vmfd_add_change_notifier(&kvm_vmfd_change_notifier);
+    }
+    first = false;
 
     return 0;
 }
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 74a6234ff7..2d213c9f9b 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -6,6 +6,7 @@ kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
 kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
 kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 kvm_hc_map_gpa_range(uint64_t gpa, uint64_t size, uint64_t attributes, uint64_t flags) "gpa 0x%" PRIx64 " size 0x%" PRIx64 " attributes 0x%" PRIx64 " flags 0x%" PRIx64
+kvm_arch_on_vmfd_change(void) ""
 
 # xen-emu.c
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 11/35] i386/kvm: refactor xen init into a new function
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (9 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 10/35] kvm/i386: implement architecture support for kvm " Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 12/35] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
                   ` (23 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

Cosmetic - no new functionality added. Xen initialisation code is refactored
into its own function.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8679e7d3fa..feb3f3cf3c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3433,6 +3433,24 @@ bool kvm_arch_supports_vmfd_change(void)
     return true;
 }
 
+static int xen_init(MachineState *ms, KVMState *s)
+{
+#ifdef CONFIG_XEN_EMU
+    int ret = 0;
+    if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
+        error_report("kvm: Xen support only available in PC machine");
+        return -ENOTSUP;
+    }
+    /* hyperv_enabled() doesn't work yet. */
+    uint32_t msr = XEN_HYPERCALL_MSR;
+    ret = kvm_xen_init(s, msr);
+    return ret;
+#else
+    error_report("kvm: Xen support not enabled in qemu");
+    return -ENOTSUP;
+#endif
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     int ret;
@@ -3467,21 +3485,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     }
 
     if (s->xen_version) {
-#ifdef CONFIG_XEN_EMU
-        if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
-            error_report("kvm: Xen support only available in PC machine");
-            return -ENOTSUP;
-        }
-        /* hyperv_enabled() doesn't work yet. */
-        uint32_t msr = XEN_HYPERCALL_MSR;
-        ret = kvm_xen_init(s, msr);
+        ret = xen_init(ms, s);
         if (ret < 0) {
             return ret;
         }
-#else
-        error_report("kvm: Xen support not enabled in qemu");
-        return -ENOTSUP;
-#endif
     }
 
     ret = kvm_get_supported_msrs(s);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 12/35] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (10 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 11/35] i386/kvm: refactor xen init into a new function Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 13/35] hw/i386: export a new function x86_bios_rom_reload Ani Sinha
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Ani Sinha, kraxel, ani, qemu-devel

For confidential guests, bios image must be reinitialized upon reset. This
is because bios memory is encrypted and hence once the old confidential
kvm context is destroyed, it cannot be decrypted. It needs to be reinitilized.
Towards that, this change refactors x86_bios_rom_init() code so that
parts of it can be called during confidential guest reset.
No functional chnage.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/x86-common.c | 50 ++++++++++++++++++++++++++++++++------------
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index de4cd7650a..c98abaf368 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -1020,17 +1020,11 @@ void x86_isa_bios_init(MemoryRegion *isa_bios, MemoryRegion *isa_memory,
     memory_region_set_readonly(isa_bios, read_only);
 }
 
-void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
-                       MemoryRegion *rom_memory, bool isapc_ram_fw)
+static int get_bios_size(X86MachineState *x86ms,
+                         const char *bios_name, char *filename)
 {
-    const char *bios_name;
-    char *filename;
     int bios_size;
-    ssize_t ret;
 
-    /* BIOS load */
-    bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
-    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
     if (filename) {
         bios_size = get_image_size(filename, NULL);
     } else {
@@ -1040,6 +1034,21 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
         (bios_size % 65536) != 0) {
         goto bios_error;
     }
+
+    return bios_size;
+
+ bios_error:
+    fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
+    exit(1);
+}
+
+static void load_bios_from_file(X86MachineState *x86ms, const char *bios_name,
+                                char *filename, int bios_size,
+                                bool isapc_ram_fw)
+{
+    ssize_t ret;
+
+    /* BIOS load */
     if (machine_require_guest_memfd(MACHINE(x86ms))) {
         memory_region_init_ram_guest_memfd(&x86ms->bios, NULL, "pc.bios",
                                            bios_size, &error_fatal);
@@ -1068,7 +1077,26 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
             goto bios_error;
         }
     }
-    g_free(filename);
+
+    return;
+
+ bios_error:
+    fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
+    exit(1);
+}
+
+void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
+                       MemoryRegion *rom_memory, bool isapc_ram_fw)
+{
+    int bios_size;
+    const char *bios_name;
+    g_autofree char *filename;
+
+    bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
+    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+
+    bios_size = get_bios_size(x86ms, bios_name, filename);
+    load_bios_from_file(x86ms, bios_name, filename, bios_size, isapc_ram_fw);
 
     if (!machine_require_guest_memfd(MACHINE(x86ms))) {
         /* map the last 128KB of the BIOS in ISA space */
@@ -1081,8 +1109,4 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
                                 (uint32_t)(-bios_size),
                                 &x86ms->bios);
     return;
-
-bios_error:
-    fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
-    exit(1);
 }
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 13/35] hw/i386: export a new function x86_bios_rom_reload
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (11 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 12/35] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 14/35] kvm/i386: reload firmware for confidential guest reset Ani Sinha
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost
  Cc: Ani Sinha, kraxel, ani, Bernhard Beschow, qemu-devel

Confidential guest smust reload their bios rom upon reset. This is because
bios memory is encrypted and upon reset, the contents of the old bios memory
is lost and cannot be re-used. To this end, export a new x86 function
x86_bios_rom_reload() to reload the bios again. This function will be used in
the subsequent patches.

Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/x86-common.c  | 21 +++++++++++++++++++++
 include/hw/i386/x86.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index c98abaf368..a420112666 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -1085,6 +1085,27 @@ static void load_bios_from_file(X86MachineState *x86ms, const char *bios_name,
     exit(1);
 }
 
+void x86_bios_rom_reload(X86MachineState *x86ms)
+{
+    int bios_size;
+    const char *bios_name;
+    char *filename;
+
+    if (memory_region_size(&x86ms->bios) == 0) {
+        /* if -bios is not used */
+        return;
+    }
+
+    bios_name = MACHINE(x86ms)->firmware ?: "bios.bin";
+    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+
+    bios_size = get_bios_size(x86ms, bios_name, filename);
+
+    void *ptr = memory_region_get_ram_ptr(&x86ms->bios);
+    load_image_size(filename, ptr, bios_size);
+    x86_firmware_configure(0x100000000ULL - bios_size, ptr, bios_size);
+}
+
 void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
                        MemoryRegion *rom_memory, bool isapc_ram_fw)
 {
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 23be627437..a85a5600ce 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -125,6 +125,7 @@ void x86_isa_bios_init(MemoryRegion *isa_bios, MemoryRegion *isa_memory,
                        MemoryRegion *bios, bool read_only);
 void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
                        MemoryRegion *rom_memory, bool isapc_ram_fw);
+void x86_bios_rom_reload(X86MachineState *x86ms);
 
 void x86_load_linux(X86MachineState *x86ms,
                     FWCfgState *fw_cfg,
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 14/35] kvm/i386: reload firmware for confidential guest reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (12 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 13/35] hw/i386: export a new function x86_bios_rom_reload Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 15/35] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
                   ` (20 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

When IGVM is not being used by the confidential guest, the guest firmware has
to be reloaded explicitly again into memory. This is because, the memory into
which the firmware was loaded before reset was encrypted and is thus lost
upon reset. When IGVM is used, it is expected that the IGVM will contain the
guest firmware and the execution of the IGVM directives will set up the guest
firmware memory.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index feb3f3cf3c..5c8ec77212 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3416,7 +3416,14 @@ int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
 
     if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
         X86MachineState *x86ms = X86_MACHINE(ms);
-
+        /*
+         * For confidential guests, reload bios ROM if IGVM is not specified.
+         * If an IGVM file is specified then the firmware must be provided
+         * in the IGVM file.
+         */
+        if (ms->cgs && !x86ms->igvm) {
+                x86_bios_rom_reload(x86ms);
+        }
         if (x86_machine_is_smm_enabled(x86ms)) {
             memory_listener_register(&smram_listener.listener,
                                      &smram_address_space);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 15/35] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (13 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 14/35] kvm/i386: reload firmware for confidential guest reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 16/35] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

Confidential guests needs to generate a new KVM file descriptor upon virtual
machine reset. Existing VCPUs needs to be reattached to this new
KVM VM file descriptor. As a part of this, new VCPU file descriptors against
this new KVM VM file descriptor needs to be created and re-initialized.
Resources allocated against the old VCPU fds needs to be released. This change
makes this happen.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c    | 215 +++++++++++++++++++++++++++++++++--------
 accel/kvm/trace-events |   1 +
 2 files changed, 174 insertions(+), 42 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index d244156f6f..a347a71a2e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -127,6 +127,10 @@ static NotifierList kvm_irqchip_change_notifiers =
 static NotifierWithReturnList register_vmfd_changed_notifiers =
     NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
 
+static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp);
+static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp);
+static int vcpu_unmap_regions(KVMState *s, CPUState *cpu);
+
 struct KVMResampleFd {
     int gsi;
     EventNotifier *resample_event;
@@ -420,6 +424,90 @@ err:
     return ret;
 }
 
+static void kvm_create_vcpu_internal(CPUState *cpu, KVMState *s, int kvm_fd)
+{
+    cpu->kvm_fd = kvm_fd;
+    cpu->kvm_state = s;
+    if (!s->guest_state_protected) {
+        cpu->vcpu_dirty = true;
+    }
+    cpu->dirty_pages = 0;
+    cpu->throttle_us_per_full = 0;
+
+    return;
+}
+
+static int kvm_rebind_vcpus(Error **errp)
+{
+    CPUState *cpu;
+    unsigned long vcpu_id;
+    KVMState *s = kvm_state;
+    int kvm_fd, ret = 0;
+
+    CPU_FOREACH(cpu) {
+        vcpu_id = kvm_arch_vcpu_id(cpu);
+
+        if (cpu->kvm_fd) {
+            close(cpu->kvm_fd);
+        }
+
+        ret = kvm_arch_destroy_vcpu(cpu);
+        if (ret < 0) {
+            goto err;
+        }
+
+        if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+            s->coalesced_mmio_ring = NULL;
+        }
+
+        ret = vcpu_unmap_regions(s, cpu);
+        if (ret < 0) {
+            goto err;
+        }
+
+        ret = kvm_arch_pre_create_vcpu(cpu, errp);
+        if (ret < 0) {
+            goto err;
+        }
+
+        kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
+        if (kvm_fd < 0) {
+            error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu (%s)",
+                         vcpu_id, strerror(kvm_fd));
+            return kvm_fd;
+        }
+
+        kvm_create_vcpu_internal(cpu, s, kvm_fd);
+
+        ret = map_kvm_run(s, cpu, errp);
+        if (ret < 0) {
+            goto err;
+        }
+
+        if (s->kvm_dirty_ring_size) {
+            ret = map_kvm_dirty_gfns(s, cpu, errp);
+            if (ret < 0) {
+                goto err;
+            }
+        }
+
+        ret = kvm_arch_init_vcpu(cpu);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret,
+                             "kvm_init_vcpu: kvm_arch_init_vcpu failed (%lu)",
+                             vcpu_id);
+        }
+
+        close(cpu->kvm_vcpu_stats_fd);
+        cpu->kvm_vcpu_stats_fd = kvm_vcpu_ioctl(cpu, KVM_GET_STATS_FD, NULL);
+        kvm_init_cpu_signals(cpu);
+    }
+    trace_kvm_rebind_vcpus();
+
+ err:
+    return ret;
+}
+
 static void kvm_park_vcpu(CPUState *cpu)
 {
     struct KVMParkedVcpu *vcpu;
@@ -483,13 +571,7 @@ static int kvm_create_vcpu(CPUState *cpu)
         }
     }
 
-    cpu->kvm_fd = kvm_fd;
-    cpu->kvm_state = s;
-    if (!s->guest_state_protected) {
-        cpu->vcpu_dirty = true;
-    }
-    cpu->dirty_pages = 0;
-    cpu->throttle_us_per_full = 0;
+    kvm_create_vcpu_internal(cpu, s, kvm_fd);
 
     trace_kvm_create_vcpu(cpu->cpu_index, vcpu_id, kvm_fd);
 
@@ -508,19 +590,11 @@ int kvm_create_and_park_vcpu(CPUState *cpu)
     return ret;
 }
 
-static int do_kvm_destroy_vcpu(CPUState *cpu)
+static int vcpu_unmap_regions(KVMState *s, CPUState *cpu)
 {
-    KVMState *s = kvm_state;
     int mmap_size;
     int ret = 0;
 
-    trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
-
-    ret = kvm_arch_destroy_vcpu(cpu);
-    if (ret < 0) {
-        goto err;
-    }
-
     mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
     if (mmap_size < 0) {
         ret = mmap_size;
@@ -548,39 +622,47 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
         cpu->kvm_dirty_gfns = NULL;
     }
 
-    kvm_park_vcpu(cpu);
-err:
+ err:
     return ret;
 }
 
-void kvm_destroy_vcpu(CPUState *cpu)
-{
-    if (do_kvm_destroy_vcpu(cpu) < 0) {
-        error_report("kvm_destroy_vcpu failed");
-        exit(EXIT_FAILURE);
-    }
-}
-
-int kvm_init_vcpu(CPUState *cpu, Error **errp)
+static int do_kvm_destroy_vcpu(CPUState *cpu)
 {
     KVMState *s = kvm_state;
-    int mmap_size;
-    int ret;
+    int ret = 0;
 
-    trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+    trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
-    ret = kvm_arch_pre_create_vcpu(cpu, errp);
+    ret = kvm_arch_destroy_vcpu(cpu);
     if (ret < 0) {
         goto err;
     }
 
-    ret = kvm_create_vcpu(cpu);
+    /* If I am the CPU that created coalesced_mmio_ring, then discard it */
+    if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+        s->coalesced_mmio_ring = NULL;
+    }
+
+    ret = vcpu_unmap_regions(s, cpu);
     if (ret < 0) {
-        error_setg_errno(errp, -ret,
-                         "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
-                         kvm_arch_vcpu_id(cpu));
         goto err;
     }
+    kvm_park_vcpu(cpu);
+err:
+    return ret;
+}
+
+void kvm_destroy_vcpu(CPUState *cpu)
+{
+    if (do_kvm_destroy_vcpu(cpu) < 0) {
+        error_report("kvm_destroy_vcpu failed");
+        exit(EXIT_FAILURE);
+    }
+}
+
+static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp)
+{
+    int mmap_size, ret = 0;
 
     mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
     if (mmap_size < 0) {
@@ -605,14 +687,53 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
             (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
     }
 
+ err:
+    return ret;
+}
+
+static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp)
+{
+    int ret = 0;
+    /* Use MAP_SHARED to share pages with the kernel */
+    cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_bytes,
+                               PROT_READ | PROT_WRITE, MAP_SHARED,
+                               cpu->kvm_fd,
+                               PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
+    if (cpu->kvm_dirty_gfns == MAP_FAILED) {
+        ret = -errno;
+    }
+
+    return ret;
+}
+
+int kvm_init_vcpu(CPUState *cpu, Error **errp)
+{
+    KVMState *s = kvm_state;
+    int ret;
+
+    trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+    ret = kvm_arch_pre_create_vcpu(cpu, errp);
+    if (ret < 0) {
+        goto err;
+    }
+
+    ret = kvm_create_vcpu(cpu);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
+                         kvm_arch_vcpu_id(cpu));
+        goto err;
+    }
+
+    ret = map_kvm_run(s, cpu, errp);
+    if (ret < 0) {
+        goto err;
+    }
+
     if (s->kvm_dirty_ring_size) {
-        /* Use MAP_SHARED to share pages with the kernel */
-        cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_bytes,
-                                   PROT_READ | PROT_WRITE, MAP_SHARED,
-                                   cpu->kvm_fd,
-                                   PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
-        if (cpu->kvm_dirty_gfns == MAP_FAILED) {
-            ret = -errno;
+        ret = map_kvm_dirty_gfns(s, cpu, errp);
+        if (ret < 0) {
             goto err;
         }
     }
@@ -2710,6 +2831,16 @@ static int kvm_reset_vmfd(MachineState *ms)
     }
     assert(!err);
 
+    /*
+     * rebind new vcpu fds with the new kvm fds
+     * These can only be called after kvm_arch_on_vmfd_change()
+     */
+    ret = kvm_rebind_vcpus(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
     /* these can be only called after ram_block_rebind() */
     memory_listener_register(&kml->listener, &address_space_memory);
     memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e4beda0148..4a8921c632 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,6 +15,7 @@ kvm_park_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
 kvm_unpark_vcpu(unsigned long arch_cpu_id, const char *msg) "id: %lu %s"
 kvm_irqchip_commit_routes(void) ""
 kvm_reset_vmfd(void) ""
+kvm_rebind_vcpus(void) ""
 kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
 kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 16/35] i386/tdx: refactor TDX firmware memory initialization code into a new function
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (14 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 15/35] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 17/35] i386/tdx: finalize TDX guest state upon reset Ani Sinha
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

A new helper function is introduced that refactors all firmware memory
initialization code into a separate function. No functional change.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/tdx.c | 73 ++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index a3e81e1c0c..fd8e3de969 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -295,14 +295,51 @@ static void tdx_post_init_vcpus(void)
     }
 }
 
-static void tdx_finalize_vm(Notifier *notifier, void *unused)
+static void tdx_init_fw_mem_region(void)
 {
     TdxFirmware *tdvf = &tdx_guest->tdvf;
     TdxFirmwareEntry *entry;
-    RAMBlock *ram_block;
     Error *local_err = NULL;
     int r;
 
+    for_each_tdx_fw_entry(tdvf, entry) {
+        struct kvm_tdx_init_mem_region region;
+        uint32_t flags;
+
+        region = (struct kvm_tdx_init_mem_region) {
+            .source_addr = (uintptr_t)entry->mem_ptr,
+            .gpa = entry->address,
+            .nr_pages = entry->size >> 12,
+        };
+
+        flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
+                KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+        do {
+            error_free(local_err);
+            local_err = NULL;
+            r = tdx_vcpu_ioctl(first_cpu, KVM_TDX_INIT_MEM_REGION, flags,
+                               &region, &local_err);
+        } while (r == -EAGAIN || r == -EINTR);
+        if (r < 0) {
+            error_report_err(local_err);
+            exit(1);
+        }
+
+        if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+            entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+            qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
+            entry->mem_ptr = NULL;
+        }
+    }
+}
+
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+    TdxFirmware *tdvf = &tdx_guest->tdvf;
+    TdxFirmwareEntry *entry;
+    RAMBlock *ram_block;
+
     tdx_init_ram_entries();
 
     for_each_tdx_fw_entry(tdvf, entry) {
@@ -339,37 +376,7 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
     tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
 
     tdx_post_init_vcpus();
-
-    for_each_tdx_fw_entry(tdvf, entry) {
-        struct kvm_tdx_init_mem_region region;
-        uint32_t flags;
-
-        region = (struct kvm_tdx_init_mem_region) {
-            .source_addr = (uintptr_t)entry->mem_ptr,
-            .gpa = entry->address,
-            .nr_pages = entry->size >> 12,
-        };
-
-        flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
-                KVM_TDX_MEASURE_MEMORY_REGION : 0;
-
-        do {
-            error_free(local_err);
-            local_err = NULL;
-            r = tdx_vcpu_ioctl(first_cpu, KVM_TDX_INIT_MEM_REGION, flags,
-                               &region, &local_err);
-        } while (r == -EAGAIN || r == -EINTR);
-        if (r < 0) {
-            error_report_err(local_err);
-            exit(1);
-        }
-
-        if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
-            entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
-            qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
-            entry->mem_ptr = NULL;
-        }
-    }
+    tdx_init_fw_mem_region();
 
     /*
      * TDVF image has been copied into private region above via
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 17/35] i386/tdx: finalize TDX guest state upon reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (15 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 16/35] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 18/35] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

When the confidential virtual machine KVM file descriptor changes due to the
guest reset, some TDX specific setup steps needs to be done again. This
includes finalizing the initial guest launch state again. This change
re-executes some parts of the TDX setup during the device reset phaze using a
resettable interface. This finalizes the guest launch state again and locks
it in. Machine done notifier which was previously used is no longer needed as
the same code is now executed as a part of VM reset.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/tdx.c        | 38 +++++++++++++++++++++++++++++++-----
 target/i386/kvm/tdx.h        |  1 +
 target/i386/kvm/trace-events |  3 +++
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index fd8e3de969..37e91d95e1 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
 #include "crypto/hash.h"
 #include "system/kvm_int.h"
 #include "system/runstate.h"
+#include "system/reset.h"
 #include "system/system.h"
 #include "system/ramblock.h"
 #include "system/address-spaces.h"
@@ -38,6 +39,7 @@
 #include "kvm_i386.h"
 #include "tdx.h"
 #include "tdx-quote-generator.h"
+#include "trace.h"
 
 #include "standard-headers/asm-x86/kvm_para.h"
 
@@ -389,9 +391,19 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
     CONFIDENTIAL_GUEST_SUPPORT(tdx_guest)->ready = true;
 }
 
-static Notifier tdx_machine_done_notify = {
-    .notify = tdx_finalize_vm,
-};
+static void tdx_handle_reset(Object *obj, ResetType type)
+{
+    if (!runstate_is_running() && !phase_check(PHASE_MACHINE_READY)) {
+        return;
+    }
+
+    if (!kvm_enable_hypercall(BIT_ULL(KVM_HC_MAP_GPA_RANGE))) {
+        error_setg(&error_fatal, "KVM_HC_MAP_GPA_RANGE not enabled for guest");
+    }
+
+    tdx_finalize_vm(NULL, NULL);
+    trace_tdx_handle_reset();
+}
 
 /*
  * Some CPUID bits change from fixed1 to configurable bits when TDX module
@@ -738,8 +750,6 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
      */
     kvm_readonly_mem_allowed = false;
 
-    qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
-
     tdx_guest = tdx;
     return 0;
 }
@@ -1505,6 +1515,7 @@ OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
                                    TDX_GUEST,
                                    X86_CONFIDENTIAL_GUEST,
                                    { TYPE_USER_CREATABLE },
+                                   { TYPE_RESETTABLE_INTERFACE },
                                    { NULL })
 
 static void tdx_guest_init(Object *obj)
@@ -1538,16 +1549,24 @@ static void tdx_guest_init(Object *obj)
 
     tdx->event_notify_vector = -1;
     tdx->event_notify_apicid = -1;
+    qemu_register_resettable(obj);
 }
 
 static void tdx_guest_finalize(Object *obj)
 {
 }
 
+static ResettableState *tdx_reset_state(Object *obj)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+    return &tdx->reset_state;
+}
+
 static void tdx_guest_class_init(ObjectClass *oc, const void *data)
 {
     ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
     X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
+    ResettableClass *rc = RESETTABLE_CLASS(oc);
 
     klass->kvm_init = tdx_kvm_init;
     klass->can_rebuild_guest_state = true;
@@ -1555,4 +1574,13 @@ static void tdx_guest_class_init(ObjectClass *oc, const void *data)
     x86_klass->cpu_instance_init = tdx_cpu_instance_init;
     x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
     x86_klass->check_features = tdx_check_features;
+
+    /*
+     * the exit phase makes sure sev handles reset after all legacy resets
+     * have taken place (in the hold phase) and IGVM has also properly
+     * set up the boot state.
+     */
+    rc->phases.exit = tdx_handle_reset;
+    rc->get_state = tdx_reset_state;
+
 }
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 1c38faf983..264fbe530c 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -70,6 +70,7 @@ typedef struct TdxGuest {
 
     uint32_t event_notify_vector;
     uint32_t event_notify_apicid;
+    ResettableState reset_state;
 } TdxGuest;
 
 #ifdef CONFIG_TDX
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 2d213c9f9b..a386234571 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -14,3 +14,6 @@ kvm_xen_soft_reset(void) ""
 kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
 kvm_xen_set_vcpu_attr(int cpu, int type, uint64_t gpa) "vcpu attr cpu %d type %d gpa 0x%" PRIx64
 kvm_xen_set_vcpu_callback(int cpu, int vector) "callback vcpu %d vector %d"
+
+# tdx.c
+tdx_handle_reset(void) ""
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 18/35] i386/tdx: add a pre-vmfd change notifier to reset tdx state
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (16 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 17/35] i386/tdx: finalize TDX guest state upon reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 19/35] i386/sev: add migration blockers only once Ani Sinha
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

During reset, when the VM file descriptor is changed, the TDX state needs to be
re-initialized. A notifier callback is implemented to reset the old
state and free memory before the new state is initialized post VM file
descriptor change.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/tdx.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 37e91d95e1..4cae99c281 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -405,6 +405,36 @@ static void tdx_handle_reset(Object *obj, ResetType type)
     trace_tdx_handle_reset();
 }
 
+/* TDX guest reset will require us to reinitialize some of tdx guest state. */
+static int set_tdx_vm_uninitialized(NotifierWithReturn *notifier,
+                                    void *data, Error** errp)
+{
+    TdxFirmware *fw = &tdx_guest->tdvf;
+
+    if (!((VmfdChangeNotifier *)data)->pre) {
+        return 0;
+    }
+
+    if (tdx_guest->initialized) {
+        tdx_guest->initialized = false;
+    }
+
+    g_free(tdx_guest->ram_entries);
+
+    /*
+     * the firmware entries will be parsed again, see
+     * x86_firmware_configure() -> tdx_parse_tdvf()
+     */
+    fw->entries = 0;
+    g_free(fw->entries);
+
+    return 0;
+}
+
+static NotifierWithReturn tdx_vmfd_change_notifier = {
+    .notify = set_tdx_vm_uninitialized,
+};
+
 /*
  * Some CPUID bits change from fixed1 to configurable bits when TDX module
  * supports TDX_FEATURES0.VE_REDUCTION. e.g., MCA/MCE/MTRR/CORE_CAPABILITY.
@@ -1549,6 +1579,7 @@ static void tdx_guest_init(Object *obj)
 
     tdx->event_notify_vector = -1;
     tdx->event_notify_apicid = -1;
+    kvm_vmfd_add_change_notifier(&tdx_vmfd_change_notifier);
     qemu_register_resettable(obj);
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 19/35] i386/sev: add migration blockers only once
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (17 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 18/35] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 20/35] i386/sev: add notifiers " Ani Sinha
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu
  Cc: Ani Sinha, kraxel, ani, Prasad Pandit, kvm, qemu-devel

sev_launch_finish() and sev_snp_launch_finish() could be called multiple times
when the confidential guest is being reset/rebooted. The migration
blockers should not be added multiple times, once per invocation. This change
makes sure that the migration blockers are added only one time by adding the
migration blockers to the vm state change handler when the vm transitions to
the running state. Subsequent reboots do not change the state of the vm.

Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/sev.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 66e38ca32e..260d8ef88b 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1421,11 +1421,6 @@ sev_launch_finish(SevCommonState *sev_common)
     }
 
     sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
-
-    /* add migration blocker */
-    error_setg(&sev_mig_blocker,
-               "SEV: Migration is not implemented");
-    migrate_add_blocker(&sev_mig_blocker, &error_fatal);
 }
 
 static int snp_launch_update_data(uint64_t gpa, void *hva, size_t len,
@@ -1608,7 +1603,6 @@ static void
 sev_snp_launch_finish(SevCommonState *sev_common)
 {
     int ret, error;
-    Error *local_err = NULL;
     OvmfSevMetadata *metadata;
     SevLaunchUpdateData *data;
     SevSnpGuestState *sev_snp = SEV_SNP_GUEST(sev_common);
@@ -1655,15 +1649,6 @@ sev_snp_launch_finish(SevCommonState *sev_common)
 
     kvm_mark_guest_state_protected();
     sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
-
-    /* add migration blocker */
-    error_setg(&sev_mig_blocker,
-               "SEV-SNP: Migration is not implemented");
-    ret = migrate_add_blocker(&sev_mig_blocker, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-        exit(1);
-    }
 }
 
 
@@ -1676,6 +1661,11 @@ sev_vm_state_change(void *opaque, bool running, RunState state)
     if (running) {
         if (!sev_check_state(sev_common, SEV_STATE_RUNNING)) {
             klass->launch_finish(sev_common);
+
+            /* add migration blocker */
+            error_setg(&sev_mig_blocker,
+                       "SEV: Migration is not implemented");
+            migrate_add_blocker(&sev_mig_blocker, &error_fatal);
         }
     }
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 20/35] i386/sev: add notifiers only once
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (18 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 19/35] i386/sev: add migration blockers only once Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 21/35] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti
  Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

The various notifiers that are used needs to be installed only once not on
every initialization. This includes the vm state change notifier and others.
This change uses 'cgs->ready' flag to install the notifiers only one time,
the first time.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/sev.c | 36 +++++++++++++++++++-----------------
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 260d8ef88b..647f4bf63d 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1920,8 +1920,9 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return -1;
     }
 
-    qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
-
+    if (!cgs->ready) {
+        qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
+    }
     cgs->ready = true;
 
     return 0;
@@ -1943,22 +1944,23 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return -1;
     }
 
-    /*
-     * SEV uses these notifiers to register/pin pages prior to guest use,
-     * but SNP relies on guest_memfd for private pages, which has its
-     * own internal mechanisms for registering/pinning private memory.
-     */
-    ram_block_notifier_add(&sev_ram_notifier);
-
-    /*
-     * The machine done notify event is used for SEV guests to get the
-     * measurement of the encrypted images. When SEV-SNP is enabled, the
-     * measurement is part of the guest attestation process where it can
-     * be collected without any reliance on the VMM. So skip registering
-     * the notifier for SNP in favor of using guest attestation instead.
-     */
-    qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
+    if (!cgs->ready) {
+        /*
+         * SEV uses these notifiers to register/pin pages prior to guest use,
+         * but SNP relies on guest_memfd for private pages, which has its
+         * own internal mechanisms for registering/pinning private memory.
+         */
+        ram_block_notifier_add(&sev_ram_notifier);
 
+        /*
+         * The machine done notify event is used for SEV guests to get the
+         * measurement of the encrypted images. When SEV-SNP is enabled, the
+         * measurement is part of the guest attestation process where it can
+         * be collected without any reliance on the VMM. So skip registering
+         * the notifier for SNP in favor of using guest attestation instead.
+         */
+        qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
+    }
     return 0;
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 21/35] i386/sev: free existing launch update data and kernel hashes data on init
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (19 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 20/35] i386/sev: add notifiers " Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 22/35] i386/sev: add support for confidential guest reset Ani Sinha
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti
  Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

If there is existing launch update data and kernel hashes data, they need to be
freed when initialization code is executed. This is important for resettable
confidential guests where the initialization happens once every reset.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/sev.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 647f4bf63d..b3893e431c 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1773,6 +1773,7 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     uint32_t ebx;
     uint32_t host_cbitpos;
     struct sev_user_data_status status = {};
+    SevLaunchUpdateData *data, *next_elm;
     SevCommonState *sev_common = SEV_COMMON(cgs);
     SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(cgs);
     X86ConfidentialGuestClass *x86_klass =
@@ -1780,6 +1781,11 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 
     sev_common->state = SEV_STATE_UNINIT;
 
+    /* free existing launch update data if any */
+    QTAILQ_FOREACH_SAFE(data, &launch_update, next, next_elm) {
+        g_free(data);
+    }
+
     host_cpuid(0x8000001F, 0, NULL, &ebx, NULL, NULL);
     host_cbitpos = ebx & 0x3f;
 
@@ -1968,6 +1974,8 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
     X86MachineState *x86ms = X86_MACHINE(ms);
+    SevCommonState *sev_common = SEV_COMMON(cgs);
+    SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(sev_common);
 
     if (x86ms->smm == ON_OFF_AUTO_AUTO) {
         x86ms->smm = ON_OFF_AUTO_OFF;
@@ -1976,6 +1984,10 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return -1;
     }
 
+    /* free existing kernel hashes data if any */
+    g_free(sev_snp_guest->kernel_hashes_data);
+    sev_snp_guest->kernel_hashes_data = NULL;
+
     return 0;
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 22/35] i386/sev: add support for confidential guest reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (20 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 21/35] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 23/35] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu
  Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

When the KVM VM file descriptor changes as a part of the confidential guest
reset mechanism, it necessary to create a new confidential guest context and
re-encrypt the VM memory. This happens for SEV-ES and SEV-SNP virtual machines
as a part of SEV_LAUNCH_FINISH, SEV_SNP_LAUNCH_FINISH operations.

A new resettable interface for SEV module has been added. A new reset callback
for the reset 'exit' state has been implemented to perform the above operations
when the VM file descriptor has changed during VM reset.

Tracepoints has been added also for tracing purpose.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/sev.c        | 58 ++++++++++++++++++++++++++++++++++++++++
 target/i386/trace-events |  1 +
 2 files changed, 59 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index b3893e431c..549e624176 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -30,8 +30,10 @@
 #include "system/kvm.h"
 #include "kvm/kvm_i386.h"
 #include "sev.h"
+#include "system/cpus.h"
 #include "system/system.h"
 #include "system/runstate.h"
+#include "system/reset.h"
 #include "trace.h"
 #include "migration/blocker.h"
 #include "qom/object.h"
@@ -86,6 +88,10 @@ typedef struct QEMU_PACKED PaddedSevHashTable {
     uint8_t padding[ROUND_UP(sizeof(SevHashTable), 16) - sizeof(SevHashTable)];
 } PaddedSevHashTable;
 
+static void sev_handle_reset(Object *obj, ResetType type);
+
+SevKernelLoaderContext sev_load_ctx = {};
+
 QEMU_BUILD_BUG_ON(sizeof(PaddedSevHashTable) % 16 != 0);
 
 #define SEV_INFO_BLOCK_GUID     "00f771de-1a7e-4fcb-890e-68c77e2fb44e"
@@ -129,6 +135,7 @@ struct SevCommonState {
     uint8_t build_id;
     int sev_fd;
     SevState state;
+    ResettableState reset_state;
 
     QTAILQ_HEAD(, SevLaunchVmsa) launch_vmsa;
 };
@@ -1666,6 +1673,11 @@ sev_vm_state_change(void *opaque, bool running, RunState state)
             error_setg(&sev_mig_blocker,
                        "SEV: Migration is not implemented");
             migrate_add_blocker(&sev_mig_blocker, &error_fatal);
+            /*
+             * mark SEV guest as resettable so that we can reinitialize
+             * SEV upon reset.
+             */
+            qemu_register_resettable(OBJECT(sev_common));
         }
     }
 }
@@ -1991,6 +2003,41 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     return 0;
 }
 
+/*
+ * handle sev vm reset
+ */
+static void sev_handle_reset(Object *obj, ResetType type)
+{
+    SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
+    SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(sev_common);
+
+    if (!sev_common) {
+        return;
+    }
+
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    sev_add_kernel_loader_hashes(&sev_load_ctx, &error_fatal);
+    if (sev_es_enabled() && !sev_snp_enabled()) {
+        sev_launch_get_measure(NULL, NULL);
+    }
+    if (!sev_check_state(sev_common, SEV_STATE_RUNNING)) {
+        /* this calls sev_snp_launch_finish() etc */
+        klass->launch_finish(sev_common);
+    }
+
+    trace_sev_handle_reset();
+    return;
+}
+
+static ResettableState *sev_reset_state(Object *obj)
+{
+    SevCommonState *sev_common = SEV_COMMON(obj);
+    return &sev_common->reset_state;
+}
+
 int
 sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp)
 {
@@ -2469,6 +2516,8 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
         return false;
     }
 
+    /* save the context here so that it can be re-used when vm is reset */
+    memcpy(&sev_load_ctx, ctx, sizeof(*ctx));
     return klass->build_kernel_loader_hashes(sev_common, area, ctx, errp);
 }
 
@@ -2729,8 +2778,16 @@ static void
 sev_common_class_init(ObjectClass *oc, const void *data)
 {
     ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
+    ResettableClass *rc = RESETTABLE_CLASS(oc);
 
     klass->kvm_init = sev_common_kvm_init;
+    /*
+     * the exit phase makes sure sev handles reset after all legacy resets
+     * have taken place (in the hold phase) and IGVM has also properly
+     * set up the boot state.
+     */
+    rc->phases.exit = sev_handle_reset;
+    rc->get_state = sev_reset_state;
 
     object_class_property_add_str(oc, "sev-device",
                                   sev_common_get_sev_device,
@@ -2780,6 +2837,7 @@ static const TypeInfo sev_common_info = {
     .abstract = true,
     .interfaces = (const InterfaceInfo[]) {
         { TYPE_USER_CREATABLE },
+        { TYPE_RESETTABLE_INTERFACE },
         { }
     }
 };
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 51301673f0..b320f655ee 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -14,3 +14,4 @@ kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce %s data
 kvm_sev_snp_launch_start(uint64_t policy, char *gosvw) "policy 0x%" PRIx64 " gosvw %s"
 kvm_sev_snp_launch_update(uint64_t src, uint64_t gpa, uint64_t len, const char *type) "src 0x%" PRIx64 " gpa 0x%" PRIx64 " len 0x%" PRIx64 " (%s page)"
 kvm_sev_snp_launch_finish(char *id_block, char *id_auth, char *host_data) "id_block %s id_auth %s host_data %s"
+sev_handle_reset(void) ""
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 23/35] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (21 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 22/35] i386/sev: add support for confidential guest reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-27  7:02   ` Cédric Le Goater
  2026-02-25  3:49 ` [PATCH v6 24/35] kvm/i8254: refactor pit initialization into a helper Ani Sinha
                   ` (11 subsequent siblings)
  34 siblings, 1 reply; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Alex Williamson, Cédric Le Goater
  Cc: Ani Sinha, kraxel, pbonzini, ani, qemu-devel

Normally the vfio pseudo device file descriptor lives for the life of the VM.
However, when the kvm VM file descriptor changes, a new file descriptor
for the pseudo device needs to be generated against the new kvm VM descriptor.
Other existing vfio descriptors needs to be reattached to the new pseudo device
descriptor. This change performs the above steps.

Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/vfio/helpers.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index f68f8165d0..e2bedd15ec 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -116,6 +116,89 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
  * we'll re-use it should another vfio device be attached before then.
  */
 int vfio_kvm_device_fd = -1;
+
+/*
+ * Confidential virtual machines:
+ * During reset of confidential vms, the kvm vm file descriptor changes.
+ * In this case, the old vfio kvm file descriptor is
+ * closed and a new descriptor is created against the new kvm vm file
+ * descriptor.
+ */
+
+typedef struct VFIODeviceFd {
+    int fd;
+    QLIST_ENTRY(VFIODeviceFd) node;
+} VFIODeviceFd;
+
+static QLIST_HEAD(, VFIODeviceFd) vfio_device_fds =
+    QLIST_HEAD_INITIALIZER(vfio_device_fds);
+
+static void vfio_device_fd_list_add(int fd)
+{
+    VFIODeviceFd *file_fd;
+    file_fd = g_malloc0(sizeof(*file_fd));
+    file_fd->fd = fd;
+    QLIST_INSERT_HEAD(&vfio_device_fds, file_fd, node);
+}
+
+static void vfio_device_fd_list_remove(int fd)
+{
+    VFIODeviceFd *file_fd, *next;
+
+    QLIST_FOREACH_SAFE(file_fd, &vfio_device_fds, node, next) {
+        if (file_fd->fd == fd) {
+            QLIST_REMOVE(file_fd, node);
+            g_free(file_fd);
+            break;
+        }
+    }
+}
+
+static int vfio_device_fd_rebind(NotifierWithReturn *notifier, void *data,
+                                  Error **errp)
+{
+    VFIODeviceFd *file_fd;
+    int ret = 0;
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_FILE,
+        .attr = KVM_DEV_VFIO_FILE_ADD,
+    };
+    struct kvm_create_device cd = {
+        .type = KVM_DEV_TYPE_VFIO,
+    };
+
+    /* we are not interested in pre vmfd change notification */
+    if (((VmfdChangeNotifier *)data)->pre) {
+        return 0;
+    }
+
+    if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
+        error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
+        return -errno;
+    }
+
+    if (vfio_kvm_device_fd != -1) {
+        close(vfio_kvm_device_fd);
+    }
+
+    vfio_kvm_device_fd = cd.fd;
+
+    QLIST_FOREACH(file_fd, &vfio_device_fds, node) {
+        attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
+        if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+            error_setg_errno(errp, errno,
+                             "Failed to add fd %d to KVM VFIO device",
+                             file_fd->fd);
+            ret = -errno;
+        }
+    }
+    return ret;
+}
+
+static struct NotifierWithReturn vfio_vmfd_change_notifier = {
+    .notify = vfio_device_fd_rebind,
+};
+
 #endif
 
 void vfio_kvm_device_close(void)
@@ -153,6 +236,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
         }
 
         vfio_kvm_device_fd = cd.fd;
+        /*
+         * If the vm file descriptor changes, add a notifier so that we can
+         * re-create the vfio_kvm_device_fd.
+         */
+        kvm_vmfd_add_change_notifier(&vfio_vmfd_change_notifier);
     }
 
     if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
@@ -160,6 +248,8 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
                          fd);
         return -errno;
     }
+
+    vfio_device_fd_list_add(fd);
 #endif
     return 0;
 }
@@ -183,6 +273,8 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
                          "Failed to remove fd %d from KVM VFIO device", fd);
         return -errno;
     }
+
+    vfio_device_fd_list_remove(fd);
 #endif
     return 0;
 }
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 24/35] kvm/i8254: refactor pit initialization into a helper
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (22 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 23/35] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 25/35] kvm/i8254: add support for confidential guest reset Ani Sinha
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost
  Cc: Ani Sinha, kraxel, ani, qemu-devel

The initialization code will be used again by VM file descriptor change
notifier callback in a subsequent change. So refactor common code into a new
helper function.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/kvm/i8254.c | 68 +++++++++++++++++++++++++--------------------
 1 file changed, 38 insertions(+), 30 deletions(-)

diff --git a/hw/i386/kvm/i8254.c b/hw/i386/kvm/i8254.c
index 81e742f866..255047458a 100644
--- a/hw/i386/kvm/i8254.c
+++ b/hw/i386/kvm/i8254.c
@@ -60,6 +60,43 @@ struct KVMPITClass {
     DeviceRealize parent_realize;
 };
 
+static void do_pit_initialize(KVMPITState *s, Error **errp)
+{
+    struct kvm_pit_config config = {
+        .flags = 0,
+    };
+    int ret;
+
+    ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_PIT2, &config);
+    if (ret < 0) {
+        error_setg(errp, "Create kernel PIC irqchip failed: %s",
+                   strerror(-ret));
+        return;
+    }
+    switch (s->lost_tick_policy) {
+    case LOST_TICK_POLICY_DELAY:
+        break; /* enabled by default */
+    case LOST_TICK_POLICY_DISCARD:
+        if (kvm_check_extension(kvm_state, KVM_CAP_REINJECT_CONTROL)) {
+            struct kvm_reinject_control control = { .pit_reinject = 0 };
+
+            ret = kvm_vm_ioctl(kvm_state, KVM_REINJECT_CONTROL, &control);
+            if (ret < 0) {
+                error_setg(errp,
+                           "Can't disable in-kernel PIT reinjection: %s",
+                           strerror(-ret));
+                return;
+            }
+        }
+        break;
+    default:
+        error_setg(errp, "Lost tick policy not supported.");
+        return;
+    }
+
+    return;
+}
+
 static void kvm_pit_update_clock_offset(KVMPITState *s)
 {
     int64_t offset, clock_offset;
@@ -241,42 +278,13 @@ static void kvm_pit_realizefn(DeviceState *dev, Error **errp)
     PITCommonState *pit = PIT_COMMON(dev);
     KVMPITClass *kpc = KVM_PIT_GET_CLASS(dev);
     KVMPITState *s = KVM_PIT(pit);
-    struct kvm_pit_config config = {
-        .flags = 0,
-    };
-    int ret;
 
     if (!kvm_check_extension(kvm_state, KVM_CAP_PIT_STATE2) ||
         !kvm_check_extension(kvm_state, KVM_CAP_PIT2)) {
         error_setg(errp, "In-kernel PIT not available");
     }
 
-    ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_PIT2, &config);
-    if (ret < 0) {
-        error_setg(errp, "Create kernel PIC irqchip failed: %s",
-                   strerror(-ret));
-        return;
-    }
-    switch (s->lost_tick_policy) {
-    case LOST_TICK_POLICY_DELAY:
-        break; /* enabled by default */
-    case LOST_TICK_POLICY_DISCARD:
-        if (kvm_check_extension(kvm_state, KVM_CAP_REINJECT_CONTROL)) {
-            struct kvm_reinject_control control = { .pit_reinject = 0 };
-
-            ret = kvm_vm_ioctl(kvm_state, KVM_REINJECT_CONTROL, &control);
-            if (ret < 0) {
-                error_setg(errp,
-                           "Can't disable in-kernel PIT reinjection: %s",
-                           strerror(-ret));
-                return;
-            }
-        }
-        break;
-    default:
-        error_setg(errp, "Lost tick policy not supported.");
-        return;
-    }
+    do_pit_initialize(s, errp);
 
     memory_region_init_io(&pit->ioports, OBJECT(dev), NULL, NULL, "kvm-pit", 4);
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 25/35] kvm/i8254: add support for confidential guest reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (23 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 24/35] kvm/i8254: refactor pit initialization into a helper Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 26/35] kvm/hyperv: add synic feature to CPU only if its not enabled Ani Sinha
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost
  Cc: Ani Sinha, kraxel, ani, qemu-devel

A confidential guest reset involves closing the old virtual machine KVM file
descriptor and opening a new one. Since its a new KVM fd, PIT needs to be
re-initialized again. This is done with the help of a notifier which is invoked
upon KVM vm file descriptor change during the confidential guest reset process.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/kvm/i8254.c      | 23 +++++++++++++++++++++++
 hw/i386/kvm/trace-events |  1 +
 2 files changed, 24 insertions(+)

diff --git a/hw/i386/kvm/i8254.c b/hw/i386/kvm/i8254.c
index 255047458a..70e8fd83cd 100644
--- a/hw/i386/kvm/i8254.c
+++ b/hw/i386/kvm/i8254.c
@@ -35,6 +35,7 @@
 #include "hw/core/qdev-properties-system.h"
 #include "system/kvm.h"
 #include "target/i386/kvm/kvm_i386.h"
+#include "trace.h"
 #include "qom/object.h"
 
 #define KVM_PIT_REINJECT_BIT 0
@@ -52,6 +53,8 @@ struct KVMPITState {
     LostTickPolicy lost_tick_policy;
     bool vm_stopped;
     int64_t kernel_clock_offset;
+
+    NotifierWithReturn kvmpit_vmfd_change_notifier;
 };
 
 struct KVMPITClass {
@@ -203,6 +206,23 @@ static void kvm_pit_put(PITCommonState *pit)
     }
 }
 
+static int kvmpit_post_vmfd_change(NotifierWithReturn *notifier,
+                                   void *data, Error** errp)
+{
+    KVMPITState *s = container_of(notifier, KVMPITState,
+                                  kvmpit_vmfd_change_notifier);
+
+    /* we are not interested in pre vmfd change notification */
+    if (((VmfdChangeNotifier *)data)->pre) {
+        return 0;
+    }
+
+    do_pit_initialize(s, errp);
+
+    trace_kvmpit_post_vmfd_change();
+    return 0;
+}
+
 static void kvm_pit_set_gate(PITCommonState *s, PITChannelState *sc, int val)
 {
     kvm_pit_get(s);
@@ -292,6 +312,9 @@ static void kvm_pit_realizefn(DeviceState *dev, Error **errp)
 
     qemu_add_vm_change_state_handler(kvm_pit_vm_state_change, s);
 
+    s->kvmpit_vmfd_change_notifier.notify = kvmpit_post_vmfd_change;
+    kvm_vmfd_add_change_notifier(&s->kvmpit_vmfd_change_notifier);
+
     kpc->parent_realize(dev, errp);
 }
 
diff --git a/hw/i386/kvm/trace-events b/hw/i386/kvm/trace-events
index 67bf7f174e..33680ff82b 100644
--- a/hw/i386/kvm/trace-events
+++ b/hw/i386/kvm/trace-events
@@ -20,3 +20,4 @@ xenstore_reset_watches(void) ""
 xenstore_watch_event(const char *path, const char *token) "path %s token %s"
 xen_primary_console_create(void) ""
 xen_primary_console_reset(int port) "port %u"
+kvmpit_post_vmfd_change(void) ""
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 26/35] kvm/hyperv: add synic feature to CPU only if its not enabled
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (24 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 25/35] kvm/i8254: add support for confidential guest reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 27/35] hw/hyperv/vmbus: add support for confidential guest reset Ani Sinha
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

We need to make sure that synic CPU feature is not already enabled. If it is,
trying to enable it again will result in the following assertion:

Unexpected error in object_property_try_add() at ../qom/object.c:1268:
qemu-system-x86_64: attempt to add duplicate property 'synic' to object (type 'host-x86_64-cpu')

So enable synic only if its not enabled already.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5c8ec77212..ff5dc5b02a 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1761,7 +1761,7 @@ static int hyperv_init_vcpu(X86CPU *cpu)
             return ret;
         }
 
-        if (!cpu->hyperv_synic_kvm_only) {
+        if (!cpu->hyperv_synic_kvm_only && !hyperv_is_synic_enabled()) {
             ret = hyperv_x86_synic_add(cpu);
             if (ret < 0) {
                 error_report("failed to create HyperV SynIC: %s",
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 27/35] hw/hyperv/vmbus: add support for confidential guest reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (25 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 26/35] kvm/hyperv: add synic feature to CPU only if its not enabled Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 28/35] kvm/xen-emu: re-initialize capabilities during " Ani Sinha
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Maciej S. Szmigiero; +Cc: Ani Sinha, kraxel, pbonzini, ani, qemu-devel

On confidential guests when the KVM virtual machine file descriptor changes as
a part of the reset process, event file descriptors needs to be reassociated
with the new KVM VM file descriptor. This is achieved with the help of a
callback handler that gets called when KVM VM file descriptor changes during
the confidential guest reset process.

This patch is tested on non-confidential platform only.

Acked-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/hyperv/trace-events |  1 +
 hw/hyperv/vmbus.c      | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/hw/hyperv/trace-events b/hw/hyperv/trace-events
index 7963c215b1..d8c96f18e9 100644
--- a/hw/hyperv/trace-events
+++ b/hw/hyperv/trace-events
@@ -16,6 +16,7 @@ vmbus_gpadl_torndown(uint32_t gpadl_id) "gpadl #%d"
 vmbus_open_channel(uint32_t chan_id, uint32_t gpadl_id, uint32_t target_vp) "channel #%d gpadl #%d target vp %d"
 vmbus_channel_open(uint32_t chan_id, uint32_t status) "channel #%d status %d"
 vmbus_close_channel(uint32_t chan_id) "channel #%d"
+vmbus_handle_vmfd_change(void) ""
 
 # hv-balloon
 hv_balloon_state_change(const char *tostr) "-> %s"
diff --git a/hw/hyperv/vmbus.c b/hw/hyperv/vmbus.c
index c5bab5d245..64abe4c4c1 100644
--- a/hw/hyperv/vmbus.c
+++ b/hw/hyperv/vmbus.c
@@ -20,6 +20,7 @@
 #include "hw/hyperv/vmbus-bridge.h"
 #include "hw/core/sysbus.h"
 #include "exec/cpu-common.h"
+#include "system/kvm.h"
 #include "exec/target_page.h"
 #include "trace.h"
 
@@ -248,6 +249,12 @@ struct VMBus {
      * interrupt page
      */
     EventNotifier notifier;
+
+    /*
+     * Notifier to inform when vmfd is changed as a part of confidential guest
+     * reset mechanism.
+     */
+    NotifierWithReturn vmbus_vmfd_change_notifier;
 };
 
 static bool gpadl_full(VMBusGpadl *gpadl)
@@ -2347,6 +2354,33 @@ static void vmbus_dev_unrealize(DeviceState *dev)
     free_channels(vdev);
 }
 
+/*
+ * If the KVM fd changes because of VM reset in confidential guests,
+ * reassociate event fd with the new KVM fd.
+ */
+static int vmbus_handle_vmfd_change(NotifierWithReturn *notifier,
+                                    void *data, Error** errp)
+{
+    VMBus *vmbus = container_of(notifier, VMBus,
+                                vmbus_vmfd_change_notifier);
+    int ret = 0;
+
+    /* we are not interested in pre vmfd change notification */
+    if (((VmfdChangeNotifier *)data)->pre) {
+        return 0;
+    }
+
+    ret = hyperv_set_event_flag_handler(VMBUS_EVENT_CONNECTION_ID,
+                                            &vmbus->notifier);
+    /* if we are only using userland event handler, it may already exist */
+    if (ret != 0 && ret != -EEXIST) {
+        error_setg(errp, "hyperv set event handler failed with %d", ret);
+    }
+
+    trace_vmbus_handle_vmfd_change();
+    return ret;
+}
+
 static const Property vmbus_dev_props[] = {
     DEFINE_PROP_UUID("instanceid", VMBusDevice, instanceid),
 };
@@ -2429,6 +2463,9 @@ static void vmbus_realize(BusState *bus, Error **errp)
         goto clear_event_notifier;
     }
 
+    vmbus->vmbus_vmfd_change_notifier.notify = vmbus_handle_vmfd_change;
+    kvm_vmfd_add_change_notifier(&vmbus->vmbus_vmfd_change_notifier);
+
     return;
 
 clear_event_notifier:
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 28/35] kvm/xen-emu: re-initialize capabilities during confidential guest reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (26 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 27/35] hw/hyperv/vmbus: add support for confidential guest reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 29/35] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: David Woodhouse, Paul Durrant, Paolo Bonzini, Marcelo Tosatti
  Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

On confidential guests KVM virtual machine file descriptor changes as a
part of the guest reset process. Xen capabilities needs to be re-initialized in
KVM against the new file descriptor.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/xen-emu.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 52de019834..29364a9279 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -44,9 +44,12 @@
 
 #include "xen-compat.h"
 
+NotifierWithReturn xen_vmfd_change_notifier;
+static uint32_t xen_msr;
 static void xen_vcpu_singleshot_timer_event(void *opaque);
 static void xen_vcpu_periodic_timer_event(void *opaque);
 static int vcpuop_stop_singleshot_timer(CPUState *cs);
+static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr);
 
 #ifdef TARGET_X86_64
 #define hypercall_compat32(longmode) (!(longmode))
@@ -54,6 +57,23 @@ static int vcpuop_stop_singleshot_timer(CPUState *cs);
 #define hypercall_compat32(longmode) (false)
 #endif
 
+static int xen_handle_vmfd_change(NotifierWithReturn *n,
+                                  void *data, Error** errp)
+{
+    int ret;
+
+    /* we are not interested in pre vmfd change notification */
+    if (((VmfdChangeNotifier *)data)->pre) {
+        return 0;
+    }
+
+    ret = do_initialize_xen_caps(kvm_state, xen_msr);
+    if (ret < 0) {
+        return ret;
+    }
+    return 0;
+}
+
 static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
                            size_t *len, bool is_write)
 {
@@ -111,7 +131,7 @@ static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
     return kvm_gva_rw(cs, gva, buf, sz, true);
 }
 
-int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
+static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr)
 {
     const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
         KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_SHARED_INFO;
@@ -143,6 +163,19 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
                      strerror(-ret));
         return ret;
     }
+    return xen_caps;
+}
+
+int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
+{
+    int xen_caps;
+
+    xen_caps = do_initialize_xen_caps(s, hypercall_msr);
+    if (xen_caps < 0) {
+        return xen_caps;
+    }
+
+    xen_msr = hypercall_msr;
 
     /* If called a second time, don't repeat the rest of the setup. */
     if (s->xen_caps) {
@@ -185,6 +218,9 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
     xen_primary_console_reset();
     xen_xenstore_reset();
 
+    xen_vmfd_change_notifier.notify = xen_handle_vmfd_change;
+    kvm_vmfd_add_change_notifier(&xen_vmfd_change_notifier);
+
     return 0;
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 29/35] ppc/openpic: create a new openpic device and reattach mem region on coco reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (27 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 28/35] kvm/xen-emu: re-initialize capabilities during " Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 30/35] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Bernhard Beschow; +Cc: Ani Sinha, kraxel, pbonzini, ani, qemu-ppc, qemu-devel

For confidential guests during the reset process, the old KVM VM file
descriptor is closed and a new one is created. When a new file descriptor is
created, a new openpic device needs to be created against this new KVM VM file
descriptor as well. Additionally, existing memory region needs to be reattached
to this new openpic device and proper CPU attributes set associating new file
descriptor. This change makes this happen with the help of a callback handler
that gets called when the KVM VM file descriptor changes as a part of the
confidential guest reset process.

Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/intc/openpic_kvm.c | 112 +++++++++++++++++++++++++++++++++---------
 1 file changed, 88 insertions(+), 24 deletions(-)

diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
index fbf0bdbe07..b099da20eb 100644
--- a/hw/intc/openpic_kvm.c
+++ b/hw/intc/openpic_kvm.c
@@ -49,6 +49,7 @@ struct KVMOpenPICState {
     uint32_t fd;
     uint32_t model;
     hwaddr mapped;
+    NotifierWithReturn vmfd_change_notifier;
 };
 
 static void kvm_openpic_set_irq(void *opaque, int n_IRQ, int level)
@@ -114,6 +115,88 @@ static const MemoryRegionOps kvm_openpic_mem_ops = {
     },
 };
 
+static int kvm_openpic_setup(KVMOpenPICState *opp, Error **errp)
+{
+    int kvm_openpic_model;
+    struct kvm_create_device cd = {0};
+    KVMState *s = kvm_state;
+    int ret;
+
+    switch (opp->model) {
+    case OPENPIC_MODEL_FSL_MPIC_20:
+        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
+        break;
+
+    case OPENPIC_MODEL_FSL_MPIC_42:
+        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
+        break;
+
+    default:
+        error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
+        return -1;
+    }
+
+    cd.type = kvm_openpic_model;
+    ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
+    if (ret < 0) {
+        error_setg(errp, "Can't create device %d: %s",
+                   cd.type, strerror(errno));
+        return -1;
+    }
+    opp->fd = cd.fd;
+
+    return 0;
+}
+
+static int kvm_openpic_handle_vmfd_change(NotifierWithReturn *notifier,
+                                          void *data, Error **errp)
+{
+    KVMOpenPICState *opp = container_of(notifier, KVMOpenPICState,
+                                        vmfd_change_notifier);
+    uint64_t reg_base;
+    struct kvm_device_attr attr;
+    CPUState *cs;
+    int ret;
+
+    /* we are not interested in pre vmfd change notification */
+    if (((VmfdChangeNotifier *)data)->pre) {
+        return 0;
+    }
+
+    /* close the old descriptor */
+    close(opp->fd);
+
+    if (kvm_openpic_setup(opp, errp) < 0) {
+        return -1;
+    }
+
+    if (!opp->mapped) {
+        return 0;
+    }
+
+    reg_base = opp->mapped;
+    attr.group = KVM_DEV_MPIC_GRP_MISC;
+    attr.attr = KVM_DEV_MPIC_BASE_ADDR;
+    attr.addr = (uint64_t)(unsigned long)&reg_base;
+
+    ret = ioctl(opp->fd, KVM_SET_DEVICE_ATTR, &attr);
+    if (ret < 0) {
+        error_setg(errp, "%s: %s %" PRIx64, __func__,
+                   strerror(errno), reg_base);
+        return -1;
+    }
+
+    CPU_FOREACH(cs) {
+        ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_MPIC, 0, opp->fd,
+                                   kvm_arch_vcpu_id(cs));
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 static void kvm_openpic_region_add(MemoryListener *listener,
                                    MemoryRegionSection *section)
 {
@@ -197,36 +280,14 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
     SysBusDevice *d = SYS_BUS_DEVICE(dev);
     KVMOpenPICState *opp = KVM_OPENPIC(dev);
     KVMState *s = kvm_state;
-    int kvm_openpic_model;
-    struct kvm_create_device cd = {0};
-    int ret, i;
+    int i;
 
     if (!kvm_check_extension(s, KVM_CAP_DEVICE_CTRL)) {
         error_setg(errp, "Kernel is lacking Device Control API");
         return;
     }
 
-    switch (opp->model) {
-    case OPENPIC_MODEL_FSL_MPIC_20:
-        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
-        break;
-
-    case OPENPIC_MODEL_FSL_MPIC_42:
-        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
-        break;
-
-    default:
-        error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
-        return;
-    }
-
-    cd.type = kvm_openpic_model;
-    ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
-    if (ret < 0) {
-        error_setg_errno(errp, errno, "Can't create device %d", cd.type);
-        return;
-    }
-    opp->fd = cd.fd;
+    kvm_openpic_setup(opp, errp);
 
     sysbus_init_mmio(d, &opp->mem);
     qdev_init_gpio_in(dev, kvm_openpic_set_irq, OPENPIC_MAX_IRQ);
@@ -235,6 +296,9 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
     opp->mem_listener.region_del = kvm_openpic_region_del;
     opp->mem_listener.name = "openpic-kvm";
     memory_listener_register(&opp->mem_listener, &address_space_memory);
+    opp->vmfd_change_notifier.notify =
+        kvm_openpic_handle_vmfd_change;
+    kvm_vmfd_add_change_notifier(&opp->vmfd_change_notifier);
 
     /* indicate pic capabilities */
     msi_nonbroken = true;
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 30/35] kvm/vcpu: add notifiers to inform vcpu file descriptor change
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (28 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 29/35] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 31/35] kvm/clock: add support for confidential guest reset Ani Sinha
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kraxel, ani, kvm, qemu-devel

When new vcpu file descriptors are created and bound to the new kvm file
descriptor as a part of the confidential guest reset mechanism, various
subsystems needs to know about it. This change adds notifiers so that various
subsystems can take appropriate actions when vcpu fds change by registering
their handlers to this notifier.
Subsequent changes will register specific handlers to this notifier.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c    | 26 ++++++++++++++++++++++++++
 accel/stubs/kvm-stub.c | 10 ++++++++++
 include/system/kvm.h   | 17 +++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a347a71a2e..a1f910e9df 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -127,6 +127,9 @@ static NotifierList kvm_irqchip_change_notifiers =
 static NotifierWithReturnList register_vmfd_changed_notifiers =
     NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
 
+static NotifierWithReturnList register_vcpufd_changed_notifiers =
+    NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vcpufd_changed_notifiers);
+
 static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp);
 static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp);
 static int vcpu_unmap_regions(KVMState *s, CPUState *cpu);
@@ -2314,6 +2317,22 @@ static int kvm_vmfd_change_notify(Error **errp)
                                             &vmfd_notifier, errp);
 }
 
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_list_add(&register_vcpufd_changed_notifiers, n);
+}
+
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_remove(n);
+}
+
+static int kvm_vcpufd_change_notify(Error **errp)
+{
+    return notifier_with_return_list_notify(&register_vcpufd_changed_notifiers,
+                                            &vmfd_notifier, errp);
+}
+
 int kvm_irqchip_get_virq(KVMState *s)
 {
     int next_virq;
@@ -2841,6 +2860,13 @@ static int kvm_reset_vmfd(MachineState *ms)
     }
     assert(!err);
 
+    /* notify everyone that vcpu fd has changed. */
+    ret = kvm_vcpufd_change_notify(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
     /* these can be only called after ram_block_rebind() */
     memory_listener_register(&kml->listener, &address_space_memory);
     memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index a6e8a6e16c..c4617caac6 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -87,6 +87,16 @@ void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
 {
 }
 
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n)
+{
+    return;
+}
+
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n)
+{
+    return;
+}
+
 int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
                                        EventNotifier *rn, int virq)
 {
diff --git a/include/system/kvm.h b/include/system/kvm.h
index fbe23608a1..4b0e1b4ab1 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -590,4 +590,21 @@ void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
  */
 void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
 
+/**
+ * kvm_vcpufd_add_change_notifier - register a notifier to get notified when
+ * a KVM vcpu file descriptors changes as a part of the confidential guest
+ * "reset" process. Various subsystems should use this mechanism to take
+ * actions such as re-issuing vcpu ioctls as a part of setting up vcpu
+ * features.
+ * @n: notifier with return value.
+ */
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n);
+
+/**
+ * kvm_vcpufd_remove_change_notifier - de-register a notifer previously
+ * registered with kvm_vcpufd_add_change_notifier call.
+ * @n: notifier that was previously registered.
+ */
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n);
+
 #endif
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 31/35] kvm/clock: add support for confidential guest reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (29 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 30/35] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 32/35] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' Ani Sinha
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost
  Cc: Ani Sinha, kraxel, ani, qemu-devel

Confidential guests change the KVM VM file descriptor upon reset and also create
new VCPU file descriptors against the new KVM VM file descriptor. We need to
save the clock state from kvm before KVM VM file descriptor changes and restore
it after. Also after VCPU file descriptors changed, we must call
KVM_KVMCLOCK_CTRL on the VCPU file descriptor to inform KVM that the VCPU is
in paused state.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/kvm/clock.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index aba6842a22..10d34254f0 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -50,6 +50,9 @@ struct KVMClockState {
     /* whether the 'clock' value was obtained in a host with
      * reliable KVM_GET_CLOCK */
     bool clock_is_reliable;
+
+    NotifierWithReturn kvmclock_vcpufd_change_notifier;
+    NotifierWithReturn kvmclock_vmfd_change_notifier;
 };
 
 struct pvclock_vcpu_time_info {
@@ -63,6 +66,9 @@ struct pvclock_vcpu_time_info {
     uint8_t    pad[2];
 } __attribute__((__packed__)); /* 32 bytes */
 
+static int kvmclock_set_clock(NotifierWithReturn *notifier,
+                              void *data, Error** errp);
+
 static uint64_t kvmclock_current_nsec(KVMClockState *s)
 {
     CPUState *cpu = first_cpu;
@@ -219,6 +225,54 @@ static void kvmclock_vm_state_change(void *opaque, bool running,
     }
 }
 
+static int kvmclock_save_clock(NotifierWithReturn *notifier,
+                               void *data, Error** errp)
+{
+    if (!((VmfdChangeNotifier *)data)->pre) {
+        return 0;
+    }
+    KVMClockState *s = container_of(notifier, KVMClockState,
+                                    kvmclock_vmfd_change_notifier);
+    kvm_update_clock(s);
+    return 0;
+}
+
+static int kvmclock_set_clock(NotifierWithReturn *notifier,
+                              void *data, Error** errp)
+{
+    struct kvm_clock_data clock_data = {};
+    CPUState *cpu;
+    int ret;
+    KVMClockState *s = container_of(notifier, KVMClockState,
+                                    kvmclock_vcpufd_change_notifier);
+    int cap_clock_ctrl = kvm_check_extension(kvm_state, KVM_CAP_KVMCLOCK_CTRL);
+
+    if (!s->clock_is_reliable) {
+        uint64_t pvclock_via_mem = kvmclock_current_nsec(s);
+        /* saved clock value before vmfd change is not reliable */
+        if (pvclock_via_mem) {
+            s->clock = pvclock_via_mem;
+        }
+    }
+
+    clock_data.clock = s->clock;
+    ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &clock_data);
+    if (ret < 0) {
+        fprintf(stderr, "KVM_SET_CLOCK failed: %s\n", strerror(-ret));
+        abort();
+    }
+
+    if (!cap_clock_ctrl) {
+        return 0;
+    }
+    CPU_FOREACH(cpu) {
+        run_on_cpu(cpu, do_kvmclock_ctrl, RUN_ON_CPU_NULL);
+    }
+
+    return 0;
+}
+
+
 static void kvmclock_realize(DeviceState *dev, Error **errp)
 {
     KVMClockState *s = KVM_CLOCK(dev);
@@ -230,7 +284,12 @@ static void kvmclock_realize(DeviceState *dev, Error **errp)
 
     kvm_update_clock(s);
 
+    s->kvmclock_vcpufd_change_notifier.notify = kvmclock_set_clock;
+    s->kvmclock_vmfd_change_notifier.notify = kvmclock_save_clock;
+
     qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
+    kvm_vcpufd_add_change_notifier(&s->kvmclock_vcpufd_change_notifier);
+    kvm_vmfd_add_change_notifier(&s->kvmclock_vmfd_change_notifier);
 }
 
 static bool kvmclock_clock_is_reliable_needed(void *opaque)
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 32/35] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset'
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (30 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 31/35] kvm/clock: add support for confidential guest reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 33/35] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Paolo Bonzini
  Cc: Ani Sinha, kraxel, ani, qemu-devel

A new machine specific option 'x-change-vmfd-on-reset' is introduced for
debugging and testing only (hence the 'x-' prefix). This option when enabled
will force KVM VM file descriptor to be changed upon guest reset like
in the case of confidential guests. This can be used to exercise the code
changes that are specific for confidential guests on non-confidential
guests as well (except changes that require hardware support for
confidential guests).
A new functional test has been added in the next patch that uses this new
parameter to test the VM file descriptor changes.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/core/machine.c        | 22 ++++++++++++++++++++++
 include/hw/core/boards.h |  6 ++++++
 system/runstate.c        |  6 +++---
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index d4ef620c17..eae1f6be8d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -435,6 +435,21 @@ static void machine_set_dump_guest_core(Object *obj, bool value, Error **errp)
     ms->dump_guest_core = value;
 }
 
+static bool machine_get_new_accel_vmfd_on_reset(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->new_accel_vmfd_on_reset;
+}
+
+static void machine_set_new_accel_vmfd_on_reset(Object *obj,
+                                                bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->new_accel_vmfd_on_reset = value;
+}
+
 static bool machine_get_mem_merge(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -1183,6 +1198,13 @@ static void machine_class_init(ObjectClass *oc, const void *data)
     object_class_property_set_description(oc, "dump-guest-core",
         "Include guest memory in a core dump");
 
+    object_class_property_add_bool(oc, "x-change-vmfd-on-reset",
+        machine_get_new_accel_vmfd_on_reset,
+        machine_set_new_accel_vmfd_on_reset);
+    object_class_property_set_description(oc, "x-change-vmfd-on-reset",
+        "Set on/off to enable/disable generating new accelerator guest handle "
+         "on guest reset. Default: off (used only for testing/debugging).");
+
     object_class_property_add_bool(oc, "mem-merge",
         machine_get_mem_merge, machine_set_mem_merge);
     object_class_property_set_description(oc, "mem-merge",
diff --git a/include/hw/core/boards.h b/include/hw/core/boards.h
index edbe8d03e5..12b2149378 100644
--- a/include/hw/core/boards.h
+++ b/include/hw/core/boards.h
@@ -448,6 +448,12 @@ struct MachineState {
     struct NVDIMMState *nvdimms_state;
     struct NumaState *numa_state;
     bool acpi_spcr_enabled;
+    /*
+     * Whether to change virtual machine accelerator handle upon
+     * reset or not. Used only for debugging and testing purpose.
+     * Set to false by default for all regular use.
+     */
+    bool new_accel_vmfd_on_reset;
 };
 
 /*
diff --git a/system/runstate.c b/system/runstate.c
index e7b50e6a3b..eca722b43c 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -526,9 +526,9 @@ void qemu_system_reset(ShutdownCause reason)
         type = RESET_TYPE_COLD;
     }
 
-    if (!cpus_are_resettable() &&
-        (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
-         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
+    if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
+         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
+        (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
         if (ac->rebuild_guest) {
             ret = ac->rebuild_guest(current_machine);
             if (ret < 0) {
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 33/35] tests/functional/x86_64: add functional test to exercise vm fd change on reset
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (31 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 32/35] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 34/35] qom: add 'confidential-guest-reset' property for x86 confidential vms Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker Ani Sinha
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Zhao Liu, Ani Sinha; +Cc: kraxel, ani, qemu-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6908 bytes --]

A new functional test is added that exercises the code changes related to
closing of the old KVM VM file descriptor and opening a new one upon VM reset.
This normally happens when confidential guests are reset but for
non-confidential guests, we use a special machine specific debug/test parameter
'x-change-vmfd-on-reset' to enable this behavior.
Only specific code changes related to re-initialisation of SEV-ES, SEV-SNP and
TDX platforms are not exercised in this test as they require hardware that
supports running confidential guests.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 MAINTAINERS                                  |   1 +
 tests/functional/x86_64/meson.build          |   1 +
 tests/functional/x86_64/test_rebuild_vmfd.py | 136 +++++++++++++++++++
 3 files changed, 138 insertions(+)
 create mode 100755 tests/functional/x86_64/test_rebuild_vmfd.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 6377ff5898..726aa35ff2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -157,6 +157,7 @@ M: Ani Sinha <anisinha@redhat.com>
 M: Paolo Bonzini <pbonzini@redhat.com>
 S: Maintained
 F: stubs/kvm.c
+F: tests/functional/x86_64/test_rebuild_vmfd.py
 
 Guest CPU cores (TCG)
 ---------------------
diff --git a/tests/functional/x86_64/meson.build b/tests/functional/x86_64/meson.build
index beab4f304b..05e4914c77 100644
--- a/tests/functional/x86_64/meson.build
+++ b/tests/functional/x86_64/meson.build
@@ -37,4 +37,5 @@ tests_x86_64_system_thorough = [
   'vhost_user_bridge',
   'virtio_balloon',
   'virtio_gpu',
+  'rebuild_vmfd',
 ]
diff --git a/tests/functional/x86_64/test_rebuild_vmfd.py b/tests/functional/x86_64/test_rebuild_vmfd.py
new file mode 100755
index 0000000000..5a8e5fd89b
--- /dev/null
+++ b/tests/functional/x86_64/test_rebuild_vmfd.py
@@ -0,0 +1,136 @@
+#!/usr/bin/env python3
+#
+# Functional tests exercising guest KVM file descriptor change on reset.
+#
+# Copyright © 2026 Red Hat, Inc.
+#
+# Author:
+#  Ani Sinha <anisinha@redhat.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os
+from qemu.machine import machine
+
+from qemu_test import QemuSystemTest, Asset, exec_command_and_wait_for_pattern
+from qemu_test import wait_for_console_pattern
+
+class KVMGuest(QemuSystemTest):
+
+    # ASSET UKI was generated using
+    # https://gitlab.com/kraxel/edk2-tests/-/blob/unittest/tools/make-supermin.sh
+    ASSET_UKI = Asset('https://gitlab.com/anisinha/misc-artifacts/'
+                      '-/raw/main/uki.x86-64.efi?ref_type=heads',
+                      'e0f806bd1fa24111312e1fe849d2ee69808d4343930a5'
+                      'dc8c1688da17c65f576')
+    # ASSET_OVMF comes from /usr/share/edk2/ovmf/OVMF.stateless.fd of a
+    # fedora core 43 distribution which in turn comes from the
+    # edk2-ovmf-20251119-3.fc43.noarch rpm of that distribution.
+    ASSET_OVMF = Asset('https://gitlab.com/anisinha/misc-artifacts/'
+                       '-/raw/main/OVMF.stateless.fd?ref_type=heads',
+                       '58a4275aafa8774bd6b1540adceae4ea434b8db75b476'
+                       '11839ff47be88cfcf22')
+
+    def common_vm_setup(self, kvm_args=None, cpu_args=None):
+        self.set_machine('q35')
+        self.require_accelerator("kvm")
+
+        self.vm.set_console()
+        if kvm_args:
+            self.vm.add_args("-accel", "kvm,%s" %kvm_args)
+        else:
+            self.vm.add_args("-accel", "kvm")
+        self.vm.add_args("-smp", "2")
+        if cpu_args:
+            self.vm.add_args("-cpu", "host,%s" %cpu_args)
+        else:
+            self.vm.add_args("-cpu", "host")
+        self.vm.add_args("-m", "2G")
+        self.vm.add_args("-nographic", "-nodefaults")
+
+
+        self.uki_path = self.ASSET_UKI.fetch()
+        self.ovmf_path = self.ASSET_OVMF.fetch()
+
+        self.vm.add_args('-kernel', self.uki_path)
+        self.vm.add_args("-bios", self.ovmf_path)
+        # enable KVM VMFD change on reset for a non-coco VM
+        self.vm.add_args("-machine", "q35,x-change-vmfd-on-reset=on")
+
+        # enable tracing of basic vmfd change function
+        self.vm.add_args("--trace", "kvm_reset_vmfd")
+
+    def launch_vm(self):
+        try:
+            self.vm.launch()
+        except machine.VMLaunchFailure as e:
+            if "Xen HVM guest support not present" in e.output:
+                self.skipTest("KVM Xen support is not present "
+                              "(need v5.12+ kernel with CONFIG_KVM_XEN)")
+            elif "Property 'kvm-accel.xen-version' not found" in e.output:
+                self.skipTest("QEMU not built with CONFIG_XEN_EMU support")
+            else:
+                raise e
+
+        self.log.info('VM launched')
+        console_pattern = 'bash-5.1#'
+        wait_for_console_pattern(self, console_pattern)
+        self.log.info('VM ready with a bash prompt')
+
+    def vm_console_reset(self):
+        exec_command_and_wait_for_pattern(self, '/usr/sbin/reboot -f',
+                                          'reboot: machine restart')
+        console_pattern = '# --- Hello world ---'
+        wait_for_console_pattern(self, console_pattern)
+        self.vm.shutdown()
+
+    def vm_qmp_reset(self):
+        self.vm.qmp('system_reset')
+        console_pattern = '# --- Hello world ---'
+        wait_for_console_pattern(self, console_pattern)
+        self.vm.shutdown()
+
+    def check_logs(self):
+        self.assertRegex(self.vm.get_log(),
+                         r'kvm_reset_vmfd')
+        self.assertRegex(self.vm.get_log(),
+                         r'virtual machine state has been rebuilt')
+
+    def test_reset_console(self):
+        self.common_vm_setup()
+        self.launch_vm()
+        self.vm_console_reset()
+        self.check_logs()
+
+    def test_reset_qmp(self):
+        self.common_vm_setup()
+        self.launch_vm()
+        self.vm_qmp_reset()
+        self.check_logs()
+
+    def test_reset_kvmpit(self):
+        self.common_vm_setup()
+        self.vm.add_args("--trace", "kvmpit_post_vmfd_change")
+        self.launch_vm()
+        self.vm_console_reset()
+        self.assertRegex(self.vm.get_log(),
+                         r'kvmpit_post_vmfd_change')
+
+    def test_reset_xen_emulation(self):
+        self.common_vm_setup("xen-version=0x4000a,kernel-irqchip=split")
+        self.launch_vm()
+        self.vm_console_reset()
+        self.check_logs()
+
+    def test_reset_hyperv_vmbus(self):
+        self.common_vm_setup(None, "hv-syndbg,hv-relaxed,hv_time,hv-synic,"
+                             "hv-vpindex,hv-runtime,hv-stimer")
+        self.vm.add_args("-device", "vmbus-bridge,irq=15")
+        self.vm.add_args("-trace", "vmbus_handle_vmfd_change")
+        self.launch_vm()
+        self.vm_console_reset()
+        self.assertRegex(self.vm.get_log(),
+                         r'vmbus_handle_vmfd_change')
+
+if __name__ == '__main__':
+    QemuSystemTest.main()
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 34/35] qom: add 'confidential-guest-reset' property for x86 confidential vms
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (32 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 33/35] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  3:49 ` [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker Ani Sinha
  34 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Eric Blake, Markus Armbruster
  Cc: Ani Sinha, kraxel, ani, qemu-devel

Through the new 'confidential-guest-reset' property, control plane should be
able to detect if the hypervisor supports x86 confidential guest resets. Older
hypervisors that do not support resets will not have this property populated.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 qapi/qom.json | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 6f5c9de0f0..c653248f85 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -1009,13 +1009,19 @@
 #     designated guest firmware page for measured boot with -kernel
 #     (default: false) (since 6.2)
 #
+# Features:
+#
+# @confidential-guest-reset: If present, the hypervisor supports
+#     confidential guest resets (since 11.0).
+#
 # Since: 9.1
 ##
 { 'struct': 'SevCommonProperties',
   'data': { '*sev-device': 'str',
             '*cbitpos': 'uint32',
             'reduced-phys-bits': 'uint32',
-            '*kernel-hashes': 'bool' } }
+            '*kernel-hashes': 'bool' },
+  'features': ['confidential-guest-reset']}
 
 ##
 # @SevGuestProperties:
@@ -1136,6 +1142,11 @@
 #     it, the guest will not be able to get a TD quote for
 #     attestation.
 #
+# Features:
+#
+# @confidential-guest-reset: If present, the hypervisor supports
+#     confidential guest resets (since 11.0).
+#
 # Since: 10.1
 ##
 { 'struct': 'TdxGuestProperties',
@@ -1144,7 +1155,8 @@
             '*mrconfigid': 'str',
             '*mrowner': 'str',
             '*mrownerconfig': 'str',
-            '*quote-generation-socket': 'SocketAddress' } }
+            '*quote-generation-socket': 'SocketAddress' },
+   'features': ['confidential-guest-reset']}
 
 ##
 # @ThreadContextProperties:
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
                   ` (33 preceding siblings ...)
  2026-02-25  3:49 ` [PATCH v6 34/35] qom: add 'confidential-guest-reset' property for x86 confidential vms Ani Sinha
@ 2026-02-25  3:49 ` Ani Sinha
  2026-02-25  6:05   ` Prasad Pandit
  2026-02-25 17:29   ` Peter Xu
  34 siblings, 2 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  3:49 UTC (permalink / raw)
  To: Peter Xu, Fabiano Rosas
  Cc: Ani Sinha, kraxel, pbonzini, ani, Prasad Pandit, qemu-devel

Currently the code that adds a migration blocker does not check if the same
blocker already exists. Return an EEXIST error code if there is an attempt to
add the same migration blocker again. This way the same migration blocker will
not get added twice.

Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 migration/migration.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index a5b0465ed3..1eb75fb7fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
 {
     for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
         if (modes & BIT(mode)) {
+            if (g_slist_index(migration_blockers[mode],
+                                 *reasonp) >= 0) {
+                return -EEXIST;
+            }
             migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],
                                                        *reasonp);
         }
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25  3:49 ` [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker Ani Sinha
@ 2026-02-25  6:05   ` Prasad Pandit
  2026-02-25  9:07     ` Daniel P. Berrangé
  2026-02-25 17:29   ` Peter Xu
  1 sibling, 1 reply; 55+ messages in thread
From: Prasad Pandit @ 2026-02-25  6:05 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Peter Xu, Fabiano Rosas, kraxel, pbonzini, ani, Prasad Pandit,
	qemu-devel

On Wed, 25 Feb 2026 at 09:23, Ani Sinha <anisinha@redhat.com> wrote:
> Currently the code that adds a migration blocker does not check if the same
> blocker already exists. Return an EEXIST error code if there is an attempt to
> add the same migration blocker again. This way the same migration blocker will
> not get added twice.
>
> Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
>  migration/migration.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index a5b0465ed3..1eb75fb7fb 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
>  {
>      for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
>          if (modes & BIT(mode)) {
> +            if (g_slist_index(migration_blockers[mode],
> +                                 *reasonp) >= 0) {
> +                return -EEXIST;
> +            }
>              migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],
>                                                         *reasonp);
>          }

* Looks okay.
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25  6:05   ` Prasad Pandit
@ 2026-02-25  9:07     ` Daniel P. Berrangé
  2026-02-25  9:32       ` Markus Armbruster
  2026-02-25  9:34       ` Ani Sinha
  0 siblings, 2 replies; 55+ messages in thread
From: Daniel P. Berrangé @ 2026-02-25  9:07 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: Ani Sinha, Peter Xu, Fabiano Rosas, kraxel, pbonzini, ani,
	Prasad Pandit, qemu-devel

On Wed, Feb 25, 2026 at 11:35:15AM +0530, Prasad Pandit wrote:
> On Wed, 25 Feb 2026 at 09:23, Ani Sinha <anisinha@redhat.com> wrote:
> > Currently the code that adds a migration blocker does not check if the same
> > blocker already exists. Return an EEXIST error code if there is an attempt to
> > add the same migration blocker again. This way the same migration blocker will
> > not get added twice.
> >
> > Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
> > Signed-off-by: Ani Sinha <anisinha@redhat.com>
> > ---
> >  migration/migration.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index a5b0465ed3..1eb75fb7fb 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)

This method has an  '**errp' parameter.....

> >  {
> >      for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
> >          if (modes & BIT(mode)) {
> > +            if (g_slist_index(migration_blockers[mode],
> > +                                 *reasonp) >= 0) {
> > +                return -EEXIST;

.... so using  -errno for return values is not appropriate - it must
set 'errp' and return -1.

> > +            }
> >              migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],
> >                                                         *reasonp);
> >          }
> 
> * Looks okay.
> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
> 
> Thank you.
> ---
>   - Prasad
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25  9:07     ` Daniel P. Berrangé
@ 2026-02-25  9:32       ` Markus Armbruster
  2026-02-25  9:45         ` Ani Sinha
  2026-02-25 10:04         ` Daniel P. Berrangé
  2026-02-25  9:34       ` Ani Sinha
  1 sibling, 2 replies; 55+ messages in thread
From: Markus Armbruster @ 2026-02-25  9:32 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Prasad Pandit, Ani Sinha, Peter Xu, Fabiano Rosas, kraxel,
	pbonzini, ani, Prasad Pandit, qemu-devel

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Feb 25, 2026 at 11:35:15AM +0530, Prasad Pandit wrote:
>> On Wed, 25 Feb 2026 at 09:23, Ani Sinha <anisinha@redhat.com> wrote:
>> > Currently the code that adds a migration blocker does not check if the same
>> > blocker already exists. Return an EEXIST error code if there is an attempt to
>> > add the same migration blocker again. This way the same migration blocker will
>> > not get added twice.
>> >
>> > Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
>> > Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> > ---
>> >  migration/migration.c | 4 ++++
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/migration/migration.c b/migration/migration.c
>> > index a5b0465ed3..1eb75fb7fb 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
>
> This method has an  '**errp' parameter.....
>
>> >  {
>> >      for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
>> >          if (modes & BIT(mode)) {
>> > +            if (g_slist_index(migration_blockers[mode],
>> > +                                 *reasonp) >= 0) {
>> > +                return -EEXIST;
>
> .... so using  -errno for return values is not appropriate - it must
> set 'errp' and return -1.

Yes, it must set @errp on failure.  Returning -1 would be wrong,
because:

    int migrate_add_blocker_modes(Error **reasonp, unsigned modes, Error **errp)
    {
        if (is_only_migratable(reasonp, modes, errp)) {
            return -EACCES;
        } else if (is_busy(reasonp, errp)) {
            return -EBUSY;
        }
        return add_blockers(reasonp, modes, errp);
    }

If callers don't actually care for error codes, the entire nest of
functions could be simplified to return 0/-1 or true/false.

>> > +            }
>> >              migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],
>> >                                                         *reasonp);
>> >          }
>> 
>> * Looks okay.
>> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
>> 
>> Thank you.
>> ---
>>   - Prasad
>> 
>> 
>
> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25  9:07     ` Daniel P. Berrangé
  2026-02-25  9:32       ` Markus Armbruster
@ 2026-02-25  9:34       ` Ani Sinha
  2026-02-25  9:41         ` Daniel P. Berrangé
  1 sibling, 1 reply; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  9:34 UTC (permalink / raw)
  To: Daniel Berrange
  Cc: Prasad Pandit, Peter Xu, Fabiano Rosas, Gerd Hoffmann,
	Paolo Bonzini, Ani Sinha, Prasad Pandit, qemu-devel



> On 25 Feb 2026, at 2:37 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> On Wed, Feb 25, 2026 at 11:35:15AM +0530, Prasad Pandit wrote:
>> On Wed, 25 Feb 2026 at 09:23, Ani Sinha <anisinha@redhat.com> wrote:
>>> Currently the code that adds a migration blocker does not check if the same
>>> blocker already exists. Return an EEXIST error code if there is an attempt to
>>> add the same migration blocker again. This way the same migration blocker will
>>> not get added twice.
>>> 
>>> Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> ---
>>> migration/migration.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>> 
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index a5b0465ed3..1eb75fb7fb 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
> 
> This method has an  '**errp' parameter.....
> 
>>> {
>>>     for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
>>>         if (modes & BIT(mode)) {
>>> +            if (g_slist_index(migration_blockers[mode],
>>> +                                 *reasonp) >= 0) {
>>> +                return -EEXIST;
> 
> .... so using  -errno for return values is not appropriate - it must
> set 'errp' and return -1.

I see functions migrate_add_blocker_modes() or migrate_add_blocker_internal() that do not do this. This is why I thought it was ok to just return -errno here.


> 
>>> +            }
>>>             migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],
>>>                                                        *reasonp);
>>>         }
>> 
>> * Looks okay.
>> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
>> 
>> Thank you.
>> ---
>>  - Prasad
>> 
>> 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
> |: https://libvirt.org          ~~          https://entangle-photo.org :|
> |: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|




^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25  9:34       ` Ani Sinha
@ 2026-02-25  9:41         ` Daniel P. Berrangé
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel P. Berrangé @ 2026-02-25  9:41 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Prasad Pandit, Peter Xu, Fabiano Rosas, Gerd Hoffmann,
	Paolo Bonzini, Ani Sinha, Prasad Pandit, qemu-devel

On Wed, Feb 25, 2026 at 03:04:27PM +0530, Ani Sinha wrote:
> 
> 
> > On 25 Feb 2026, at 2:37 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > 
> > On Wed, Feb 25, 2026 at 11:35:15AM +0530, Prasad Pandit wrote:
> >> On Wed, 25 Feb 2026 at 09:23, Ani Sinha <anisinha@redhat.com> wrote:
> >>> Currently the code that adds a migration blocker does not check if the same
> >>> blocker already exists. Return an EEXIST error code if there is an attempt to
> >>> add the same migration blocker again. This way the same migration blocker will
> >>> not get added twice.
> >>> 
> >>> Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
> >>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>> ---
> >>> migration/migration.c | 4 ++++
> >>> 1 file changed, 4 insertions(+)
> >>> 
> >>> diff --git a/migration/migration.c b/migration/migration.c
> >>> index a5b0465ed3..1eb75fb7fb 100644
> >>> --- a/migration/migration.c
> >>> +++ b/migration/migration.c
> >>> @@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
> > 
> > This method has an  '**errp' parameter.....
> > 
> >>> {
> >>>     for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
> >>>         if (modes & BIT(mode)) {
> >>> +            if (g_slist_index(migration_blockers[mode],
> >>> +                                 *reasonp) >= 0) {
> >>> +                return -EEXIST;
> > 
> > .... so using  -errno for return values is not appropriate - it must
> > set 'errp' and return -1.
> 
> I see functions migrate_add_blocker_modes() or migrate_add_blocker_internal()
> that do not do this. This is why I thought it was ok to just return -errno here.

Yes they all set 'errp', but indirectly:

    if (is_only_migratable(reasonp, modes, errp)) {
        return -EACCES;
    } else if (is_busy(reasonp, errp)) {
        return -EBUSY;
    }

In this case 'is_busy' and 'is_only_migratable' will be setting 'errp'.


With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25  9:32       ` Markus Armbruster
@ 2026-02-25  9:45         ` Ani Sinha
  2026-02-25 10:04         ` Daniel P. Berrangé
  1 sibling, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-25  9:45 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Daniel Berrange, Prasad Pandit, Peter Xu, Fabiano Rosas,
	Gerd Hoffmann, Paolo Bonzini, Ani Sinha, Prasad Pandit,
	qemu-devel



> On 25 Feb 2026, at 3:02 PM, Markus Armbruster <armbru@redhat.com> wrote:
> 
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
>> On Wed, Feb 25, 2026 at 11:35:15AM +0530, Prasad Pandit wrote:
>>> On Wed, 25 Feb 2026 at 09:23, Ani Sinha <anisinha@redhat.com> wrote:
>>>> Currently the code that adds a migration blocker does not check if the same
>>>> blocker already exists. Return an EEXIST error code if there is an attempt to
>>>> add the same migration blocker again. This way the same migration blocker will
>>>> not get added twice.
>>>> 
>>>> Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>> ---
>>>> migration/migration.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>> 
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index a5b0465ed3..1eb75fb7fb 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
>> 
>> This method has an  '**errp' parameter.....
>> 
>>>> {
>>>>     for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
>>>>         if (modes & BIT(mode)) {
>>>> +            if (g_slist_index(migration_blockers[mode],
>>>> +                                 *reasonp) >= 0) {
>>>> +                return -EEXIST;
>> 
>> .... so using  -errno for return values is not appropriate - it must
>> set 'errp' and return -1.
> 
> Yes, it must set @errp on failure.  Returning -1 would be wrong,

Ok so just

diff --git a/migration/migration.c b/migration/migration.c
index 1eb75fb7fb..dc75d2eeb6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1704,6 +1704,8 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
         if (modes & BIT(mode)) {
             if (g_slist_index(migration_blockers[mode],
                                  *reasonp) >= 0) {
+                error_propagate_prepend(errp, *reasonp,
+                                        "migration blocker already added ");
                 return -EEXIST;
             }
             migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],


> because:
> 
>    int migrate_add_blocker_modes(Error **reasonp, unsigned modes, Error **errp)
>    {
>        if (is_only_migratable(reasonp, modes, errp)) {
>            return -EACCES;
>        } else if (is_busy(reasonp, errp)) {
>            return -EBUSY;
>        }
>        return add_blockers(reasonp, modes, errp);
>    }
> 
> If callers don't actually care for error codes, the entire nest of
> functions could be simplified to return 0/-1 or true/false.
> 
>>>> +            }
>>>>             migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],
>>>>                                                        *reasonp);
>>>>         }
>>> 
>>> * Looks okay.
>>> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
>>> 
>>> Thank you.
>>> ---
>>>  - Prasad
>>> 
>>> 
>> 
>> With regards,
>> Daniel




^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25  9:32       ` Markus Armbruster
  2026-02-25  9:45         ` Ani Sinha
@ 2026-02-25 10:04         ` Daniel P. Berrangé
  1 sibling, 0 replies; 55+ messages in thread
From: Daniel P. Berrangé @ 2026-02-25 10:04 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Prasad Pandit, Ani Sinha, Peter Xu, Fabiano Rosas, kraxel,
	pbonzini, ani, Prasad Pandit, qemu-devel

On Wed, Feb 25, 2026 at 10:32:51AM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Wed, Feb 25, 2026 at 11:35:15AM +0530, Prasad Pandit wrote:
> >> On Wed, 25 Feb 2026 at 09:23, Ani Sinha <anisinha@redhat.com> wrote:
> >> > Currently the code that adds a migration blocker does not check if the same
> >> > blocker already exists. Return an EEXIST error code if there is an attempt to
> >> > add the same migration blocker again. This way the same migration blocker will
> >> > not get added twice.
> >> >
> >> > Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
> >> > Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >> > ---
> >> >  migration/migration.c | 4 ++++
> >> >  1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/migration/migration.c b/migration/migration.c
> >> > index a5b0465ed3..1eb75fb7fb 100644
> >> > --- a/migration/migration.c
> >> > +++ b/migration/migration.c
> >> > @@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
> >
> > This method has an  '**errp' parameter.....
> >
> >> >  {
> >> >      for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
> >> >          if (modes & BIT(mode)) {
> >> > +            if (g_slist_index(migration_blockers[mode],
> >> > +                                 *reasonp) >= 0) {
> >> > +                return -EEXIST;
> >
> > .... so using  -errno for return values is not appropriate - it must
> > set 'errp' and return -1.
> 
> Yes, it must set @errp on failure.  Returning -1 would be wrong,
> because:
> 
>     int migrate_add_blocker_modes(Error **reasonp, unsigned modes, Error **errp)
>     {
>         if (is_only_migratable(reasonp, modes, errp)) {
>             return -EACCES;
>         } else if (is_busy(reasonp, errp)) {
>             return -EBUSY;
>         }
>         return add_blockers(reasonp, modes, errp);
>     }
> 
> If callers don't actually care for error codes, the entire nest of
> functions could be simplified to return 0/-1 or true/false.

AFAICT there is only 1 caller that cares

spapr_mce_req_event tries to block migration while handling machine
check exceptions, and if seeing EBUSY it will report a warning
message.  I'm not convinced it should be checking for EBUSY at all.

If -only-migratable is set, it seems appropriate that it report
the same warning message, as it is unable to block the migration.


With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25  3:49 ` [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker Ani Sinha
  2026-02-25  6:05   ` Prasad Pandit
@ 2026-02-25 17:29   ` Peter Xu
  2026-02-26  3:46     ` Ani Sinha
  1 sibling, 1 reply; 55+ messages in thread
From: Peter Xu @ 2026-02-25 17:29 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Fabiano Rosas, kraxel, pbonzini, ani, Prasad Pandit, qemu-devel

Hi, Ani,

On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
> Currently the code that adds a migration blocker does not check if the same
> blocker already exists. Return an EEXIST error code if there is an attempt to
> add the same migration blocker again. This way the same migration blocker will
> not get added twice.

Could you help explain why it will inject two identical errors in the first
place, and why the caller cannot make sure it won't be injected twice?

Thanks,

> 
> Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
>  migration/migration.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index a5b0465ed3..1eb75fb7fb 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
>  {
>      for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
>          if (modes & BIT(mode)) {
> +            if (g_slist_index(migration_blockers[mode],
> +                                 *reasonp) >= 0) {
> +                return -EEXIST;
> +            }
>              migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],
>                                                         *reasonp);
>          }
> -- 
> 2.42.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-25 17:29   ` Peter Xu
@ 2026-02-26  3:46     ` Ani Sinha
  2026-02-26 13:08       ` Peter Xu
  2026-02-26 17:23       ` Paolo Bonzini
  0 siblings, 2 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-26  3:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, Gerd Hoffmann, Paolo Bonzini, Ani Sinha,
	Prasad Pandit, qemu-devel



> On 25 Feb 2026, at 10:59 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> Hi, Ani,
> 
> On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
>> Currently the code that adds a migration blocker does not check if the same
>> blocker already exists. Return an EEXIST error code if there is an attempt to
>> add the same migration blocker again. This way the same migration blocker will
>> not get added twice.
> 
> Could you help explain why it will inject two identical errors in the first
> place, and why the caller cannot make sure it won't be injected twice?

Likely due to a bug in the code. For example if the init function that adds the blocker is called again and the caller does not handle the second init call properly.
This came up as a part of the coco reset work where migration blockers are added in init methods. They need not be added again when init methods are again called during the reset process. The caller can handle it of course but adding a check further down the call stack makes things more robust.


> 
> Thanks,
> 
>> 
>> Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> ---
>> migration/migration.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index a5b0465ed3..1eb75fb7fb 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1702,6 +1702,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
>> {
>>     for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
>>         if (modes & BIT(mode)) {
>> +            if (g_slist_index(migration_blockers[mode],
>> +                                 *reasonp) >= 0) {
>> +                return -EEXIST;
>> +            }
>>             migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],
>>                                                        *reasonp);
>>         }
>> -- 
>> 2.42.0
>> 
> 
> -- 
> Peter Xu




^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-26  3:46     ` Ani Sinha
@ 2026-02-26 13:08       ` Peter Xu
  2026-02-26 16:14         ` Ani Sinha
  2026-02-26 17:23       ` Paolo Bonzini
  1 sibling, 1 reply; 55+ messages in thread
From: Peter Xu @ 2026-02-26 13:08 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Fabiano Rosas, Gerd Hoffmann, Paolo Bonzini, Ani Sinha,
	Prasad Pandit, qemu-devel

On Thu, Feb 26, 2026 at 09:16:43AM +0530, Ani Sinha wrote:
> 
> 
> > On 25 Feb 2026, at 10:59 PM, Peter Xu <peterx@redhat.com> wrote:
> > 
> > Hi, Ani,
> > 
> > On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
> >> Currently the code that adds a migration blocker does not check if the same
> >> blocker already exists. Return an EEXIST error code if there is an attempt to
> >> add the same migration blocker again. This way the same migration blocker will
> >> not get added twice.
> > 
> > Could you help explain why it will inject two identical errors in the first
> > place, and why the caller cannot make sure it won't be injected twice?
> 
> Likely due to a bug in the code. For example if the init function that
> adds the blocker is called again and the caller does not handle the
> second init call properly.  This came up as a part of the coco reset work
> where migration blockers are added in init methods. They need not be
> added again when init methods are again called during the reset
> process. The caller can handle it of course but adding a check further
> down the call stack makes things more robust.

IMHO if we want to make it more robust, we shouldn't return an error
because the caller may not always check for errors.

Would assertion suites more here?  Because migration blockers are not
something the user can manipulate, so it's a solid bug to fix when
triggered.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-26 13:08       ` Peter Xu
@ 2026-02-26 16:14         ` Ani Sinha
  2026-03-02 11:28           ` Prasad Pandit
  0 siblings, 1 reply; 55+ messages in thread
From: Ani Sinha @ 2026-02-26 16:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, Gerd Hoffmann, Paolo Bonzini, Ani Sinha,
	Prasad Pandit, qemu-devel



> On 26 Feb 2026, at 6:38 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> On Thu, Feb 26, 2026 at 09:16:43AM +0530, Ani Sinha wrote:
>> 
>> 
>>> On 25 Feb 2026, at 10:59 PM, Peter Xu <peterx@redhat.com> wrote:
>>> 
>>> Hi, Ani,
>>> 
>>> On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
>>>> Currently the code that adds a migration blocker does not check if the same
>>>> blocker already exists. Return an EEXIST error code if there is an attempt to
>>>> add the same migration blocker again. This way the same migration blocker will
>>>> not get added twice.
>>> 
>>> Could you help explain why it will inject two identical errors in the first
>>> place, and why the caller cannot make sure it won't be injected twice?
>> 
>> Likely due to a bug in the code. For example if the init function that
>> adds the blocker is called again and the caller does not handle the
>> second init call properly.  This came up as a part of the coco reset work
>> where migration blockers are added in init methods. They need not be
>> added again when init methods are again called during the reset
>> process. The caller can handle it of course but adding a check further
>> down the call stack makes things more robust.
> 
> IMHO if we want to make it more robust, we shouldn't return an error
> because the caller may not always check for errors.
> 
> Would assertion suites more here?  Because migration blockers are not
> something the user can manipulate, so it's a solid bug to fix when
> triggered.

If Prasad agrees, I will send something.

> 
> Thanks,
> 
> -- 
> Peter Xu




^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-26  3:46     ` Ani Sinha
  2026-02-26 13:08       ` Peter Xu
@ 2026-02-26 17:23       ` Paolo Bonzini
  2026-02-27  3:19         ` Ani Sinha
  1 sibling, 1 reply; 55+ messages in thread
From: Paolo Bonzini @ 2026-02-26 17:23 UTC (permalink / raw)
  To: Ani Sinha, Peter Xu
  Cc: Fabiano Rosas, Gerd Hoffmann, Ani Sinha, Prasad Pandit,
	qemu-devel

On 2/26/26 04:46, Ani Sinha wrote:
> 
> 
>> On 25 Feb 2026, at 10:59 PM, Peter Xu <peterx@redhat.com> wrote:
>> 
>> Hi, Ani,
>> 
>> On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
>>> Currently the code that adds a migration blocker does not check
>>> if the same blocker already exists. Return an EEXIST error code
>>> if there is an attempt to add the same migration blocker again.
>>> This way the same migration blocker will not get added twice.
>> 
>> Could you help explain why it will inject two identical errors in
>> the first place, and why the caller cannot make sure it won't be
>> injected twice?
> 
> Likely due to a bug in the code. For example if the init function
> that adds the blocker is called again and the caller does not handle
> the second init call properly. This came up as a part of the coco
> reset work where migration blockers are added in init methods. They
> need not be added again when init methods are again called during
> the reset process. The caller can handle it of course but adding a
> check further down the call stack makes things more robust.
Since this is the last patch, is it okay to remove it at least for now? 
Can the situation actually happen?

If not and it's a programming error, even an abort() is okay.

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-26 17:23       ` Paolo Bonzini
@ 2026-02-27  3:19         ` Ani Sinha
  0 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-27  3:19 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, Fabiano Rosas, Gerd Hoffmann, Ani Sinha, Prasad Pandit,
	qemu-devel



> On 26 Feb 2026, at 10:53 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 2/26/26 04:46, Ani Sinha wrote:
>>> On 25 Feb 2026, at 10:59 PM, Peter Xu <peterx@redhat.com> wrote:
>>> Hi, Ani,
>>> On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
>>>> Currently the code that adds a migration blocker does not check
>>>> if the same blocker already exists. Return an EEXIST error code
>>>> if there is an attempt to add the same migration blocker again.
>>>> This way the same migration blocker will not get added twice.
>>> Could you help explain why it will inject two identical errors in
>>> the first place, and why the caller cannot make sure it won't be
>>> injected twice?
>> Likely due to a bug in the code. For example if the init function
>> that adds the blocker is called again and the caller does not handle
>> the second init call properly. This came up as a part of the coco
>> reset work where migration blockers are added in init methods. They
>> need not be added again when init methods are again called during
>> the reset process. The caller can handle it of course but adding a
>> check further down the call stack makes things more robust.
> Since this is the last patch, is it okay to remove it at least for now? Can the situation actually happen?
> 

Yes you can drop this patch for now. When we get consensus if/what needs to be done, I can send another fix. 

> If not and it's a programming error, even an abort() is okay.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 23/35] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
  2026-02-25  3:49 ` [PATCH v6 23/35] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
@ 2026-02-27  7:02   ` Cédric Le Goater
  2026-02-27  7:28     ` Ani Sinha
  0 siblings, 1 reply; 55+ messages in thread
From: Cédric Le Goater @ 2026-02-27  7:02 UTC (permalink / raw)
  To: Ani Sinha, Alex Williamson; +Cc: kraxel, pbonzini, ani, qemu-devel

On 2/25/26 04:49, Ani Sinha wrote:
> Normally the vfio pseudo device file descriptor lives for the life of the VM.
> However, when the kvm VM file descriptor changes, a new file descriptor
> for the pseudo device needs to be generated against the new kvm VM descriptor.
> Other existing vfio descriptors needs to be reattached to the new pseudo device
> descriptor. This change performs the above steps.
> 
> Tested-by: Cédric Le Goater <clg@redhat.com>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
>   hw/vfio/helpers.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 92 insertions(+)
> 
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index f68f8165d0..e2bedd15ec 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -116,6 +116,89 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>    * we'll re-use it should another vfio device be attached before then.
>    */
>   int vfio_kvm_device_fd = -1;
> +
> +/*
> + * Confidential virtual machines:
> + * During reset of confidential vms, the kvm vm file descriptor changes.
> + * In this case, the old vfio kvm file descriptor is
> + * closed and a new descriptor is created against the new kvm vm file
> + * descriptor.
> + */
> +
> +typedef struct VFIODeviceFd {
> +    int fd;
> +    QLIST_ENTRY(VFIODeviceFd) node;
> +} VFIODeviceFd;
> +
> +static QLIST_HEAD(, VFIODeviceFd) vfio_device_fds =
> +    QLIST_HEAD_INITIALIZER(vfio_device_fds);
> +
> +static void vfio_device_fd_list_add(int fd)
> +{
> +    VFIODeviceFd *file_fd;
> +    file_fd = g_malloc0(sizeof(*file_fd));
> +    file_fd->fd = fd;
> +    QLIST_INSERT_HEAD(&vfio_device_fds, file_fd, node);
> +}
> +
> +static void vfio_device_fd_list_remove(int fd)
> +{
> +    VFIODeviceFd *file_fd, *next;
> +
> +    QLIST_FOREACH_SAFE(file_fd, &vfio_device_fds, node, next) {
> +        if (file_fd->fd == fd) {
> +            QLIST_REMOVE(file_fd, node);
> +            g_free(file_fd);
> +            break;
> +        }
> +    }
> +}
> +
> +static int vfio_device_fd_rebind(NotifierWithReturn *notifier, void *data,
> +                                  Error **errp)
> +{
> +    VFIODeviceFd *file_fd;
> +    int ret = 0;
> +    struct kvm_device_attr attr = {
> +        .group = KVM_DEV_VFIO_FILE,
> +        .attr = KVM_DEV_VFIO_FILE_ADD,
> +    };
> +    struct kvm_create_device cd = {
> +        .type = KVM_DEV_TYPE_VFIO,
> +    };
> +
> +    /* we are not interested in pre vmfd change notification */
> +    if (((VmfdChangeNotifier *)data)->pre) {
> +        return 0;
> +    }
> +
> +    if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
> +        error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
> +        return -errno;
> +    }
> +
> +    if (vfio_kvm_device_fd != -1) {
> +        close(vfio_kvm_device_fd);
> +    }
> +
> +    vfio_kvm_device_fd = cd.fd;
> +
> +    QLIST_FOREACH(file_fd, &vfio_device_fds, node) {
> +        attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
> +        if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
> +            error_setg_errno(errp, errno,
> +                             "Failed to add fd %d to KVM VFIO device",
> +                             file_fd->fd);
> +            ret = -errno;

In case you resend, may be change this part to return -errno at first
error. Seems best.

> +        }
> +    }
> +    return ret;

And return 0 here. Which means 'ret' is now unused.

Thanks,

C.


> +}
> +
> +static struct NotifierWithReturn vfio_vmfd_change_notifier = {
> +    .notify = vfio_device_fd_rebind,
> +};
> +
>   #endif
>   
>   void vfio_kvm_device_close(void)
> @@ -153,6 +236,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>           }
>   
>           vfio_kvm_device_fd = cd.fd;
> +        /*
> +         * If the vm file descriptor changes, add a notifier so that we can
> +         * re-create the vfio_kvm_device_fd.
> +         */
> +        kvm_vmfd_add_change_notifier(&vfio_vmfd_change_notifier);
>       }
>   
>       if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
> @@ -160,6 +248,8 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>                            fd);
>           return -errno;
>       }
> +
> +    vfio_device_fd_list_add(fd);
>   #endif
>       return 0;
>   }
> @@ -183,6 +273,8 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
>                            "Failed to remove fd %d from KVM VFIO device", fd);
>           return -errno;
>       }
> +
> +    vfio_device_fd_list_remove(fd);
>   #endif
>       return 0;
>   }



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 23/35] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
  2026-02-27  7:02   ` Cédric Le Goater
@ 2026-02-27  7:28     ` Ani Sinha
  0 siblings, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-02-27  7:28 UTC (permalink / raw)
  To: Cedric Le Goater
  Cc: Alex Williamson, Gerd Hoffmann, Paolo Bonzini, Ani Sinha,
	qemu-devel



> On 27 Feb 2026, at 12:32 PM, Cédric Le Goater <clg@redhat.com> wrote:
> 
> On 2/25/26 04:49, Ani Sinha wrote:
>> Normally the vfio pseudo device file descriptor lives for the life of the VM.
>> However, when the kvm VM file descriptor changes, a new file descriptor
>> for the pseudo device needs to be generated against the new kvm VM descriptor.
>> Other existing vfio descriptors needs to be reattached to the new pseudo device
>> descriptor. This change performs the above steps.
>> Tested-by: Cédric Le Goater <clg@redhat.com>
>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> ---
>>  hw/vfio/helpers.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 92 insertions(+)
>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>> index f68f8165d0..e2bedd15ec 100644
>> --- a/hw/vfio/helpers.c
>> +++ b/hw/vfio/helpers.c
>> @@ -116,6 +116,89 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>>   * we'll re-use it should another vfio device be attached before then.
>>   */
>>  int vfio_kvm_device_fd = -1;
>> +
>> +/*
>> + * Confidential virtual machines:
>> + * During reset of confidential vms, the kvm vm file descriptor changes.
>> + * In this case, the old vfio kvm file descriptor is
>> + * closed and a new descriptor is created against the new kvm vm file
>> + * descriptor.
>> + */
>> +
>> +typedef struct VFIODeviceFd {
>> +    int fd;
>> +    QLIST_ENTRY(VFIODeviceFd) node;
>> +} VFIODeviceFd;
>> +
>> +static QLIST_HEAD(, VFIODeviceFd) vfio_device_fds =
>> +    QLIST_HEAD_INITIALIZER(vfio_device_fds);
>> +
>> +static void vfio_device_fd_list_add(int fd)
>> +{
>> +    VFIODeviceFd *file_fd;
>> +    file_fd = g_malloc0(sizeof(*file_fd));
>> +    file_fd->fd = fd;
>> +    QLIST_INSERT_HEAD(&vfio_device_fds, file_fd, node);
>> +}
>> +
>> +static void vfio_device_fd_list_remove(int fd)
>> +{
>> +    VFIODeviceFd *file_fd, *next;
>> +
>> +    QLIST_FOREACH_SAFE(file_fd, &vfio_device_fds, node, next) {
>> +        if (file_fd->fd == fd) {
>> +            QLIST_REMOVE(file_fd, node);
>> +            g_free(file_fd);
>> +            break;
>> +        }
>> +    }
>> +}
>> +
>> +static int vfio_device_fd_rebind(NotifierWithReturn *notifier, void *data,
>> +                                  Error **errp)
>> +{
>> +    VFIODeviceFd *file_fd;
>> +    int ret = 0;
>> +    struct kvm_device_attr attr = {
>> +        .group = KVM_DEV_VFIO_FILE,
>> +        .attr = KVM_DEV_VFIO_FILE_ADD,
>> +    };
>> +    struct kvm_create_device cd = {
>> +        .type = KVM_DEV_TYPE_VFIO,
>> +    };
>> +
>> +    /* we are not interested in pre vmfd change notification */
>> +    if (((VmfdChangeNotifier *)data)->pre) {
>> +        return 0;
>> +    }
>> +
>> +    if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
>> +        error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
>> +        return -errno;
>> +    }
>> +
>> +    if (vfio_kvm_device_fd != -1) {
>> +        close(vfio_kvm_device_fd);
>> +    }
>> +
>> +    vfio_kvm_device_fd = cd.fd;
>> +
>> +    QLIST_FOREACH(file_fd, &vfio_device_fds, node) {
>> +        attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
>> +        if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>> +            error_setg_errno(errp, errno,
>> +                             "Failed to add fd %d to KVM VFIO device",
>> +                             file_fd->fd);
>> +            ret = -errno;
> 
> In case you resend, may be change this part to return -errno at first
> error. Seems best.
> 
>> +        }
>> +    }
>> +    return ret;
> 
> And return 0 here. Which means 'ret' is now unused.

Yes good idea. I have just sent a v7 for this patch only which I see you have already seen.


> 
> Thanks,
> 
> C.
> 
> 
>> +}
>> +
>> +static struct NotifierWithReturn vfio_vmfd_change_notifier = {
>> +    .notify = vfio_device_fd_rebind,
>> +};
>> +
>>  #endif
>>    void vfio_kvm_device_close(void)
>> @@ -153,6 +236,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>>          }
>>            vfio_kvm_device_fd = cd.fd;
>> +        /*
>> +         * If the vm file descriptor changes, add a notifier so that we can
>> +         * re-create the vfio_kvm_device_fd.
>> +         */
>> +        kvm_vmfd_add_change_notifier(&vfio_vmfd_change_notifier);
>>      }
>>        if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>> @@ -160,6 +248,8 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>>                           fd);
>>          return -errno;
>>      }
>> +
>> +    vfio_device_fd_list_add(fd);
>>  #endif
>>      return 0;
>>  }
>> @@ -183,6 +273,8 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
>>                           "Failed to remove fd %d from KVM VFIO device", fd);
>>          return -errno;
>>      }
>> +
>> +    vfio_device_fd_list_remove(fd);
>>  #endif
>>      return 0;
>>  }
> 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-02-26 16:14         ` Ani Sinha
@ 2026-03-02 11:28           ` Prasad Pandit
  2026-03-02 20:01             ` Peter Xu
  0 siblings, 1 reply; 55+ messages in thread
From: Prasad Pandit @ 2026-03-02 11:28 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Peter Xu, Fabiano Rosas, Gerd Hoffmann, Paolo Bonzini, Ani Sinha,
	Prasad Pandit, qemu-devel

Hello all,

On Thu, 26 Feb 2026 at 21:46, Ani Sinha <anisinha@redhat.com> wrote:
> >>> On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
> >>>> Currently the code that adds a migration blocker does not check if the same
> >>>> blocker already exists. Return an EEXIST error code if there is an attempt to
> >>>> add the same migration blocker again. This way the same migration blocker will
> >>>> not get added twice.
> >>>
> >>> Could you help explain why it will inject two identical errors in the first
> >>> place, and why the caller cannot make sure it won't be injected twice?
> >>
> >> Likely due to a bug in the code. For example if the init function that
> >> adds the blocker is called again and the caller does not handle the
> >> second init call properly.  This came up as a part of the coco reset work
> >> where migration blockers are added in init methods. They need not be
> >> added again when init methods are again called during the reset
> >> process. The caller can handle it of course but adding a check further
> >> down the call stack makes things more robust.
> >
> > IMHO if we want to make it more robust, we shouldn't return an error
> > because the caller may not always check for errors.
> >
> > Would assertion suites more here?  Because migration blockers are not
> > something the user can manipulate, so it's a solid bug to fix when
> > triggered.
>
> If Prasad agrees, I will send something.

* The majority of the places I see constructs like:
===
        if (migrate_add_blocker(&g->migration_blocker, errp) < 0) {
OR
        ret = migrate_add_blocker(&hv_no_nonarch_cs_mig_blocker, &local_err);
        if (ret < 0) {
            error_report_err(local_err);
===

* So setting **errp and returning a negative value is consistent with
that. Aborting (assert(3)) for trying to add a duplicate blocker
object seems like a harsh punishment, considering that'll happen at
run time and the user won't be able to do much then.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-03-02 11:28           ` Prasad Pandit
@ 2026-03-02 20:01             ` Peter Xu
  2026-03-03 10:51               ` Ani Sinha
  2026-03-09  4:29               ` Ani Sinha
  0 siblings, 2 replies; 55+ messages in thread
From: Peter Xu @ 2026-03-02 20:01 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: Ani Sinha, Fabiano Rosas, Gerd Hoffmann, Paolo Bonzini, Ani Sinha,
	Prasad Pandit, qemu-devel

On Mon, Mar 02, 2026 at 04:58:50PM +0530, Prasad Pandit wrote:
> Hello all,
> 
> On Thu, 26 Feb 2026 at 21:46, Ani Sinha <anisinha@redhat.com> wrote:
> > >>> On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
> > >>>> Currently the code that adds a migration blocker does not check if the same
> > >>>> blocker already exists. Return an EEXIST error code if there is an attempt to
> > >>>> add the same migration blocker again. This way the same migration blocker will
> > >>>> not get added twice.
> > >>>
> > >>> Could you help explain why it will inject two identical errors in the first
> > >>> place, and why the caller cannot make sure it won't be injected twice?
> > >>
> > >> Likely due to a bug in the code. For example if the init function that
> > >> adds the blocker is called again and the caller does not handle the
> > >> second init call properly.  This came up as a part of the coco reset work
> > >> where migration blockers are added in init methods. They need not be
> > >> added again when init methods are again called during the reset
> > >> process. The caller can handle it of course but adding a check further
> > >> down the call stack makes things more robust.
> > >
> > > IMHO if we want to make it more robust, we shouldn't return an error
> > > because the caller may not always check for errors.
> > >
> > > Would assertion suites more here?  Because migration blockers are not
> > > something the user can manipulate, so it's a solid bug to fix when
> > > triggered.
> >
> > If Prasad agrees, I will send something.
> 
> * The majority of the places I see constructs like:
> ===
>         if (migrate_add_blocker(&g->migration_blocker, errp) < 0) {
> OR
>         ret = migrate_add_blocker(&hv_no_nonarch_cs_mig_blocker, &local_err);
>         if (ret < 0) {
>             error_report_err(local_err);
> ===
> 
> * So setting **errp and returning a negative value is consistent with
> that. Aborting (assert(3)) for trying to add a duplicate blocker
> object seems like a harsh punishment, considering that'll happen at
> run time and the user won't be able to do much then.

If it's a programming error, then it shouldn't happen at runtime.  The
current paths to fail this API was not for programming errors.

If this patch isn't required, IMHO we can at least drop it in this series
and make it separate.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-03-02 20:01             ` Peter Xu
@ 2026-03-03 10:51               ` Ani Sinha
  2026-03-09  4:29               ` Ani Sinha
  1 sibling, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-03-03 10:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: Prasad Pandit, Fabiano Rosas, Gerd Hoffmann, Paolo Bonzini,
	Ani Sinha, Prasad Pandit, qemu-devel

On Mon, Mar 2, 2026 at 3:01 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Mar 02, 2026 at 04:58:50PM +0530, Prasad Pandit wrote:
> > Hello all,
> >
> > On Thu, 26 Feb 2026 at 21:46, Ani Sinha <anisinha@redhat.com> wrote:
> > > >>> On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
> > > >>>> Currently the code that adds a migration blocker does not check if the same
> > > >>>> blocker already exists. Return an EEXIST error code if there is an attempt to
> > > >>>> add the same migration blocker again. This way the same migration blocker will
> > > >>>> not get added twice.
> > > >>>
> > > >>> Could you help explain why it will inject two identical errors in the first
> > > >>> place, and why the caller cannot make sure it won't be injected twice?
> > > >>
> > > >> Likely due to a bug in the code. For example if the init function that
> > > >> adds the blocker is called again and the caller does not handle the
> > > >> second init call properly.  This came up as a part of the coco reset work
> > > >> where migration blockers are added in init methods. They need not be
> > > >> added again when init methods are again called during the reset
> > > >> process. The caller can handle it of course but adding a check further
> > > >> down the call stack makes things more robust.
> > > >
> > > > IMHO if we want to make it more robust, we shouldn't return an error
> > > > because the caller may not always check for errors.
> > > >
> > > > Would assertion suites more here?  Because migration blockers are not
> > > > something the user can manipulate, so it's a solid bug to fix when
> > > > triggered.
> > >
> > > If Prasad agrees, I will send something.
> >
> > * The majority of the places I see constructs like:
> > ===
> >         if (migrate_add_blocker(&g->migration_blocker, errp) < 0) {
> > OR
> >         ret = migrate_add_blocker(&hv_no_nonarch_cs_mig_blocker, &local_err);
> >         if (ret < 0) {
> >             error_report_err(local_err);
> > ===
> >
> > * So setting **errp and returning a negative value is consistent with
> > that. Aborting (assert(3)) for trying to add a duplicate blocker
> > object seems like a harsh punishment, considering that'll happen at
> > run time and the user won't be able to do much then.
>
> If it's a programming error, then it shouldn't happen at runtime.  The
> current paths to fail this API was not for programming errors.
>
> If this patch isn't required, IMHO we can at least drop it in this series
> and make it separate.
>

Yes this patch is dropped.

> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker
  2026-03-02 20:01             ` Peter Xu
  2026-03-03 10:51               ` Ani Sinha
@ 2026-03-09  4:29               ` Ani Sinha
  1 sibling, 0 replies; 55+ messages in thread
From: Ani Sinha @ 2026-03-09  4:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: Prasad Pandit, Fabiano Rosas, Gerd Hoffmann, Paolo Bonzini,
	Ani Sinha, Prasad Pandit, qemu-devel



> On 3 Mar 2026, at 1:31 AM, Peter Xu <peterx@redhat.com> wrote:
> 
> On Mon, Mar 02, 2026 at 04:58:50PM +0530, Prasad Pandit wrote:
>> Hello all,
>> 
>> On Thu, 26 Feb 2026 at 21:46, Ani Sinha <anisinha@redhat.com> wrote:
>>>>>> On Wed, Feb 25, 2026 at 09:19:40AM +0530, Ani Sinha wrote:
>>>>>>> Currently the code that adds a migration blocker does not check if the same
>>>>>>> blocker already exists. Return an EEXIST error code if there is an attempt to
>>>>>>> add the same migration blocker again. This way the same migration blocker will
>>>>>>> not get added twice.
>>>>>> 
>>>>>> Could you help explain why it will inject two identical errors in the first
>>>>>> place, and why the caller cannot make sure it won't be injected twice?
>>>>> 
>>>>> Likely due to a bug in the code. For example if the init function that
>>>>> adds the blocker is called again and the caller does not handle the
>>>>> second init call properly.  This came up as a part of the coco reset work
>>>>> where migration blockers are added in init methods. They need not be
>>>>> added again when init methods are again called during the reset
>>>>> process. The caller can handle it of course but adding a check further
>>>>> down the call stack makes things more robust.
>>>> 
>>>> IMHO if we want to make it more robust, we shouldn't return an error
>>>> because the caller may not always check for errors.
>>>> 
>>>> Would assertion suites more here?  Because migration blockers are not
>>>> something the user can manipulate, so it's a solid bug to fix when
>>>> triggered.
>>> 
>>> If Prasad agrees, I will send something.
>> 
>> * The majority of the places I see constructs like:
>> ===
>>        if (migrate_add_blocker(&g->migration_blocker, errp) < 0) {
>> OR
>>        ret = migrate_add_blocker(&hv_no_nonarch_cs_mig_blocker, &local_err);
>>        if (ret < 0) {
>>            error_report_err(local_err);
>> ===
>> 
>> * So setting **errp and returning a negative value is consistent with
>> that. Aborting (assert(3)) for trying to add a duplicate blocker
>> object seems like a harsh punishment, considering that'll happen at
>> run time and the user won't be able to do much then.
> 
> If it's a programming error, then it shouldn't happen at runtime.  The
> current paths to fail this API was not for programming errors.

I have sent a patch with an assertion added
https://mail.gnu.org/archive/html/qemu-devel/2026-03/msg02602.html


> 
> If this patch isn't required, IMHO we can at least drop it in this series
> and make it separate.
> 
> Thanks,
> 
> -- 
> Peter Xu




^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2026-03-09  4:30 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-25  3:49 [PATCH v6 00/35] Introduce support for confidential guest reset (x86) Ani Sinha
2026-02-25  3:49 ` [PATCH v6 01/35] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
2026-02-25  3:49 ` [PATCH v6 02/35] accel/kvm: add confidential class member to indicate guest rebuild capability Ani Sinha
2026-02-25  3:49 ` [PATCH v6 03/35] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
2026-02-25  3:49 ` [PATCH v6 04/35] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
2026-02-25  3:49 ` [PATCH v6 05/35] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
2026-02-25  3:49 ` [PATCH v6 06/35] accel/kvm: mark guest state as unprotected after vm " Ani Sinha
2026-02-25  3:49 ` [PATCH v6 07/35] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
2026-02-25  3:49 ` [PATCH v6 08/35] accel/kvm: notify when KVM VM file fd is about to be changed Ani Sinha
2026-02-25  3:49 ` [PATCH v6 09/35] i386/kvm: unregister smram listeners prior to vm file descriptor change Ani Sinha
2026-02-25  3:49 ` [PATCH v6 10/35] kvm/i386: implement architecture support for kvm " Ani Sinha
2026-02-25  3:49 ` [PATCH v6 11/35] i386/kvm: refactor xen init into a new function Ani Sinha
2026-02-25  3:49 ` [PATCH v6 12/35] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 13/35] hw/i386: export a new function x86_bios_rom_reload Ani Sinha
2026-02-25  3:49 ` [PATCH v6 14/35] kvm/i386: reload firmware for confidential guest reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 15/35] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 16/35] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
2026-02-25  3:49 ` [PATCH v6 17/35] i386/tdx: finalize TDX guest state upon reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 18/35] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
2026-02-25  3:49 ` [PATCH v6 19/35] i386/sev: add migration blockers only once Ani Sinha
2026-02-25  3:49 ` [PATCH v6 20/35] i386/sev: add notifiers " Ani Sinha
2026-02-25  3:49 ` [PATCH v6 21/35] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
2026-02-25  3:49 ` [PATCH v6 22/35] i386/sev: add support for confidential guest reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 23/35] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
2026-02-27  7:02   ` Cédric Le Goater
2026-02-27  7:28     ` Ani Sinha
2026-02-25  3:49 ` [PATCH v6 24/35] kvm/i8254: refactor pit initialization into a helper Ani Sinha
2026-02-25  3:49 ` [PATCH v6 25/35] kvm/i8254: add support for confidential guest reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 26/35] kvm/hyperv: add synic feature to CPU only if its not enabled Ani Sinha
2026-02-25  3:49 ` [PATCH v6 27/35] hw/hyperv/vmbus: add support for confidential guest reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 28/35] kvm/xen-emu: re-initialize capabilities during " Ani Sinha
2026-02-25  3:49 ` [PATCH v6 29/35] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 30/35] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
2026-02-25  3:49 ` [PATCH v6 31/35] kvm/clock: add support for confidential guest reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 32/35] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' Ani Sinha
2026-02-25  3:49 ` [PATCH v6 33/35] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
2026-02-25  3:49 ` [PATCH v6 34/35] qom: add 'confidential-guest-reset' property for x86 confidential vms Ani Sinha
2026-02-25  3:49 ` [PATCH v6 35/35] migration: return EEXIST when trying to add the same migration blocker Ani Sinha
2026-02-25  6:05   ` Prasad Pandit
2026-02-25  9:07     ` Daniel P. Berrangé
2026-02-25  9:32       ` Markus Armbruster
2026-02-25  9:45         ` Ani Sinha
2026-02-25 10:04         ` Daniel P. Berrangé
2026-02-25  9:34       ` Ani Sinha
2026-02-25  9:41         ` Daniel P. Berrangé
2026-02-25 17:29   ` Peter Xu
2026-02-26  3:46     ` Ani Sinha
2026-02-26 13:08       ` Peter Xu
2026-02-26 16:14         ` Ani Sinha
2026-03-02 11:28           ` Prasad Pandit
2026-03-02 20:01             ` Peter Xu
2026-03-03 10:51               ` Ani Sinha
2026-03-09  4:29               ` Ani Sinha
2026-02-26 17:23       ` Paolo Bonzini
2026-02-27  3:19         ` Ani Sinha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.