* [PATCH v5 00/34] Introduce support for confidential guest reset (x86)
@ 2026-02-18 11:41 Ani Sinha
2026-02-18 11:41 ` [PATCH v5 01/34] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
` (33 more replies)
0 siblings, 34 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:41 UTC (permalink / raw)
Cc: Ani Sinha, kraxel, qemu-devel, pbonzini, vkuznets, graf
This change introduces support for confidential guests
(SEV-ES, SEV-SNP and TDX) to reset/reboot just like other non-confidential
guests. Currently, a reboot intiated from the confidential guest results
in termination of the QEMU hypervisor as the CPUs are not resettable. As the
initial state of the guest including private memory is locked and encrypted,
the contents of that memory will not be accessible post reset. Hence a new
KVM file descriptor must be opened to create a new confidential VM context
closing the old one. All KVM VM specific ioctls must be called again. New
VCPU file descriptors must be created against the new KVM fd and most VCPU
ioctls must be called again as well.
This change perfoms closing of the old KVM fd and creating a new one. After
the new KVM fd is opened, all generic and architecture specific ioctl calls
are issued again. Notifiers are added to notify subsystems that:
- The KVM file fd is about to be changed to state sync-ing from KVM to QEMU
should be done if required.
- The KVM file fd has changed, so ioctl calls to the new KVM fd has to be
performed again.
- That new VCPU fds are created so that VCPU ioctl calls must be called again
where required.
Specific subsystems use these notifiers to re-issue ioctl calls where required.
Changes are made to SEV and TDX modules to reinitialize the confidential guest
state and seal it again. Along the way, some bug fixes are made so that some
initialization functions can be called again. Some refactoring of existing
code is done so that both init and reset paths can use them.
Tested on TDX, SEV-ES and SEV-SNP. Tested on non-coco hardware. Tested with Xen
emulation enabled.
All changes can be browsed here:
https://gitlab.com/anisinha/qemu/-/compare/master...coco-reboot-v5?from_project_id=11167699
Hoping to make v5 the final revision on this work.
CI pipeline passes:
https://gitlab.com/anisinha/qemu/-/pipelines/2329137956
Added functional test passed:
$ ./build/run tests/functional/x86_64/test_vmfd_change_reboot.py
TAP version 13
ok 1 test_vmfd_change_reboot.KVMGuest.test_reset_console
ok 2 test_vmfd_change_reboot.KVMGuest.test_reset_hyperv_vmbus
ok 3 test_vmfd_change_reboot.KVMGuest.test_reset_kvmpit
ok 4 test_vmfd_change_reboot.KVMGuest.test_reset_qmp
ok 5 test_vmfd_change_reboot.KVMGuest.test_reset_xen_emulation
1..5
Please review and test.
CC: qemu-devel@nongnu.org
CC: pbonzini@redhat.com
CC: kraxel@redhat.com
CC: vkuznets@redhat.com
CC: graf@amazon.com
Changelog:
v5:
- suggestions from v4 added.
- xen code refactoring seperated into a new patch.
- minor fixes.
- tags added.
- rebased.
v4:
- Fixed reset on non-coco with qmp "system_reset" command.
- Numerious misc fixes.
- addressed review comments from v3.
- dropped three patches that are not required.
- Added more functional tests including one vmbus test.
- added noop callbacks to stubs/kvm removing them from arch specific headers.
- Tags added.
- Rebased.
v3:
- Combined pre and post vmfd change notifier into one.
- rename kvm_arch_vmfd_change_ops() -> kvm_arch_on_vmfd_change()
- reuse kvm_arch_init() code in kvm_arch_on_vmfd_change()
- moved around migration blockers and notifers to more appropriate place.
- fixed Xen emulation.
- fixed SEV-ES reset.
- fixed/reorganized reset code in system/runstate.c
- can_rebuild_guest_state is now a boolean not a callback.
- misc fixes.
- added a functional test for Xen emulation with vmfd change.
- rebased.
v2:
- Bugfixes.
- Added a new machine option so that we can exercize most of the non-coco changes
related to reboot on non-coco platforms.
- added a new functional test. Currently its skipped on CI pipeline as KVM is not
enabled (no /dev/kvm on the container)for QEMU CI tests. It can be run manually and it
passes on those systems where KVM is enabled.
- Addressed comments from v1 with regards to refactoring of code, code simplication by
removal of redundant stuff, moved around code
so that notifiers, migration blockers are added only on one place.
- Added some tracepoints for future debugging on newly added functions.
- Rebased.
Ani Sinha (34):
i386/kvm: avoid installing duplicate msr entries in msr_handlers
accel/kvm: add confidential class member to indicate guest rebuild
capability
hw/accel: add a per-accelerator callback to change VM accelerator
handle
system/physmem: add helper to reattach existing memory after KVM VM fd
change
accel/kvm: add changes required to support KVM VM file descriptor
change
accel/kvm: add a notifier to indicate KVM VM file descriptor has
changed
accel/kvm: notify when KVM VM file fd is about to be changed
i386/kvm: unregister smram listeners prior to vm file descriptor
change
kvm/i386: implement architecture support for kvm file descriptor
change
i386/kvm: refactor xen init into a new function
hw/i386: refactor x86_bios_rom_init for reuse in confidential guest
reset
hw/i386: export a new function x86_bios_rom_reload
kvm/i386: reload firmware for confidential guest reset
accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon
reset
i386/tdx: refactor TDX firmware memory initialization code into a new
function
i386/tdx: finalize TDX guest state upon reset
i386/tdx: add a pre-vmfd change notifier to reset tdx state
i386/sev: add migration blockers only once
i386/sev: add notifiers only once
i386/sev: free existing launch update data and kernel hashes data on
init
i386/sev: add support for confidential guest reset
hw/vfio: generate new file fd for pseudo device and rebind existing
descriptors
kvm/i8254: refactor pit initialization into a helper
kvm/i8254: add support for confidential guest reset
kvm/hyperv: add synic feature to CPU only if its not enabled
hw/hyperv/vmbus: add support for confidential guest reset
kvm/xen-emu: re-initialize capabilities during confidential guest
reset
ppc/openpic: create a new openpic device and reattach mem region on
coco reset
kvm/vcpu: add notifiers to inform vcpu file descriptor change
kvm/clock: add support for confidential guest reset
hw/machine: introduce machine specific option 'x-change-vmfd-on-reset'
tests/functional/x86_64: add functional test to exercise vm fd change
on reset
qom: add 'confidential-guest-reset' property for x86 confidential vms
migration: return EEXIST when trying to add the same migration blocker
MAINTAINERS | 7 +
accel/kvm/kvm-all.c | 360 ++++++++++++++++---
accel/kvm/trace-events | 2 +
accel/stubs/kvm-stub.c | 18 +
hw/core/machine.c | 22 ++
hw/hyperv/trace-events | 1 +
hw/hyperv/vmbus.c | 37 ++
hw/i386/kvm/clock.c | 59 +++
hw/i386/kvm/i8254.c | 91 +++--
hw/i386/kvm/trace-events | 1 +
hw/i386/x86-common.c | 71 +++-
hw/intc/openpic_kvm.c | 112 ++++--
hw/vfio/helpers.c | 92 +++++
include/accel/accel-ops.h | 2 +
include/hw/core/boards.h | 6 +
include/hw/i386/x86.h | 1 +
include/system/confidential-guest-support.h | 20 ++
include/system/kvm.h | 43 +++
include/system/physmem.h | 1 +
migration/migration.c | 4 +
qapi/qom.json | 16 +-
stubs/kvm.c | 22 ++
stubs/meson.build | 1 +
system/physmem.c | 28 ++
system/runstate.c | 44 ++-
target/i386/kvm/kvm.c | 136 +++++--
target/i386/kvm/tdx.c | 141 ++++++--
target/i386/kvm/tdx.h | 1 +
target/i386/kvm/trace-events | 4 +
target/i386/kvm/xen-emu.c | 50 ++-
target/i386/sev.c | 127 +++++--
target/i386/trace-events | 1 +
tests/functional/x86_64/meson.build | 1 +
tests/functional/x86_64/test_rebuild_vmfd.py | 136 +++++++
34 files changed, 1438 insertions(+), 220 deletions(-)
create mode 100644 stubs/kvm.c
create mode 100755 tests/functional/x86_64/test_rebuild_vmfd.py
--
2.42.0
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v5 01/34] i386/kvm: avoid installing duplicate msr entries in msr_handlers
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
@ 2026-02-18 11:41 ` Ani Sinha
2026-02-18 11:41 ` [PATCH v5 02/34] accel/kvm: add confidential class member to indicate guest rebuild capability Ani Sinha
` (32 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:41 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
kvm_filter_msr() does not check if an msr entry is already present in the
msr_handlers table and installs a new handler unconditionally. If the function
is called again with the same MSR, it will result in duplicate entries in the
table and multiple such calls will fill up the table needlessly. Fix that.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/kvm.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9f1a4d4cbb..6d823a7991 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6278,27 +6278,33 @@ static int kvm_install_msr_filters(KVMState *s)
static int kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
QEMUWRMSRHandler *wrmsr)
{
- int i, ret;
+ int i, ret = 0;
for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
- if (!msr_handlers[i].msr) {
+ if (msr_handlers[i].msr == msr) {
+ break;
+ } else if (!msr_handlers[i].msr) {
msr_handlers[i] = (KVMMSRHandlers) {
.msr = msr,
.rdmsr = rdmsr,
.wrmsr = wrmsr,
};
+ break;
+ }
+ }
- ret = kvm_install_msr_filters(s);
- if (ret) {
- msr_handlers[i] = (KVMMSRHandlers) { };
- return ret;
- }
+ if (i == ARRAY_SIZE(msr_handlers)) {
+ ret = -EINVAL;
+ goto end;
+ }
- return 0;
- }
+ ret = kvm_install_msr_filters(s);
+ if (ret) {
+ msr_handlers[i] = (KVMMSRHandlers) { };
}
- return -EINVAL;
+ end:
+ return ret;
}
static int kvm_handle_rdmsr(X86CPU *cpu, struct kvm_run *run)
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 02/34] accel/kvm: add confidential class member to indicate guest rebuild capability
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
2026-02-18 11:41 ` [PATCH v5 01/34] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
@ 2026-02-18 11:41 ` Ani Sinha
2026-02-18 11:41 ` [PATCH v5 03/34] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
` (31 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:41 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu
Cc: Ani Sinha, kraxel, qemu-devel, kvm
As a part of the confidential guest reset process, the existing encrypted guest
state must be made mutable since it would be discarded after reset. A new
encrypted and locked guest state must be established after the reset. To this
end, a new boolean member per confidential guest support class
(eg, tdx or sev-snp) is added that will indicate whether its possible to
rebuild guest state:
bool can_rebuild_guest_state;
This is true if rebuilding guest state is possible, false otherwise.
A KVM based confidential guest reset is only possible when
the existing state is locked but its possible to rebuild guest state.
Otherwise, the guest is not resettable.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
include/system/confidential-guest-support.h | 20 ++++++++++++++++++++
system/runstate.c | 6 +++---
target/i386/kvm/tdx.c | 1 +
target/i386/sev.c | 1 +
4 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/include/system/confidential-guest-support.h b/include/system/confidential-guest-support.h
index 0cc8b26e64..5dca717308 100644
--- a/include/system/confidential-guest-support.h
+++ b/include/system/confidential-guest-support.h
@@ -152,6 +152,11 @@ typedef struct ConfidentialGuestSupportClass {
*/
int (*get_mem_map_entry)(int index, ConfidentialGuestMemoryMapEntry *entry,
Error **errp);
+
+ /*
+ * is it possible to rebuild the guest state?
+ */
+ bool can_rebuild_guest_state;
} ConfidentialGuestSupportClass;
static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
@@ -167,6 +172,21 @@ static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
return 0;
}
+static inline bool
+confidential_guest_can_rebuild_state(ConfidentialGuestSupport *cgs)
+{
+ ConfidentialGuestSupportClass *klass;
+
+ if (!cgs) {
+ /* non-confidential guests */
+ return true;
+ }
+
+ klass = CONFIDENTIAL_GUEST_SUPPORT_GET_CLASS(cgs);
+ return klass->can_rebuild_guest_state;
+
+}
+
static inline int confidential_guest_kvm_reset(ConfidentialGuestSupport *cgs,
Error **errp)
{
diff --git a/system/runstate.c b/system/runstate.c
index d091a2bddd..13f32bed8c 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -57,6 +57,7 @@
#include "system/reset.h"
#include "system/runstate.h"
#include "system/runstate-action.h"
+#include "system/confidential-guest-support.h"
#include "system/system.h"
#include "system/tpm.h"
#include "trace.h"
@@ -543,8 +544,6 @@ void qemu_system_reset(ShutdownCause reason)
*/
if (cpus_are_resettable()) {
cpu_synchronize_all_post_reset();
- } else {
- assert(runstate_check(RUN_STATE_PRELAUNCH));
}
vm_set_suspended(false);
@@ -697,7 +696,8 @@ void qemu_system_reset_request(ShutdownCause reason)
if (reboot_action == REBOOT_ACTION_SHUTDOWN &&
reason != SHUTDOWN_CAUSE_SUBSYSTEM_RESET) {
shutdown_requested = reason;
- } else if (!cpus_are_resettable()) {
+ } else if (!cpus_are_resettable() &&
+ !confidential_guest_can_rebuild_state(current_machine->cgs)) {
error_report("cpus are not resettable, terminating");
shutdown_requested = reason;
} else {
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0161985768..a3e81e1c0c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -1543,6 +1543,7 @@ static void tdx_guest_class_init(ObjectClass *oc, const void *data)
X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
klass->kvm_init = tdx_kvm_init;
+ klass->can_rebuild_guest_state = true;
x86_klass->kvm_type = tdx_kvm_type;
x86_klass->cpu_instance_init = tdx_cpu_instance_init;
x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index acdcb9c4e6..66e38ca32e 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -2760,6 +2760,7 @@ sev_common_instance_init(Object *obj)
cgs->set_guest_state = cgs_set_guest_state;
cgs->get_mem_map_entry = cgs_get_mem_map_entry;
cgs->set_guest_policy = cgs_set_guest_policy;
+ cgs->can_rebuild_guest_state = true;
QTAILQ_INIT(&sev_common->launch_vmsa);
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 03/34] hw/accel: add a per-accelerator callback to change VM accelerator handle
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
2026-02-18 11:41 ` [PATCH v5 01/34] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
2026-02-18 11:41 ` [PATCH v5 02/34] accel/kvm: add confidential class member to indicate guest rebuild capability Ani Sinha
@ 2026-02-18 11:41 ` Ani Sinha
2026-02-18 11:41 ` [PATCH v5 04/34] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
` (30 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:41 UTC (permalink / raw)
To: Richard Henderson, Paolo Bonzini, Philippe Mathieu-Daudé
Cc: Ani Sinha, kraxel, qemu-devel
When a confidential virtual machine is reset, a new guest context in the
accelerator must be generated post reset. Therefore, the old accelerator guest
file handle must be closed and a new one created. To this end, a per-accelerator
callback, "rebuild_guest" is introduced that would get called when a confidential
guest is reset. Subsequent patches will introduce specific implementation of
this callback for KVM accelerator.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
include/accel/accel-ops.h | 2 ++
system/runstate.c | 38 +++++++++++++++++++++++++++++++++++++-
2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/accel/accel-ops.h b/include/accel/accel-ops.h
index 23a8c246e1..f46492e3fe 100644
--- a/include/accel/accel-ops.h
+++ b/include/accel/accel-ops.h
@@ -23,6 +23,8 @@ struct AccelClass {
AccelOpsClass *ops;
int (*init_machine)(AccelState *as, MachineState *ms);
+ /* used mainly by confidential guests to rebuild guest state upon reset */
+ int (*rebuild_guest)(MachineState *ms);
bool (*cpu_common_realize)(CPUState *cpu, Error **errp);
void (*cpu_common_unrealize)(CPUState *cpu);
/* get_stats: Append statistics to @buf */
diff --git a/system/runstate.c b/system/runstate.c
index 13f32bed8c..e7b50e6a3b 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -42,6 +42,7 @@
#include "qapi/qapi-commands-run-state.h"
#include "qapi/qapi-events-run-state.h"
#include "qemu/accel.h"
+#include "accel/accel-ops.h"
#include "qemu/error-report.h"
#include "qemu/job.h"
#include "qemu/log.h"
@@ -509,6 +510,9 @@ void qemu_system_reset(ShutdownCause reason)
{
MachineClass *mc;
ResetType type;
+ AccelClass *ac = ACCEL_GET_CLASS(current_accel());
+ bool guest_state_rebuilt = false;
+ int ret;
mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
@@ -521,6 +525,29 @@ void qemu_system_reset(ShutdownCause reason)
default:
type = RESET_TYPE_COLD;
}
+
+ if (!cpus_are_resettable() &&
+ (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
+ reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
+ if (ac->rebuild_guest) {
+ ret = ac->rebuild_guest(current_machine);
+ if (ret < 0) {
+ error_report("unable to rebuild guest: %s(%d)",
+ strerror(-ret), ret);
+ vm_stop(RUN_STATE_INTERNAL_ERROR);
+ } else {
+ info_report("virtual machine state has been rebuilt with new "
+ "guest file handle.");
+ guest_state_rebuilt = true;
+ }
+ } else if (!cpus_are_resettable()) {
+ error_report("accelerator does not support reset!");
+ } else {
+ error_report("accelerator does not support rebuilding guest state,"
+ " proceeding with normal reset!");
+ }
+ }
+
if (mc && mc->reset) {
mc->reset(current_machine, type);
} else {
@@ -543,7 +570,16 @@ void qemu_system_reset(ShutdownCause reason)
* it does _more_ than cpu_synchronize_all_post_reset().
*/
if (cpus_are_resettable()) {
- cpu_synchronize_all_post_reset();
+ if (guest_state_rebuilt) {
+ /*
+ * If guest state has been rebuilt, then we
+ * need to sync full cpu state for non confidential guests post
+ * reset.
+ */
+ cpu_synchronize_all_post_init();
+ } else {
+ cpu_synchronize_all_post_reset();
+ }
}
vm_set_suspended(false);
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 04/34] system/physmem: add helper to reattach existing memory after KVM VM fd change
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (2 preceding siblings ...)
2026-02-18 11:41 ` [PATCH v5 03/34] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
@ 2026-02-18 11:41 ` Ani Sinha
2026-02-18 11:41 ` [PATCH v5 05/34] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
` (29 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:41 UTC (permalink / raw)
To: Paolo Bonzini, Peter Xu, Philippe Mathieu-Daudé
Cc: Ani Sinha, kraxel, qemu-devel
After the guest KVM file descriptor has changed as a part of the process of
confidential guest reset mechanism, existing memory needs to be reattached to
the new file descriptor. This change adds a helper function ram_block_rebind()
for this purpose. The next patch will make use of this function.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
include/system/physmem.h | 1 +
system/physmem.c | 28 ++++++++++++++++++++++++++++
2 files changed, 29 insertions(+)
diff --git a/include/system/physmem.h b/include/system/physmem.h
index 7bb7d3e154..da91b77bd9 100644
--- a/include/system/physmem.h
+++ b/include/system/physmem.h
@@ -51,5 +51,6 @@ physical_memory_snapshot_and_clear_dirty(MemoryRegion *mr, hwaddr offset,
bool physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
ram_addr_t start,
ram_addr_t length);
+int ram_block_rebind(Error **errp);
#endif
diff --git a/system/physmem.c b/system/physmem.c
index 2fb0c25c93..e5ff26acec 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2826,6 +2826,34 @@ found:
return block;
}
+/*
+ * Creates new guest memfd for the ramblocks and closes the
+ * existing memfd.
+ */
+int ram_block_rebind(Error **errp)
+{
+ RAMBlock *block;
+
+ qemu_mutex_lock_ramlist();
+
+ RAMBLOCK_FOREACH(block) {
+ if (block->flags & RAM_GUEST_MEMFD) {
+ if (block->guest_memfd >= 0) {
+ close(block->guest_memfd);
+ }
+ block->guest_memfd = kvm_create_guest_memfd(block->max_length,
+ 0, errp);
+ if (block->guest_memfd < 0) {
+ qemu_mutex_unlock_ramlist();
+ return -1;
+ }
+
+ }
+ }
+ qemu_mutex_unlock_ramlist();
+ return 0;
+}
+
/*
* Finds the named RAMBlock
*
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 05/34] accel/kvm: add changes required to support KVM VM file descriptor change
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (3 preceding siblings ...)
2026-02-18 11:41 ` [PATCH v5 04/34] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
@ 2026-02-18 11:41 ` Ani Sinha
2026-02-18 11:41 ` [PATCH v5 06/34] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
` (28 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:41 UTC (permalink / raw)
To: Paolo Bonzini, Ani Sinha, Marcelo Tosatti; +Cc: kraxel, qemu-devel, kvm
This change adds common kvm specific support to handle KVM VM file descriptor
change. KVM VM file descriptor can change as a part of confidential guest reset
mechanism. A new function api kvm_arch_on_vmfd_change() per
architecture platform is added in order to implement architecture specific
changes required to support it. A subsequent patch will add x86 specific
implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
confidential guest reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
MAINTAINERS | 6 ++++
accel/kvm/kvm-all.c | 80 ++++++++++++++++++++++++++++++++++++++++--
accel/kvm/trace-events | 1 +
include/system/kvm.h | 3 ++
stubs/kvm.c | 22 ++++++++++++
stubs/meson.build | 1 +
target/i386/kvm/kvm.c | 10 ++++++
7 files changed, 120 insertions(+), 3 deletions(-)
create mode 100644 stubs/kvm.c
diff --git a/MAINTAINERS b/MAINTAINERS
index d3aa6d6732..b0eb77c08f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -152,6 +152,12 @@ F: tools/i386/
F: tests/functional/i386/
F: tests/functional/x86_64/
+X86 VM file descriptor change on reset test
+M: Ani Sinha <anisinha@redhat.com>
+M: Paolo Bonzini <pbonzini@redhat.com>
+S: Maintained
+F: stubs/kvm.c
+
Guest CPU cores (TCG)
---------------------
Overall TCG CPUs
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0d8b0c4347..14729666a0 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2415,11 +2415,9 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, qemu_irq irq, int gsi)
g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));
}
-static void kvm_irqchip_create(KVMState *s)
+static void do_kvm_irqchip_create(KVMState *s)
{
int ret;
-
- assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
;
} else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) {
@@ -2452,7 +2450,13 @@ static void kvm_irqchip_create(KVMState *s)
fprintf(stderr, "Create kernel irqchip failed: %s\n", strerror(-ret));
exit(1);
}
+}
+static void kvm_irqchip_create(KVMState *s)
+{
+ assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
+
+ do_kvm_irqchip_create(s);
kvm_kernel_irqchip = true;
/* If we have an in-kernel IRQ chip then we must have asynchronous
* interrupt delivery (though the reverse is not necessarily true)
@@ -2607,6 +2611,75 @@ static int kvm_setup_dirty_ring(KVMState *s)
return 0;
}
+static int kvm_reset_vmfd(MachineState *ms)
+{
+ KVMState *s;
+ KVMMemoryListener *kml;
+ int ret = 0, type;
+ Error *err = NULL;
+
+ /*
+ * bail if the current architecture does not support VM file
+ * descriptor change.
+ */
+ if (!kvm_arch_supports_vmfd_change()) {
+ error_report("This target architecture does not support KVM VM "
+ "file descriptor change.");
+ return -EOPNOTSUPP;
+ }
+
+ s = KVM_STATE(ms->accelerator);
+ kml = &s->memory_listener;
+
+ memory_listener_unregister(&kml->listener);
+ memory_listener_unregister(&kvm_io_listener);
+
+ if (s->vmfd >= 0) {
+ close(s->vmfd);
+ }
+
+ type = find_kvm_machine_type(ms);
+ if (type < 0) {
+ return -EINVAL;
+ }
+
+ ret = do_kvm_create_vm(s, type);
+ if (ret < 0) {
+ return ret;
+ }
+
+ s->vmfd = ret;
+
+ kvm_setup_dirty_ring(s);
+
+ /* rebind memory to new vm fd */
+ ret = ram_block_rebind(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
+ ret = kvm_arch_on_vmfd_change(ms, s);
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (s->kernel_irqchip_allowed) {
+ do_kvm_irqchip_create(s);
+ }
+
+ /* these can be only called after ram_block_rebind() */
+ memory_listener_register(&kml->listener, &address_space_memory);
+ memory_listener_register(&kvm_io_listener, &address_space_io);
+
+ /*
+ * kvm fd has changed. Commit the irq routes to KVM once more.
+ */
+ kvm_irqchip_commit_routes(s);
+ trace_kvm_reset_vmfd();
+ return ret;
+}
+
static int kvm_init(AccelState *as, MachineState *ms)
{
MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -4015,6 +4088,7 @@ static void kvm_accel_class_init(ObjectClass *oc, const void *data)
AccelClass *ac = ACCEL_CLASS(oc);
ac->name = "KVM";
ac->init_machine = kvm_init;
+ ac->rebuild_guest = kvm_reset_vmfd;
ac->has_memory = kvm_accel_has_memory;
ac->allowed = &kvm_allowed;
ac->gdbstub_supported_sstep_flags = kvm_gdbstub_sstep_flags;
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e43d18a869..e4beda0148 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -14,6 +14,7 @@ kvm_destroy_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
kvm_park_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
kvm_unpark_vcpu(unsigned long arch_cpu_id, const char *msg) "id: %lu %s"
kvm_irqchip_commit_routes(void) ""
+kvm_reset_vmfd(void) ""
kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
kvm_irqchip_release_virq(int virq) "virq %d"
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 8f9eecf044..5fc7251fd9 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -456,6 +456,9 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr,
#endif /* COMPILING_PER_TARGET */
+bool kvm_arch_supports_vmfd_change(void);
+int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s);
+
void kvm_cpu_synchronize_state(CPUState *cpu);
void kvm_init_cpu_signals(CPUState *cpu);
diff --git a/stubs/kvm.c b/stubs/kvm.c
new file mode 100644
index 0000000000..2db61d89a7
--- /dev/null
+++ b/stubs/kvm.c
@@ -0,0 +1,22 @@
+/*
+ * kvm target arch specific stubs
+ *
+ * Copyright (c) 2026 Red Hat, Inc.
+ *
+ * Author:
+ * Ani Sinha <anisinha@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include "qemu/osdep.h"
+#include "system/kvm.h"
+
+int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
+{
+ abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+ return false;
+}
diff --git a/stubs/meson.build b/stubs/meson.build
index 8a07059500..6ae478bacc 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -74,6 +74,7 @@ if have_system
if igvm.found()
stub_ss.add(files('igvm.c'))
endif
+ stub_ss.add(files('kvm.c'))
stub_ss.add(files('target-get-monitor-def.c'))
stub_ss.add(files('target-monitor-defs.c'))
stub_ss.add(files('win32-kbd-hook.c'))
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6d823a7991..a4e18734b1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3389,6 +3389,16 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
return 0;
}
+int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
+{
+ abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+ return false;
+}
+
int kvm_arch_init(MachineState *ms, KVMState *s)
{
int ret;
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 06/34] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (4 preceding siblings ...)
2026-02-18 11:41 ` [PATCH v5 05/34] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
@ 2026-02-18 11:41 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 07/34] accel/kvm: notify when KVM VM file fd is about to be changed Ani Sinha
` (27 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:41 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
A notifier callback can be used by various subsystems to perform actions when
KVM file descriptor for a virtual machine changes as a part of confidential
guest reset process. This change adds this notifier mechanism. Subsequent
patches will add specific implementations for various notifier callbacks
corresponding to various subsystems that need to take action when KVM VM file
descriptor changed.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 30 ++++++++++++++++++++++++++++++
accel/stubs/kvm-stub.c | 8 ++++++++
include/system/kvm.h | 21 +++++++++++++++++++++
3 files changed, 59 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 14729666a0..b8a0685f7a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -90,6 +90,7 @@ struct KVMParkedVcpu {
};
KVMState *kvm_state;
+VmfdChangeNotifier vmfd_notifier;
bool kvm_kernel_irqchip;
bool kvm_split_irqchip;
bool kvm_async_interrupts_allowed;
@@ -123,6 +124,9 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = {
static NotifierList kvm_irqchip_change_notifiers =
NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
+static NotifierWithReturnList register_vmfd_changed_notifiers =
+ NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
+
struct KVMResampleFd {
int gsi;
EventNotifier *resample_event;
@@ -2173,6 +2177,22 @@ void kvm_irqchip_change_notify(void)
notifier_list_notify(&kvm_irqchip_change_notifiers, NULL);
}
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_list_add(®ister_vmfd_changed_notifiers, n);
+}
+
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_remove(n);
+}
+
+static int kvm_vmfd_change_notify(Error **errp)
+{
+ return notifier_with_return_list_notify(®ister_vmfd_changed_notifiers,
+ &vmfd_notifier, errp);
+}
+
int kvm_irqchip_get_virq(KVMState *s)
{
int next_virq;
@@ -2668,6 +2688,16 @@ static int kvm_reset_vmfd(MachineState *ms)
do_kvm_irqchip_create(s);
}
+ /*
+ * notify everyone that vmfd has changed.
+ */
+ vmfd_notifier.vmfd = s->vmfd;
+ ret = kvm_vmfd_change_notify(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
/* these can be only called after ram_block_rebind() */
memory_listener_register(&kml->listener, &address_space_memory);
memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 68cd33ba97..a6e8a6e16c 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -79,6 +79,14 @@ void kvm_irqchip_change_notify(void)
{
}
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
+{
+}
+
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
+{
+}
+
int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
EventNotifier *rn, int virq)
{
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 5fc7251fd9..f11729f432 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -181,6 +181,7 @@ DECLARE_INSTANCE_CHECKER(KVMState, KVM_STATE,
extern KVMState *kvm_state;
typedef struct Notifier Notifier;
+typedef struct NotifierWithReturn NotifierWithReturn;
typedef struct KVMRouteChange {
KVMState *s;
@@ -567,4 +568,24 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size);
int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
+/* argument to vmfd change notifier */
+typedef struct VmfdChangeNotifier {
+ int vmfd;
+} VmfdChangeNotifier;
+
+/**
+ * kvm_vmfd_add_change_notifier - register a notifier to get notified when
+ * a KVM vm file descriptor changes as a part of the confidential guest "reset"
+ * process. Various subsystems should use this mechanism to take actions such
+ * as creating new fds against this new vm file descriptor.
+ * @n: notifier with return value.
+ */
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
+/**
+ * kvm_vmfd_remove_change_notifier - de-register a notifer previously
+ * registered with kvm_vmfd_add_change_notifier call.
+ * @n: notifier that was previously registered.
+ */
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
+
#endif
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 07/34] accel/kvm: notify when KVM VM file fd is about to be changed
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (5 preceding siblings ...)
2026-02-18 11:41 ` [PATCH v5 06/34] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 08/34] i386/kvm: unregister smram listeners prior to vm file descriptor change Ani Sinha
` (26 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
Various subsystems might need to take some steps before the KVM file descriptor
for a virtual machine is changed. So a new boolean attribute is added to the
vmfd_notifier structure which is passed to the notifier callbacks.
vmfd_notifer.pre is true for pre-notification of vmfd change and false for
post notification. Notifier callback implementations can simply check
the boolean value for (vmfd_notifer*)->pre and can take actions for pre or
post vmfd change based on the value.
Subsequent patches will add callback implementations for specific components
that need this pre-notification.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 9 +++++++++
include/system/kvm.h | 6 ++++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index b8a0685f7a..47589f92e2 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2654,6 +2654,13 @@ static int kvm_reset_vmfd(MachineState *ms)
memory_listener_unregister(&kml->listener);
memory_listener_unregister(&kvm_io_listener);
+ vmfd_notifier.pre = true;
+ ret = kvm_vmfd_change_notify(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
if (s->vmfd >= 0) {
close(s->vmfd);
}
@@ -2692,6 +2699,8 @@ static int kvm_reset_vmfd(MachineState *ms)
* notify everyone that vmfd has changed.
*/
vmfd_notifier.vmfd = s->vmfd;
+ vmfd_notifier.pre = false;
+
ret = kvm_vmfd_change_notify(&err);
if (ret < 0) {
return ret;
diff --git a/include/system/kvm.h b/include/system/kvm.h
index f11729f432..fbe23608a1 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -571,12 +571,14 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
/* argument to vmfd change notifier */
typedef struct VmfdChangeNotifier {
int vmfd;
+ bool pre;
} VmfdChangeNotifier;
/**
* kvm_vmfd_add_change_notifier - register a notifier to get notified when
- * a KVM vm file descriptor changes as a part of the confidential guest "reset"
- * process. Various subsystems should use this mechanism to take actions such
+ * a KVM vm file descriptor changes or about to be changed as a part of the
+ * confidential guest "reset" process.
+ * Various subsystems should use this mechanism to take actions such
* as creating new fds against this new vm file descriptor.
* @n: notifier with return value.
*/
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 08/34] i386/kvm: unregister smram listeners prior to vm file descriptor change
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (6 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 07/34] accel/kvm: notify when KVM VM file fd is about to be changed Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 09/34] kvm/i386: implement architecture support for kvm " Ani Sinha
` (25 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
We will re-register smram listeners after the VM file descriptors has changed.
We need to unregister them first to make sure addresses and reference counters
work properly.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/kvm.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a4e18734b1..83657fe832 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -112,6 +112,11 @@ typedef struct {
static void kvm_init_msrs(X86CPU *cpu);
static int kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
QEMUWRMSRHandler *wrmsr);
+static int unregister_smram_listener(NotifierWithReturn *notifier,
+ void *data, Error** errp);
+NotifierWithReturn kvm_vmfd_change_notifier = {
+ .notify = unregister_smram_listener,
+};
const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
KVM_CAP_INFO(SET_TSS_ADDR),
@@ -2885,6 +2890,17 @@ static void register_smram_listener(Notifier *n, void *unused)
}
}
+static int unregister_smram_listener(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ if (!((VmfdChangeNotifier *)data)->pre) {
+ return 0;
+ }
+
+ memory_listener_unregister(&smram_listener.listener);
+ return 0;
+}
+
/* It should only be called in cpu's hotplug callback */
void kvm_smm_cpu_address_space_init(X86CPU *cpu)
{
@@ -3538,6 +3554,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
}
pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+ kvm_vmfd_add_change_notifier(&kvm_vmfd_change_notifier);
return 0;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 09/34] kvm/i386: implement architecture support for kvm file descriptor change
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (7 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 08/34] i386/kvm: unregister smram listeners prior to vm file descriptor change Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 10/34] i386/kvm: refactor xen init into a new function Ani Sinha
` (24 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
When the kvm file descriptor changes as a part of confidential guest reset,
some architecture specific setups including SEV/SEV-SNP/TDX specific setups
needs to be redone. These changes are implemented as a part of the
kvm_arch_on_vmfd_change() callback which was introduced previously.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/kvm.c | 49 ++++++++++++++++++++++++++++--------
target/i386/kvm/trace-events | 1 +
2 files changed, 39 insertions(+), 11 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 83657fe832..8679e7d3fa 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3407,12 +3407,30 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
{
- abort();
+ int ret;
+
+ ret = kvm_arch_init(ms, s);
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
+ X86MachineState *x86ms = X86_MACHINE(ms);
+
+ if (x86_machine_is_smm_enabled(x86ms)) {
+ memory_listener_register(&smram_listener.listener,
+ &smram_address_space);
+ }
+ kvm_set_max_apic_id(x86ms->apic_id_limit);
+ }
+
+ trace_kvm_arch_on_vmfd_change();
+ return 0;
}
bool kvm_arch_supports_vmfd_change(void)
{
- return false;
+ return true;
}
int kvm_arch_init(MachineState *ms, KVMState *s)
@@ -3420,6 +3438,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
int ret;
struct utsname utsname;
Error *local_err = NULL;
+ static bool first = true;
/*
* Initialize confidential guest (SEV/TDX) context, if required
@@ -3489,16 +3508,17 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
return ret;
}
- /* Tell fw_cfg to notify the BIOS to reserve the range. */
- e820_add_entry(KVM_IDENTITY_BASE, 0x4000, E820_RESERVED);
-
+ if (first) {
+ /* Tell fw_cfg to notify the BIOS to reserve the range. */
+ e820_add_entry(KVM_IDENTITY_BASE, 0x4000, E820_RESERVED);
+ }
ret = kvm_vm_set_nr_mmu_pages(s);
if (ret < 0) {
return ret;
}
if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
- x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
+ x86_machine_is_smm_enabled(X86_MACHINE(ms)) && first) {
smram_machine_done.notify = register_smram_listener;
qemu_add_machine_init_done_notifier(&smram_machine_done);
}
@@ -3545,16 +3565,23 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
return ret;
}
- ret = kvm_msr_energy_thread_init(s, ms);
- if (ret < 0) {
- error_report("kvm : error RAPL feature requirement not met");
- return ret;
+ if (first) {
+ ret = kvm_msr_energy_thread_init(s, ms);
+ if (ret < 0) {
+ error_report("kvm : "
+ "error RAPL feature requirement not met");
+ return ret;
+ }
}
}
}
pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
- kvm_vmfd_add_change_notifier(&kvm_vmfd_change_notifier);
+
+ if (first) {
+ kvm_vmfd_add_change_notifier(&kvm_vmfd_change_notifier);
+ }
+ first = false;
return 0;
}
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 74a6234ff7..2d213c9f9b 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -6,6 +6,7 @@ kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
kvm_hc_map_gpa_range(uint64_t gpa, uint64_t size, uint64_t attributes, uint64_t flags) "gpa 0x%" PRIx64 " size 0x%" PRIx64 " attributes 0x%" PRIx64 " flags 0x%" PRIx64
+kvm_arch_on_vmfd_change(void) ""
# xen-emu.c
kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 10/34] i386/kvm: refactor xen init into a new function
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (8 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 09/34] kvm/i386: implement architecture support for kvm " Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 11/34] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
` (23 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
Cosmetic - no new functionality added. Xen initialisation code is refactored
into its own function.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/kvm.c | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8679e7d3fa..feb3f3cf3c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3433,6 +3433,24 @@ bool kvm_arch_supports_vmfd_change(void)
return true;
}
+static int xen_init(MachineState *ms, KVMState *s)
+{
+#ifdef CONFIG_XEN_EMU
+ int ret = 0;
+ if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
+ error_report("kvm: Xen support only available in PC machine");
+ return -ENOTSUP;
+ }
+ /* hyperv_enabled() doesn't work yet. */
+ uint32_t msr = XEN_HYPERCALL_MSR;
+ ret = kvm_xen_init(s, msr);
+ return ret;
+#else
+ error_report("kvm: Xen support not enabled in qemu");
+ return -ENOTSUP;
+#endif
+}
+
int kvm_arch_init(MachineState *ms, KVMState *s)
{
int ret;
@@ -3467,21 +3485,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
}
if (s->xen_version) {
-#ifdef CONFIG_XEN_EMU
- if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
- error_report("kvm: Xen support only available in PC machine");
- return -ENOTSUP;
- }
- /* hyperv_enabled() doesn't work yet. */
- uint32_t msr = XEN_HYPERCALL_MSR;
- ret = kvm_xen_init(s, msr);
+ ret = xen_init(ms, s);
if (ret < 0) {
return ret;
}
-#else
- error_report("kvm: Xen support not enabled in qemu");
- return -ENOTSUP;
-#endif
}
ret = kvm_get_supported_msrs(s);
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 11/34] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (9 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 10/34] i386/kvm: refactor xen init into a new function Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-20 15:01 ` Michael S. Tsirkin
2026-02-18 11:42 ` [PATCH v5 12/34] hw/i386: export a new function x86_bios_rom_reload Ani Sinha
` (22 subsequent siblings)
33 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
Richard Henderson, Eduardo Habkost
Cc: Ani Sinha, kraxel, qemu-devel
For confidential guests, bios image must be reinitialized upon reset. This
is because bios memory is encrypted and hence once the old confidential
kvm context is destroyed, it cannot be decrypted. It needs to be reinitilized.
In order to do that, this change refactors x86_bios_rom_init() code so that
parts of it can be called during confidential guest reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/x86-common.c | 50 ++++++++++++++++++++++++++++++++------------
1 file changed, 37 insertions(+), 13 deletions(-)
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index de4cd7650a..c98abaf368 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -1020,17 +1020,11 @@ void x86_isa_bios_init(MemoryRegion *isa_bios, MemoryRegion *isa_memory,
memory_region_set_readonly(isa_bios, read_only);
}
-void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
- MemoryRegion *rom_memory, bool isapc_ram_fw)
+static int get_bios_size(X86MachineState *x86ms,
+ const char *bios_name, char *filename)
{
- const char *bios_name;
- char *filename;
int bios_size;
- ssize_t ret;
- /* BIOS load */
- bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
- filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
if (filename) {
bios_size = get_image_size(filename, NULL);
} else {
@@ -1040,6 +1034,21 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
(bios_size % 65536) != 0) {
goto bios_error;
}
+
+ return bios_size;
+
+ bios_error:
+ fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
+ exit(1);
+}
+
+static void load_bios_from_file(X86MachineState *x86ms, const char *bios_name,
+ char *filename, int bios_size,
+ bool isapc_ram_fw)
+{
+ ssize_t ret;
+
+ /* BIOS load */
if (machine_require_guest_memfd(MACHINE(x86ms))) {
memory_region_init_ram_guest_memfd(&x86ms->bios, NULL, "pc.bios",
bios_size, &error_fatal);
@@ -1068,7 +1077,26 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
goto bios_error;
}
}
- g_free(filename);
+
+ return;
+
+ bios_error:
+ fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
+ exit(1);
+}
+
+void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
+ MemoryRegion *rom_memory, bool isapc_ram_fw)
+{
+ int bios_size;
+ const char *bios_name;
+ g_autofree char *filename;
+
+ bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
+ filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+
+ bios_size = get_bios_size(x86ms, bios_name, filename);
+ load_bios_from_file(x86ms, bios_name, filename, bios_size, isapc_ram_fw);
if (!machine_require_guest_memfd(MACHINE(x86ms))) {
/* map the last 128KB of the BIOS in ISA space */
@@ -1081,8 +1109,4 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
(uint32_t)(-bios_size),
&x86ms->bios);
return;
-
-bios_error:
- fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
- exit(1);
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 12/34] hw/i386: export a new function x86_bios_rom_reload
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (10 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 11/34] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 13/34] kvm/i386: reload firmware for confidential guest reset Ani Sinha
` (21 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Michael S. Tsirkin, Marcel Apfelbaum
Cc: Ani Sinha, kraxel, Bernhard Beschow, qemu-devel
Confidential guest smust reload their bios rom upon reset. This is because
bios memory is encrypted and upon reset, the contents of the old bios memory
is lost and cannot be re-used. To this end, export a new x86 function
x86_bios_rom_reload() to reload the bios again. This function will be used in
the subsequent patches.
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/x86-common.c | 21 +++++++++++++++++++++
include/hw/i386/x86.h | 1 +
2 files changed, 22 insertions(+)
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index c98abaf368..a420112666 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -1085,6 +1085,27 @@ static void load_bios_from_file(X86MachineState *x86ms, const char *bios_name,
exit(1);
}
+void x86_bios_rom_reload(X86MachineState *x86ms)
+{
+ int bios_size;
+ const char *bios_name;
+ char *filename;
+
+ if (memory_region_size(&x86ms->bios) == 0) {
+ /* if -bios is not used */
+ return;
+ }
+
+ bios_name = MACHINE(x86ms)->firmware ?: "bios.bin";
+ filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+
+ bios_size = get_bios_size(x86ms, bios_name, filename);
+
+ void *ptr = memory_region_get_ram_ptr(&x86ms->bios);
+ load_image_size(filename, ptr, bios_size);
+ x86_firmware_configure(0x100000000ULL - bios_size, ptr, bios_size);
+}
+
void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
MemoryRegion *rom_memory, bool isapc_ram_fw)
{
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 23be627437..a85a5600ce 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -125,6 +125,7 @@ void x86_isa_bios_init(MemoryRegion *isa_bios, MemoryRegion *isa_memory,
MemoryRegion *bios, bool read_only);
void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
MemoryRegion *rom_memory, bool isapc_ram_fw);
+void x86_bios_rom_reload(X86MachineState *x86ms);
void x86_load_linux(X86MachineState *x86ms,
FWCfgState *fw_cfg,
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 13/34] kvm/i386: reload firmware for confidential guest reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (11 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 12/34] hw/i386: export a new function x86_bios_rom_reload Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 14/34] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
` (20 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
When IGVM is not being used by the confidential guest, the guest firmware has
to be reloaded explicitly again into memory. This is because, the memory into
which the firmware was loaded before reset was encrypted and is thus lost
upon reset. When IGVM is used, it is expected that the IGVM will contain the
guest firmware and the execution of the IGVM directives will set up the guest
firmware memory.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/kvm.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index feb3f3cf3c..5c8ec77212 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3416,7 +3416,14 @@ int kvm_arch_on_vmfd_change(MachineState *ms, KVMState *s)
if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
X86MachineState *x86ms = X86_MACHINE(ms);
-
+ /*
+ * For confidential guests, reload bios ROM if IGVM is not specified.
+ * If an IGVM file is specified then the firmware must be provided
+ * in the IGVM file.
+ */
+ if (ms->cgs && !x86ms->igvm) {
+ x86_bios_rom_reload(x86ms);
+ }
if (x86_machine_is_smm_enabled(x86ms)) {
memory_listener_register(&smram_listener.listener,
&smram_address_space);
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 14/34] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (12 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 13/34] kvm/i386: reload firmware for confidential guest reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 15/34] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
` (19 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
Confidential guests needs to generate a new KVM file descriptor upon virtual
machine reset. Existing VCPUs needs to be reattached to this new
KVM VM file descriptor. As a part of this, new VCPU file descriptors against
this new KVM VM file descriptor needs to be created and re-initialized.
Resources allocated against the old VCPU fds needs to be released. This change
makes this happen.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 215 +++++++++++++++++++++++++++++++++--------
accel/kvm/trace-events | 1 +
2 files changed, 174 insertions(+), 42 deletions(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 47589f92e2..7be39111bb 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -127,6 +127,10 @@ static NotifierList kvm_irqchip_change_notifiers =
static NotifierWithReturnList register_vmfd_changed_notifiers =
NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
+static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp);
+static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp);
+static int vcpu_unmap_regions(KVMState *s, CPUState *cpu);
+
struct KVMResampleFd {
int gsi;
EventNotifier *resample_event;
@@ -420,6 +424,90 @@ err:
return ret;
}
+static void kvm_create_vcpu_internal(CPUState *cpu, KVMState *s, int kvm_fd)
+{
+ cpu->kvm_fd = kvm_fd;
+ cpu->kvm_state = s;
+ if (!s->guest_state_protected) {
+ cpu->vcpu_dirty = true;
+ }
+ cpu->dirty_pages = 0;
+ cpu->throttle_us_per_full = 0;
+
+ return;
+}
+
+static int kvm_rebind_vcpus(Error **errp)
+{
+ CPUState *cpu;
+ unsigned long vcpu_id;
+ KVMState *s = kvm_state;
+ int kvm_fd, ret = 0;
+
+ CPU_FOREACH(cpu) {
+ vcpu_id = kvm_arch_vcpu_id(cpu);
+
+ if (cpu->kvm_fd) {
+ close(cpu->kvm_fd);
+ }
+
+ ret = kvm_arch_destroy_vcpu(cpu);
+ if (ret < 0) {
+ goto err;
+ }
+
+ if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+ s->coalesced_mmio_ring = NULL;
+ }
+
+ ret = vcpu_unmap_regions(s, cpu);
+ if (ret < 0) {
+ goto err;
+ }
+
+ ret = kvm_arch_pre_create_vcpu(cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
+ kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
+ if (kvm_fd < 0) {
+ error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu (%s)",
+ vcpu_id, strerror(kvm_fd));
+ return kvm_fd;
+ }
+
+ kvm_create_vcpu_internal(cpu, s, kvm_fd);
+
+ ret = map_kvm_run(s, cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
+ if (s->kvm_dirty_ring_size) {
+ ret = map_kvm_dirty_gfns(s, cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+ }
+
+ ret = kvm_arch_init_vcpu(cpu);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "kvm_init_vcpu: kvm_arch_init_vcpu failed (%lu)",
+ vcpu_id);
+ }
+
+ close(cpu->kvm_vcpu_stats_fd);
+ cpu->kvm_vcpu_stats_fd = kvm_vcpu_ioctl(cpu, KVM_GET_STATS_FD, NULL);
+ kvm_init_cpu_signals(cpu);
+ }
+ trace_kvm_rebind_vcpus();
+
+ err:
+ return ret;
+}
+
static void kvm_park_vcpu(CPUState *cpu)
{
struct KVMParkedVcpu *vcpu;
@@ -483,13 +571,7 @@ static int kvm_create_vcpu(CPUState *cpu)
}
}
- cpu->kvm_fd = kvm_fd;
- cpu->kvm_state = s;
- if (!s->guest_state_protected) {
- cpu->vcpu_dirty = true;
- }
- cpu->dirty_pages = 0;
- cpu->throttle_us_per_full = 0;
+ kvm_create_vcpu_internal(cpu, s, kvm_fd);
trace_kvm_create_vcpu(cpu->cpu_index, vcpu_id, kvm_fd);
@@ -508,19 +590,11 @@ int kvm_create_and_park_vcpu(CPUState *cpu)
return ret;
}
-static int do_kvm_destroy_vcpu(CPUState *cpu)
+static int vcpu_unmap_regions(KVMState *s, CPUState *cpu)
{
- KVMState *s = kvm_state;
int mmap_size;
int ret = 0;
- trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
-
- ret = kvm_arch_destroy_vcpu(cpu);
- if (ret < 0) {
- goto err;
- }
-
mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
if (mmap_size < 0) {
ret = mmap_size;
@@ -548,39 +622,47 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
cpu->kvm_dirty_gfns = NULL;
}
- kvm_park_vcpu(cpu);
-err:
+ err:
return ret;
}
-void kvm_destroy_vcpu(CPUState *cpu)
-{
- if (do_kvm_destroy_vcpu(cpu) < 0) {
- error_report("kvm_destroy_vcpu failed");
- exit(EXIT_FAILURE);
- }
-}
-
-int kvm_init_vcpu(CPUState *cpu, Error **errp)
+static int do_kvm_destroy_vcpu(CPUState *cpu)
{
KVMState *s = kvm_state;
- int mmap_size;
- int ret;
+ int ret = 0;
- trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+ trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
- ret = kvm_arch_pre_create_vcpu(cpu, errp);
+ ret = kvm_arch_destroy_vcpu(cpu);
if (ret < 0) {
goto err;
}
- ret = kvm_create_vcpu(cpu);
+ /* If I am the CPU that created coalesced_mmio_ring, then discard it */
+ if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+ s->coalesced_mmio_ring = NULL;
+ }
+
+ ret = vcpu_unmap_regions(s, cpu);
if (ret < 0) {
- error_setg_errno(errp, -ret,
- "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
- kvm_arch_vcpu_id(cpu));
goto err;
}
+ kvm_park_vcpu(cpu);
+err:
+ return ret;
+}
+
+void kvm_destroy_vcpu(CPUState *cpu)
+{
+ if (do_kvm_destroy_vcpu(cpu) < 0) {
+ error_report("kvm_destroy_vcpu failed");
+ exit(EXIT_FAILURE);
+ }
+}
+
+static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp)
+{
+ int mmap_size, ret = 0;
mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
if (mmap_size < 0) {
@@ -605,14 +687,53 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
(void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
}
+ err:
+ return ret;
+}
+
+static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp)
+{
+ int ret = 0;
+ /* Use MAP_SHARED to share pages with the kernel */
+ cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_bytes,
+ PROT_READ | PROT_WRITE, MAP_SHARED,
+ cpu->kvm_fd,
+ PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
+ if (cpu->kvm_dirty_gfns == MAP_FAILED) {
+ ret = -errno;
+ }
+
+ return ret;
+}
+
+int kvm_init_vcpu(CPUState *cpu, Error **errp)
+{
+ KVMState *s = kvm_state;
+ int ret;
+
+ trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+ ret = kvm_arch_pre_create_vcpu(cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
+ ret = kvm_create_vcpu(cpu);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
+ kvm_arch_vcpu_id(cpu));
+ goto err;
+ }
+
+ ret = map_kvm_run(s, cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
if (s->kvm_dirty_ring_size) {
- /* Use MAP_SHARED to share pages with the kernel */
- cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_bytes,
- PROT_READ | PROT_WRITE, MAP_SHARED,
- cpu->kvm_fd,
- PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
- if (cpu->kvm_dirty_gfns == MAP_FAILED) {
- ret = -errno;
+ ret = map_kvm_dirty_gfns(s, cpu, errp);
+ if (ret < 0) {
goto err;
}
}
@@ -2707,6 +2828,16 @@ static int kvm_reset_vmfd(MachineState *ms)
}
assert(!err);
+ /*
+ * rebind new vcpu fds with the new kvm fds
+ * These can only be called after kvm_arch_on_vmfd_change()
+ */
+ ret = kvm_rebind_vcpus(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
/* these can be only called after ram_block_rebind() */
memory_listener_register(&kml->listener, &address_space_memory);
memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e4beda0148..4a8921c632 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,6 +15,7 @@ kvm_park_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
kvm_unpark_vcpu(unsigned long arch_cpu_id, const char *msg) "id: %lu %s"
kvm_irqchip_commit_routes(void) ""
kvm_reset_vmfd(void) ""
+kvm_rebind_vcpus(void) ""
kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
kvm_irqchip_release_virq(int virq) "virq %d"
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 15/34] i386/tdx: refactor TDX firmware memory initialization code into a new function
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (13 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 14/34] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 16/34] i386/tdx: finalize TDX guest state upon reset Ani Sinha
` (18 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
A new helper function is introduced that refactors all firmware memory
initialization code into a separate function. No functional change.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/tdx.c | 73 ++++++++++++++++++++++++-------------------
1 file changed, 40 insertions(+), 33 deletions(-)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index a3e81e1c0c..fd8e3de969 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -295,14 +295,51 @@ static void tdx_post_init_vcpus(void)
}
}
-static void tdx_finalize_vm(Notifier *notifier, void *unused)
+static void tdx_init_fw_mem_region(void)
{
TdxFirmware *tdvf = &tdx_guest->tdvf;
TdxFirmwareEntry *entry;
- RAMBlock *ram_block;
Error *local_err = NULL;
int r;
+ for_each_tdx_fw_entry(tdvf, entry) {
+ struct kvm_tdx_init_mem_region region;
+ uint32_t flags;
+
+ region = (struct kvm_tdx_init_mem_region) {
+ .source_addr = (uintptr_t)entry->mem_ptr,
+ .gpa = entry->address,
+ .nr_pages = entry->size >> 12,
+ };
+
+ flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
+ KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+ do {
+ error_free(local_err);
+ local_err = NULL;
+ r = tdx_vcpu_ioctl(first_cpu, KVM_TDX_INIT_MEM_REGION, flags,
+ ®ion, &local_err);
+ } while (r == -EAGAIN || r == -EINTR);
+ if (r < 0) {
+ error_report_err(local_err);
+ exit(1);
+ }
+
+ if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+ entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+ qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
+ entry->mem_ptr = NULL;
+ }
+ }
+}
+
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+ TdxFirmware *tdvf = &tdx_guest->tdvf;
+ TdxFirmwareEntry *entry;
+ RAMBlock *ram_block;
+
tdx_init_ram_entries();
for_each_tdx_fw_entry(tdvf, entry) {
@@ -339,37 +376,7 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
tdx_post_init_vcpus();
-
- for_each_tdx_fw_entry(tdvf, entry) {
- struct kvm_tdx_init_mem_region region;
- uint32_t flags;
-
- region = (struct kvm_tdx_init_mem_region) {
- .source_addr = (uintptr_t)entry->mem_ptr,
- .gpa = entry->address,
- .nr_pages = entry->size >> 12,
- };
-
- flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
- KVM_TDX_MEASURE_MEMORY_REGION : 0;
-
- do {
- error_free(local_err);
- local_err = NULL;
- r = tdx_vcpu_ioctl(first_cpu, KVM_TDX_INIT_MEM_REGION, flags,
- ®ion, &local_err);
- } while (r == -EAGAIN || r == -EINTR);
- if (r < 0) {
- error_report_err(local_err);
- exit(1);
- }
-
- if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
- entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
- qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
- entry->mem_ptr = NULL;
- }
- }
+ tdx_init_fw_mem_region();
/*
* TDVF image has been copied into private region above via
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 16/34] i386/tdx: finalize TDX guest state upon reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (14 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 15/34] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 17/34] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
` (17 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
When the confidential virtual machine KVM file descriptor changes due to the
guest reset, some TDX specific setup steps needs to be done again. This
includes finalizing the initial guest launch state again. This change
re-executes some parts of the TDX setup during the device reset phaze using a
resettable interface. This finalizes the guest launch state again and locks
it in. Machine done notifier which was previously used is no longer needed as
the same code is now executed as a part of VM reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/tdx.c | 38 +++++++++++++++++++++++++++++++-----
target/i386/kvm/tdx.h | 1 +
target/i386/kvm/trace-events | 3 +++
3 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index fd8e3de969..37e91d95e1 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
#include "crypto/hash.h"
#include "system/kvm_int.h"
#include "system/runstate.h"
+#include "system/reset.h"
#include "system/system.h"
#include "system/ramblock.h"
#include "system/address-spaces.h"
@@ -38,6 +39,7 @@
#include "kvm_i386.h"
#include "tdx.h"
#include "tdx-quote-generator.h"
+#include "trace.h"
#include "standard-headers/asm-x86/kvm_para.h"
@@ -389,9 +391,19 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
CONFIDENTIAL_GUEST_SUPPORT(tdx_guest)->ready = true;
}
-static Notifier tdx_machine_done_notify = {
- .notify = tdx_finalize_vm,
-};
+static void tdx_handle_reset(Object *obj, ResetType type)
+{
+ if (!runstate_is_running() && !phase_check(PHASE_MACHINE_READY)) {
+ return;
+ }
+
+ if (!kvm_enable_hypercall(BIT_ULL(KVM_HC_MAP_GPA_RANGE))) {
+ error_setg(&error_fatal, "KVM_HC_MAP_GPA_RANGE not enabled for guest");
+ }
+
+ tdx_finalize_vm(NULL, NULL);
+ trace_tdx_handle_reset();
+}
/*
* Some CPUID bits change from fixed1 to configurable bits when TDX module
@@ -738,8 +750,6 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
*/
kvm_readonly_mem_allowed = false;
- qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
-
tdx_guest = tdx;
return 0;
}
@@ -1505,6 +1515,7 @@ OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
TDX_GUEST,
X86_CONFIDENTIAL_GUEST,
{ TYPE_USER_CREATABLE },
+ { TYPE_RESETTABLE_INTERFACE },
{ NULL })
static void tdx_guest_init(Object *obj)
@@ -1538,16 +1549,24 @@ static void tdx_guest_init(Object *obj)
tdx->event_notify_vector = -1;
tdx->event_notify_apicid = -1;
+ qemu_register_resettable(obj);
}
static void tdx_guest_finalize(Object *obj)
{
}
+static ResettableState *tdx_reset_state(Object *obj)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+ return &tdx->reset_state;
+}
+
static void tdx_guest_class_init(ObjectClass *oc, const void *data)
{
ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
+ ResettableClass *rc = RESETTABLE_CLASS(oc);
klass->kvm_init = tdx_kvm_init;
klass->can_rebuild_guest_state = true;
@@ -1555,4 +1574,13 @@ static void tdx_guest_class_init(ObjectClass *oc, const void *data)
x86_klass->cpu_instance_init = tdx_cpu_instance_init;
x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
x86_klass->check_features = tdx_check_features;
+
+ /*
+ * the exit phase makes sure sev handles reset after all legacy resets
+ * have taken place (in the hold phase) and IGVM has also properly
+ * set up the boot state.
+ */
+ rc->phases.exit = tdx_handle_reset;
+ rc->get_state = tdx_reset_state;
+
}
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 1c38faf983..264fbe530c 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -70,6 +70,7 @@ typedef struct TdxGuest {
uint32_t event_notify_vector;
uint32_t event_notify_apicid;
+ ResettableState reset_state;
} TdxGuest;
#ifdef CONFIG_TDX
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 2d213c9f9b..a386234571 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -14,3 +14,6 @@ kvm_xen_soft_reset(void) ""
kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
kvm_xen_set_vcpu_attr(int cpu, int type, uint64_t gpa) "vcpu attr cpu %d type %d gpa 0x%" PRIx64
kvm_xen_set_vcpu_callback(int cpu, int vector) "callback vcpu %d vector %d"
+
+# tdx.c
+tdx_handle_reset(void) ""
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 17/34] i386/tdx: add a pre-vmfd change notifier to reset tdx state
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (15 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 16/34] i386/tdx: finalize TDX guest state upon reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 18/34] i386/sev: add migration blockers only once Ani Sinha
` (16 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
During reset, when the VM file descriptor is changed, the TDX state needs to be
re-initialized. A notifier callback is implemented to reset the old
state and free memory before the new state is initialized post VM file
descriptor change.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/tdx.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 37e91d95e1..4cae99c281 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -405,6 +405,36 @@ static void tdx_handle_reset(Object *obj, ResetType type)
trace_tdx_handle_reset();
}
+/* TDX guest reset will require us to reinitialize some of tdx guest state. */
+static int set_tdx_vm_uninitialized(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ TdxFirmware *fw = &tdx_guest->tdvf;
+
+ if (!((VmfdChangeNotifier *)data)->pre) {
+ return 0;
+ }
+
+ if (tdx_guest->initialized) {
+ tdx_guest->initialized = false;
+ }
+
+ g_free(tdx_guest->ram_entries);
+
+ /*
+ * the firmware entries will be parsed again, see
+ * x86_firmware_configure() -> tdx_parse_tdvf()
+ */
+ fw->entries = 0;
+ g_free(fw->entries);
+
+ return 0;
+}
+
+static NotifierWithReturn tdx_vmfd_change_notifier = {
+ .notify = set_tdx_vm_uninitialized,
+};
+
/*
* Some CPUID bits change from fixed1 to configurable bits when TDX module
* supports TDX_FEATURES0.VE_REDUCTION. e.g., MCA/MCE/MTRR/CORE_CAPABILITY.
@@ -1549,6 +1579,7 @@ static void tdx_guest_init(Object *obj)
tdx->event_notify_vector = -1;
tdx->event_notify_apicid = -1;
+ kvm_vmfd_add_change_notifier(&tdx_vmfd_change_notifier);
qemu_register_resettable(obj);
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 18/34] i386/sev: add migration blockers only once
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (16 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 17/34] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 19/34] i386/sev: add notifiers " Ani Sinha
` (15 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti
Cc: Ani Sinha, kraxel, Prasad Pandit, kvm, qemu-devel
sev_launch_finish() and sev_snp_launch_finish() could be called multiple times
when the confidential guest is being reset/rebooted. The migration
blockers should not be added multiple times, once per invocation. This change
makes sure that the migration blockers are added only one time by adding the
migration blockers to the vm state change handler when the vm transitions to
the running state. Subsequent reboots do not change the state of the vm.
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/sev.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 66e38ca32e..260d8ef88b 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1421,11 +1421,6 @@ sev_launch_finish(SevCommonState *sev_common)
}
sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
-
- /* add migration blocker */
- error_setg(&sev_mig_blocker,
- "SEV: Migration is not implemented");
- migrate_add_blocker(&sev_mig_blocker, &error_fatal);
}
static int snp_launch_update_data(uint64_t gpa, void *hva, size_t len,
@@ -1608,7 +1603,6 @@ static void
sev_snp_launch_finish(SevCommonState *sev_common)
{
int ret, error;
- Error *local_err = NULL;
OvmfSevMetadata *metadata;
SevLaunchUpdateData *data;
SevSnpGuestState *sev_snp = SEV_SNP_GUEST(sev_common);
@@ -1655,15 +1649,6 @@ sev_snp_launch_finish(SevCommonState *sev_common)
kvm_mark_guest_state_protected();
sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
-
- /* add migration blocker */
- error_setg(&sev_mig_blocker,
- "SEV-SNP: Migration is not implemented");
- ret = migrate_add_blocker(&sev_mig_blocker, &local_err);
- if (local_err) {
- error_report_err(local_err);
- exit(1);
- }
}
@@ -1676,6 +1661,11 @@ sev_vm_state_change(void *opaque, bool running, RunState state)
if (running) {
if (!sev_check_state(sev_common, SEV_STATE_RUNNING)) {
klass->launch_finish(sev_common);
+
+ /* add migration blocker */
+ error_setg(&sev_mig_blocker,
+ "SEV: Migration is not implemented");
+ migrate_add_blocker(&sev_mig_blocker, &error_fatal);
}
}
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 19/34] i386/sev: add notifiers only once
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (17 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 18/34] i386/sev: add migration blockers only once Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 20/34] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
` (14 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti
Cc: Ani Sinha, kraxel, kvm, qemu-devel
The various notifiers that are used needs to be installed only once not on
every initialization. This includes the vm state change notifier and others.
This change uses 'cgs->ready' flag to install the notifiers only one time,
the first time.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/sev.c | 36 +++++++++++++++++++-----------------
1 file changed, 19 insertions(+), 17 deletions(-)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 260d8ef88b..647f4bf63d 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1920,8 +1920,9 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return -1;
}
- qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
-
+ if (!cgs->ready) {
+ qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
+ }
cgs->ready = true;
return 0;
@@ -1943,22 +1944,23 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return -1;
}
- /*
- * SEV uses these notifiers to register/pin pages prior to guest use,
- * but SNP relies on guest_memfd for private pages, which has its
- * own internal mechanisms for registering/pinning private memory.
- */
- ram_block_notifier_add(&sev_ram_notifier);
-
- /*
- * The machine done notify event is used for SEV guests to get the
- * measurement of the encrypted images. When SEV-SNP is enabled, the
- * measurement is part of the guest attestation process where it can
- * be collected without any reliance on the VMM. So skip registering
- * the notifier for SNP in favor of using guest attestation instead.
- */
- qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
+ if (!cgs->ready) {
+ /*
+ * SEV uses these notifiers to register/pin pages prior to guest use,
+ * but SNP relies on guest_memfd for private pages, which has its
+ * own internal mechanisms for registering/pinning private memory.
+ */
+ ram_block_notifier_add(&sev_ram_notifier);
+ /*
+ * The machine done notify event is used for SEV guests to get the
+ * measurement of the encrypted images. When SEV-SNP is enabled, the
+ * measurement is part of the guest attestation process where it can
+ * be collected without any reliance on the VMM. So skip registering
+ * the notifier for SNP in favor of using guest attestation instead.
+ */
+ qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
+ }
return 0;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 20/34] i386/sev: free existing launch update data and kernel hashes data on init
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (18 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 19/34] i386/sev: add notifiers " Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 21/34] i386/sev: add support for confidential guest reset Ani Sinha
` (13 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti
Cc: Ani Sinha, kraxel, kvm, qemu-devel
If there is existing launch update data and kernel hashes data, they need to be
freed when initialization code is executed. This is important for resettable
confidential guests where the initialization happens once every reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/sev.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 647f4bf63d..b3893e431c 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1773,6 +1773,7 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
uint32_t ebx;
uint32_t host_cbitpos;
struct sev_user_data_status status = {};
+ SevLaunchUpdateData *data, *next_elm;
SevCommonState *sev_common = SEV_COMMON(cgs);
SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(cgs);
X86ConfidentialGuestClass *x86_klass =
@@ -1780,6 +1781,11 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
sev_common->state = SEV_STATE_UNINIT;
+ /* free existing launch update data if any */
+ QTAILQ_FOREACH_SAFE(data, &launch_update, next, next_elm) {
+ g_free(data);
+ }
+
host_cpuid(0x8000001F, 0, NULL, &ebx, NULL, NULL);
host_cbitpos = ebx & 0x3f;
@@ -1968,6 +1974,8 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
{
MachineState *ms = MACHINE(qdev_get_machine());
X86MachineState *x86ms = X86_MACHINE(ms);
+ SevCommonState *sev_common = SEV_COMMON(cgs);
+ SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(sev_common);
if (x86ms->smm == ON_OFF_AUTO_AUTO) {
x86ms->smm = ON_OFF_AUTO_OFF;
@@ -1976,6 +1984,10 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return -1;
}
+ /* free existing kernel hashes data if any */
+ g_free(sev_snp_guest->kernel_hashes_data);
+ sev_snp_guest->kernel_hashes_data = NULL;
+
return 0;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 21/34] i386/sev: add support for confidential guest reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (19 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 20/34] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
` (12 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti
Cc: Ani Sinha, kraxel, kvm, qemu-devel
When the KVM VM file descriptor changes as a part of the confidential guest
reset mechanism, it necessary to create a new confidential guest context and
re-encrypt the VM memory. This happens for SEV-ES and SEV-SNP virtual machines
as a part of SEV_LAUNCH_FINISH, SEV_SNP_LAUNCH_FINISH operations.
A new resettable interface for SEV module has been added. A new reset callback
for the reset 'exit' state has been implemented to perform the above operations
when the VM file descriptor has changed during VM reset.
Tracepoints has been added also for tracing purpose.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/sev.c | 58 ++++++++++++++++++++++++++++++++++++++++
target/i386/trace-events | 1 +
2 files changed, 59 insertions(+)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index b3893e431c..549e624176 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -30,8 +30,10 @@
#include "system/kvm.h"
#include "kvm/kvm_i386.h"
#include "sev.h"
+#include "system/cpus.h"
#include "system/system.h"
#include "system/runstate.h"
+#include "system/reset.h"
#include "trace.h"
#include "migration/blocker.h"
#include "qom/object.h"
@@ -86,6 +88,10 @@ typedef struct QEMU_PACKED PaddedSevHashTable {
uint8_t padding[ROUND_UP(sizeof(SevHashTable), 16) - sizeof(SevHashTable)];
} PaddedSevHashTable;
+static void sev_handle_reset(Object *obj, ResetType type);
+
+SevKernelLoaderContext sev_load_ctx = {};
+
QEMU_BUILD_BUG_ON(sizeof(PaddedSevHashTable) % 16 != 0);
#define SEV_INFO_BLOCK_GUID "00f771de-1a7e-4fcb-890e-68c77e2fb44e"
@@ -129,6 +135,7 @@ struct SevCommonState {
uint8_t build_id;
int sev_fd;
SevState state;
+ ResettableState reset_state;
QTAILQ_HEAD(, SevLaunchVmsa) launch_vmsa;
};
@@ -1666,6 +1673,11 @@ sev_vm_state_change(void *opaque, bool running, RunState state)
error_setg(&sev_mig_blocker,
"SEV: Migration is not implemented");
migrate_add_blocker(&sev_mig_blocker, &error_fatal);
+ /*
+ * mark SEV guest as resettable so that we can reinitialize
+ * SEV upon reset.
+ */
+ qemu_register_resettable(OBJECT(sev_common));
}
}
}
@@ -1991,6 +2003,41 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return 0;
}
+/*
+ * handle sev vm reset
+ */
+static void sev_handle_reset(Object *obj, ResetType type)
+{
+ SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
+ SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(sev_common);
+
+ if (!sev_common) {
+ return;
+ }
+
+ if (!runstate_is_running()) {
+ return;
+ }
+
+ sev_add_kernel_loader_hashes(&sev_load_ctx, &error_fatal);
+ if (sev_es_enabled() && !sev_snp_enabled()) {
+ sev_launch_get_measure(NULL, NULL);
+ }
+ if (!sev_check_state(sev_common, SEV_STATE_RUNNING)) {
+ /* this calls sev_snp_launch_finish() etc */
+ klass->launch_finish(sev_common);
+ }
+
+ trace_sev_handle_reset();
+ return;
+}
+
+static ResettableState *sev_reset_state(Object *obj)
+{
+ SevCommonState *sev_common = SEV_COMMON(obj);
+ return &sev_common->reset_state;
+}
+
int
sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp)
{
@@ -2469,6 +2516,8 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
return false;
}
+ /* save the context here so that it can be re-used when vm is reset */
+ memcpy(&sev_load_ctx, ctx, sizeof(*ctx));
return klass->build_kernel_loader_hashes(sev_common, area, ctx, errp);
}
@@ -2729,8 +2778,16 @@ static void
sev_common_class_init(ObjectClass *oc, const void *data)
{
ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
+ ResettableClass *rc = RESETTABLE_CLASS(oc);
klass->kvm_init = sev_common_kvm_init;
+ /*
+ * the exit phase makes sure sev handles reset after all legacy resets
+ * have taken place (in the hold phase) and IGVM has also properly
+ * set up the boot state.
+ */
+ rc->phases.exit = sev_handle_reset;
+ rc->get_state = sev_reset_state;
object_class_property_add_str(oc, "sev-device",
sev_common_get_sev_device,
@@ -2780,6 +2837,7 @@ static const TypeInfo sev_common_info = {
.abstract = true,
.interfaces = (const InterfaceInfo[]) {
{ TYPE_USER_CREATABLE },
+ { TYPE_RESETTABLE_INTERFACE },
{ }
}
};
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 51301673f0..b320f655ee 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -14,3 +14,4 @@ kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce %s data
kvm_sev_snp_launch_start(uint64_t policy, char *gosvw) "policy 0x%" PRIx64 " gosvw %s"
kvm_sev_snp_launch_update(uint64_t src, uint64_t gpa, uint64_t len, const char *type) "src 0x%" PRIx64 " gpa 0x%" PRIx64 " len 0x%" PRIx64 " (%s page)"
kvm_sev_snp_launch_finish(char *id_block, char *id_auth, char *host_data) "id_block %s id_auth %s host_data %s"
+sev_handle_reset(void) ""
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (20 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 21/34] i386/sev: add support for confidential guest reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 14:07 ` Cédric Le Goater
2026-02-18 11:42 ` [PATCH v5 23/34] kvm/i8254: refactor pit initialization into a helper Ani Sinha
` (11 subsequent siblings)
33 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Alex Williamson, Cédric Le Goater; +Cc: Ani Sinha, kraxel, qemu-devel
Normally the vfio pseudo device file descriptor lives for the life of the VM.
However, when the kvm VM file descriptor changes, a new file descriptor
for the pseudo device needs to be generated against the new kvm VM descriptor.
Other existing vfio descriptors needs to be reattached to the new pseudo device
descriptor. This change performs the above steps.
Tested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/vfio/helpers.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 92 insertions(+)
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index f68f8165d0..e2bedd15ec 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -116,6 +116,89 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
* we'll re-use it should another vfio device be attached before then.
*/
int vfio_kvm_device_fd = -1;
+
+/*
+ * Confidential virtual machines:
+ * During reset of confidential vms, the kvm vm file descriptor changes.
+ * In this case, the old vfio kvm file descriptor is
+ * closed and a new descriptor is created against the new kvm vm file
+ * descriptor.
+ */
+
+typedef struct VFIODeviceFd {
+ int fd;
+ QLIST_ENTRY(VFIODeviceFd) node;
+} VFIODeviceFd;
+
+static QLIST_HEAD(, VFIODeviceFd) vfio_device_fds =
+ QLIST_HEAD_INITIALIZER(vfio_device_fds);
+
+static void vfio_device_fd_list_add(int fd)
+{
+ VFIODeviceFd *file_fd;
+ file_fd = g_malloc0(sizeof(*file_fd));
+ file_fd->fd = fd;
+ QLIST_INSERT_HEAD(&vfio_device_fds, file_fd, node);
+}
+
+static void vfio_device_fd_list_remove(int fd)
+{
+ VFIODeviceFd *file_fd, *next;
+
+ QLIST_FOREACH_SAFE(file_fd, &vfio_device_fds, node, next) {
+ if (file_fd->fd == fd) {
+ QLIST_REMOVE(file_fd, node);
+ g_free(file_fd);
+ break;
+ }
+ }
+}
+
+static int vfio_device_fd_rebind(NotifierWithReturn *notifier, void *data,
+ Error **errp)
+{
+ VFIODeviceFd *file_fd;
+ int ret = 0;
+ struct kvm_device_attr attr = {
+ .group = KVM_DEV_VFIO_FILE,
+ .attr = KVM_DEV_VFIO_FILE_ADD,
+ };
+ struct kvm_create_device cd = {
+ .type = KVM_DEV_TYPE_VFIO,
+ };
+
+ /* we are not interested in pre vmfd change notification */
+ if (((VmfdChangeNotifier *)data)->pre) {
+ return 0;
+ }
+
+ if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
+ error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
+ return -errno;
+ }
+
+ if (vfio_kvm_device_fd != -1) {
+ close(vfio_kvm_device_fd);
+ }
+
+ vfio_kvm_device_fd = cd.fd;
+
+ QLIST_FOREACH(file_fd, &vfio_device_fds, node) {
+ attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
+ if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+ error_setg_errno(errp, errno,
+ "Failed to add fd %d to KVM VFIO device",
+ file_fd->fd);
+ ret = -errno;
+ }
+ }
+ return ret;
+}
+
+static struct NotifierWithReturn vfio_vmfd_change_notifier = {
+ .notify = vfio_device_fd_rebind,
+};
+
#endif
void vfio_kvm_device_close(void)
@@ -153,6 +236,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
}
vfio_kvm_device_fd = cd.fd;
+ /*
+ * If the vm file descriptor changes, add a notifier so that we can
+ * re-create the vfio_kvm_device_fd.
+ */
+ kvm_vmfd_add_change_notifier(&vfio_vmfd_change_notifier);
}
if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
@@ -160,6 +248,8 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
fd);
return -errno;
}
+
+ vfio_device_fd_list_add(fd);
#endif
return 0;
}
@@ -183,6 +273,8 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
"Failed to remove fd %d from KVM VFIO device", fd);
return -errno;
}
+
+ vfio_device_fd_list_remove(fd);
#endif
return 0;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 23/34] kvm/i8254: refactor pit initialization into a helper
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (21 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 24/34] kvm/i8254: add support for confidential guest reset Ani Sinha
` (10 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Michael S. Tsirkin, Marcel Apfelbaum
Cc: Ani Sinha, kraxel, qemu-devel
The initialization code will be used again by VM file descriptor change
notifier callback in a subsequent change. So refactor common code into a new
helper function.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/kvm/i8254.c | 68 +++++++++++++++++++++++++--------------------
1 file changed, 38 insertions(+), 30 deletions(-)
diff --git a/hw/i386/kvm/i8254.c b/hw/i386/kvm/i8254.c
index 81e742f866..255047458a 100644
--- a/hw/i386/kvm/i8254.c
+++ b/hw/i386/kvm/i8254.c
@@ -60,6 +60,43 @@ struct KVMPITClass {
DeviceRealize parent_realize;
};
+static void do_pit_initialize(KVMPITState *s, Error **errp)
+{
+ struct kvm_pit_config config = {
+ .flags = 0,
+ };
+ int ret;
+
+ ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_PIT2, &config);
+ if (ret < 0) {
+ error_setg(errp, "Create kernel PIC irqchip failed: %s",
+ strerror(-ret));
+ return;
+ }
+ switch (s->lost_tick_policy) {
+ case LOST_TICK_POLICY_DELAY:
+ break; /* enabled by default */
+ case LOST_TICK_POLICY_DISCARD:
+ if (kvm_check_extension(kvm_state, KVM_CAP_REINJECT_CONTROL)) {
+ struct kvm_reinject_control control = { .pit_reinject = 0 };
+
+ ret = kvm_vm_ioctl(kvm_state, KVM_REINJECT_CONTROL, &control);
+ if (ret < 0) {
+ error_setg(errp,
+ "Can't disable in-kernel PIT reinjection: %s",
+ strerror(-ret));
+ return;
+ }
+ }
+ break;
+ default:
+ error_setg(errp, "Lost tick policy not supported.");
+ return;
+ }
+
+ return;
+}
+
static void kvm_pit_update_clock_offset(KVMPITState *s)
{
int64_t offset, clock_offset;
@@ -241,42 +278,13 @@ static void kvm_pit_realizefn(DeviceState *dev, Error **errp)
PITCommonState *pit = PIT_COMMON(dev);
KVMPITClass *kpc = KVM_PIT_GET_CLASS(dev);
KVMPITState *s = KVM_PIT(pit);
- struct kvm_pit_config config = {
- .flags = 0,
- };
- int ret;
if (!kvm_check_extension(kvm_state, KVM_CAP_PIT_STATE2) ||
!kvm_check_extension(kvm_state, KVM_CAP_PIT2)) {
error_setg(errp, "In-kernel PIT not available");
}
- ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_PIT2, &config);
- if (ret < 0) {
- error_setg(errp, "Create kernel PIC irqchip failed: %s",
- strerror(-ret));
- return;
- }
- switch (s->lost_tick_policy) {
- case LOST_TICK_POLICY_DELAY:
- break; /* enabled by default */
- case LOST_TICK_POLICY_DISCARD:
- if (kvm_check_extension(kvm_state, KVM_CAP_REINJECT_CONTROL)) {
- struct kvm_reinject_control control = { .pit_reinject = 0 };
-
- ret = kvm_vm_ioctl(kvm_state, KVM_REINJECT_CONTROL, &control);
- if (ret < 0) {
- error_setg(errp,
- "Can't disable in-kernel PIT reinjection: %s",
- strerror(-ret));
- return;
- }
- }
- break;
- default:
- error_setg(errp, "Lost tick policy not supported.");
- return;
- }
+ do_pit_initialize(s, errp);
memory_region_init_io(&pit->ioports, OBJECT(dev), NULL, NULL, "kvm-pit", 4);
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 24/34] kvm/i8254: add support for confidential guest reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (22 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 23/34] kvm/i8254: refactor pit initialization into a helper Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 25/34] kvm/hyperv: add synic feature to CPU only if its not enabled Ani Sinha
` (9 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Michael S. Tsirkin, Marcel Apfelbaum
Cc: Ani Sinha, kraxel, qemu-devel
A confidential guest reset involves closing the old virtual machine KVM file
descriptor and opening a new one. Since its a new KVM fd, PIT needs to be
re-initialized again. This is done with the help of a notifier which is invoked
upon KVM vm file descriptor change during the confidential guest reset process.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/kvm/i8254.c | 23 +++++++++++++++++++++++
hw/i386/kvm/trace-events | 1 +
2 files changed, 24 insertions(+)
diff --git a/hw/i386/kvm/i8254.c b/hw/i386/kvm/i8254.c
index 255047458a..70e8fd83cd 100644
--- a/hw/i386/kvm/i8254.c
+++ b/hw/i386/kvm/i8254.c
@@ -35,6 +35,7 @@
#include "hw/core/qdev-properties-system.h"
#include "system/kvm.h"
#include "target/i386/kvm/kvm_i386.h"
+#include "trace.h"
#include "qom/object.h"
#define KVM_PIT_REINJECT_BIT 0
@@ -52,6 +53,8 @@ struct KVMPITState {
LostTickPolicy lost_tick_policy;
bool vm_stopped;
int64_t kernel_clock_offset;
+
+ NotifierWithReturn kvmpit_vmfd_change_notifier;
};
struct KVMPITClass {
@@ -203,6 +206,23 @@ static void kvm_pit_put(PITCommonState *pit)
}
}
+static int kvmpit_post_vmfd_change(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ KVMPITState *s = container_of(notifier, KVMPITState,
+ kvmpit_vmfd_change_notifier);
+
+ /* we are not interested in pre vmfd change notification */
+ if (((VmfdChangeNotifier *)data)->pre) {
+ return 0;
+ }
+
+ do_pit_initialize(s, errp);
+
+ trace_kvmpit_post_vmfd_change();
+ return 0;
+}
+
static void kvm_pit_set_gate(PITCommonState *s, PITChannelState *sc, int val)
{
kvm_pit_get(s);
@@ -292,6 +312,9 @@ static void kvm_pit_realizefn(DeviceState *dev, Error **errp)
qemu_add_vm_change_state_handler(kvm_pit_vm_state_change, s);
+ s->kvmpit_vmfd_change_notifier.notify = kvmpit_post_vmfd_change;
+ kvm_vmfd_add_change_notifier(&s->kvmpit_vmfd_change_notifier);
+
kpc->parent_realize(dev, errp);
}
diff --git a/hw/i386/kvm/trace-events b/hw/i386/kvm/trace-events
index 67bf7f174e..33680ff82b 100644
--- a/hw/i386/kvm/trace-events
+++ b/hw/i386/kvm/trace-events
@@ -20,3 +20,4 @@ xenstore_reset_watches(void) ""
xenstore_watch_event(const char *path, const char *token) "path %s token %s"
xen_primary_console_create(void) ""
xen_primary_console_reset(int port) "port %u"
+kvmpit_post_vmfd_change(void) ""
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 25/34] kvm/hyperv: add synic feature to CPU only if its not enabled
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (23 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 24/34] kvm/i8254: add support for confidential guest reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 26/34] hw/hyperv/vmbus: add support for confidential guest reset Ani Sinha
` (8 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
We need to make sure that synic CPU feature is not already enabled. If it is,
trying to enable it again will result in the following assertion:
Unexpected error in object_property_try_add() at ../qom/object.c:1268:
qemu-system-x86_64: attempt to add duplicate property 'synic' to object (type 'host-x86_64-cpu')
So enable synic only if its not enabled already.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5c8ec77212..ff5dc5b02a 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1761,7 +1761,7 @@ static int hyperv_init_vcpu(X86CPU *cpu)
return ret;
}
- if (!cpu->hyperv_synic_kvm_only) {
+ if (!cpu->hyperv_synic_kvm_only && !hyperv_is_synic_enabled()) {
ret = hyperv_x86_synic_add(cpu);
if (ret < 0) {
error_report("failed to create HyperV SynIC: %s",
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 26/34] hw/hyperv/vmbus: add support for confidential guest reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (24 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 25/34] kvm/hyperv: add synic feature to CPU only if its not enabled Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-19 18:34 ` Maciej S. Szmigiero
2026-02-18 11:42 ` [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during " Ani Sinha
` (7 subsequent siblings)
33 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Maciej S. Szmigiero; +Cc: Ani Sinha, kraxel, qemu-devel
On confidential guests when the KVM virtual machine file descriptor changes as
a part of the reset process, event file descriptors needs to be reassociated
with the new KVM VM file descriptor. This is achieved with the help of a
callback handler that gets called when KVM VM file descriptor changes during
the confidential guest reset process.
This patch is tested on non-confidential platform only.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/hyperv/trace-events | 1 +
hw/hyperv/vmbus.c | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/hw/hyperv/trace-events b/hw/hyperv/trace-events
index 7963c215b1..d8c96f18e9 100644
--- a/hw/hyperv/trace-events
+++ b/hw/hyperv/trace-events
@@ -16,6 +16,7 @@ vmbus_gpadl_torndown(uint32_t gpadl_id) "gpadl #%d"
vmbus_open_channel(uint32_t chan_id, uint32_t gpadl_id, uint32_t target_vp) "channel #%d gpadl #%d target vp %d"
vmbus_channel_open(uint32_t chan_id, uint32_t status) "channel #%d status %d"
vmbus_close_channel(uint32_t chan_id) "channel #%d"
+vmbus_handle_vmfd_change(void) ""
# hv-balloon
hv_balloon_state_change(const char *tostr) "-> %s"
diff --git a/hw/hyperv/vmbus.c b/hw/hyperv/vmbus.c
index c5bab5d245..64abe4c4c1 100644
--- a/hw/hyperv/vmbus.c
+++ b/hw/hyperv/vmbus.c
@@ -20,6 +20,7 @@
#include "hw/hyperv/vmbus-bridge.h"
#include "hw/core/sysbus.h"
#include "exec/cpu-common.h"
+#include "system/kvm.h"
#include "exec/target_page.h"
#include "trace.h"
@@ -248,6 +249,12 @@ struct VMBus {
* interrupt page
*/
EventNotifier notifier;
+
+ /*
+ * Notifier to inform when vmfd is changed as a part of confidential guest
+ * reset mechanism.
+ */
+ NotifierWithReturn vmbus_vmfd_change_notifier;
};
static bool gpadl_full(VMBusGpadl *gpadl)
@@ -2347,6 +2354,33 @@ static void vmbus_dev_unrealize(DeviceState *dev)
free_channels(vdev);
}
+/*
+ * If the KVM fd changes because of VM reset in confidential guests,
+ * reassociate event fd with the new KVM fd.
+ */
+static int vmbus_handle_vmfd_change(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ VMBus *vmbus = container_of(notifier, VMBus,
+ vmbus_vmfd_change_notifier);
+ int ret = 0;
+
+ /* we are not interested in pre vmfd change notification */
+ if (((VmfdChangeNotifier *)data)->pre) {
+ return 0;
+ }
+
+ ret = hyperv_set_event_flag_handler(VMBUS_EVENT_CONNECTION_ID,
+ &vmbus->notifier);
+ /* if we are only using userland event handler, it may already exist */
+ if (ret != 0 && ret != -EEXIST) {
+ error_setg(errp, "hyperv set event handler failed with %d", ret);
+ }
+
+ trace_vmbus_handle_vmfd_change();
+ return ret;
+}
+
static const Property vmbus_dev_props[] = {
DEFINE_PROP_UUID("instanceid", VMBusDevice, instanceid),
};
@@ -2429,6 +2463,9 @@ static void vmbus_realize(BusState *bus, Error **errp)
goto clear_event_notifier;
}
+ vmbus->vmbus_vmfd_change_notifier.notify = vmbus_handle_vmfd_change;
+ kvm_vmfd_add_change_notifier(&vmbus->vmbus_vmfd_change_notifier);
+
return;
clear_event_notifier:
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during confidential guest reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (25 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 26/34] hw/hyperv/vmbus: add support for confidential guest reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-19 9:39 ` Paul Durrant
2026-02-18 11:42 ` [PATCH v5 28/34] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
` (6 subsequent siblings)
33 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: David Woodhouse, Paul Durrant, Paolo Bonzini, Marcelo Tosatti
Cc: Ani Sinha, kraxel, kvm, qemu-devel
On confidential guests KVM virtual machine file descriptor changes as a
part of the guest reset process. Xen capabilities needs to be re-initialized in
KVM against the new file descriptor.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/xen-emu.c | 50 +++++++++++++++++++++++++++++++++++++--
1 file changed, 48 insertions(+), 2 deletions(-)
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 52de019834..69527145eb 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -44,9 +44,12 @@
#include "xen-compat.h"
+NotifierWithReturn xen_vmfd_change_notifier;
+static bool hyperv_enabled;
static void xen_vcpu_singleshot_timer_event(void *opaque);
static void xen_vcpu_periodic_timer_event(void *opaque);
static int vcpuop_stop_singleshot_timer(CPUState *cs);
+static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr);
#ifdef TARGET_X86_64
#define hypercall_compat32(longmode) (!(longmode))
@@ -54,6 +57,30 @@ static int vcpuop_stop_singleshot_timer(CPUState *cs);
#define hypercall_compat32(longmode) (false)
#endif
+static int xen_handle_vmfd_change(NotifierWithReturn *n,
+ void *data, Error** errp)
+{
+ int ret;
+
+ /* we are not interested in pre vmfd change notification */
+ if (((VmfdChangeNotifier *)data)->pre) {
+ return 0;
+ }
+
+ ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR);
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (hyperv_enabled) {
+ ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR_HYPERV);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+ return 0;
+}
+
static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
size_t *len, bool is_write)
{
@@ -111,15 +138,16 @@ static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
return kvm_gva_rw(cs, gva, buf, sz, true);
}
-int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
+static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr)
{
+ int xen_caps, ret;
const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_SHARED_INFO;
+
struct kvm_xen_hvm_config cfg = {
.msr = hypercall_msr,
.flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL,
};
- int xen_caps, ret;
xen_caps = kvm_check_extension(s, KVM_CAP_XEN_HVM);
if (required_caps & ~xen_caps) {
@@ -143,6 +171,21 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
strerror(-ret));
return ret;
}
+ return xen_caps;
+}
+
+int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
+{
+ int xen_caps;
+
+ xen_caps = do_initialize_xen_caps(s, hypercall_msr);
+ if (xen_caps < 0) {
+ return xen_caps;
+ }
+
+ if (!hyperv_enabled && (hypercall_msr == XEN_HYPERCALL_MSR_HYPERV)) {
+ hyperv_enabled = true;
+ }
/* If called a second time, don't repeat the rest of the setup. */
if (s->xen_caps) {
@@ -185,6 +228,9 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
xen_primary_console_reset();
xen_xenstore_reset();
+ xen_vmfd_change_notifier.notify = xen_handle_vmfd_change;
+ kvm_vmfd_add_change_notifier(&xen_vmfd_change_notifier);
+
return 0;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 28/34] ppc/openpic: create a new openpic device and reattach mem region on coco reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (26 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during " Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 29/34] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
` (5 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Bernhard Beschow; +Cc: Ani Sinha, kraxel, qemu-ppc, qemu-devel
For confidential guests during the reset process, the old KVM VM file
descriptor is closed and a new one is created. When a new file descriptor is
created, a new openpic device needs to be created against this new KVM VM file
descriptor as well. Additionally, existing memory region needs to be reattached
to this new openpic device and proper CPU attributes set associating new file
descriptor. This change makes this happen with the help of a callback handler
that gets called when the KVM VM file descriptor changes as a part of the
confidential guest reset process.
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/intc/openpic_kvm.c | 112 +++++++++++++++++++++++++++++++++---------
1 file changed, 88 insertions(+), 24 deletions(-)
diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
index fbf0bdbe07..b099da20eb 100644
--- a/hw/intc/openpic_kvm.c
+++ b/hw/intc/openpic_kvm.c
@@ -49,6 +49,7 @@ struct KVMOpenPICState {
uint32_t fd;
uint32_t model;
hwaddr mapped;
+ NotifierWithReturn vmfd_change_notifier;
};
static void kvm_openpic_set_irq(void *opaque, int n_IRQ, int level)
@@ -114,6 +115,88 @@ static const MemoryRegionOps kvm_openpic_mem_ops = {
},
};
+static int kvm_openpic_setup(KVMOpenPICState *opp, Error **errp)
+{
+ int kvm_openpic_model;
+ struct kvm_create_device cd = {0};
+ KVMState *s = kvm_state;
+ int ret;
+
+ switch (opp->model) {
+ case OPENPIC_MODEL_FSL_MPIC_20:
+ kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
+ break;
+
+ case OPENPIC_MODEL_FSL_MPIC_42:
+ kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
+ break;
+
+ default:
+ error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
+ return -1;
+ }
+
+ cd.type = kvm_openpic_model;
+ ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
+ if (ret < 0) {
+ error_setg(errp, "Can't create device %d: %s",
+ cd.type, strerror(errno));
+ return -1;
+ }
+ opp->fd = cd.fd;
+
+ return 0;
+}
+
+static int kvm_openpic_handle_vmfd_change(NotifierWithReturn *notifier,
+ void *data, Error **errp)
+{
+ KVMOpenPICState *opp = container_of(notifier, KVMOpenPICState,
+ vmfd_change_notifier);
+ uint64_t reg_base;
+ struct kvm_device_attr attr;
+ CPUState *cs;
+ int ret;
+
+ /* we are not interested in pre vmfd change notification */
+ if (((VmfdChangeNotifier *)data)->pre) {
+ return 0;
+ }
+
+ /* close the old descriptor */
+ close(opp->fd);
+
+ if (kvm_openpic_setup(opp, errp) < 0) {
+ return -1;
+ }
+
+ if (!opp->mapped) {
+ return 0;
+ }
+
+ reg_base = opp->mapped;
+ attr.group = KVM_DEV_MPIC_GRP_MISC;
+ attr.attr = KVM_DEV_MPIC_BASE_ADDR;
+ attr.addr = (uint64_t)(unsigned long)®_base;
+
+ ret = ioctl(opp->fd, KVM_SET_DEVICE_ATTR, &attr);
+ if (ret < 0) {
+ error_setg(errp, "%s: %s %" PRIx64, __func__,
+ strerror(errno), reg_base);
+ return -1;
+ }
+
+ CPU_FOREACH(cs) {
+ ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_MPIC, 0, opp->fd,
+ kvm_arch_vcpu_id(cs));
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static void kvm_openpic_region_add(MemoryListener *listener,
MemoryRegionSection *section)
{
@@ -197,36 +280,14 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
SysBusDevice *d = SYS_BUS_DEVICE(dev);
KVMOpenPICState *opp = KVM_OPENPIC(dev);
KVMState *s = kvm_state;
- int kvm_openpic_model;
- struct kvm_create_device cd = {0};
- int ret, i;
+ int i;
if (!kvm_check_extension(s, KVM_CAP_DEVICE_CTRL)) {
error_setg(errp, "Kernel is lacking Device Control API");
return;
}
- switch (opp->model) {
- case OPENPIC_MODEL_FSL_MPIC_20:
- kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
- break;
-
- case OPENPIC_MODEL_FSL_MPIC_42:
- kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
- break;
-
- default:
- error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
- return;
- }
-
- cd.type = kvm_openpic_model;
- ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
- if (ret < 0) {
- error_setg_errno(errp, errno, "Can't create device %d", cd.type);
- return;
- }
- opp->fd = cd.fd;
+ kvm_openpic_setup(opp, errp);
sysbus_init_mmio(d, &opp->mem);
qdev_init_gpio_in(dev, kvm_openpic_set_irq, OPENPIC_MAX_IRQ);
@@ -235,6 +296,9 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
opp->mem_listener.region_del = kvm_openpic_region_del;
opp->mem_listener.name = "openpic-kvm";
memory_listener_register(&opp->mem_listener, &address_space_memory);
+ opp->vmfd_change_notifier.notify =
+ kvm_openpic_handle_vmfd_change;
+ kvm_vmfd_add_change_notifier(&opp->vmfd_change_notifier);
/* indicate pic capabilities */
msi_nonbroken = true;
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 29/34] kvm/vcpu: add notifiers to inform vcpu file descriptor change
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (27 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 28/34] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 30/34] kvm/clock: add support for confidential guest reset Ani Sinha
` (4 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Ani Sinha, kraxel, kvm, qemu-devel
When new vcpu file descriptors are created and bound to the new kvm file
descriptor as a part of the confidential guest reset mechanism, various
subsystems needs to know about it. This change adds notifiers so that various
subsystems can take appropriate actions when vcpu fds change by registering
their handlers to this notifier.
Subsequent changes will register specific handlers to this notifier.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 26 ++++++++++++++++++++++++++
accel/stubs/kvm-stub.c | 10 ++++++++++
include/system/kvm.h | 17 +++++++++++++++++
3 files changed, 53 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7be39111bb..d7ea60f582 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -127,6 +127,9 @@ static NotifierList kvm_irqchip_change_notifiers =
static NotifierWithReturnList register_vmfd_changed_notifiers =
NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
+static NotifierWithReturnList register_vcpufd_changed_notifiers =
+ NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vcpufd_changed_notifiers);
+
static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp);
static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp);
static int vcpu_unmap_regions(KVMState *s, CPUState *cpu);
@@ -2314,6 +2317,22 @@ static int kvm_vmfd_change_notify(Error **errp)
&vmfd_notifier, errp);
}
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_list_add(®ister_vcpufd_changed_notifiers, n);
+}
+
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_remove(n);
+}
+
+static int kvm_vcpufd_change_notify(Error **errp)
+{
+ return notifier_with_return_list_notify(®ister_vcpufd_changed_notifiers,
+ &vmfd_notifier, errp);
+}
+
int kvm_irqchip_get_virq(KVMState *s)
{
int next_virq;
@@ -2838,6 +2857,13 @@ static int kvm_reset_vmfd(MachineState *ms)
}
assert(!err);
+ /* notify everyone that vcpu fd has changed. */
+ ret = kvm_vcpufd_change_notify(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
/* these can be only called after ram_block_rebind() */
memory_listener_register(&kml->listener, &address_space_memory);
memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index a6e8a6e16c..c4617caac6 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -87,6 +87,16 @@ void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
{
}
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n)
+{
+ return;
+}
+
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n)
+{
+ return;
+}
+
int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
EventNotifier *rn, int virq)
{
diff --git a/include/system/kvm.h b/include/system/kvm.h
index fbe23608a1..4b0e1b4ab1 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -590,4 +590,21 @@ void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
*/
void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
+/**
+ * kvm_vcpufd_add_change_notifier - register a notifier to get notified when
+ * a KVM vcpu file descriptors changes as a part of the confidential guest
+ * "reset" process. Various subsystems should use this mechanism to take
+ * actions such as re-issuing vcpu ioctls as a part of setting up vcpu
+ * features.
+ * @n: notifier with return value.
+ */
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n);
+
+/**
+ * kvm_vcpufd_remove_change_notifier - de-register a notifer previously
+ * registered with kvm_vcpufd_add_change_notifier call.
+ * @n: notifier that was previously registered.
+ */
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n);
+
#endif
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 30/34] kvm/clock: add support for confidential guest reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (28 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 29/34] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 31/34] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' Ani Sinha
` (3 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
Richard Henderson, Eduardo Habkost
Cc: Ani Sinha, kraxel, qemu-devel
Confidential guests change the KVM VM file descriptor upon reset and also create
new VCPU file descriptors against the new KVM VM file descriptor. We need to
save the clock state from kvm before KVM VM file descriptor changes and restore
it after. Also after VCPU file descriptors changed, we must call
KVM_KVMCLOCK_CTRL on the VCPU file descriptor to inform KVM that the VCPU is
in paused state.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/kvm/clock.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)
diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index aba6842a22..10d34254f0 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -50,6 +50,9 @@ struct KVMClockState {
/* whether the 'clock' value was obtained in a host with
* reliable KVM_GET_CLOCK */
bool clock_is_reliable;
+
+ NotifierWithReturn kvmclock_vcpufd_change_notifier;
+ NotifierWithReturn kvmclock_vmfd_change_notifier;
};
struct pvclock_vcpu_time_info {
@@ -63,6 +66,9 @@ struct pvclock_vcpu_time_info {
uint8_t pad[2];
} __attribute__((__packed__)); /* 32 bytes */
+static int kvmclock_set_clock(NotifierWithReturn *notifier,
+ void *data, Error** errp);
+
static uint64_t kvmclock_current_nsec(KVMClockState *s)
{
CPUState *cpu = first_cpu;
@@ -219,6 +225,54 @@ static void kvmclock_vm_state_change(void *opaque, bool running,
}
}
+static int kvmclock_save_clock(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ if (!((VmfdChangeNotifier *)data)->pre) {
+ return 0;
+ }
+ KVMClockState *s = container_of(notifier, KVMClockState,
+ kvmclock_vmfd_change_notifier);
+ kvm_update_clock(s);
+ return 0;
+}
+
+static int kvmclock_set_clock(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ struct kvm_clock_data clock_data = {};
+ CPUState *cpu;
+ int ret;
+ KVMClockState *s = container_of(notifier, KVMClockState,
+ kvmclock_vcpufd_change_notifier);
+ int cap_clock_ctrl = kvm_check_extension(kvm_state, KVM_CAP_KVMCLOCK_CTRL);
+
+ if (!s->clock_is_reliable) {
+ uint64_t pvclock_via_mem = kvmclock_current_nsec(s);
+ /* saved clock value before vmfd change is not reliable */
+ if (pvclock_via_mem) {
+ s->clock = pvclock_via_mem;
+ }
+ }
+
+ clock_data.clock = s->clock;
+ ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &clock_data);
+ if (ret < 0) {
+ fprintf(stderr, "KVM_SET_CLOCK failed: %s\n", strerror(-ret));
+ abort();
+ }
+
+ if (!cap_clock_ctrl) {
+ return 0;
+ }
+ CPU_FOREACH(cpu) {
+ run_on_cpu(cpu, do_kvmclock_ctrl, RUN_ON_CPU_NULL);
+ }
+
+ return 0;
+}
+
+
static void kvmclock_realize(DeviceState *dev, Error **errp)
{
KVMClockState *s = KVM_CLOCK(dev);
@@ -230,7 +284,12 @@ static void kvmclock_realize(DeviceState *dev, Error **errp)
kvm_update_clock(s);
+ s->kvmclock_vcpufd_change_notifier.notify = kvmclock_set_clock;
+ s->kvmclock_vmfd_change_notifier.notify = kvmclock_save_clock;
+
qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
+ kvm_vcpufd_add_change_notifier(&s->kvmclock_vcpufd_change_notifier);
+ kvm_vmfd_add_change_notifier(&s->kvmclock_vmfd_change_notifier);
}
static bool kvmclock_clock_is_reliable_needed(void *opaque)
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 31/34] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset'
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (29 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 30/34] kvm/clock: add support for confidential guest reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 32/34] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
` (2 subsequent siblings)
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
Yanan Wang, Zhao Liu, Paolo Bonzini
Cc: Ani Sinha, kraxel, qemu-devel
A new machine specific option 'x-change-vmfd-on-reset' is introduced for
debugging and testing only (hence the 'x-' prefix). This option when enabled
will force KVM VM file descriptor to be changed upon guest reset like
in the case of confidential guests. This can be used to exercise the code
changes that are specific for confidential guests on non-confidential
guests as well (except changes that require hardware support for
confidential guests).
A new functional test has been added in the next patch that uses this new
parameter to test the VM file descriptor changes.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/core/machine.c | 22 ++++++++++++++++++++++
include/hw/core/boards.h | 6 ++++++
system/runstate.c | 6 +++---
3 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index d4ef620c17..eae1f6be8d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -435,6 +435,21 @@ static void machine_set_dump_guest_core(Object *obj, bool value, Error **errp)
ms->dump_guest_core = value;
}
+static bool machine_get_new_accel_vmfd_on_reset(Object *obj, Error **errp)
+{
+ MachineState *ms = MACHINE(obj);
+
+ return ms->new_accel_vmfd_on_reset;
+}
+
+static void machine_set_new_accel_vmfd_on_reset(Object *obj,
+ bool value, Error **errp)
+{
+ MachineState *ms = MACHINE(obj);
+
+ ms->new_accel_vmfd_on_reset = value;
+}
+
static bool machine_get_mem_merge(Object *obj, Error **errp)
{
MachineState *ms = MACHINE(obj);
@@ -1183,6 +1198,13 @@ static void machine_class_init(ObjectClass *oc, const void *data)
object_class_property_set_description(oc, "dump-guest-core",
"Include guest memory in a core dump");
+ object_class_property_add_bool(oc, "x-change-vmfd-on-reset",
+ machine_get_new_accel_vmfd_on_reset,
+ machine_set_new_accel_vmfd_on_reset);
+ object_class_property_set_description(oc, "x-change-vmfd-on-reset",
+ "Set on/off to enable/disable generating new accelerator guest handle "
+ "on guest reset. Default: off (used only for testing/debugging).");
+
object_class_property_add_bool(oc, "mem-merge",
machine_get_mem_merge, machine_set_mem_merge);
object_class_property_set_description(oc, "mem-merge",
diff --git a/include/hw/core/boards.h b/include/hw/core/boards.h
index edbe8d03e5..12b2149378 100644
--- a/include/hw/core/boards.h
+++ b/include/hw/core/boards.h
@@ -448,6 +448,12 @@ struct MachineState {
struct NVDIMMState *nvdimms_state;
struct NumaState *numa_state;
bool acpi_spcr_enabled;
+ /*
+ * Whether to change virtual machine accelerator handle upon
+ * reset or not. Used only for debugging and testing purpose.
+ * Set to false by default for all regular use.
+ */
+ bool new_accel_vmfd_on_reset;
};
/*
diff --git a/system/runstate.c b/system/runstate.c
index e7b50e6a3b..eca722b43c 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -526,9 +526,9 @@ void qemu_system_reset(ShutdownCause reason)
type = RESET_TYPE_COLD;
}
- if (!cpus_are_resettable() &&
- (reason == SHUTDOWN_CAUSE_GUEST_RESET ||
- reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET)) {
+ if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
+ reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
+ (current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
if (ac->rebuild_guest) {
ret = ac->rebuild_guest(current_machine);
if (ret < 0) {
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 32/34] tests/functional/x86_64: add functional test to exercise vm fd change on reset
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (30 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 31/34] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-18 11:42 ` [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms Ani Sinha
2026-02-18 11:42 ` [PATCH v5 34/34] migration: return EEXIST when trying to add the same migration blocker Ani Sinha
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Zhao Liu, Ani Sinha; +Cc: kraxel, qemu-devel
A new functional test is added that exercises the code changes related to
closing of the old KVM VM file descriptor and opening a new one upon VM reset.
This normally happens when confidential guests are reset but for
non-confidential guests, we use a special machine specific debug/test parameter
'x-change-vmfd-on-reset' to enable this behavior.
Only specific code changes related to re-initialisation of SEV-ES, SEV-SNP and
TDX platforms are not exercised in this test as they require hardware that
supports running confidential guests.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
MAINTAINERS | 1 +
tests/functional/x86_64/meson.build | 1 +
tests/functional/x86_64/test_rebuild_vmfd.py | 136 +++++++++++++++++++
3 files changed, 138 insertions(+)
create mode 100755 tests/functional/x86_64/test_rebuild_vmfd.py
diff --git a/MAINTAINERS b/MAINTAINERS
index b0eb77c08f..de74d568e9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -157,6 +157,7 @@ M: Ani Sinha <anisinha@redhat.com>
M: Paolo Bonzini <pbonzini@redhat.com>
S: Maintained
F: stubs/kvm.c
+F: tests/functional/x86_64/test_rebuild_vmfd.py
Guest CPU cores (TCG)
---------------------
diff --git a/tests/functional/x86_64/meson.build b/tests/functional/x86_64/meson.build
index f78eec5e6c..c6553d922d 100644
--- a/tests/functional/x86_64/meson.build
+++ b/tests/functional/x86_64/meson.build
@@ -36,4 +36,5 @@ tests_x86_64_system_thorough = [
'vfio_user_client',
'virtio_balloon',
'virtio_gpu',
+ 'rebuild_vmfd',
]
diff --git a/tests/functional/x86_64/test_rebuild_vmfd.py b/tests/functional/x86_64/test_rebuild_vmfd.py
new file mode 100755
index 0000000000..5a8e5fd89b
--- /dev/null
+++ b/tests/functional/x86_64/test_rebuild_vmfd.py
@@ -0,0 +1,136 @@
+#!/usr/bin/env python3
+#
+# Functional tests exercising guest KVM file descriptor change on reset.
+#
+# Copyright © 2026 Red Hat, Inc.
+#
+# Author:
+# Ani Sinha <anisinha@redhat.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os
+from qemu.machine import machine
+
+from qemu_test import QemuSystemTest, Asset, exec_command_and_wait_for_pattern
+from qemu_test import wait_for_console_pattern
+
+class KVMGuest(QemuSystemTest):
+
+ # ASSET UKI was generated using
+ # https://gitlab.com/kraxel/edk2-tests/-/blob/unittest/tools/make-supermin.sh
+ ASSET_UKI = Asset('https://gitlab.com/anisinha/misc-artifacts/'
+ '-/raw/main/uki.x86-64.efi?ref_type=heads',
+ 'e0f806bd1fa24111312e1fe849d2ee69808d4343930a5'
+ 'dc8c1688da17c65f576')
+ # ASSET_OVMF comes from /usr/share/edk2/ovmf/OVMF.stateless.fd of a
+ # fedora core 43 distribution which in turn comes from the
+ # edk2-ovmf-20251119-3.fc43.noarch rpm of that distribution.
+ ASSET_OVMF = Asset('https://gitlab.com/anisinha/misc-artifacts/'
+ '-/raw/main/OVMF.stateless.fd?ref_type=heads',
+ '58a4275aafa8774bd6b1540adceae4ea434b8db75b476'
+ '11839ff47be88cfcf22')
+
+ def common_vm_setup(self, kvm_args=None, cpu_args=None):
+ self.set_machine('q35')
+ self.require_accelerator("kvm")
+
+ self.vm.set_console()
+ if kvm_args:
+ self.vm.add_args("-accel", "kvm,%s" %kvm_args)
+ else:
+ self.vm.add_args("-accel", "kvm")
+ self.vm.add_args("-smp", "2")
+ if cpu_args:
+ self.vm.add_args("-cpu", "host,%s" %cpu_args)
+ else:
+ self.vm.add_args("-cpu", "host")
+ self.vm.add_args("-m", "2G")
+ self.vm.add_args("-nographic", "-nodefaults")
+
+
+ self.uki_path = self.ASSET_UKI.fetch()
+ self.ovmf_path = self.ASSET_OVMF.fetch()
+
+ self.vm.add_args('-kernel', self.uki_path)
+ self.vm.add_args("-bios", self.ovmf_path)
+ # enable KVM VMFD change on reset for a non-coco VM
+ self.vm.add_args("-machine", "q35,x-change-vmfd-on-reset=on")
+
+ # enable tracing of basic vmfd change function
+ self.vm.add_args("--trace", "kvm_reset_vmfd")
+
+ def launch_vm(self):
+ try:
+ self.vm.launch()
+ except machine.VMLaunchFailure as e:
+ if "Xen HVM guest support not present" in e.output:
+ self.skipTest("KVM Xen support is not present "
+ "(need v5.12+ kernel with CONFIG_KVM_XEN)")
+ elif "Property 'kvm-accel.xen-version' not found" in e.output:
+ self.skipTest("QEMU not built with CONFIG_XEN_EMU support")
+ else:
+ raise e
+
+ self.log.info('VM launched')
+ console_pattern = 'bash-5.1#'
+ wait_for_console_pattern(self, console_pattern)
+ self.log.info('VM ready with a bash prompt')
+
+ def vm_console_reset(self):
+ exec_command_and_wait_for_pattern(self, '/usr/sbin/reboot -f',
+ 'reboot: machine restart')
+ console_pattern = '# --- Hello world ---'
+ wait_for_console_pattern(self, console_pattern)
+ self.vm.shutdown()
+
+ def vm_qmp_reset(self):
+ self.vm.qmp('system_reset')
+ console_pattern = '# --- Hello world ---'
+ wait_for_console_pattern(self, console_pattern)
+ self.vm.shutdown()
+
+ def check_logs(self):
+ self.assertRegex(self.vm.get_log(),
+ r'kvm_reset_vmfd')
+ self.assertRegex(self.vm.get_log(),
+ r'virtual machine state has been rebuilt')
+
+ def test_reset_console(self):
+ self.common_vm_setup()
+ self.launch_vm()
+ self.vm_console_reset()
+ self.check_logs()
+
+ def test_reset_qmp(self):
+ self.common_vm_setup()
+ self.launch_vm()
+ self.vm_qmp_reset()
+ self.check_logs()
+
+ def test_reset_kvmpit(self):
+ self.common_vm_setup()
+ self.vm.add_args("--trace", "kvmpit_post_vmfd_change")
+ self.launch_vm()
+ self.vm_console_reset()
+ self.assertRegex(self.vm.get_log(),
+ r'kvmpit_post_vmfd_change')
+
+ def test_reset_xen_emulation(self):
+ self.common_vm_setup("xen-version=0x4000a,kernel-irqchip=split")
+ self.launch_vm()
+ self.vm_console_reset()
+ self.check_logs()
+
+ def test_reset_hyperv_vmbus(self):
+ self.common_vm_setup(None, "hv-syndbg,hv-relaxed,hv_time,hv-synic,"
+ "hv-vpindex,hv-runtime,hv-stimer")
+ self.vm.add_args("-device", "vmbus-bridge,irq=15")
+ self.vm.add_args("-trace", "vmbus_handle_vmfd_change")
+ self.launch_vm()
+ self.vm_console_reset()
+ self.assertRegex(self.vm.get_log(),
+ r'vmbus_handle_vmfd_change')
+
+if __name__ == '__main__':
+ QemuSystemTest.main()
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (31 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 32/34] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
2026-02-19 8:55 ` Markus Armbruster
2026-02-23 10:08 ` Daniel P. Berrangé
2026-02-18 11:42 ` [PATCH v5 34/34] migration: return EEXIST when trying to add the same migration blocker Ani Sinha
33 siblings, 2 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
Eric Blake, Markus Armbruster
Cc: Ani Sinha, kraxel, qemu-devel
Through the new 'confidential-guest-reset' property, control plane should be
able to detect if the hypervisor supports x86 confidential guest resets. Older
hypervisors that do not support resets will not have this property populated.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
qapi/qom.json | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/qapi/qom.json b/qapi/qom.json
index 6f5c9de0f0..c653248f85 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -1009,13 +1009,19 @@
# designated guest firmware page for measured boot with -kernel
# (default: false) (since 6.2)
#
+# Features:
+#
+# @confidential-guest-reset: If present, the hypervisor supports
+# confidential guest resets (since 11.0).
+#
# Since: 9.1
##
{ 'struct': 'SevCommonProperties',
'data': { '*sev-device': 'str',
'*cbitpos': 'uint32',
'reduced-phys-bits': 'uint32',
- '*kernel-hashes': 'bool' } }
+ '*kernel-hashes': 'bool' },
+ 'features': ['confidential-guest-reset']}
##
# @SevGuestProperties:
@@ -1136,6 +1142,11 @@
# it, the guest will not be able to get a TD quote for
# attestation.
#
+# Features:
+#
+# @confidential-guest-reset: If present, the hypervisor supports
+# confidential guest resets (since 11.0).
+#
# Since: 10.1
##
{ 'struct': 'TdxGuestProperties',
@@ -1144,7 +1155,8 @@
'*mrconfigid': 'str',
'*mrowner': 'str',
'*mrownerconfig': 'str',
- '*quote-generation-socket': 'SocketAddress' } }
+ '*quote-generation-socket': 'SocketAddress' },
+ 'features': ['confidential-guest-reset']}
##
# @ThreadContextProperties:
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v5 34/34] migration: return EEXIST when trying to add the same migration blocker
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
` (32 preceding siblings ...)
2026-02-18 11:42 ` [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms Ani Sinha
@ 2026-02-18 11:42 ` Ani Sinha
33 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 11:42 UTC (permalink / raw)
To: Peter Xu, Fabiano Rosas; +Cc: Ani Sinha, kraxel, Prasad Pandit, qemu-devel
Currently the code that adds a migration blocker does not check if the same
blocker already exists. Return an EEXIST error code if there is an attempt to
add the same migration blocker again. This way the same migration blocker will
not get added twice.
Suggested-by: Prasad Pandit <pjp@fedoraproject.org>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
migration/migration.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index b103a82fc0..495664e01a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1696,6 +1696,10 @@ static int add_blockers(Error **reasonp, unsigned modes, Error **errp)
{
for (MigMode mode = 0; mode < MIG_MODE__MAX; mode++) {
if (modes & BIT(mode)) {
+ if (g_slist_index(migration_blockers[mode],
+ *reasonp) >= 0) {
+ return -EEXIST;
+ }
migration_blockers[mode] = g_slist_prepend(migration_blockers[mode],
*reasonp);
}
--
2.42.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 11:42 ` [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
@ 2026-02-18 14:07 ` Cédric Le Goater
2026-02-18 15:07 ` Ani Sinha
0 siblings, 1 reply; 60+ messages in thread
From: Cédric Le Goater @ 2026-02-18 14:07 UTC (permalink / raw)
To: Ani Sinha, Alex Williamson; +Cc: kraxel, qemu-devel
On 2/18/26 12:42, Ani Sinha wrote:
> Normally the vfio pseudo device file descriptor lives for the life of the VM.
> However, when the kvm VM file descriptor changes, a new file descriptor
> for the pseudo device needs to be generated against the new kvm VM descriptor.
> Other existing vfio descriptors needs to be reattached to the new pseudo device
> descriptor. This change performs the above steps.
>
> Tested-by: Cédric Le Goater <clg@redhat.com>
There is a regression since last version.
'reboot' from the guest and command 'system_reset' from the QEMU
monitor now generate these outputs:
qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
...
and QEMU exits after a while.
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
Anyhow this patch looks good.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/helpers.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 92 insertions(+)
>
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index f68f8165d0..e2bedd15ec 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -116,6 +116,89 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
> * we'll re-use it should another vfio device be attached before then.
> */
> int vfio_kvm_device_fd = -1;
> +
> +/*
> + * Confidential virtual machines:
> + * During reset of confidential vms, the kvm vm file descriptor changes.
> + * In this case, the old vfio kvm file descriptor is
> + * closed and a new descriptor is created against the new kvm vm file
> + * descriptor.
> + */
> +
> +typedef struct VFIODeviceFd {
> + int fd;
> + QLIST_ENTRY(VFIODeviceFd) node;
> +} VFIODeviceFd;
> +
> +static QLIST_HEAD(, VFIODeviceFd) vfio_device_fds =
> + QLIST_HEAD_INITIALIZER(vfio_device_fds);
> +
> +static void vfio_device_fd_list_add(int fd)
> +{
> + VFIODeviceFd *file_fd;
> + file_fd = g_malloc0(sizeof(*file_fd));
> + file_fd->fd = fd;
> + QLIST_INSERT_HEAD(&vfio_device_fds, file_fd, node);
> +}
> +
> +static void vfio_device_fd_list_remove(int fd)
> +{
> + VFIODeviceFd *file_fd, *next;
> +
> + QLIST_FOREACH_SAFE(file_fd, &vfio_device_fds, node, next) {
> + if (file_fd->fd == fd) {
> + QLIST_REMOVE(file_fd, node);
> + g_free(file_fd);
> + break;
> + }
> + }
> +}
> +
> +static int vfio_device_fd_rebind(NotifierWithReturn *notifier, void *data,
> + Error **errp)
> +{
> + VFIODeviceFd *file_fd;
> + int ret = 0;
> + struct kvm_device_attr attr = {
> + .group = KVM_DEV_VFIO_FILE,
> + .attr = KVM_DEV_VFIO_FILE_ADD,
> + };
> + struct kvm_create_device cd = {
> + .type = KVM_DEV_TYPE_VFIO,
> + };
> +
> + /* we are not interested in pre vmfd change notification */
> + if (((VmfdChangeNotifier *)data)->pre) {
> + return 0;
> + }
> +
> + if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
> + error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
> + return -errno;
> + }
> +
> + if (vfio_kvm_device_fd != -1) {
> + close(vfio_kvm_device_fd);
> + }
> +
> + vfio_kvm_device_fd = cd.fd;
> +
> + QLIST_FOREACH(file_fd, &vfio_device_fds, node) {
> + attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
> + if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
> + error_setg_errno(errp, errno,
> + "Failed to add fd %d to KVM VFIO device",
> + file_fd->fd);
> + ret = -errno;
> + }
> + }
> + return ret;
> +}
> +
> +static struct NotifierWithReturn vfio_vmfd_change_notifier = {
> + .notify = vfio_device_fd_rebind,
> +};
> +
> #endif
>
> void vfio_kvm_device_close(void)
> @@ -153,6 +236,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
> }
>
> vfio_kvm_device_fd = cd.fd;
> + /*
> + * If the vm file descriptor changes, add a notifier so that we can
> + * re-create the vfio_kvm_device_fd.
> + */
> + kvm_vmfd_add_change_notifier(&vfio_vmfd_change_notifier);
> }
>
> if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
> @@ -160,6 +248,8 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
> fd);
> return -errno;
> }
> +
> + vfio_device_fd_list_add(fd);
> #endif
> return 0;
> }
> @@ -183,6 +273,8 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
> "Failed to remove fd %d from KVM VFIO device", fd);
> return -errno;
> }
> +
> + vfio_device_fd_list_remove(fd);
> #endif
> return 0;
> }
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 14:07 ` Cédric Le Goater
@ 2026-02-18 15:07 ` Ani Sinha
2026-02-18 15:30 ` Cédric Le Goater
0 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 15:07 UTC (permalink / raw)
To: Cedric Le Goater; +Cc: Alex Williamson, Gerd Hoffmann, qemu-devel
> On 18 Feb 2026, at 7:37 PM, Cédric Le Goater <clg@redhat.com> wrote:
>
> On 2/18/26 12:42, Ani Sinha wrote:
>> Normally the vfio pseudo device file descriptor lives for the life of the VM.
>> However, when the kvm VM file descriptor changes, a new file descriptor
>> for the pseudo device needs to be generated against the new kvm VM descriptor.
>> Other existing vfio descriptors needs to be reattached to the new pseudo device
>> descriptor. This change performs the above steps.
>> Tested-by: Cédric Le Goater <clg@redhat.com>
>
> There is a regression since last version.
>
>
> 'reboot' from the guest and command 'system_reset' from the QEMU
> monitor now generate these outputs:
>
> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> ...
>
> and QEMU exits after a while.
I have only seen this in SEV-ES with more than one vcpus. Never with TDX or SEV-SNP (single or multiple cpus).
On which host/guest type did you see this?
>
>
>
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>
> Anyhow this patch looks good.
>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>
> Thanks,
>
> C.
>
>> ---
>> hw/vfio/helpers.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 92 insertions(+)
>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>> index f68f8165d0..e2bedd15ec 100644
>> --- a/hw/vfio/helpers.c
>> +++ b/hw/vfio/helpers.c
>> @@ -116,6 +116,89 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>> * we'll re-use it should another vfio device be attached before then.
>> */
>> int vfio_kvm_device_fd = -1;
>> +
>> +/*
>> + * Confidential virtual machines:
>> + * During reset of confidential vms, the kvm vm file descriptor changes.
>> + * In this case, the old vfio kvm file descriptor is
>> + * closed and a new descriptor is created against the new kvm vm file
>> + * descriptor.
>> + */
>> +
>> +typedef struct VFIODeviceFd {
>> + int fd;
>> + QLIST_ENTRY(VFIODeviceFd) node;
>> +} VFIODeviceFd;
>> +
>> +static QLIST_HEAD(, VFIODeviceFd) vfio_device_fds =
>> + QLIST_HEAD_INITIALIZER(vfio_device_fds);
>> +
>> +static void vfio_device_fd_list_add(int fd)
>> +{
>> + VFIODeviceFd *file_fd;
>> + file_fd = g_malloc0(sizeof(*file_fd));
>> + file_fd->fd = fd;
>> + QLIST_INSERT_HEAD(&vfio_device_fds, file_fd, node);
>> +}
>> +
>> +static void vfio_device_fd_list_remove(int fd)
>> +{
>> + VFIODeviceFd *file_fd, *next;
>> +
>> + QLIST_FOREACH_SAFE(file_fd, &vfio_device_fds, node, next) {
>> + if (file_fd->fd == fd) {
>> + QLIST_REMOVE(file_fd, node);
>> + g_free(file_fd);
>> + break;
>> + }
>> + }
>> +}
>> +
>> +static int vfio_device_fd_rebind(NotifierWithReturn *notifier, void *data,
>> + Error **errp)
>> +{
>> + VFIODeviceFd *file_fd;
>> + int ret = 0;
>> + struct kvm_device_attr attr = {
>> + .group = KVM_DEV_VFIO_FILE,
>> + .attr = KVM_DEV_VFIO_FILE_ADD,
>> + };
>> + struct kvm_create_device cd = {
>> + .type = KVM_DEV_TYPE_VFIO,
>> + };
>> +
>> + /* we are not interested in pre vmfd change notification */
>> + if (((VmfdChangeNotifier *)data)->pre) {
>> + return 0;
>> + }
>> +
>> + if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
>> + error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
>> + return -errno;
>> + }
>> +
>> + if (vfio_kvm_device_fd != -1) {
>> + close(vfio_kvm_device_fd);
>> + }
>> +
>> + vfio_kvm_device_fd = cd.fd;
>> +
>> + QLIST_FOREACH(file_fd, &vfio_device_fds, node) {
>> + attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
>> + if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>> + error_setg_errno(errp, errno,
>> + "Failed to add fd %d to KVM VFIO device",
>> + file_fd->fd);
>> + ret = -errno;
>> + }
>> + }
>> + return ret;
>> +}
>> +
>> +static struct NotifierWithReturn vfio_vmfd_change_notifier = {
>> + .notify = vfio_device_fd_rebind,
>> +};
>> +
>> #endif
>> void vfio_kvm_device_close(void)
>> @@ -153,6 +236,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>> }
>> vfio_kvm_device_fd = cd.fd;
>> + /*
>> + * If the vm file descriptor changes, add a notifier so that we can
>> + * re-create the vfio_kvm_device_fd.
>> + */
>> + kvm_vmfd_add_change_notifier(&vfio_vmfd_change_notifier);
>> }
>> if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>> @@ -160,6 +248,8 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>> fd);
>> return -errno;
>> }
>> +
>> + vfio_device_fd_list_add(fd);
>> #endif
>> return 0;
>> }
>> @@ -183,6 +273,8 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
>> "Failed to remove fd %d from KVM VFIO device", fd);
>> return -errno;
>> }
>> +
>> + vfio_device_fd_list_remove(fd);
>> #endif
>> return 0;
>> }
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 15:07 ` Ani Sinha
@ 2026-02-18 15:30 ` Cédric Le Goater
2026-02-18 16:06 ` Ani Sinha
2026-02-18 17:33 ` Ani Sinha
0 siblings, 2 replies; 60+ messages in thread
From: Cédric Le Goater @ 2026-02-18 15:30 UTC (permalink / raw)
To: Ani Sinha; +Cc: Alex Williamson, Gerd Hoffmann, qemu-devel
On 2/18/26 16:07, Ani Sinha wrote:
>
>
>> On 18 Feb 2026, at 7:37 PM, Cédric Le Goater <clg@redhat.com> wrote:
>>
>> On 2/18/26 12:42, Ani Sinha wrote:
>>> Normally the vfio pseudo device file descriptor lives for the life of the VM.
>>> However, when the kvm VM file descriptor changes, a new file descriptor
>>> for the pseudo device needs to be generated against the new kvm VM descriptor.
>>> Other existing vfio descriptors needs to be reattached to the new pseudo device
>>> descriptor. This change performs the above steps.
>>> Tested-by: Cédric Le Goater <clg@redhat.com>
>>
>> There is a regression since last version.
>>
>>
>> 'reboot' from the guest and command 'system_reset' from the QEMU
>> monitor now generate these outputs:
>>
>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>> ...
>>
>> and QEMU exits after a while.
>
> I have only seen this in SEV-ES with more than one vcpus. Never with TDX or SEV-SNP (single or multiple cpus).
> On which host/guest type did you see this?
SEV-SNP on a RHEL9 host. Same guest I used before and host says :
[1816531.409591] kvm_amd: SEV-ES guest requested termination: 0x0:0x0
Thanks,
C.
>
>>
>>
>>
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>
>> Anyhow this patch looks good.
>>
>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>>
>> Thanks,
>>
>> C.
>>
>>> ---
>>> hw/vfio/helpers.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 92 insertions(+)
>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>> index f68f8165d0..e2bedd15ec 100644
>>> --- a/hw/vfio/helpers.c
>>> +++ b/hw/vfio/helpers.c
>>> @@ -116,6 +116,89 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>>> * we'll re-use it should another vfio device be attached before then.
>>> */
>>> int vfio_kvm_device_fd = -1;
>>> +
>>> +/*
>>> + * Confidential virtual machines:
>>> + * During reset of confidential vms, the kvm vm file descriptor changes.
>>> + * In this case, the old vfio kvm file descriptor is
>>> + * closed and a new descriptor is created against the new kvm vm file
>>> + * descriptor.
>>> + */
>>> +
>>> +typedef struct VFIODeviceFd {
>>> + int fd;
>>> + QLIST_ENTRY(VFIODeviceFd) node;
>>> +} VFIODeviceFd;
>>> +
>>> +static QLIST_HEAD(, VFIODeviceFd) vfio_device_fds =
>>> + QLIST_HEAD_INITIALIZER(vfio_device_fds);
>>> +
>>> +static void vfio_device_fd_list_add(int fd)
>>> +{
>>> + VFIODeviceFd *file_fd;
>>> + file_fd = g_malloc0(sizeof(*file_fd));
>>> + file_fd->fd = fd;
>>> + QLIST_INSERT_HEAD(&vfio_device_fds, file_fd, node);
>>> +}
>>> +
>>> +static void vfio_device_fd_list_remove(int fd)
>>> +{
>>> + VFIODeviceFd *file_fd, *next;
>>> +
>>> + QLIST_FOREACH_SAFE(file_fd, &vfio_device_fds, node, next) {
>>> + if (file_fd->fd == fd) {
>>> + QLIST_REMOVE(file_fd, node);
>>> + g_free(file_fd);
>>> + break;
>>> + }
>>> + }
>>> +}
>>> +
>>> +static int vfio_device_fd_rebind(NotifierWithReturn *notifier, void *data,
>>> + Error **errp)
>>> +{
>>> + VFIODeviceFd *file_fd;
>>> + int ret = 0;
>>> + struct kvm_device_attr attr = {
>>> + .group = KVM_DEV_VFIO_FILE,
>>> + .attr = KVM_DEV_VFIO_FILE_ADD,
>>> + };
>>> + struct kvm_create_device cd = {
>>> + .type = KVM_DEV_TYPE_VFIO,
>>> + };
>>> +
>>> + /* we are not interested in pre vmfd change notification */
>>> + if (((VmfdChangeNotifier *)data)->pre) {
>>> + return 0;
>>> + }
>>> +
>>> + if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
>>> + error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
>>> + return -errno;
>>> + }
>>> +
>>> + if (vfio_kvm_device_fd != -1) {
>>> + close(vfio_kvm_device_fd);
>>> + }
>>> +
>>> + vfio_kvm_device_fd = cd.fd;
>>> +
>>> + QLIST_FOREACH(file_fd, &vfio_device_fds, node) {
>>> + attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
>>> + if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>>> + error_setg_errno(errp, errno,
>>> + "Failed to add fd %d to KVM VFIO device",
>>> + file_fd->fd);
>>> + ret = -errno;
>>> + }
>>> + }
>>> + return ret;
>>> +}
>>> +
>>> +static struct NotifierWithReturn vfio_vmfd_change_notifier = {
>>> + .notify = vfio_device_fd_rebind,
>>> +};
>>> +
>>> #endif
>>> void vfio_kvm_device_close(void)
>>> @@ -153,6 +236,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>>> }
>>> vfio_kvm_device_fd = cd.fd;
>>> + /*
>>> + * If the vm file descriptor changes, add a notifier so that we can
>>> + * re-create the vfio_kvm_device_fd.
>>> + */
>>> + kvm_vmfd_add_change_notifier(&vfio_vmfd_change_notifier);
>>> }
>>> if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>>> @@ -160,6 +248,8 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>>> fd);
>>> return -errno;
>>> }
>>> +
>>> + vfio_device_fd_list_add(fd);
>>> #endif
>>> return 0;
>>> }
>>> @@ -183,6 +273,8 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
>>> "Failed to remove fd %d from KVM VFIO device", fd);
>>> return -errno;
>>> }
>>> +
>>> + vfio_device_fd_list_remove(fd);
>>> #endif
>>> return 0;
>>> }
>>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 15:30 ` Cédric Le Goater
@ 2026-02-18 16:06 ` Ani Sinha
2026-02-18 16:09 ` Cédric Le Goater
2026-02-18 17:33 ` Ani Sinha
1 sibling, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 16:06 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: Alex Williamson, Gerd Hoffmann, qemu-devel
On Wed, Feb 18, 2026 at 9:00 PM Cédric Le Goater <clg@redhat.com> wrote:
>
> On 2/18/26 16:07, Ani Sinha wrote:
> >
> >
> >> On 18 Feb 2026, at 7:37 PM, Cédric Le Goater <clg@redhat.com> wrote:
> >>
> >> On 2/18/26 12:42, Ani Sinha wrote:
> >>> Normally the vfio pseudo device file descriptor lives for the life of the VM.
> >>> However, when the kvm VM file descriptor changes, a new file descriptor
> >>> for the pseudo device needs to be generated against the new kvm VM descriptor.
> >>> Other existing vfio descriptors needs to be reattached to the new pseudo device
> >>> descriptor. This change performs the above steps.
> >>> Tested-by: Cédric Le Goater <clg@redhat.com>
> >>
> >> There is a regression since last version.
> >>
> >>
> >> 'reboot' from the guest and command 'system_reset' from the QEMU
> >> monitor now generate these outputs:
> >>
> >> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> >> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> >> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> >> ...
> >>
> >> and QEMU exits after a while.
> >
> > I have only seen this in SEV-ES with more than one vcpus. Never with TDX or SEV-SNP (single or multiple cpus).
> > On which host/guest type did you see this?
>
> SEV-SNP on a RHEL9 host. Same guest I used before and host says :
>
> [1816531.409591] kvm_amd: SEV-ES guest requested termination: 0x0:0x0
Ok so the guest is SEV-ES. most likely you are also using > 1 vcpu.
Try with one vcpu and/or enabling SEV-SNP.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 16:06 ` Ani Sinha
@ 2026-02-18 16:09 ` Cédric Le Goater
0 siblings, 0 replies; 60+ messages in thread
From: Cédric Le Goater @ 2026-02-18 16:09 UTC (permalink / raw)
To: Ani Sinha; +Cc: Alex Williamson, Gerd Hoffmann, qemu-devel
On 2/18/26 17:06, Ani Sinha wrote:
> On Wed, Feb 18, 2026 at 9:00 PM Cédric Le Goater <clg@redhat.com> wrote:
>>
>> On 2/18/26 16:07, Ani Sinha wrote:
>>>
>>>
>>>> On 18 Feb 2026, at 7:37 PM, Cédric Le Goater <clg@redhat.com> wrote:
>>>>
>>>> On 2/18/26 12:42, Ani Sinha wrote:
>>>>> Normally the vfio pseudo device file descriptor lives for the life of the VM.
>>>>> However, when the kvm VM file descriptor changes, a new file descriptor
>>>>> for the pseudo device needs to be generated against the new kvm VM descriptor.
>>>>> Other existing vfio descriptors needs to be reattached to the new pseudo device
>>>>> descriptor. This change performs the above steps.
>>>>> Tested-by: Cédric Le Goater <clg@redhat.com>
>>>>
>>>> There is a regression since last version.
>>>>
>>>>
>>>> 'reboot' from the guest and command 'system_reset' from the QEMU
>>>> monitor now generate these outputs:
>>>>
>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>>>> ...
>>>>
>>>> and QEMU exits after a while.
>>>
>>> I have only seen this in SEV-ES with more than one vcpus. Never with TDX or SEV-SNP (single or multiple cpus).
>>> On which host/guest type did you see this?
>>
>> SEV-SNP on a RHEL9 host. Same guest I used before and host says :
>>
>> [1816531.409591] kvm_amd: SEV-ES guest requested termination: 0x0:0x0
>
> Ok so the guest is SEV-ES. most likely you are also using > 1 vcpu.
> Try with one vcpu and/or enabling SEV-SNP.
>
The guest is SEV-SNP and has 2 cpus.
[root@vm12 ~]# dmesg | grep SEV
[ 0.544608] Memory Encryption Features active: AMD SEV SEV-ES SEV-SNP
[ 0.545580] SEV: Status: SEV SEV-ES SEV-SNP
[ 0.663592] SEV: APIC: wakeup_secondary_cpu() replaced with wakeup_cpu_via_vmgexit()
[ 0.719630] SEV: Using SNP CPUID table, 32 entries present.
[ 0.719636] SEV: SNP running at VMPL0.
[ 1.043748] SEV: SNP guest platform devices initialized.
[ 4.530555] sev-guest sev-guest: Initialized SEV guest driver (using vmpck_id 0)
vmpck_id 0)
[root@vm12 ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Vendor ID: AuthenticAMD
BIOS Vendor ID: QEMU
Model name: AMD EPYC-v4 Processor
BIOS Model name: pc-q35-11.0
....
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 15:30 ` Cédric Le Goater
2026-02-18 16:06 ` Ani Sinha
@ 2026-02-18 17:33 ` Ani Sinha
2026-02-18 17:39 ` Cédric Le Goater
2026-02-23 11:56 ` Gerd Hoffmann
1 sibling, 2 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-18 17:33 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Alex Williamson, Gerd Hoffmann, qemu-devel, Paolo Bonzini
On Wed, Feb 18, 2026 at 9:00 PM Cédric Le Goater <clg@redhat.com> wrote:
>
> On 2/18/26 16:07, Ani Sinha wrote:
> >
> >
> >> On 18 Feb 2026, at 7:37 PM, Cédric Le Goater <clg@redhat.com> wrote:
> >>
> >> On 2/18/26 12:42, Ani Sinha wrote:
> >>> Normally the vfio pseudo device file descriptor lives for the life of the VM.
> >>> However, when the kvm VM file descriptor changes, a new file descriptor
> >>> for the pseudo device needs to be generated against the new kvm VM descriptor.
> >>> Other existing vfio descriptors needs to be reattached to the new pseudo device
> >>> descriptor. This change performs the above steps.
> >>> Tested-by: Cédric Le Goater <clg@redhat.com>
> >>
> >> There is a regression since last version.
> >>
> >>
> >> 'reboot' from the guest and command 'system_reset' from the QEMU
> >> monitor now generate these outputs:
> >>
> >> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> >> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> >> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> >> ...
> >>
> >> and QEMU exits after a while.
> >
> > I have only seen this in SEV-ES with more than one vcpus. Never with TDX or SEV-SNP (single or multiple cpus).
> > On which host/guest type did you see this?
>
> SEV-SNP on a RHEL9 host. Same guest I used before and host says :
>
> [1816531.409591] kvm_amd: SEV-ES guest requested termination: 0x0:0x0
Strange! I am not sure why KVM thinks it's SEV-ES. I have done all my
SEV-SNP and TDX testing on a fc43 host and for SEV-ES I used a fc42
host. I have not seen this kind of guest termination on SEV-SNP or TDX
on that host. I am sure there are some differences between the RHEL9
host kernel and fc43 kernel.
>
> Thanks,
>
> C.
>
>
>
> >
> >>
> >>
> >>
> >>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>
> >> Anyhow this patch looks good.
> >>
> >> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> >>
> >> Thanks,
> >>
> >> C.
> >>
> >>> ---
> >>> hw/vfio/helpers.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++
> >>> 1 file changed, 92 insertions(+)
> >>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> >>> index f68f8165d0..e2bedd15ec 100644
> >>> --- a/hw/vfio/helpers.c
> >>> +++ b/hw/vfio/helpers.c
> >>> @@ -116,6 +116,89 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
> >>> * we'll re-use it should another vfio device be attached before then.
> >>> */
> >>> int vfio_kvm_device_fd = -1;
> >>> +
> >>> +/*
> >>> + * Confidential virtual machines:
> >>> + * During reset of confidential vms, the kvm vm file descriptor changes.
> >>> + * In this case, the old vfio kvm file descriptor is
> >>> + * closed and a new descriptor is created against the new kvm vm file
> >>> + * descriptor.
> >>> + */
> >>> +
> >>> +typedef struct VFIODeviceFd {
> >>> + int fd;
> >>> + QLIST_ENTRY(VFIODeviceFd) node;
> >>> +} VFIODeviceFd;
> >>> +
> >>> +static QLIST_HEAD(, VFIODeviceFd) vfio_device_fds =
> >>> + QLIST_HEAD_INITIALIZER(vfio_device_fds);
> >>> +
> >>> +static void vfio_device_fd_list_add(int fd)
> >>> +{
> >>> + VFIODeviceFd *file_fd;
> >>> + file_fd = g_malloc0(sizeof(*file_fd));
> >>> + file_fd->fd = fd;
> >>> + QLIST_INSERT_HEAD(&vfio_device_fds, file_fd, node);
> >>> +}
> >>> +
> >>> +static void vfio_device_fd_list_remove(int fd)
> >>> +{
> >>> + VFIODeviceFd *file_fd, *next;
> >>> +
> >>> + QLIST_FOREACH_SAFE(file_fd, &vfio_device_fds, node, next) {
> >>> + if (file_fd->fd == fd) {
> >>> + QLIST_REMOVE(file_fd, node);
> >>> + g_free(file_fd);
> >>> + break;
> >>> + }
> >>> + }
> >>> +}
> >>> +
> >>> +static int vfio_device_fd_rebind(NotifierWithReturn *notifier, void *data,
> >>> + Error **errp)
> >>> +{
> >>> + VFIODeviceFd *file_fd;
> >>> + int ret = 0;
> >>> + struct kvm_device_attr attr = {
> >>> + .group = KVM_DEV_VFIO_FILE,
> >>> + .attr = KVM_DEV_VFIO_FILE_ADD,
> >>> + };
> >>> + struct kvm_create_device cd = {
> >>> + .type = KVM_DEV_TYPE_VFIO,
> >>> + };
> >>> +
> >>> + /* we are not interested in pre vmfd change notification */
> >>> + if (((VmfdChangeNotifier *)data)->pre) {
> >>> + return 0;
> >>> + }
> >>> +
> >>> + if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
> >>> + error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
> >>> + return -errno;
> >>> + }
> >>> +
> >>> + if (vfio_kvm_device_fd != -1) {
> >>> + close(vfio_kvm_device_fd);
> >>> + }
> >>> +
> >>> + vfio_kvm_device_fd = cd.fd;
> >>> +
> >>> + QLIST_FOREACH(file_fd, &vfio_device_fds, node) {
> >>> + attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
> >>> + if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
> >>> + error_setg_errno(errp, errno,
> >>> + "Failed to add fd %d to KVM VFIO device",
> >>> + file_fd->fd);
> >>> + ret = -errno;
> >>> + }
> >>> + }
> >>> + return ret;
> >>> +}
> >>> +
> >>> +static struct NotifierWithReturn vfio_vmfd_change_notifier = {
> >>> + .notify = vfio_device_fd_rebind,
> >>> +};
> >>> +
> >>> #endif
> >>> void vfio_kvm_device_close(void)
> >>> @@ -153,6 +236,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
> >>> }
> >>> vfio_kvm_device_fd = cd.fd;
> >>> + /*
> >>> + * If the vm file descriptor changes, add a notifier so that we can
> >>> + * re-create the vfio_kvm_device_fd.
> >>> + */
> >>> + kvm_vmfd_add_change_notifier(&vfio_vmfd_change_notifier);
> >>> }
> >>> if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
> >>> @@ -160,6 +248,8 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
> >>> fd);
> >>> return -errno;
> >>> }
> >>> +
> >>> + vfio_device_fd_list_add(fd);
> >>> #endif
> >>> return 0;
> >>> }
> >>> @@ -183,6 +273,8 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
> >>> "Failed to remove fd %d from KVM VFIO device", fd);
> >>> return -errno;
> >>> }
> >>> +
> >>> + vfio_device_fd_list_remove(fd);
> >>> #endif
> >>> return 0;
> >>> }
> >>
> >
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 17:33 ` Ani Sinha
@ 2026-02-18 17:39 ` Cédric Le Goater
2026-02-19 5:30 ` Ani Sinha
2026-02-23 11:56 ` Gerd Hoffmann
1 sibling, 1 reply; 60+ messages in thread
From: Cédric Le Goater @ 2026-02-18 17:39 UTC (permalink / raw)
To: Ani Sinha; +Cc: Alex Williamson, Gerd Hoffmann, qemu-devel, Paolo Bonzini
On 2/18/26 18:33, Ani Sinha wrote:
> On Wed, Feb 18, 2026 at 9:00 PM Cédric Le Goater <clg@redhat.com> wrote:
>>
>> On 2/18/26 16:07, Ani Sinha wrote:
>>>
>>>
>>>> On 18 Feb 2026, at 7:37 PM, Cédric Le Goater <clg@redhat.com> wrote:
>>>>
>>>> On 2/18/26 12:42, Ani Sinha wrote:
>>>>> Normally the vfio pseudo device file descriptor lives for the life of the VM.
>>>>> However, when the kvm VM file descriptor changes, a new file descriptor
>>>>> for the pseudo device needs to be generated against the new kvm VM descriptor.
>>>>> Other existing vfio descriptors needs to be reattached to the new pseudo device
>>>>> descriptor. This change performs the above steps.
>>>>> Tested-by: Cédric Le Goater <clg@redhat.com>
>>>>
>>>> There is a regression since last version.
>>>>
>>>>
>>>> 'reboot' from the guest and command 'system_reset' from the QEMU
>>>> monitor now generate these outputs:
>>>>
>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>>>> ...
>>>>
>>>> and QEMU exits after a while.
>>>
>>> I have only seen this in SEV-ES with more than one vcpus. Never with TDX or SEV-SNP (single or multiple cpus).
>>> On which host/guest type did you see this?
>>
>> SEV-SNP on a RHEL9 host. Same guest I used before and host says :
>>
>> [1816531.409591] kvm_amd: SEV-ES guest requested termination: 0x0:0x0
>
> Strange! I am not sure why KVM thinks it's SEV-ES. I have done all my
> SEV-SNP and TDX testing on a fc43 host and for SEV-ES I used a fc42
> host. I have not seen this kind of guest termination on SEV-SNP or TDX
> on that host. I am sure there are some differences between the RHEL9
> host kernel and fc43 kernel.
The same issue occurs with a rhel10 kernel.
C.
>
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>>>
>>>>
>>>>
>>>>
>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>
>>>> Anyhow this patch looks good.
>>>>
>>>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>>>>
>>>> Thanks,
>>>>
>>>> C.
>>>>
>>>>> ---
>>>>> hw/vfio/helpers.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 1 file changed, 92 insertions(+)
>>>>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>>>>> index f68f8165d0..e2bedd15ec 100644
>>>>> --- a/hw/vfio/helpers.c
>>>>> +++ b/hw/vfio/helpers.c
>>>>> @@ -116,6 +116,89 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>>>>> * we'll re-use it should another vfio device be attached before then.
>>>>> */
>>>>> int vfio_kvm_device_fd = -1;
>>>>> +
>>>>> +/*
>>>>> + * Confidential virtual machines:
>>>>> + * During reset of confidential vms, the kvm vm file descriptor changes.
>>>>> + * In this case, the old vfio kvm file descriptor is
>>>>> + * closed and a new descriptor is created against the new kvm vm file
>>>>> + * descriptor.
>>>>> + */
>>>>> +
>>>>> +typedef struct VFIODeviceFd {
>>>>> + int fd;
>>>>> + QLIST_ENTRY(VFIODeviceFd) node;
>>>>> +} VFIODeviceFd;
>>>>> +
>>>>> +static QLIST_HEAD(, VFIODeviceFd) vfio_device_fds =
>>>>> + QLIST_HEAD_INITIALIZER(vfio_device_fds);
>>>>> +
>>>>> +static void vfio_device_fd_list_add(int fd)
>>>>> +{
>>>>> + VFIODeviceFd *file_fd;
>>>>> + file_fd = g_malloc0(sizeof(*file_fd));
>>>>> + file_fd->fd = fd;
>>>>> + QLIST_INSERT_HEAD(&vfio_device_fds, file_fd, node);
>>>>> +}
>>>>> +
>>>>> +static void vfio_device_fd_list_remove(int fd)
>>>>> +{
>>>>> + VFIODeviceFd *file_fd, *next;
>>>>> +
>>>>> + QLIST_FOREACH_SAFE(file_fd, &vfio_device_fds, node, next) {
>>>>> + if (file_fd->fd == fd) {
>>>>> + QLIST_REMOVE(file_fd, node);
>>>>> + g_free(file_fd);
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> +}
>>>>> +
>>>>> +static int vfio_device_fd_rebind(NotifierWithReturn *notifier, void *data,
>>>>> + Error **errp)
>>>>> +{
>>>>> + VFIODeviceFd *file_fd;
>>>>> + int ret = 0;
>>>>> + struct kvm_device_attr attr = {
>>>>> + .group = KVM_DEV_VFIO_FILE,
>>>>> + .attr = KVM_DEV_VFIO_FILE_ADD,
>>>>> + };
>>>>> + struct kvm_create_device cd = {
>>>>> + .type = KVM_DEV_TYPE_VFIO,
>>>>> + };
>>>>> +
>>>>> + /* we are not interested in pre vmfd change notification */
>>>>> + if (((VmfdChangeNotifier *)data)->pre) {
>>>>> + return 0;
>>>>> + }
>>>>> +
>>>>> + if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
>>>>> + error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
>>>>> + return -errno;
>>>>> + }
>>>>> +
>>>>> + if (vfio_kvm_device_fd != -1) {
>>>>> + close(vfio_kvm_device_fd);
>>>>> + }
>>>>> +
>>>>> + vfio_kvm_device_fd = cd.fd;
>>>>> +
>>>>> + QLIST_FOREACH(file_fd, &vfio_device_fds, node) {
>>>>> + attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
>>>>> + if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>>>>> + error_setg_errno(errp, errno,
>>>>> + "Failed to add fd %d to KVM VFIO device",
>>>>> + file_fd->fd);
>>>>> + ret = -errno;
>>>>> + }
>>>>> + }
>>>>> + return ret;
>>>>> +}
>>>>> +
>>>>> +static struct NotifierWithReturn vfio_vmfd_change_notifier = {
>>>>> + .notify = vfio_device_fd_rebind,
>>>>> +};
>>>>> +
>>>>> #endif
>>>>> void vfio_kvm_device_close(void)
>>>>> @@ -153,6 +236,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>>>>> }
>>>>> vfio_kvm_device_fd = cd.fd;
>>>>> + /*
>>>>> + * If the vm file descriptor changes, add a notifier so that we can
>>>>> + * re-create the vfio_kvm_device_fd.
>>>>> + */
>>>>> + kvm_vmfd_add_change_notifier(&vfio_vmfd_change_notifier);
>>>>> }
>>>>> if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>>>>> @@ -160,6 +248,8 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
>>>>> fd);
>>>>> return -errno;
>>>>> }
>>>>> +
>>>>> + vfio_device_fd_list_add(fd);
>>>>> #endif
>>>>> return 0;
>>>>> }
>>>>> @@ -183,6 +273,8 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
>>>>> "Failed to remove fd %d from KVM VFIO device", fd);
>>>>> return -errno;
>>>>> }
>>>>> +
>>>>> + vfio_device_fd_list_remove(fd);
>>>>> #endif
>>>>> return 0;
>>>>> }
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 17:39 ` Cédric Le Goater
@ 2026-02-19 5:30 ` Ani Sinha
2026-02-19 8:01 ` Ani Sinha
0 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-19 5:30 UTC (permalink / raw)
To: Cedric Le Goater
Cc: Alex Williamson, Gerd Hoffmann, qemu-devel, Paolo Bonzini
> On 18 Feb 2026, at 11:09 PM, Cédric Le Goater <clg@redhat.com> wrote:
>
> On 2/18/26 18:33, Ani Sinha wrote:
>> On Wed, Feb 18, 2026 at 9:00 PM Cédric Le Goater <clg@redhat.com> wrote:
>>>
>>> On 2/18/26 16:07, Ani Sinha wrote:
>>>>
>>>>
>>>>> On 18 Feb 2026, at 7:37 PM, Cédric Le Goater <clg@redhat.com> wrote:
>>>>>
>>>>> On 2/18/26 12:42, Ani Sinha wrote:
>>>>>> Normally the vfio pseudo device file descriptor lives for the life of the VM.
>>>>>> However, when the kvm VM file descriptor changes, a new file descriptor
>>>>>> for the pseudo device needs to be generated against the new kvm VM descriptor.
>>>>>> Other existing vfio descriptors needs to be reattached to the new pseudo device
>>>>>> descriptor. This change performs the above steps.
>>>>>> Tested-by: Cédric Le Goater <clg@redhat.com>
>>>>>
>>>>> There is a regression since last version.
>>>>>
>>>>>
>>>>> 'reboot' from the guest and command 'system_reset' from the QEMU
>>>>> monitor now generate these outputs:
>>>>>
>>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>>>>> ...
>>>>>
>>>>> and QEMU exits after a while.
>>>>
>>>> I have only seen this in SEV-ES with more than one vcpus. Never with TDX or SEV-SNP (single or multiple cpus).
>>>> On which host/guest type did you see this?
>>>
>>> SEV-SNP on a RHEL9 host. Same guest I used before and host says :
>>>
>>> [1816531.409591] kvm_amd: SEV-ES guest requested termination: 0x0:0x0
>> Strange! I am not sure why KVM thinks it's SEV-ES. I have done all my
>> SEV-SNP and TDX testing on a fc43 host and for SEV-ES I used a fc42
>> host. I have not seen this kind of guest termination on SEV-SNP or TDX
>> on that host. I am sure there are some differences between the RHEL9
>> host kernel and fc43 kernel.
>
> The same issue occurs with a rhel10 kernel.
OK good news is that I see this issue on the latest rhel9 host but not on a fc43 host.
The issue happens only when > 1 cpus are created for the guest. With only one cpu seems its fine.
I tried a minimalistic guest with only a kernel passed as UKI as well as a full blown fc43 guest. Both manifest the same issue with > 1 vcpu threads.
There is a regression indeed between versions. My v3 which I posted just before FOSDEM seems good and does not show this behaviour on the same RHEL 9 host.
Both v4 and v5 are affected. This means regression is somewhere in my changes, not due to host kernel changes.
This shows a little bit more information:
bash-5.1# /usr/sbin/reboot -f
[ 43.920835] ACPI: PM: Preparing to enter system sleep state S5
[ 43.922988] reboot: Restarting system
[ 43.924242] reboot: machine restart
qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00800f12
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000b004 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=1
ES =0000 00000000 0000ffff 00009300
CS =f000 00800000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
It is likely that the changes towards fixing non-coco reset case with QMP “system_reset” introduced this regression. I will have to audit changes between v3 and v4.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-19 5:30 ` Ani Sinha
@ 2026-02-19 8:01 ` Ani Sinha
2026-02-19 8:14 ` Cédric Le Goater
0 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-19 8:01 UTC (permalink / raw)
To: Cedric Le Goater
Cc: Alex Williamson, Gerd Hoffmann, qemu-devel, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 5744 bytes --]
On Thu, 19 Feb 2026, Ani Sinha wrote:
>
>
> > On 18 Feb 2026, at 11:09 PM, Cédric Le Goater <clg@redhat.com> wrote:
> >
> > On 2/18/26 18:33, Ani Sinha wrote:
> >> On Wed, Feb 18, 2026 at 9:00 PM Cédric Le Goater <clg@redhat.com> wrote:
> >>>
> >>> On 2/18/26 16:07, Ani Sinha wrote:
> >>>>
> >>>>
> >>>>> On 18 Feb 2026, at 7:37 PM, Cédric Le Goater <clg@redhat.com> wrote:
> >>>>>
> >>>>> On 2/18/26 12:42, Ani Sinha wrote:
> >>>>>> Normally the vfio pseudo device file descriptor lives for the life of the VM.
> >>>>>> However, when the kvm VM file descriptor changes, a new file descriptor
> >>>>>> for the pseudo device needs to be generated against the new kvm VM descriptor.
> >>>>>> Other existing vfio descriptors needs to be reattached to the new pseudo device
> >>>>>> descriptor. This change performs the above steps.
> >>>>>> Tested-by: Cédric Le Goater <clg@redhat.com>
> >>>>>
> >>>>> There is a regression since last version.
> >>>>>
> >>>>>
> >>>>> 'reboot' from the guest and command 'system_reset' from the QEMU
> >>>>> monitor now generate these outputs:
> >>>>>
> >>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> >>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> >>>>> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> >>>>> ...
> >>>>>
> >>>>> and QEMU exits after a while.
> >>>>
> >>>> I have only seen this in SEV-ES with more than one vcpus. Never with TDX or SEV-SNP (single or multiple cpus).
> >>>> On which host/guest type did you see this?
> >>>
> >>> SEV-SNP on a RHEL9 host. Same guest I used before and host says :
> >>>
> >>> [1816531.409591] kvm_amd: SEV-ES guest requested termination: 0x0:0x0
> >> Strange! I am not sure why KVM thinks it's SEV-ES. I have done all my
> >> SEV-SNP and TDX testing on a fc43 host and for SEV-ES I used a fc42
> >> host. I have not seen this kind of guest termination on SEV-SNP or TDX
> >> on that host. I am sure there are some differences between the RHEL9
> >> host kernel and fc43 kernel.
> >
> > The same issue occurs with a rhel10 kernel.
>
> OK good news is that I see this issue on the latest rhel9 host but not on a fc43 host.
> The issue happens only when > 1 cpus are created for the guest. With only one cpu seems its fine.
> I tried a minimalistic guest with only a kernel passed as UKI as well as a full blown fc43 guest. Both manifest the same issue with > 1 vcpu threads.
>
> There is a regression indeed between versions. My v3 which I posted just before FOSDEM seems good and does not show this behaviour on the same RHEL 9 host.
> Both v4 and v5 are affected. This means regression is somewhere in my changes, not due to host kernel changes.
>
> This shows a little bit more information:
>
> bash-5.1# /usr/sbin/reboot -f
> [ 43.920835] ACPI: PM: Preparing to enter system sleep state S5
> [ 43.922988] reboot: Restarting system
> [ 43.924242] reboot: machine restart
> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00800f12
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000b004 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=1
> ES =0000 00000000 0000ffff 00009300
> CS =f000 00800000 0000ffff 00009b00
> SS =0000 00000000 0000ffff 00009300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT= 00000000 0000ffff
> IDT= 00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> It is likely that the changes towards fixing non-coco reset case with QMP “system_reset” introduced this regression. I will have to audit changes between v3 and v4.
>
Alright fixed this. Basically I had dropped
https://gitlab.com/anisinha/qemu/-/commit/b413141cd8a5b599214153a5e37b485443885718
but this is needed. Also I need to call cpu_synchronize_all_post_init()
for the coco case as well. This will also set vcpu->dirty = false and this
will make subsequent code paths happy.
please try this patch on top of my v5. It should fix the issue. I tested
on a RHEL 9 host and a fc43 SEV-SNP host, 1 or multiple vcpus, two
different guests and all looks good.
From 26db44eba8c727160dd9e97c9d5582a0ddc5884d Mon Sep 17 00:00:00 2001
From: Ani Sinha <anisinha@redhat.com>
Date: Thu, 19 Feb 2026 13:12:02 +0530
Subject: [PATCH] Call cpu_synchronize_all_post_init for coco as well
Fixes issue reported by Cedric
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index d7ea60f582..0610bf8434 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2816,7 +2816,7 @@ static int kvm_reset_vmfd(MachineState *ms)
}
s->vmfd = ret;
-
+ kvm_state->guest_state_protected = false;
kvm_setup_dirty_ring(s);
/* rebind memory to new vm fd */
@@ -2872,6 +2872,9 @@ static int kvm_reset_vmfd(MachineState *ms)
* kvm fd has changed. Commit the irq routes to KVM once more.
*/
kvm_irqchip_commit_routes(s);
+ if (ms->cgs) {
+ cpu_synchronize_all_post_init();
+ }
trace_kvm_reset_vmfd();
return ret;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-19 8:01 ` Ani Sinha
@ 2026-02-19 8:14 ` Cédric Le Goater
2026-02-19 9:06 ` Ani Sinha
0 siblings, 1 reply; 60+ messages in thread
From: Cédric Le Goater @ 2026-02-19 8:14 UTC (permalink / raw)
To: Ani Sinha; +Cc: Alex Williamson, Gerd Hoffmann, qemu-devel, Paolo Bonzini
> please try this patch on top of my v5. It should fix the issue. I tested
> on a RHEL 9 host and a fc43 SEV-SNP host, 1 or multiple vcpus, two
> different guests and all looks good.
>
> From 26db44eba8c727160dd9e97c9d5582a0ddc5884d Mon Sep 17 00:00:00 2001
> From: Ani Sinha <anisinha@redhat.com>
> Date: Thu, 19 Feb 2026 13:12:02 +0530
> Subject: [PATCH] Call cpu_synchronize_all_post_init for coco as well
>
> Fixes issue reported by Cedric
>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
> accel/kvm/kvm-all.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index d7ea60f582..0610bf8434 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2816,7 +2816,7 @@ static int kvm_reset_vmfd(MachineState *ms)
> }
>
> s->vmfd = ret;
> -
> + kvm_state->guest_state_protected = false;
> kvm_setup_dirty_ring(s);
>
> /* rebind memory to new vm fd */
> @@ -2872,6 +2872,9 @@ static int kvm_reset_vmfd(MachineState *ms)
> * kvm fd has changed. Commit the irq routes to KVM once more.
> */
> kvm_irqchip_commit_routes(s);
> + if (ms->cgs) {
> + cpu_synchronize_all_post_init();
> + }
> trace_kvm_reset_vmfd();
> return ret;
> }
Tested (OS reboot, system_reset) with a RHEL10 guest (2 vCPUs) on
a RHEL9 and a RHEL10 host, using a SATA device and active NIC VFs.
All Looks good.
There is still one message :
qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
which seems normal.
Thanks,
C.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms
2026-02-18 11:42 ` [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms Ani Sinha
@ 2026-02-19 8:55 ` Markus Armbruster
2026-02-19 9:12 ` Ani Sinha
2026-02-23 10:08 ` Daniel P. Berrangé
1 sibling, 1 reply; 60+ messages in thread
From: Markus Armbruster @ 2026-02-19 8:55 UTC (permalink / raw)
To: Ani Sinha
Cc: Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
Eric Blake, kraxel, qemu-devel
Ani Sinha <anisinha@redhat.com> writes:
> Through the new 'confidential-guest-reset' property, control plane should be
> able to detect if the hypervisor supports x86 confidential guest resets. Older
> hypervisors that do not support resets will not have this property populated.
Double-checking... This is an static ability of QEMU, and QEMU alone.
It does not depend on QEMU's run-time environment (host kernel, ...) or
the guest. Correct?
> Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
Patch looks sane.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-19 8:14 ` Cédric Le Goater
@ 2026-02-19 9:06 ` Ani Sinha
0 siblings, 0 replies; 60+ messages in thread
From: Ani Sinha @ 2026-02-19 9:06 UTC (permalink / raw)
To: Cedric Le Goater
Cc: Alex Williamson, Gerd Hoffmann, qemu-devel, Paolo Bonzini
> On 19 Feb 2026, at 1:44 PM, Cédric Le Goater <clg@redhat.com> wrote:
>
>> please try this patch on top of my v5. It should fix the issue. I tested
>> on a RHEL 9 host and a fc43 SEV-SNP host, 1 or multiple vcpus, two
>> different guests and all looks good.
>> From 26db44eba8c727160dd9e97c9d5582a0ddc5884d Mon Sep 17 00:00:00 2001
>> From: Ani Sinha <anisinha@redhat.com>
>> Date: Thu, 19 Feb 2026 13:12:02 +0530
>> Subject: [PATCH] Call cpu_synchronize_all_post_init for coco as well
>> Fixes issue reported by Cedric
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> ---
>> accel/kvm/kvm-all.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index d7ea60f582..0610bf8434 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -2816,7 +2816,7 @@ static int kvm_reset_vmfd(MachineState *ms)
>> }
>> s->vmfd = ret;
>> -
>> + kvm_state->guest_state_protected = false;
>> kvm_setup_dirty_ring(s);
>> /* rebind memory to new vm fd */
>> @@ -2872,6 +2872,9 @@ static int kvm_reset_vmfd(MachineState *ms)
>> * kvm fd has changed. Commit the irq routes to KVM once more.
>> */
>> kvm_irqchip_commit_routes(s);
>> + if (ms->cgs) {
>> + cpu_synchronize_all_post_init();
>> + }
>> trace_kvm_reset_vmfd();
>> return ret;
>> }
>
> Tested (OS reboot, system_reset) with a RHEL10 guest (2 vCPUs) on
> a RHEL9 and a RHEL10 host, using a SATA device and active NIC VFs.
> All Looks good.
Excellent. Also tested on a TDX host with RHEL 9.6 and seems good as well. Tested running my integration tests on non-coco and all is fine there too.
>
> There is still one message :
>
> qemu-system-x86_64: info: virtual machine state has been rebuilt with new guest file handle.
>
> which seems normal.
>
> Thanks,
>
> C.
>
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms
2026-02-19 8:55 ` Markus Armbruster
@ 2026-02-19 9:12 ` Ani Sinha
2026-02-19 9:27 ` Markus Armbruster
0 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-19 9:12 UTC (permalink / raw)
To: Markus Armbruster
Cc: Paolo Bonzini, Daniel Berrange, Eduardo Habkost, Eric Blake,
Gerd Hoffmann, qemu-devel
> On 19 Feb 2026, at 2:25 PM, Markus Armbruster <armbru@redhat.com> wrote:
>
> Ani Sinha <anisinha@redhat.com> writes:
>
>> Through the new 'confidential-guest-reset' property, control plane should be
>> able to detect if the hypervisor supports x86 confidential guest resets. Older
>> hypervisors that do not support resets will not have this property populated.
>
> Double-checking... This is an static ability of QEMU, and QEMU alone.
> It does not depend on QEMU's run-time environment (host kernel, ...) or
> the guest. Correct?
The run time environment is the same as what is needed to spawn confidential guests. That is, the host should support confidential technology. The host kernel/distribution should support confidential technologies. Plus the guest should support confidential technologies. There is nothing additionally needed to support resets. There are no additional dependencies with host kernel/environment and/or the guest etc to support reset.
>
>> Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>
> Patch looks sane.
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms
2026-02-19 9:12 ` Ani Sinha
@ 2026-02-19 9:27 ` Markus Armbruster
2026-02-19 9:29 ` Ani Sinha
0 siblings, 1 reply; 60+ messages in thread
From: Markus Armbruster @ 2026-02-19 9:27 UTC (permalink / raw)
To: Ani Sinha
Cc: Markus Armbruster, Paolo Bonzini, Daniel Berrange,
Eduardo Habkost, Eric Blake, Gerd Hoffmann, qemu-devel
Ani Sinha <anisinha@redhat.com> writes:
>> On 19 Feb 2026, at 2:25 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>
>> Ani Sinha <anisinha@redhat.com> writes:
>>
>>> Through the new 'confidential-guest-reset' property, control plane should be
>>> able to detect if the hypervisor supports x86 confidential guest resets. Older
>>> hypervisors that do not support resets will not have this property populated.
>>
>> Double-checking... This is an static ability of QEMU, and QEMU alone.
>> It does not depend on QEMU's run-time environment (host kernel, ...) or
>> the guest. Correct?
>
> The run time environment is the same as what is needed to spawn confidential guests. That is, the host should support confidential technology. The host kernel/distribution should support confidential technologies. Plus the guest should support confidential technologies. There is nothing additionally needed to support resets. There are no additional dependencies with host kernel/environment and/or the guest etc to support reset.
So... if a QEMU with this feature succeeded at starting a confidential
guest, then x86 confidential guest reset is definitely supported for
that guest. Correct?
>>
>>> Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>
>> Patch looks sane.
>>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms
2026-02-19 9:27 ` Markus Armbruster
@ 2026-02-19 9:29 ` Ani Sinha
2026-02-19 10:19 ` Markus Armbruster
0 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-19 9:29 UTC (permalink / raw)
To: Markus Armbruster
Cc: Paolo Bonzini, Daniel Berrange, Eduardo Habkost, Eric Blake,
Gerd Hoffmann, qemu-devel
> On 19 Feb 2026, at 2:57 PM, Markus Armbruster <armbru@redhat.com> wrote:
>
> Ani Sinha <anisinha@redhat.com> writes:
>
>>> On 19 Feb 2026, at 2:25 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>>
>>> Ani Sinha <anisinha@redhat.com> writes:
>>>
>>>> Through the new 'confidential-guest-reset' property, control plane should be
>>>> able to detect if the hypervisor supports x86 confidential guest resets. Older
>>>> hypervisors that do not support resets will not have this property populated.
>>>
>>> Double-checking... This is an static ability of QEMU, and QEMU alone.
>>> It does not depend on QEMU's run-time environment (host kernel, ...) or
>>> the guest. Correct?
>>
>> The run time environment is the same as what is needed to spawn confidential guests. That is, the host should support confidential technology. The host kernel/distribution should support confidential technologies. Plus the guest should support confidential technologies. There is nothing additionally needed to support resets. There are no additional dependencies with host kernel/environment and/or the guest etc to support reset.
>
> So... if a QEMU with this feature succeeded at starting a confidential
> guest, then x86 confidential guest reset is definitely supported for
> that guest. Correct?
Yes :-)
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during confidential guest reset
2026-02-18 11:42 ` [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during " Ani Sinha
@ 2026-02-19 9:39 ` Paul Durrant
2026-02-19 10:31 ` Ani Sinha
0 siblings, 1 reply; 60+ messages in thread
From: Paul Durrant @ 2026-02-19 9:39 UTC (permalink / raw)
To: Ani Sinha, David Woodhouse, Paolo Bonzini, Marcelo Tosatti
Cc: kraxel, kvm, qemu-devel
On 18/02/2026 11:42, Ani Sinha wrote:
> On confidential guests KVM virtual machine file descriptor changes as a
> part of the guest reset process. Xen capabilities needs to be re-initialized in
> KVM against the new file descriptor.
>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
> target/i386/kvm/xen-emu.c | 50 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
> index 52de019834..69527145eb 100644
> --- a/target/i386/kvm/xen-emu.c
> +++ b/target/i386/kvm/xen-emu.c
> @@ -44,9 +44,12 @@
>
> #include "xen-compat.h"
>
> +NotifierWithReturn xen_vmfd_change_notifier;
> +static bool hyperv_enabled;
> static void xen_vcpu_singleshot_timer_event(void *opaque);
> static void xen_vcpu_periodic_timer_event(void *opaque);
> static int vcpuop_stop_singleshot_timer(CPUState *cs);
> +static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr);
>
> #ifdef TARGET_X86_64
> #define hypercall_compat32(longmode) (!(longmode))
> @@ -54,6 +57,30 @@ static int vcpuop_stop_singleshot_timer(CPUState *cs);
> #define hypercall_compat32(longmode) (false)
> #endif
>
> +static int xen_handle_vmfd_change(NotifierWithReturn *n,
> + void *data, Error** errp)
> +{
> + int ret;
> +
> + /* we are not interested in pre vmfd change notification */
> + if (((VmfdChangeNotifier *)data)->pre) {
> + return 0;
> + }
> +
> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR);
> + if (ret < 0) {
> + return ret;
> + }
> +
> + if (hyperv_enabled) {
> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR_HYPERV);
> + if (ret < 0) {
> + return ret;
> + }
> + }
> + return 0;
This seems odd. Why use the hyperv_enabled boolean, rather than simply
the msr value, since when hyperv_enabled is set you will be calling
do_initialize_xen_caps() twice.
> +}
> +
> static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
> size_t *len, bool is_write)
> {
> @@ -111,15 +138,16 @@ static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
> return kvm_gva_rw(cs, gva, buf, sz, true);
> }
>
> -int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
> +static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr)
> {
> + int xen_caps, ret;
> const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
> KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_SHARED_INFO;
> +
Gratuitous whitespace change.
> struct kvm_xen_hvm_config cfg = {
> .msr = hypercall_msr,
> .flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL,
> };
> - int xen_caps, ret;
>
> xen_caps = kvm_check_extension(s, KVM_CAP_XEN_HVM);
> if (required_caps & ~xen_caps) {
> @@ -143,6 +171,21 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
> strerror(-ret));
> return ret;
> }
> + return xen_caps;
> +}
> +
> +int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
> +{
> + int xen_caps;
> +
> + xen_caps = do_initialize_xen_caps(s, hypercall_msr);
> + if (xen_caps < 0) {
> + return xen_caps;
> + }
> +
Clearly here the code would be simpler here if you just saved the value
of hypercall_msr which you have used in the call above.
> + if (!hyperv_enabled && (hypercall_msr == XEN_HYPERCALL_MSR_HYPERV)) {
> + hyperv_enabled = true;
> + }
>
> /* If called a second time, don't repeat the rest of the setup. */
> if (s->xen_caps) {
> @@ -185,6 +228,9 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
> xen_primary_console_reset();
> xen_xenstore_reset();
>
> + xen_vmfd_change_notifier.notify = xen_handle_vmfd_change;
> + kvm_vmfd_add_change_notifier(&xen_vmfd_change_notifier);
> +
> return 0;
> }
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms
2026-02-19 9:29 ` Ani Sinha
@ 2026-02-19 10:19 ` Markus Armbruster
0 siblings, 0 replies; 60+ messages in thread
From: Markus Armbruster @ 2026-02-19 10:19 UTC (permalink / raw)
To: Ani Sinha
Cc: Paolo Bonzini, Daniel Berrange, Eduardo Habkost, Eric Blake,
Gerd Hoffmann, qemu-devel
Ani Sinha <anisinha@redhat.com> writes:
>> On 19 Feb 2026, at 2:57 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>
>> Ani Sinha <anisinha@redhat.com> writes:
>>
>>>> On 19 Feb 2026, at 2:25 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>>>
>>>> Ani Sinha <anisinha@redhat.com> writes:
>>>>
>>>>> Through the new 'confidential-guest-reset' property, control plane should be
>>>>> able to detect if the hypervisor supports x86 confidential guest resets. Older
>>>>> hypervisors that do not support resets will not have this property populated.
>>>>
>>>> Double-checking... This is an static ability of QEMU, and QEMU alone.
>>>> It does not depend on QEMU's run-time environment (host kernel, ...) or
>>>> the guest. Correct?
>>>
>>> The run time environment is the same as what is needed to spawn confidential guests. That is, the host should support confidential technology. The host kernel/distribution should support confidential technologies. Plus the guest should support confidential technologies. There is nothing additionally needed to support resets. There are no additional dependencies with host kernel/environment and/or the guest etc to support reset.
>>
>> So... if a QEMU with this feature succeeded at starting a confidential
>> guest, then x86 confidential guest reset is definitely supported for
>> that guest. Correct?
>
> Yes :-)
Thank you!
Reviewed-by: Markus Armbruster <armbru@redhat.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during confidential guest reset
2026-02-19 9:39 ` Paul Durrant
@ 2026-02-19 10:31 ` Ani Sinha
2026-02-19 10:45 ` Paul Durrant
0 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-19 10:31 UTC (permalink / raw)
To: Paul Durrant
Cc: David Woodhouse, Paolo Bonzini, Marcelo Tosatti, Gerd Hoffmann,
kvm, qemu-devel
> On 19 Feb 2026, at 3:09 PM, Paul Durrant <xadimgnik@gmail.com> wrote:
>
> On 18/02/2026 11:42, Ani Sinha wrote:
>> On confidential guests KVM virtual machine file descriptor changes as a
>> part of the guest reset process. Xen capabilities needs to be re-initialized in
>> KVM against the new file descriptor.
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> ---
>> target/i386/kvm/xen-emu.c | 50 +++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 48 insertions(+), 2 deletions(-)
>> diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
>> index 52de019834..69527145eb 100644
>> --- a/target/i386/kvm/xen-emu.c
>> +++ b/target/i386/kvm/xen-emu.c
>> @@ -44,9 +44,12 @@
>> #include "xen-compat.h"
>> +NotifierWithReturn xen_vmfd_change_notifier;
>> +static bool hyperv_enabled;
>> static void xen_vcpu_singleshot_timer_event(void *opaque);
>> static void xen_vcpu_periodic_timer_event(void *opaque);
>> static int vcpuop_stop_singleshot_timer(CPUState *cs);
>> +static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr);
>> #ifdef TARGET_X86_64
>> #define hypercall_compat32(longmode) (!(longmode))
>> @@ -54,6 +57,30 @@ static int vcpuop_stop_singleshot_timer(CPUState *cs);
>> #define hypercall_compat32(longmode) (false)
>> #endif
>> +static int xen_handle_vmfd_change(NotifierWithReturn *n,
>> + void *data, Error** errp)
>> +{
>> + int ret;
>> +
>> + /* we are not interested in pre vmfd change notification */
>> + if (((VmfdChangeNotifier *)data)->pre) {
>> + return 0;
>> + }
>> +
>> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> +
>> + if (hyperv_enabled) {
>> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR_HYPERV);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + }
>> + return 0;
>
> This seems odd. Why use the hyperv_enabled boolean, rather than simply the msr value, since when hyperv_enabled is set you will be calling do_initialize_xen_caps() twice.
I am not sure of enabling capabilities for Xen. I assumed we need to call kvm_xen_init() twice, once normally with XEN_HYPERCALL_MSR and if hyper is enabled, again with XEN_HYPERCALL_MSR_HYPERV. Is that not the case? Is it one or the other but not both? It seems kvm_arch_init() calls kvm_xen_init() once with XEN_HYPERCALL_MSR and another time vcpu_arch_init() calls it again if hyperv is enabled with XEN_HYPERCALL_MSR_HYPERV .
>
>> +}
>> +
>> static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
>> size_t *len, bool is_write)
>> {
>> @@ -111,15 +138,16 @@ static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
>> return kvm_gva_rw(cs, gva, buf, sz, true);
>> }
>> -int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
>> +static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr)
>> {
>> + int xen_caps, ret;
>> const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
>> KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_SHARED_INFO;
>> +
>
> Gratuitous whitespace change.
>
>> struct kvm_xen_hvm_config cfg = {
>> .msr = hypercall_msr,
>> .flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL,
>> };
>> - int xen_caps, ret;
>> xen_caps = kvm_check_extension(s, KVM_CAP_XEN_HVM);
>> if (required_caps & ~xen_caps) {
>> @@ -143,6 +171,21 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
>> strerror(-ret));
>> return ret;
>> }
>> + return xen_caps;
>> +}
>> +
>> +int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
>> +{
>> + int xen_caps;
>> +
>> + xen_caps = do_initialize_xen_caps(s, hypercall_msr);
>> + if (xen_caps < 0) {
>> + return xen_caps;
>> + }
>> +
>
> Clearly here the code would be simpler here if you just saved the value of hypercall_msr which you have used in the call above.
>
>> + if (!hyperv_enabled && (hypercall_msr == XEN_HYPERCALL_MSR_HYPERV)) {
>> + hyperv_enabled = true;
>> + }
>> /* If called a second time, don't repeat the rest of the setup. */
>> if (s->xen_caps) {
>> @@ -185,6 +228,9 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
>> xen_primary_console_reset();
>> xen_xenstore_reset();
>> + xen_vmfd_change_notifier.notify = xen_handle_vmfd_change;
>> + kvm_vmfd_add_change_notifier(&xen_vmfd_change_notifier);
>> +
>> return 0;
>> }
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during confidential guest reset
2026-02-19 10:31 ` Ani Sinha
@ 2026-02-19 10:45 ` Paul Durrant
2026-02-19 11:19 ` Ani Sinha
0 siblings, 1 reply; 60+ messages in thread
From: Paul Durrant @ 2026-02-19 10:45 UTC (permalink / raw)
To: Ani Sinha, Paul Durrant
Cc: David Woodhouse, Paolo Bonzini, Marcelo Tosatti, Gerd Hoffmann,
kvm, qemu-devel
On 19/02/2026 10:31, Ani Sinha wrote:
>
>
>> On 19 Feb 2026, at 3:09 PM, Paul Durrant <xadimgnik@gmail.com> wrote:
>>
>> On 18/02/2026 11:42, Ani Sinha wrote:
>>> On confidential guests KVM virtual machine file descriptor changes as a
>>> part of the guest reset process. Xen capabilities needs to be re-initialized in
>>> KVM against the new file descriptor.
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> ---
>>> target/i386/kvm/xen-emu.c | 50 +++++++++++++++++++++++++++++++++++++--
>>> 1 file changed, 48 insertions(+), 2 deletions(-)
>>> diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
>>> index 52de019834..69527145eb 100644
>>> --- a/target/i386/kvm/xen-emu.c
>>> +++ b/target/i386/kvm/xen-emu.c
>>> @@ -44,9 +44,12 @@
>>> #include "xen-compat.h"
>>> +NotifierWithReturn xen_vmfd_change_notifier;
>>> +static bool hyperv_enabled;
>>> static void xen_vcpu_singleshot_timer_event(void *opaque);
>>> static void xen_vcpu_periodic_timer_event(void *opaque);
>>> static int vcpuop_stop_singleshot_timer(CPUState *cs);
>>> +static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr);
>>> #ifdef TARGET_X86_64
>>> #define hypercall_compat32(longmode) (!(longmode))
>>> @@ -54,6 +57,30 @@ static int vcpuop_stop_singleshot_timer(CPUState *cs);
>>> #define hypercall_compat32(longmode) (false)
>>> #endif
>>> +static int xen_handle_vmfd_change(NotifierWithReturn *n,
>>> + void *data, Error** errp)
>>> +{
>>> + int ret;
>>> +
>>> + /* we are not interested in pre vmfd change notification */
>>> + if (((VmfdChangeNotifier *)data)->pre) {
>>> + return 0;
>>> + }
>>> +
>>> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR);
>>> + if (ret < 0) {
>>> + return ret;
>>> + }
>>> +
>>> + if (hyperv_enabled) {
>>> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR_HYPERV);
>>> + if (ret < 0) {
>>> + return ret;
>>> + }
>>> + }
>>> + return 0;
>>
>> This seems odd. Why use the hyperv_enabled boolean, rather than simply the msr value, since when hyperv_enabled is set you will be calling do_initialize_xen_caps() twice.
>
> I am not sure of enabling capabilities for Xen. I assumed we need to call kvm_xen_init() twice, once normally with XEN_HYPERCALL_MSR and if hyper is enabled, again with XEN_HYPERCALL_MSR_HYPERV. Is that not the case? Is it one or the other but not both? It seems kvm_arch_init() calls kvm_xen_init() once with XEN_HYPERCALL_MSR and another time vcpu_arch_init() calls it again if hyperv is enabled with XEN_HYPERCALL_MSR_HYPERV .
Yes, it has to be assumed that XEN_HYPERCALL_MSR is correct until
Hyper-V supported is enabled, which comes later, at which point the MSR
is changed. So you only need save the latest MSR value and use that in
xen_handle_vmfd_change().
>
>>
>>> +}
>>> +
>>> static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
>>> size_t *len, bool is_write)
>>> {
>>> @@ -111,15 +138,16 @@ static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
>>> return kvm_gva_rw(cs, gva, buf, sz, true);
>>> }
>>> -int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
>>> +static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr)
>>> {
>>> + int xen_caps, ret;
>>> const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
>>> KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_SHARED_INFO;
>>> +
>>
>> Gratuitous whitespace change.
>>
>>> struct kvm_xen_hvm_config cfg = {
>>> .msr = hypercall_msr,
>>> .flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL,
>>> };
>>> - int xen_caps, ret;
>>> xen_caps = kvm_check_extension(s, KVM_CAP_XEN_HVM);
>>> if (required_caps & ~xen_caps) {
>>> @@ -143,6 +171,21 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
>>> strerror(-ret));
>>> return ret;
>>> }
>>> + return xen_caps;
>>> +}
>>> +
>>> +int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
>>> +{
>>> + int xen_caps;
>>> +
>>> + xen_caps = do_initialize_xen_caps(s, hypercall_msr);
>>> + if (xen_caps < 0) {
>>> + return xen_caps;
>>> + }
>>> +
>>
>> Clearly here the code would be simpler here if you just saved the value of hypercall_msr which you have used in the call above.
>>
>>> + if (!hyperv_enabled && (hypercall_msr == XEN_HYPERCALL_MSR_HYPERV)) {
>>> + hyperv_enabled = true;
>>> + }
>>> /* If called a second time, don't repeat the rest of the setup. */
>>> if (s->xen_caps) {
>>> @@ -185,6 +228,9 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
>>> xen_primary_console_reset();
>>> xen_xenstore_reset();
>>> + xen_vmfd_change_notifier.notify = xen_handle_vmfd_change;
>>> + kvm_vmfd_add_change_notifier(&xen_vmfd_change_notifier);
>>> +
>>> return 0;
>>> }
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during confidential guest reset
2026-02-19 10:45 ` Paul Durrant
@ 2026-02-19 11:19 ` Ani Sinha
2026-02-19 11:40 ` Paul Durrant
0 siblings, 1 reply; 60+ messages in thread
From: Ani Sinha @ 2026-02-19 11:19 UTC (permalink / raw)
To: Paul Durrant
Cc: David Woodhouse, Paolo Bonzini, Marcelo Tosatti, Gerd Hoffmann,
kvm, qemu-devel
On Thu, Feb 19, 2026 at 4:15 PM Paul Durrant <xadimgnik@gmail.com> wrote:
>
> On 19/02/2026 10:31, Ani Sinha wrote:
> >
> >
> >> On 19 Feb 2026, at 3:09 PM, Paul Durrant <xadimgnik@gmail.com> wrote:
> >>
> >> On 18/02/2026 11:42, Ani Sinha wrote:
> >>> On confidential guests KVM virtual machine file descriptor changes as a
> >>> part of the guest reset process. Xen capabilities needs to be re-initialized in
> >>> KVM against the new file descriptor.
> >>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>> ---
> >>> target/i386/kvm/xen-emu.c | 50 +++++++++++++++++++++++++++++++++++++--
> >>> 1 file changed, 48 insertions(+), 2 deletions(-)
> >>> diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
> >>> index 52de019834..69527145eb 100644
> >>> --- a/target/i386/kvm/xen-emu.c
> >>> +++ b/target/i386/kvm/xen-emu.c
> >>> @@ -44,9 +44,12 @@
> >>> #include "xen-compat.h"
> >>> +NotifierWithReturn xen_vmfd_change_notifier;
> >>> +static bool hyperv_enabled;
> >>> static void xen_vcpu_singleshot_timer_event(void *opaque);
> >>> static void xen_vcpu_periodic_timer_event(void *opaque);
> >>> static int vcpuop_stop_singleshot_timer(CPUState *cs);
> >>> +static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr);
> >>> #ifdef TARGET_X86_64
> >>> #define hypercall_compat32(longmode) (!(longmode))
> >>> @@ -54,6 +57,30 @@ static int vcpuop_stop_singleshot_timer(CPUState *cs);
> >>> #define hypercall_compat32(longmode) (false)
> >>> #endif
> >>> +static int xen_handle_vmfd_change(NotifierWithReturn *n,
> >>> + void *data, Error** errp)
> >>> +{
> >>> + int ret;
> >>> +
> >>> + /* we are not interested in pre vmfd change notification */
> >>> + if (((VmfdChangeNotifier *)data)->pre) {
> >>> + return 0;
> >>> + }
> >>> +
> >>> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR);
> >>> + if (ret < 0) {
> >>> + return ret;
> >>> + }
> >>> +
> >>> + if (hyperv_enabled) {
> >>> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR_HYPERV);
> >>> + if (ret < 0) {
> >>> + return ret;
> >>> + }
> >>> + }
> >>> + return 0;
> >>
> >> This seems odd. Why use the hyperv_enabled boolean, rather than simply the msr value, since when hyperv_enabled is set you will be calling do_initialize_xen_caps() twice.
> >
> > I am not sure of enabling capabilities for Xen. I assumed we need to call kvm_xen_init() twice, once normally with XEN_HYPERCALL_MSR and if hyper is enabled, again with XEN_HYPERCALL_MSR_HYPERV. Is that not the case? Is it one or the other but not both? It seems kvm_arch_init() calls kvm_xen_init() once with XEN_HYPERCALL_MSR and another time vcpu_arch_init() calls it again if hyperv is enabled with XEN_HYPERCALL_MSR_HYPERV .
>
> Yes, it has to be assumed that XEN_HYPERCALL_MSR is correct until
> Hyper-V supported is enabled, which comes later, at which point the MSR
> is changed. So you only need save the latest MSR value and use that in
> xen_handle_vmfd_change().
ok hopefully this looks good
https://gitlab.com/anisinha/qemu/-/commit/7f7ba25151b6a658c54f95a370f1970c01a6269a
sending this out to minimize churn and to make v6 as close to the
merge worthy as possible.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during confidential guest reset
2026-02-19 11:19 ` Ani Sinha
@ 2026-02-19 11:40 ` Paul Durrant
0 siblings, 0 replies; 60+ messages in thread
From: Paul Durrant @ 2026-02-19 11:40 UTC (permalink / raw)
To: Ani Sinha, Paul Durrant
Cc: David Woodhouse, Paolo Bonzini, Marcelo Tosatti, Gerd Hoffmann,
kvm, qemu-devel
On 19/02/2026 11:19, Ani Sinha wrote:
> On Thu, Feb 19, 2026 at 4:15 PM Paul Durrant <xadimgnik@gmail.com> wrote:
>>
>> On 19/02/2026 10:31, Ani Sinha wrote:
>>>
>>>
>>>> On 19 Feb 2026, at 3:09 PM, Paul Durrant <xadimgnik@gmail.com> wrote:
>>>>
>>>> On 18/02/2026 11:42, Ani Sinha wrote:
>>>>> On confidential guests KVM virtual machine file descriptor changes as a
>>>>> part of the guest reset process. Xen capabilities needs to be re-initialized in
>>>>> KVM against the new file descriptor.
>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>> ---
>>>>> target/i386/kvm/xen-emu.c | 50 +++++++++++++++++++++++++++++++++++++--
>>>>> 1 file changed, 48 insertions(+), 2 deletions(-)
>>>>> diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
>>>>> index 52de019834..69527145eb 100644
>>>>> --- a/target/i386/kvm/xen-emu.c
>>>>> +++ b/target/i386/kvm/xen-emu.c
>>>>> @@ -44,9 +44,12 @@
>>>>> #include "xen-compat.h"
>>>>> +NotifierWithReturn xen_vmfd_change_notifier;
>>>>> +static bool hyperv_enabled;
>>>>> static void xen_vcpu_singleshot_timer_event(void *opaque);
>>>>> static void xen_vcpu_periodic_timer_event(void *opaque);
>>>>> static int vcpuop_stop_singleshot_timer(CPUState *cs);
>>>>> +static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr);
>>>>> #ifdef TARGET_X86_64
>>>>> #define hypercall_compat32(longmode) (!(longmode))
>>>>> @@ -54,6 +57,30 @@ static int vcpuop_stop_singleshot_timer(CPUState *cs);
>>>>> #define hypercall_compat32(longmode) (false)
>>>>> #endif
>>>>> +static int xen_handle_vmfd_change(NotifierWithReturn *n,
>>>>> + void *data, Error** errp)
>>>>> +{
>>>>> + int ret;
>>>>> +
>>>>> + /* we are not interested in pre vmfd change notification */
>>>>> + if (((VmfdChangeNotifier *)data)->pre) {
>>>>> + return 0;
>>>>> + }
>>>>> +
>>>>> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR);
>>>>> + if (ret < 0) {
>>>>> + return ret;
>>>>> + }
>>>>> +
>>>>> + if (hyperv_enabled) {
>>>>> + ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR_HYPERV);
>>>>> + if (ret < 0) {
>>>>> + return ret;
>>>>> + }
>>>>> + }
>>>>> + return 0;
>>>>
>>>> This seems odd. Why use the hyperv_enabled boolean, rather than simply the msr value, since when hyperv_enabled is set you will be calling do_initialize_xen_caps() twice.
>>>
>>> I am not sure of enabling capabilities for Xen. I assumed we need to call kvm_xen_init() twice, once normally with XEN_HYPERCALL_MSR and if hyper is enabled, again with XEN_HYPERCALL_MSR_HYPERV. Is that not the case? Is it one or the other but not both? It seems kvm_arch_init() calls kvm_xen_init() once with XEN_HYPERCALL_MSR and another time vcpu_arch_init() calls it again if hyperv is enabled with XEN_HYPERCALL_MSR_HYPERV .
>>
>> Yes, it has to be assumed that XEN_HYPERCALL_MSR is correct until
>> Hyper-V supported is enabled, which comes later, at which point the MSR
>> is changed. So you only need save the latest MSR value and use that in
>> xen_handle_vmfd_change().
>
> ok hopefully this looks good
> https://gitlab.com/anisinha/qemu/-/commit/7f7ba25151b6a658c54f95a370f1970c01a6269a
>
> sending this out to minimize churn and to make v6 as close to the
> merge worthy as possible.
>
Yeah, that looks better. I don't think you need to move the `int
xen_caps, ret;` line though so your patch can be even smaller AFAICS.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 26/34] hw/hyperv/vmbus: add support for confidential guest reset
2026-02-18 11:42 ` [PATCH v5 26/34] hw/hyperv/vmbus: add support for confidential guest reset Ani Sinha
@ 2026-02-19 18:34 ` Maciej S. Szmigiero
0 siblings, 0 replies; 60+ messages in thread
From: Maciej S. Szmigiero @ 2026-02-19 18:34 UTC (permalink / raw)
To: Ani Sinha; +Cc: kraxel, qemu-devel
On 18.02.2026 12:42, Ani Sinha wrote:
> On confidential guests when the KVM virtual machine file descriptor changes as
> a part of the reset process, event file descriptors needs to be reassociated
> with the new KVM VM file descriptor. This is achieved with the help of a
> callback handler that gets called when KVM VM file descriptor changes during
> the confidential guest reset process.
>
> This patch is tested on non-confidential platform only.
>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
I have also tested this patch with a non-confidential Windows guest
with VMBus, including live migrating it a few times.
No regressions observed, so:
Acked-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Thanks,
Maciej
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 11/34] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset
2026-02-18 11:42 ` [PATCH v5 11/34] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
@ 2026-02-20 15:01 ` Michael S. Tsirkin
0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2026-02-20 15:01 UTC (permalink / raw)
To: Ani Sinha
Cc: Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, kraxel, qemu-devel
On Wed, Feb 18, 2026 at 05:12:04PM +0530, Ani Sinha wrote:
> For confidential guests, bios image must be reinitialized upon reset. This
> is because bios memory is encrypted and hence once the old confidential
> kvm context is destroyed, it cannot be decrypted. It needs to be reinitilized.
> In order to do that, this change refactors x86_bios_rom_init() code so that
> parts of it can be called during confidential guest reset.
>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
just a refactor:
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> hw/i386/x86-common.c | 50 ++++++++++++++++++++++++++++++++------------
> 1 file changed, 37 insertions(+), 13 deletions(-)
>
> diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
> index de4cd7650a..c98abaf368 100644
> --- a/hw/i386/x86-common.c
> +++ b/hw/i386/x86-common.c
> @@ -1020,17 +1020,11 @@ void x86_isa_bios_init(MemoryRegion *isa_bios, MemoryRegion *isa_memory,
> memory_region_set_readonly(isa_bios, read_only);
> }
>
> -void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
> - MemoryRegion *rom_memory, bool isapc_ram_fw)
> +static int get_bios_size(X86MachineState *x86ms,
> + const char *bios_name, char *filename)
> {
> - const char *bios_name;
> - char *filename;
> int bios_size;
> - ssize_t ret;
>
> - /* BIOS load */
> - bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
> - filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> if (filename) {
> bios_size = get_image_size(filename, NULL);
> } else {
> @@ -1040,6 +1034,21 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
> (bios_size % 65536) != 0) {
> goto bios_error;
> }
> +
> + return bios_size;
> +
> + bios_error:
> + fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
> + exit(1);
> +}
> +
> +static void load_bios_from_file(X86MachineState *x86ms, const char *bios_name,
> + char *filename, int bios_size,
> + bool isapc_ram_fw)
> +{
> + ssize_t ret;
> +
> + /* BIOS load */
> if (machine_require_guest_memfd(MACHINE(x86ms))) {
> memory_region_init_ram_guest_memfd(&x86ms->bios, NULL, "pc.bios",
> bios_size, &error_fatal);
> @@ -1068,7 +1077,26 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
> goto bios_error;
> }
> }
> - g_free(filename);
> +
> + return;
> +
> + bios_error:
> + fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
> + exit(1);
> +}
> +
> +void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
> + MemoryRegion *rom_memory, bool isapc_ram_fw)
> +{
> + int bios_size;
> + const char *bios_name;
> + g_autofree char *filename;
> +
> + bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
> + filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> +
> + bios_size = get_bios_size(x86ms, bios_name, filename);
> + load_bios_from_file(x86ms, bios_name, filename, bios_size, isapc_ram_fw);
>
> if (!machine_require_guest_memfd(MACHINE(x86ms))) {
> /* map the last 128KB of the BIOS in ISA space */
> @@ -1081,8 +1109,4 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
> (uint32_t)(-bios_size),
> &x86ms->bios);
> return;
> -
> -bios_error:
> - fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
> - exit(1);
> }
> --
> 2.42.0
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms
2026-02-18 11:42 ` [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms Ani Sinha
2026-02-19 8:55 ` Markus Armbruster
@ 2026-02-23 10:08 ` Daniel P. Berrangé
1 sibling, 0 replies; 60+ messages in thread
From: Daniel P. Berrangé @ 2026-02-23 10:08 UTC (permalink / raw)
To: Ani Sinha
Cc: Paolo Bonzini, Eduardo Habkost, Eric Blake, Markus Armbruster,
kraxel, qemu-devel
On Wed, Feb 18, 2026 at 05:12:26PM +0530, Ani Sinha wrote:
> Through the new 'confidential-guest-reset' property, control plane should be
> able to detect if the hypervisor supports x86 confidential guest resets. Older
> hypervisors that do not support resets will not have this property populated.
>
> Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
> qapi/qom.json | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com ~~ https://hachyderm.io/@berrange :|
|: https://libvirt.org ~~ https://entangle-photo.org :|
|: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2026-02-18 17:33 ` Ani Sinha
2026-02-18 17:39 ` Cédric Le Goater
@ 2026-02-23 11:56 ` Gerd Hoffmann
1 sibling, 0 replies; 60+ messages in thread
From: Gerd Hoffmann @ 2026-02-23 11:56 UTC (permalink / raw)
To: Ani Sinha
Cc: Cédric Le Goater, Alex Williamson, qemu-devel, Paolo Bonzini
Hi,
> > SEV-SNP on a RHEL9 host. Same guest I used before and host says :
> >
> > [1816531.409591] kvm_amd: SEV-ES guest requested termination: 0x0:0x0
>
> Strange! I am not sure why KVM thinks it's SEV-ES.
Historical reasons / just not checking I guess. It's a code path which
was added with SEV-ES support and it is used with SEV-SNP too. So it
means "SEV-ES or newer".
take care,
Gerd
^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2026-02-23 11:56 UTC | newest]
Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 11:41 [PATCH v5 00/34] Introduce support for confidential guest reset (x86) Ani Sinha
2026-02-18 11:41 ` [PATCH v5 01/34] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
2026-02-18 11:41 ` [PATCH v5 02/34] accel/kvm: add confidential class member to indicate guest rebuild capability Ani Sinha
2026-02-18 11:41 ` [PATCH v5 03/34] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
2026-02-18 11:41 ` [PATCH v5 04/34] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
2026-02-18 11:41 ` [PATCH v5 05/34] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
2026-02-18 11:41 ` [PATCH v5 06/34] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
2026-02-18 11:42 ` [PATCH v5 07/34] accel/kvm: notify when KVM VM file fd is about to be changed Ani Sinha
2026-02-18 11:42 ` [PATCH v5 08/34] i386/kvm: unregister smram listeners prior to vm file descriptor change Ani Sinha
2026-02-18 11:42 ` [PATCH v5 09/34] kvm/i386: implement architecture support for kvm " Ani Sinha
2026-02-18 11:42 ` [PATCH v5 10/34] i386/kvm: refactor xen init into a new function Ani Sinha
2026-02-18 11:42 ` [PATCH v5 11/34] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
2026-02-20 15:01 ` Michael S. Tsirkin
2026-02-18 11:42 ` [PATCH v5 12/34] hw/i386: export a new function x86_bios_rom_reload Ani Sinha
2026-02-18 11:42 ` [PATCH v5 13/34] kvm/i386: reload firmware for confidential guest reset Ani Sinha
2026-02-18 11:42 ` [PATCH v5 14/34] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
2026-02-18 11:42 ` [PATCH v5 15/34] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
2026-02-18 11:42 ` [PATCH v5 16/34] i386/tdx: finalize TDX guest state upon reset Ani Sinha
2026-02-18 11:42 ` [PATCH v5 17/34] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
2026-02-18 11:42 ` [PATCH v5 18/34] i386/sev: add migration blockers only once Ani Sinha
2026-02-18 11:42 ` [PATCH v5 19/34] i386/sev: add notifiers " Ani Sinha
2026-02-18 11:42 ` [PATCH v5 20/34] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
2026-02-18 11:42 ` [PATCH v5 21/34] i386/sev: add support for confidential guest reset Ani Sinha
2026-02-18 11:42 ` [PATCH v5 22/34] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
2026-02-18 14:07 ` Cédric Le Goater
2026-02-18 15:07 ` Ani Sinha
2026-02-18 15:30 ` Cédric Le Goater
2026-02-18 16:06 ` Ani Sinha
2026-02-18 16:09 ` Cédric Le Goater
2026-02-18 17:33 ` Ani Sinha
2026-02-18 17:39 ` Cédric Le Goater
2026-02-19 5:30 ` Ani Sinha
2026-02-19 8:01 ` Ani Sinha
2026-02-19 8:14 ` Cédric Le Goater
2026-02-19 9:06 ` Ani Sinha
2026-02-23 11:56 ` Gerd Hoffmann
2026-02-18 11:42 ` [PATCH v5 23/34] kvm/i8254: refactor pit initialization into a helper Ani Sinha
2026-02-18 11:42 ` [PATCH v5 24/34] kvm/i8254: add support for confidential guest reset Ani Sinha
2026-02-18 11:42 ` [PATCH v5 25/34] kvm/hyperv: add synic feature to CPU only if its not enabled Ani Sinha
2026-02-18 11:42 ` [PATCH v5 26/34] hw/hyperv/vmbus: add support for confidential guest reset Ani Sinha
2026-02-19 18:34 ` Maciej S. Szmigiero
2026-02-18 11:42 ` [PATCH v5 27/34] kvm/xen-emu: re-initialize capabilities during " Ani Sinha
2026-02-19 9:39 ` Paul Durrant
2026-02-19 10:31 ` Ani Sinha
2026-02-19 10:45 ` Paul Durrant
2026-02-19 11:19 ` Ani Sinha
2026-02-19 11:40 ` Paul Durrant
2026-02-18 11:42 ` [PATCH v5 28/34] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
2026-02-18 11:42 ` [PATCH v5 29/34] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
2026-02-18 11:42 ` [PATCH v5 30/34] kvm/clock: add support for confidential guest reset Ani Sinha
2026-02-18 11:42 ` [PATCH v5 31/34] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' Ani Sinha
2026-02-18 11:42 ` [PATCH v5 32/34] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
2026-02-18 11:42 ` [PATCH v5 33/34] qom: add 'confidential-guest-reset' property for x86 confidential vms Ani Sinha
2026-02-19 8:55 ` Markus Armbruster
2026-02-19 9:12 ` Ani Sinha
2026-02-19 9:27 ` Markus Armbruster
2026-02-19 9:29 ` Ani Sinha
2026-02-19 10:19 ` Markus Armbruster
2026-02-23 10:08 ` Daniel P. Berrangé
2026-02-18 11:42 ` [PATCH v5 34/34] migration: return EEXIST when trying to add the same migration blocker Ani Sinha
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.