* [PATCH v1 00/28] Introduce support for confidential guest reset
@ 2025-12-12 15:03 Ani Sinha
2025-12-12 15:03 ` [PATCH v1 01/28] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
` (27 more replies)
0 siblings, 28 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
Cc: vkuznets, kraxel, pbonzini, qemu-devel, Ani Sinha
This change introduces support for confidential guests
(SEV-ES, SEV-SNP and TDX) to reset/reboot just like other non-confidential
guests. Currently, a reboot intiated from the confidential guest results
in termination of the QEMU hypervisor as the CPUs are not resettable. As the
initial state of the guest including private memory is locked and encrypted,
the contents of that memory will not be accessible post reset. Hence a new
KVM file descriptor must be opened to create a new confidential VM context
closing the old one. All KVM VM specific ioctls must be called again. New
VCPU file descriptors must be created against the new KVM fd and most VCPU
ioctls must be called again as well.
This change perfoms closing of the old KVM fd and creating a new one. After
the new KVM fd is opened, all generic and architecture specific ioctl calls
are issued again. Notifiers are added to notify subsystems that:
- The KVM file fd is about to be changed to state sync-ing from KVM to QEMU
should be done if required.
- The KVM file fd has changed, so ioctl calls to the new KVM fd has to be
performed again.
- That new VCPU fds are created so that VCPU ioctl calls must be called again
where required.
Specific subsystems use these notifiers to re-issue ioctl calls where required.
Changes are made to SEV and TDX modules to reinitialize the confidential guest
state and seal it again. Along the way, some bug fixes are made so that some
initialization functions can be called again. Some refactoring of existing
code is done so that both init and reset paths can use them.
Tested on TDX and SEV-SNP.
CI pipeline passes: https://gitlab.com/anisinha/qemu/-/pipelines/2211550528
Rebased on top of version 10.2.0-rc3
CC: pbonzini@redhat.com
CC: kraxel@redhat.com
CC: vkuznets@redhat.com
Ani Sinha (28):
i386/kvm: avoid installing duplicate msr entries in msr_handlers
hw/accel: add a per-accelerator callback to change VM accelerator
handle
system/physmem: add helper to reattach existing memory after KVM VM fd
change
accel/kvm: add changes required to support KVM VM file descriptor
change
accel/kvm: mark guest state as unprotected after vm file descriptor
change
accel/kvm: add a notifier to indicate KVM VM file descriptor has
changed
kvm/i386: implement architecture support for kvm file descriptor
change
hw/i386: refactor x86_bios_rom_init for reuse in confidential guest
reset
kvm/i386: reload firmware for confidential guest reset
accel/kvm: Add notifier to inform that the KVM VM file fd is about to
be changed
accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon
reset
i386/tdx: refactor TDX firmware memory initialization code into a new
function
i386/tdx: finalize TDX guest state upon reset
i386/tdx: add a pre-vmfd change notifier to reset tdx state
i386/sev: add migration blockers only once
i386/sev: add notifiers only once
i386/sev: free existing launch update data and kernel hashes data on
init
i386/sev: add support for confidential guest reset
hw/vfio: generate new file fd for pseudo device and rebind existing
descriptors
kvm/i8254: add support for confidential guest reset
hw/hyperv/vmbus: add support for confidential guest reset
accel/kvm: add a per-confidential class callback to unlock guest state
kvm/xen-emu: re-initialize capabilities during confidential guest
reset
kvm/xen_evtchn: add support for confidential guest reset
ppc/openpic: create a new openpic device and reattach mem region on
coco reset
kvm/vcpu: add notifiers to inform vcpu file descriptor change
kvm/i386/apic: set local apic after vcpu file descriptors changed
kvm/clock: add support for confidential guest reset
accel/kvm/kvm-all.c | 354 +++++++++++++++++---
accel/stubs/kvm-stub.c | 26 ++
hw/hyperv/vmbus.c | 30 ++
hw/i386/kvm/apic.c | 13 +
hw/i386/kvm/clock.c | 56 ++++
hw/i386/kvm/i8254.c | 84 +++--
hw/i386/kvm/xen_evtchn.c | 100 +++++-
hw/i386/x86-common.c | 50 ++-
hw/intc/openpic_kvm.c | 108 ++++--
hw/vfio/helpers.c | 81 ++++-
include/accel/accel-ops.h | 1 +
include/hw/i386/apic_internal.h | 1 +
include/hw/i386/x86.h | 5 +-
include/system/confidential-guest-support.h | 27 ++
include/system/kvm.h | 54 +++
include/system/physmem.h | 1 +
system/physmem.c | 28 ++
system/runstate.c | 31 +-
target/arm/kvm.c | 5 +
target/i386/kvm/kvm.c | 189 +++++++++--
target/i386/kvm/tdx.c | 145 ++++++--
target/i386/kvm/tdx.h | 1 +
target/i386/kvm/xen-emu.c | 45 ++-
target/i386/sev.c | 110 +++++-
target/loongarch/kvm/kvm.c | 5 +
target/mips/kvm.c | 5 +
target/ppc/kvm.c | 5 +
target/riscv/kvm/kvm-cpu.c | 5 +
target/s390x/kvm/kvm.c | 5 +
29 files changed, 1382 insertions(+), 188 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v1 01/28] i386/kvm: avoid installing duplicate msr entries in msr_handlers
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 02/28] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
` (26 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
kvm_filter_msr() does not check if an msr entry is already present in the
msr_handlers table and installs a new handler unconditionally. If the function
is called again with the same MSR, it will result in duplicate entries in the
table and multiple such calls will fill up the table needlessly. Fix that.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/kvm.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 60c7981138..02819de625 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5925,27 +5925,33 @@ static int kvm_install_msr_filters(KVMState *s)
static int kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
QEMUWRMSRHandler *wrmsr)
{
- int i, ret;
+ int i, ret = 0;
for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
- if (!msr_handlers[i].msr) {
+ if (msr_handlers[i].msr == msr) {
+ break;
+ } else if (!msr_handlers[i].msr) {
msr_handlers[i] = (KVMMSRHandlers) {
.msr = msr,
.rdmsr = rdmsr,
.wrmsr = wrmsr,
};
+ break;
+ }
+ }
- ret = kvm_install_msr_filters(s);
- if (ret) {
- msr_handlers[i] = (KVMMSRHandlers) { };
- return ret;
- }
+ if (i == ARRAY_SIZE(msr_handlers)) {
+ ret = -EINVAL;
+ goto end;
+ }
- return 0;
- }
+ ret = kvm_install_msr_filters(s);
+ if (ret) {
+ msr_handlers[i] = (KVMMSRHandlers) { };
}
- return -EINVAL;
+ end:
+ return ret;
}
static int kvm_handle_rdmsr(X86CPU *cpu, struct kvm_run *run)
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 02/28] hw/accel: add a per-accelerator callback to change VM accelerator handle
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 01/28] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 03/28] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
` (25 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Richard Henderson, Paolo Bonzini, Philippe Mathieu-Daudé
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha
When a confidential virtual machine is reset, a new guest context in the
accelerator must be generated post reset. Therefore, the old accelerator guest
file handle must closed and a new one created. To this end, a per-accelerator
callback, "reset_vmfd" is introduced that would get called when a confidential
guest is reset. Subsequent patches will introduce specific implementation of
this callback for KVM accelerator.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
include/accel/accel-ops.h | 1 +
system/runstate.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/include/accel/accel-ops.h b/include/accel/accel-ops.h
index 23a8c246e1..998a95ca69 100644
--- a/include/accel/accel-ops.h
+++ b/include/accel/accel-ops.h
@@ -23,6 +23,7 @@ struct AccelClass {
AccelOpsClass *ops;
int (*init_machine)(AccelState *as, MachineState *ms);
+ int (*reset_vmfd)(MachineState *ms);
bool (*cpu_common_realize)(CPUState *cpu, Error **errp);
void (*cpu_common_unrealize)(CPUState *cpu);
/* get_stats: Append statistics to @buf */
diff --git a/system/runstate.c b/system/runstate.c
index e3ec16ab74..f5e57fd1f7 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -42,6 +42,7 @@
#include "qapi/qapi-commands-run-state.h"
#include "qapi/qapi-events-run-state.h"
#include "qemu/accel.h"
+#include "accel/accel-ops.h"
#include "qemu/error-report.h"
#include "qemu/job.h"
#include "qemu/log.h"
@@ -508,6 +509,8 @@ void qemu_system_reset(ShutdownCause reason)
{
MachineClass *mc;
ResetType type;
+ AccelClass *ac = ACCEL_GET_CLASS(current_accel());
+ int ret;
mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
@@ -520,6 +523,23 @@ void qemu_system_reset(ShutdownCause reason)
default:
type = RESET_TYPE_COLD;
}
+
+ /*
+ * different accelerators implement how to close the old file handle of
+ * the accelerator descriptor and create a new one here. Resetting
+ * file handle is necessary to create a new confidential VM context post
+ * VM reset.
+ */
+ if (current_machine->cgs && reason == SHUTDOWN_CAUSE_GUEST_RESET) {
+ if (ac->reset_vmfd) {
+ ret = ac->reset_vmfd(current_machine);
+ if (ret < 0) {
+ error_report("unable to reset vmfd: %d", ret);
+ abort();
+ }
+ }
+ }
+
if (mc && mc->reset) {
mc->reset(current_machine, type);
} else {
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 03/28] system/physmem: add helper to reattach existing memory after KVM VM fd change
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 01/28] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
2025-12-12 15:03 ` [PATCH v1 02/28] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 04/28] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
` (24 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Peter Xu, David Hildenbrand,
Philippe Mathieu-Daudé
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha
After the guest KVM file descriptor has changed as a part of the process of
confidential guest reset mechanism, existing memory needs to be reattached to
the new file descriptor. This change adds a helper function ram_block_rebind()
for this purpose. The next patch will make use of this function.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
include/system/physmem.h | 1 +
system/physmem.c | 28 ++++++++++++++++++++++++++++
2 files changed, 29 insertions(+)
diff --git a/include/system/physmem.h b/include/system/physmem.h
index 879f6eae38..bfc0a623ac 100644
--- a/include/system/physmem.h
+++ b/include/system/physmem.h
@@ -50,5 +50,6 @@ physical_memory_snapshot_and_clear_dirty(MemoryRegion *mr, hwaddr offset,
bool physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
ram_addr_t start,
ram_addr_t length);
+int ram_block_rebind(Error **errp);
#endif
diff --git a/system/physmem.c b/system/physmem.c
index c9869e4049..9a3e3c16f8 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2839,6 +2839,34 @@ found:
return block;
}
+/*
+ * Creates new guest memfd for the ramblocks and closes the
+ * existing memfd.
+ */
+int ram_block_rebind(Error **errp)
+{
+ RAMBlock *block;
+
+ qemu_mutex_lock_ramlist();
+
+ RAMBLOCK_FOREACH(block) {
+ if (block->flags & RAM_GUEST_MEMFD) {
+ if (block->guest_memfd >= 0) {
+ close(block->guest_memfd);
+ }
+ block->guest_memfd = kvm_create_guest_memfd(block->max_length,
+ 0, errp);
+ if (block->guest_memfd < 0) {
+ qemu_mutex_unlock_ramlist();
+ return -1;
+ }
+
+ }
+ }
+ qemu_mutex_unlock_ramlist();
+ return 0;
+}
+
/*
* Finds the named RAMBlock
*
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 04/28] accel/kvm: add changes required to support KVM VM file descriptor change
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (2 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 03/28] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 05/28] accel/kvm: mark guest state as unprotected after vm " Ani Sinha
` (23 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Peter Maydell, Marcelo Tosatti, Song Gao,
Huacai Chen, Philippe Mathieu-Daudé, Aurelien Jarno,
Jiaxun Yang, Aleksandar Rikalo, Nicholas Piggin,
Harsh Prateek Bora, Chinmay Rath, Palmer Dabbelt,
Alistair Francis, Weiwei Li, Daniel Henrique Barboza, Liu Zhiwei,
Halil Pasic, Christian Borntraeger, Eric Farman, Matthew Rosato,
Richard Henderson, Ilya Leoshkevich, David Hildenbrand,
Thomas Huth
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm, qemu-arm, qemu-ppc,
qemu-riscv, qemu-s390x
This change adds common kvm specific support to handle KVM VM file descriptor
change. KVM VM file descriptor can change as a part of confidential guest reset
mechanism. A new function api kvm_arch_vmfd_change_ops() per
architecture platform is added in order to implement architecture specific
changes required to support it. A subsequent patch will add x86 specific
implementation for kvm_arch_vmfd_change_ops as currently only x86 supports
confidential guest reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 70 ++++++++++++++++++++++++++++++++++++--
include/system/kvm.h | 1 +
target/arm/kvm.c | 5 +++
target/i386/kvm/kvm.c | 5 +++
target/loongarch/kvm/kvm.c | 5 +++
target/mips/kvm.c | 5 +++
target/ppc/kvm.c | 5 +++
target/riscv/kvm/kvm-cpu.c | 5 +++
target/s390x/kvm/kvm.c | 5 +++
9 files changed, 103 insertions(+), 3 deletions(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 28006d73c5..c9564bf681 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2415,11 +2415,9 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, qemu_irq irq, int gsi)
g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));
}
-static void kvm_irqchip_create(KVMState *s)
+static void do_kvm_irqchip_create(KVMState *s)
{
int ret;
-
- assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
;
} else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) {
@@ -2452,7 +2450,13 @@ static void kvm_irqchip_create(KVMState *s)
fprintf(stderr, "Create kernel irqchip failed: %s\n", strerror(-ret));
exit(1);
}
+}
+static void kvm_irqchip_create(KVMState *s)
+{
+ assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
+
+ do_kvm_irqchip_create(s);
kvm_kernel_irqchip = true;
/* If we have an in-kernel IRQ chip then we must have asynchronous
* interrupt delivery (though the reverse is not necessarily true)
@@ -2607,6 +2611,65 @@ static int kvm_setup_dirty_ring(KVMState *s)
return 0;
}
+static int kvm_reset_vmfd(MachineState *ms)
+{
+ KVMState *s;
+ KVMMemoryListener *kml;
+ int ret, type;
+ Error *err = NULL;
+
+ s = KVM_STATE(ms->accelerator);
+ kml = &s->memory_listener;
+
+ memory_listener_unregister(&kml->listener);
+ memory_listener_unregister(&kvm_io_listener);
+
+ if (s->vmfd >= 0) {
+ close(s->vmfd);
+ }
+
+ type = find_kvm_machine_type(ms);
+ if (type < 0) {
+ return -EINVAL;
+ }
+
+ ret = do_kvm_create_vm(s, type);
+ if (ret < 0) {
+ return ret;
+ }
+
+ s->vmfd = ret;
+
+ kvm_setup_dirty_ring(s);
+
+ /* rebind memory to new vm fd */
+ ret = ram_block_rebind(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
+ ret = kvm_arch_vmfd_change_ops(ms, s);
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (s->kernel_irqchip_allowed) {
+ do_kvm_irqchip_create(s);
+ }
+
+ /* these can be only called after ram_block_rebind() */
+ memory_listener_register(&kml->listener, &address_space_memory);
+ memory_listener_register(&kvm_io_listener, &address_space_io);
+
+ /*
+ * kvm fd has changed. Commit the irq routes to KVM once more.
+ */
+ kvm_irqchip_commit_routes(s);
+
+ return ret;
+}
+
static int kvm_init(AccelState *as, MachineState *ms)
{
MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -4014,6 +4077,7 @@ static void kvm_accel_class_init(ObjectClass *oc, const void *data)
AccelClass *ac = ACCEL_CLASS(oc);
ac->name = "KVM";
ac->init_machine = kvm_init;
+ ac->reset_vmfd = kvm_reset_vmfd;
ac->has_memory = kvm_accel_has_memory;
ac->allowed = &kvm_allowed;
ac->gdbstub_supported_sstep_flags = kvm_gdbstub_sstep_flags;
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 8f9eecf044..ade13dd8cc 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -358,6 +358,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s);
int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
int kvm_arch_init_vcpu(CPUState *cpu);
int kvm_arch_destroy_vcpu(CPUState *cpu);
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s);
#ifdef TARGET_KVM_HAVE_RESET_PARKED_VCPU
void kvm_arch_reset_parked_vcpu(unsigned long vcpu_id, int kvm_fd);
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 0d57081e69..919bf95ae1 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1568,6 +1568,11 @@ void kvm_arch_init_irq_routing(KVMState *s)
{
}
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+ abort();
+}
+
int kvm_arch_irqchip_create(KVMState *s)
{
if (kvm_kernel_irqchip_split()) {
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 02819de625..cdfcb70f40 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3252,6 +3252,11 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
return 0;
}
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+ abort();
+}
+
int kvm_arch_init(MachineState *ms, KVMState *s)
{
int ret;
diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index 26e40c9bdc..4171781346 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -1312,6 +1312,11 @@ int kvm_arch_irqchip_create(KVMState *s)
return kvm_check_extension(s, KVM_CAP_DEVICE_CTRL);
}
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+ return 0;
+}
+
void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
{
}
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index 912cd5dfa0..28730da06b 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -44,6 +44,11 @@ unsigned long kvm_arch_vcpu_id(CPUState *cs)
return cs->cpu_index;
}
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+ return 0;
+}
+
int kvm_arch_init(MachineState *ms, KVMState *s)
{
/* MIPS has 128 signals */
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 43124bf1c7..a48dc7670b 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -180,6 +180,11 @@ int kvm_arch_irqchip_create(KVMState *s)
return 0;
}
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+ return 0;
+}
+
static int kvm_arch_sync_sregs(PowerPCCPU *cpu)
{
CPUPPCState *cenv = &cpu->env;
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 47e672c7aa..ca384a8b85 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -1545,6 +1545,11 @@ int kvm_arch_irqchip_create(KVMState *s)
return kvm_check_extension(s, KVM_CAP_DEVICE_CTRL);
}
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+ return 0;
+}
+
int kvm_arch_process_async_events(CPUState *cs)
{
return 0;
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 916dac1f14..671c854634 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -393,6 +393,11 @@ int kvm_arch_irqchip_create(KVMState *s)
return 0;
}
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+ return 0;
+}
+
unsigned long kvm_arch_vcpu_id(CPUState *cpu)
{
return cpu->cpu_index;
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 05/28] accel/kvm: mark guest state as unprotected after vm file descriptor change
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (3 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 04/28] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 06/28] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
` (22 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
When the KVM VM file descriptor has changed and a new one created, the guest
state is no longer in protected state. Mark it as such.
The guest state becomes protected again when TDX and SEV-ES and SEV-SNP mark
it as such.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c9564bf681..abdf91c0de 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2640,6 +2640,9 @@ static int kvm_reset_vmfd(MachineState *ms)
s->vmfd = ret;
+ /* guest state is now unprotected again */
+ kvm_state->guest_state_protected = false;
+
kvm_setup_dirty_ring(s);
/* rebind memory to new vm fd */
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 06/28] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (4 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 05/28] accel/kvm: mark guest state as unprotected after vm " Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 07/28] kvm/i386: implement architecture support for kvm file descriptor change Ani Sinha
` (21 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
A notifier callback can be used by various subsystems to perform actions when
KVM file descriptor for a virtual machine changes as a part of confidential
guest reset process. This change adds this notifier mechanism. Subsequent
patches will add specific implementations for various notifier callbacks
corresponding to various subsystems that need to take action when KVM VM file
descriptor changed.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 30 ++++++++++++++++++++++++++++++
accel/stubs/kvm-stub.c | 8 ++++++++
include/system/kvm.h | 21 +++++++++++++++++++++
3 files changed, 59 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index abdf91c0de..679cf04375 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -90,6 +90,7 @@ struct KVMParkedVcpu {
};
KVMState *kvm_state;
+VmfdChangeNotifier vmfd_notifier;
bool kvm_kernel_irqchip;
bool kvm_split_irqchip;
bool kvm_async_interrupts_allowed;
@@ -123,6 +124,9 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = {
static NotifierList kvm_irqchip_change_notifiers =
NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
+static NotifierWithReturnList register_vmfd_changed_notifiers =
+ NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
+
struct KVMResampleFd {
int gsi;
EventNotifier *resample_event;
@@ -2173,6 +2177,22 @@ void kvm_irqchip_change_notify(void)
notifier_list_notify(&kvm_irqchip_change_notifiers, NULL);
}
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_list_add(®ister_vmfd_changed_notifiers, n);
+}
+
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_remove(n);
+}
+
+static int kvm_vmfd_change_notify(Error **errp)
+{
+ return notifier_with_return_list_notify(®ister_vmfd_changed_notifiers,
+ &vmfd_notifier, errp);
+}
+
int kvm_irqchip_get_virq(KVMState *s)
{
int next_virq;
@@ -2661,6 +2681,16 @@ static int kvm_reset_vmfd(MachineState *ms)
do_kvm_irqchip_create(s);
}
+ /*
+ * notify everyone that vmfd has changed.
+ */
+ vmfd_notifier.vmfd = s->vmfd;
+ ret = kvm_vmfd_change_notify(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
/* these can be only called after ram_block_rebind() */
memory_listener_register(&kml->listener, &address_space_memory);
memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 68cd33ba97..a6e8a6e16c 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -79,6 +79,14 @@ void kvm_irqchip_change_notify(void)
{
}
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
+{
+}
+
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
+{
+}
+
int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
EventNotifier *rn, int virq)
{
diff --git a/include/system/kvm.h b/include/system/kvm.h
index ade13dd8cc..6844ebd56d 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -181,6 +181,7 @@ DECLARE_INSTANCE_CHECKER(KVMState, KVM_STATE,
extern KVMState *kvm_state;
typedef struct Notifier Notifier;
+typedef struct NotifierWithReturn NotifierWithReturn;
typedef struct KVMRouteChange {
KVMState *s;
@@ -565,4 +566,24 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size);
int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
+/* argument to vmfd change notifier */
+typedef struct VmfdChangeNotifier {
+ int vmfd;
+} VmfdChangeNotifier;
+
+/**
+ * kvm_vmfd_add_change_notifier - register a notifier to get notified when
+ * a KVM vm file descriptor changes as a part of the confidential guest "reset"
+ * process. Various subsystems should use this mechanism to take actions such
+ * as creating new fds against this new vm file descriptor.
+ * @n: notifier with return value.
+ */
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
+/**
+ * kvm_vmfd_remove_change_notifier - de-register a notifer previously
+ * registered with kvm_vmfd_add_change_notifier call.
+ * @n: notifier that was previously registered.
+ */
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
+
#endif
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 07/28] kvm/i386: implement architecture support for kvm file descriptor change
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (5 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 06/28] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 08/28] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
` (20 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
When the kvm file descriptor changes as a part of confidential guest reset,
some architecture specific setups including SEV/SEV-SNP/TDX specific setups
needs to be redone. These changes are implemented as a part of the
kvm_arch_vmfd_change_ops() call which was introduced previously.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/kvm.c | 132 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 119 insertions(+), 13 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index cdfcb70f40..e971f5f8c4 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3252,9 +3252,126 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
return 0;
}
+static int xen_init_wrapper(MachineState *ms, KVMState *s);
+
int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
{
- abort();
+ Error *local_err = NULL;
+ int ret;
+
+ /*
+ * Initialize confidential context, if required
+ *
+ * If no memory encryption is requested (ms->cgs == NULL) this is
+ * a no-op.
+ *
+ */
+ if (ms->cgs) {
+ ret = confidential_guest_kvm_init(ms->cgs, &local_err);
+ if (ret < 0) {
+ error_report_err(local_err);
+ return ret;
+ }
+ }
+
+ ret = kvm_vm_enable_exception_payload(s);
+ if (ret < 0) {
+ return ret;
+ }
+
+ ret = kvm_vm_enable_triple_fault_event(s);
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (s->xen_version) {
+ ret = xen_init_wrapper(ms, s);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ ret = kvm_vm_set_identity_map_addr(s, KVM_IDENTITY_BASE);
+ if (ret < 0) {
+ return ret;
+ }
+
+ ret = kvm_vm_set_tss_addr(s, KVM_IDENTITY_BASE + 0x1000);
+ if (ret < 0) {
+ return ret;
+ }
+ ret = kvm_vm_set_nr_mmu_pages(s);
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
+ x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
+ memory_listener_register(&smram_listener.listener,
+ &smram_address_space);
+ }
+
+ if (enable_cpu_pm) {
+ ret = kvm_vm_enable_disable_exits(s);
+ if (ret < 0) {
+ error_report("kvm: guest stopping CPU not supported: %s",
+ strerror(-ret));
+ return ret;
+ }
+ }
+
+ if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
+ X86MachineState *x86ms = X86_MACHINE(ms);
+
+ if (x86ms->bus_lock_ratelimit > 0) {
+ ret = kvm_vm_enable_bus_lock_exit(s);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+ kvm_set_max_apic_id(x86ms->apic_id_limit);
+ }
+
+ if (kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) {
+ ret = kvm_vm_enable_notify_vmexit(s);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) {
+ ret = kvm_vm_enable_userspace_msr(s);
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (s->msr_energy.enable == true) {
+ ret = kvm_vm_enable_energy_msrs(s);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static int xen_init_wrapper(MachineState *ms, KVMState *s)
+{
+ int ret = 0;
+#ifdef CONFIG_XEN_EMU
+ if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
+ error_report("kvm: Xen support only available in PC machine");
+ return -ENOTSUP;
+ }
+ /* hyperv_enabled() doesn't work yet. */
+ uint32_t msr = XEN_HYPERCALL_MSR;
+ ret = kvm_xen_init(s, msr);
+#else
+ error_report("kvm: Xen support not enabled in qemu");
+ return -ENOTSUP;
+#endif
+ return ret;
}
int kvm_arch_init(MachineState *ms, KVMState *s)
@@ -3290,21 +3407,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
}
if (s->xen_version) {
-#ifdef CONFIG_XEN_EMU
- if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
- error_report("kvm: Xen support only available in PC machine");
- return -ENOTSUP;
- }
- /* hyperv_enabled() doesn't work yet. */
- uint32_t msr = XEN_HYPERCALL_MSR;
- ret = kvm_xen_init(s, msr);
+ ret = xen_init_wrapper(ms, s);
if (ret < 0) {
return ret;
}
-#else
- error_report("kvm: Xen support not enabled in qemu");
- return -ENOTSUP;
-#endif
}
ret = kvm_get_supported_msrs(s);
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 08/28] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (6 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 07/28] kvm/i386: implement architecture support for kvm file descriptor change Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 09/28] kvm/i386: reload firmware for " Ani Sinha
` (19 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Michael S. Tsirkin, Marcel Apfelbaum
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha
For confidential guests, bios image must be reinitialized upon reset. This
is because bios memory is encrypted and hence once the old confidential
kvm context is destroyed, it cannot be decrypted. It needs to be reinitilized.
In order to do that, this change refactors x86_bios_rom_init() code so that
parts of it can be called during confidential guest reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/x86-common.c | 50 ++++++++++++++++++++++++++++++++-----------
include/hw/i386/x86.h | 5 ++++-
2 files changed, 41 insertions(+), 14 deletions(-)
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index c844749900..81fa4f47fb 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -1024,17 +1024,11 @@ void x86_isa_bios_init(MemoryRegion *isa_bios, MemoryRegion *isa_memory,
memory_region_set_readonly(isa_bios, read_only);
}
-void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
- MemoryRegion *rom_memory, bool isapc_ram_fw)
+int get_bios_size(X86MachineState *x86ms,
+ const char *bios_name, char *filename)
{
- const char *bios_name;
- char *filename;
int bios_size;
- ssize_t ret;
- /* BIOS load */
- bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
- filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
if (filename) {
bios_size = get_image_size(filename, NULL);
} else {
@@ -1044,6 +1038,20 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
(bios_size % 65536) != 0) {
goto bios_error;
}
+
+ return bios_size;
+
+ bios_error:
+ fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
+ exit(1);
+}
+
+void load_bios_from_file(X86MachineState *x86ms, const char *bios_name,
+ char *filename, int bios_size, bool isapc_ram_fw)
+{
+ ssize_t ret;
+
+ /* BIOS load */
if (machine_require_guest_memfd(MACHINE(x86ms))) {
memory_region_init_ram_guest_memfd(&x86ms->bios, NULL, "pc.bios",
bios_size, &error_fatal);
@@ -1072,7 +1080,26 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
goto bios_error;
}
}
- g_free(filename);
+
+ return;
+
+ bios_error:
+ fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
+ exit(1);
+}
+
+void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
+ MemoryRegion *rom_memory, bool isapc_ram_fw)
+{
+ int bios_size;
+ const char *bios_name;
+ char *filename;
+
+ bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
+ filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+
+ bios_size = get_bios_size(x86ms, bios_name, filename);
+ load_bios_from_file(x86ms, bios_name, filename, bios_size, isapc_ram_fw);
if (!machine_require_guest_memfd(MACHINE(x86ms))) {
/* map the last 128KB of the BIOS in ISA space */
@@ -1084,9 +1111,6 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
memory_region_add_subregion(rom_memory,
(uint32_t)(-bios_size),
&x86ms->bios);
+ g_free(filename);
return;
-
-bios_error:
- fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
- exit(1);
}
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 8755cad50a..8871f95891 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -122,7 +122,10 @@ void x86_cpu_unplug_request_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp);
void x86_cpu_unplug_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp);
-
+int get_bios_size(X86MachineState *x86ms,
+ const char *bios_name, char *filename);
+void load_bios_from_file(X86MachineState *x86ms, const char *bios_name,
+ char *filename, int bios_size, bool isapc_ram_fw);
void x86_isa_bios_init(MemoryRegion *isa_bios, MemoryRegion *isa_memory,
MemoryRegion *bios, bool read_only);
void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 09/28] kvm/i386: reload firmware for confidential guest reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (7 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 08/28] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 10/28] accel/kvm: Add notifier to inform that the KVM VM file fd is about to be changed Ani Sinha
` (18 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
When IGVM is not being used by the confidential guest, the guest firmware has
to be reloaded explictly again into memory. This is because, the memory into
which the firmware was loaded before reset was encrypted and is thus lost
upon reset. When IGVM is used, it is expected that the IGVM will contain the
guest firmware and the execution of the IGVM directives will set up the guest
firmware memory.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/kvm.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e971f5f8c4..199a224dbf 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -50,6 +50,8 @@
#include "qemu/config-file.h"
#include "qemu/error-report.h"
#include "qemu/memalign.h"
+#include "qemu/datadir.h"
+#include "hw/loader.h"
#include "hw/i386/x86.h"
#include "hw/i386/kvm/xen_evtchn.h"
#include "hw/i386/pc.h"
@@ -3254,6 +3256,22 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
static int xen_init_wrapper(MachineState *ms, KVMState *s);
+static void reload_bios_rom(X86MachineState *x86ms)
+{
+ int bios_size;
+ const char *bios_name;
+ char *filename;
+
+ bios_name = MACHINE(x86ms)->firmware ?: "bios.bin";
+ filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+
+ bios_size = get_bios_size(x86ms, bios_name, filename);
+
+ void *ptr = memory_region_get_ram_ptr(&x86ms->bios);
+ load_image_size(filename, ptr, bios_size);
+ x86_firmware_configure(0x100000000ULL - bios_size, ptr, bios_size);
+}
+
int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
{
Error *local_err = NULL;
@@ -3272,6 +3290,16 @@ int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
error_report_err(local_err);
return ret;
}
+ if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
+ X86MachineState *x86ms = X86_MACHINE(ms);
+ /*
+ * If an IGVM file is specified then the firmware must be provided
+ * in the IGVM file.
+ */
+ if (!x86ms->igvm) {
+ reload_bios_rom(x86ms);
+ }
+ }
}
ret = kvm_vm_enable_exception_payload(s);
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 10/28] accel/kvm: Add notifier to inform that the KVM VM file fd is about to be changed
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (8 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 09/28] kvm/i386: reload firmware for " Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 11/28] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
` (17 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
Various subsystems might need to take some steps before the KVM file descriptor
for a virtual machine is changed. So a new notifier is added to inform them that
kvm VM file descriptor is about to change.
Subsequent patches will add callback implementations for specific components
that need this notification.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 25 +++++++++++++++++++++++++
accel/stubs/kvm-stub.c | 8 ++++++++
include/system/kvm.h | 15 +++++++++++++++
3 files changed, 48 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 679cf04375..5b854c9866 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -127,6 +127,9 @@ static NotifierList kvm_irqchip_change_notifiers =
static NotifierWithReturnList register_vmfd_changed_notifiers =
NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
+static NotifierWithReturnList register_vmfd_pre_change_notifiers =
+ NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_pre_change_notifiers);
+
struct KVMResampleFd {
int gsi;
EventNotifier *resample_event;
@@ -2193,6 +2196,22 @@ static int kvm_vmfd_change_notify(Error **errp)
&vmfd_notifier, errp);
}
+void kvm_vmfd_add_pre_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_list_add(®ister_vmfd_pre_change_notifiers, n);
+}
+
+void kvm_vmfd_remove_pre_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_remove(n);
+}
+
+static int kvm_vmfd_pre_change_notify(Error **errp)
+{
+ return notifier_with_return_list_notify(®ister_vmfd_pre_change_notifiers,
+ NULL, errp);
+}
+
int kvm_irqchip_get_virq(KVMState *s)
{
int next_virq;
@@ -2644,6 +2663,12 @@ static int kvm_reset_vmfd(MachineState *ms)
memory_listener_unregister(&kml->listener);
memory_listener_unregister(&kvm_io_listener);
+ ret = kvm_vmfd_pre_change_notify(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
if (s->vmfd >= 0) {
close(s->vmfd);
}
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index a6e8a6e16c..7f4e3c4050 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -79,6 +79,14 @@ void kvm_irqchip_change_notify(void)
{
}
+void kvm_vmfd_add_pre_change_notifier(NotifierWithReturn *n)
+{
+}
+
+void kvm_vmfd_remove_pre_change_notifier(NotifierWithReturn *n)
+{
+}
+
void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
{
}
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 6844ebd56d..cb5db9ff67 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -586,4 +586,19 @@ void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
*/
void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
+/**
+ * kvm_vmfd_add_pre_change_notifier - register a notifier to get notified when
+ * kvm vm file descriptor is about to be changed as a part of the confidential
+ * guest "reset" process.
+ * @n: notifier with return value.
+ */
+void kvm_vmfd_add_pre_change_notifier(NotifierWithReturn *n);
+
+/**
+ * kvm_vmfd_remove_pre_change_notifier - de-register a notifier previously
+ * registered with kvm_vmfd_add_pre_change_notifier.
+ * @n: the notifier that was previously registered.
+ */
+void kvm_vmfd_remove_pre_change_notifier(NotifierWithReturn *n);
+
#endif
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 11/28] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (9 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 10/28] accel/kvm: Add notifier to inform that the KVM VM file fd is about to be changed Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 12/28] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
` (16 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
Confidential guests needs to generate a new KVM file descriptor upon virtual
machine reset. Existing VCPUs needs to be reattached to this new
KVM VM file descriptor. As a part of this, new VCPU file descriptors against
this new KVM VM file descriptor needs to be created and re-initialized.
Resources allocated against the old VCPU fds needs to be released. This change
makes this happen.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 201 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 166 insertions(+), 35 deletions(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 5b854c9866..638f193626 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -130,6 +130,12 @@ static NotifierWithReturnList register_vmfd_changed_notifiers =
static NotifierWithReturnList register_vmfd_pre_change_notifiers =
NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_pre_change_notifiers);
+static int kvm_rebind_vcpus(Error **errp);
+
+static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp);
+static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp);
+static int vcpu_unmap_regions(KVMState *s, CPUState *cpu);
+
struct KVMResampleFd {
int gsi;
EventNotifier *resample_event;
@@ -423,6 +429,82 @@ err:
return ret;
}
+static int kvm_rebind_vcpus(Error **errp)
+{
+ CPUState *cpu;
+ unsigned long vcpu_id;
+ KVMState *s = kvm_state;
+ int kvm_fd, ret = 0;
+
+ CPU_FOREACH(cpu) {
+ vcpu_id = kvm_arch_vcpu_id(cpu);
+
+ if (cpu->kvm_fd) {
+ close(cpu->kvm_fd);
+ }
+
+ ret = kvm_arch_destroy_vcpu(cpu);
+ if (ret < 0) {
+ goto err;
+ }
+
+ if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+ s->coalesced_mmio_ring = NULL;
+ }
+
+ ret = vcpu_unmap_regions(s, cpu);
+ if (ret < 0) {
+ goto err;
+ }
+
+ ret = kvm_arch_pre_create_vcpu(cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
+ kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
+ if (kvm_fd < 0) {
+ error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu (%s)",
+ vcpu_id, strerror(kvm_fd));
+ return kvm_fd;
+ }
+
+ cpu->kvm_fd = kvm_fd;
+
+ cpu->vcpu_dirty = false;
+ cpu->dirty_pages = 0;
+ cpu->throttle_us_per_full = 0;
+
+ ret = map_kvm_run(s, cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
+ if (s->kvm_dirty_ring_size) {
+ ret = map_kvm_dirty_gfns(s, cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+ }
+
+ ret = kvm_arch_init_vcpu(cpu);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "kvm_init_vcpu: kvm_arch_init_vcpu failed (%lu)",
+ vcpu_id);
+ }
+
+ close(cpu->kvm_vcpu_stats_fd);
+ cpu->kvm_vcpu_stats_fd = kvm_vcpu_ioctl(cpu, KVM_GET_STATS_FD, NULL);
+ kvm_init_cpu_signals(cpu);
+
+ kvm_cpu_synchronize_post_init(cpu);
+ }
+
+ err:
+ return ret;
+}
+
static void kvm_park_vcpu(CPUState *cpu)
{
struct KVMParkedVcpu *vcpu;
@@ -511,19 +593,11 @@ int kvm_create_and_park_vcpu(CPUState *cpu)
return ret;
}
-static int do_kvm_destroy_vcpu(CPUState *cpu)
+static int vcpu_unmap_regions(KVMState *s, CPUState *cpu)
{
- KVMState *s = kvm_state;
int mmap_size;
int ret = 0;
- trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
-
- ret = kvm_arch_destroy_vcpu(cpu);
- if (ret < 0) {
- goto err;
- }
-
mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
if (mmap_size < 0) {
ret = mmap_size;
@@ -551,39 +625,47 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
cpu->kvm_dirty_gfns = NULL;
}
- kvm_park_vcpu(cpu);
-err:
+ err:
return ret;
}
-void kvm_destroy_vcpu(CPUState *cpu)
-{
- if (do_kvm_destroy_vcpu(cpu) < 0) {
- error_report("kvm_destroy_vcpu failed");
- exit(EXIT_FAILURE);
- }
-}
-
-int kvm_init_vcpu(CPUState *cpu, Error **errp)
+static int do_kvm_destroy_vcpu(CPUState *cpu)
{
KVMState *s = kvm_state;
- int mmap_size;
- int ret;
+ int ret = 0;
- trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+ trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
- ret = kvm_arch_pre_create_vcpu(cpu, errp);
+ ret = kvm_arch_destroy_vcpu(cpu);
if (ret < 0) {
goto err;
}
- ret = kvm_create_vcpu(cpu);
+ /* If I am the CPU that created coalesced_mmio_ring, then discard it */
+ if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+ s->coalesced_mmio_ring = NULL;
+ }
+
+ ret = vcpu_unmap_regions(s, cpu);
if (ret < 0) {
- error_setg_errno(errp, -ret,
- "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
- kvm_arch_vcpu_id(cpu));
goto err;
}
+ kvm_park_vcpu(cpu);
+err:
+ return ret;
+}
+
+void kvm_destroy_vcpu(CPUState *cpu)
+{
+ if (do_kvm_destroy_vcpu(cpu) < 0) {
+ error_report("kvm_destroy_vcpu failed");
+ exit(EXIT_FAILURE);
+ }
+}
+
+static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp)
+{
+ int mmap_size, ret = 0;
mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
if (mmap_size < 0) {
@@ -608,14 +690,53 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
(void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
}
+ err:
+ return ret;
+}
+
+static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp)
+{
+ int ret = 0;
+ /* Use MAP_SHARED to share pages with the kernel */
+ cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_bytes,
+ PROT_READ | PROT_WRITE, MAP_SHARED,
+ cpu->kvm_fd,
+ PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
+ if (cpu->kvm_dirty_gfns == MAP_FAILED) {
+ ret = -errno;
+ }
+
+ return ret;
+}
+
+int kvm_init_vcpu(CPUState *cpu, Error **errp)
+{
+ KVMState *s = kvm_state;
+ int ret;
+
+ trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+ ret = kvm_arch_pre_create_vcpu(cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
+ ret = kvm_create_vcpu(cpu);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
+ kvm_arch_vcpu_id(cpu));
+ goto err;
+ }
+
+ ret = map_kvm_run(s, cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
if (s->kvm_dirty_ring_size) {
- /* Use MAP_SHARED to share pages with the kernel */
- cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_bytes,
- PROT_READ | PROT_WRITE, MAP_SHARED,
- cpu->kvm_fd,
- PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
- if (cpu->kvm_dirty_gfns == MAP_FAILED) {
- ret = -errno;
+ ret = map_kvm_dirty_gfns(s, cpu, errp);
+ if (ret < 0) {
goto err;
}
}
@@ -2716,6 +2837,16 @@ static int kvm_reset_vmfd(MachineState *ms)
}
assert(!err);
+ /*
+ * rebind new vcpu fds with the new kvm fds
+ * These can only be called after kvm_arch_vmfd_change_ops()
+ */
+ ret = kvm_rebind_vcpus(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
/* these can be only called after ram_block_rebind() */
memory_listener_register(&kml->listener, &address_space_memory);
memory_listener_register(&kvm_io_listener, &address_space_io);
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 12/28] i386/tdx: refactor TDX firmware memory initialization code into a new function
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (10 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 11/28] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 13/28] i386/tdx: finalize TDX guest state upon reset Ani Sinha
` (15 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
A new helper function is introduced that refactors all firmware memory
initialization code into a separate function. No functional change.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/tdx.c | 73 ++++++++++++++++++++++++-------------------
1 file changed, 40 insertions(+), 33 deletions(-)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index dbf0fa2c91..bafaf62cdb 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -295,14 +295,51 @@ static void tdx_post_init_vcpus(void)
}
}
-static void tdx_finalize_vm(Notifier *notifier, void *unused)
+static void tdx_init_fw_mem_region(void)
{
TdxFirmware *tdvf = &tdx_guest->tdvf;
TdxFirmwareEntry *entry;
- RAMBlock *ram_block;
Error *local_err = NULL;
int r;
+ for_each_tdx_fw_entry(tdvf, entry) {
+ struct kvm_tdx_init_mem_region region;
+ uint32_t flags;
+
+ region = (struct kvm_tdx_init_mem_region) {
+ .source_addr = (uintptr_t)entry->mem_ptr,
+ .gpa = entry->address,
+ .nr_pages = entry->size >> 12,
+ };
+
+ flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
+ KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+ do {
+ error_free(local_err);
+ local_err = NULL;
+ r = tdx_vcpu_ioctl(first_cpu, KVM_TDX_INIT_MEM_REGION, flags,
+ ®ion, &local_err);
+ } while (r == -EAGAIN || r == -EINTR);
+ if (r < 0) {
+ error_report_err(local_err);
+ exit(1);
+ }
+
+ if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+ entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+ qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
+ entry->mem_ptr = NULL;
+ }
+ }
+}
+
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+ TdxFirmware *tdvf = &tdx_guest->tdvf;
+ TdxFirmwareEntry *entry;
+ RAMBlock *ram_block;
+
tdx_init_ram_entries();
for_each_tdx_fw_entry(tdvf, entry) {
@@ -339,37 +376,7 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
tdx_post_init_vcpus();
-
- for_each_tdx_fw_entry(tdvf, entry) {
- struct kvm_tdx_init_mem_region region;
- uint32_t flags;
-
- region = (struct kvm_tdx_init_mem_region) {
- .source_addr = (uintptr_t)entry->mem_ptr,
- .gpa = entry->address,
- .nr_pages = entry->size >> 12,
- };
-
- flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
- KVM_TDX_MEASURE_MEMORY_REGION : 0;
-
- do {
- error_free(local_err);
- local_err = NULL;
- r = tdx_vcpu_ioctl(first_cpu, KVM_TDX_INIT_MEM_REGION, flags,
- ®ion, &local_err);
- } while (r == -EAGAIN || r == -EINTR);
- if (r < 0) {
- error_report_err(local_err);
- exit(1);
- }
-
- if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
- entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
- qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
- entry->mem_ptr = NULL;
- }
- }
+ tdx_init_fw_mem_region();
/*
* TDVF image has been copied into private region above via
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 13/28] i386/tdx: finalize TDX guest state upon reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (11 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 12/28] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 14/28] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
` (14 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
When the confidential virtual machine KVM file descriptor changes due to the
guest reset, some TDX specific setup steps needs to be done again. This
includes finalizing the inital guest launch state again. This change
re-executes some parts of the TDX setup during the device reset phaze using a
resettable interface. This finalizes the guest launch state again and locks
it in. Also care has been taken so that notifiers are installed only once.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/tdx.c | 39 +++++++++++++++++++++++++++++++++++++--
target/i386/kvm/tdx.h | 1 +
2 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index bafaf62cdb..1903cc2132 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
#include "crypto/hash.h"
#include "system/kvm_int.h"
#include "system/runstate.h"
+#include "system/reset.h"
#include "system/system.h"
#include "system/ramblock.h"
#include "system/address-spaces.h"
@@ -389,6 +390,19 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
CONFIDENTIAL_GUEST_SUPPORT(tdx_guest)->ready = true;
}
+static void tdx_handle_reset(Object *obj, ResetType type)
+{
+ if (!runstate_is_running()) {
+ return;
+ }
+
+ if (!kvm_enable_hypercall(BIT_ULL(KVM_HC_MAP_GPA_RANGE))) {
+ error_setg(&error_fatal, "KVM_HC_MAP_GPA_RANGE not enabled for guest");
+ }
+
+ tdx_finalize_vm(NULL, NULL);
+}
+
static Notifier tdx_machine_done_notify = {
.notify = tdx_finalize_vm,
};
@@ -689,6 +703,7 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
X86MachineState *x86ms = X86_MACHINE(ms);
TdxGuest *tdx = TDX_GUEST(cgs);
int r = 0;
+ static bool notifier_added;
kvm_mark_guest_state_protected();
@@ -736,8 +751,10 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
*/
kvm_readonly_mem_allowed = false;
- qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
-
+ if (!notifier_added) {
+ qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
+ notifier_added = true;
+ }
tdx_guest = tdx;
return 0;
}
@@ -1503,6 +1520,7 @@ OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
TDX_GUEST,
X86_CONFIDENTIAL_GUEST,
{ TYPE_USER_CREATABLE },
+ { TYPE_RESETTABLE_INTERFACE },
{ NULL })
static void tdx_guest_init(Object *obj)
@@ -1536,20 +1554,37 @@ static void tdx_guest_init(Object *obj)
tdx->event_notify_vector = -1;
tdx->event_notify_apicid = -1;
+ qemu_register_resettable(obj);
}
static void tdx_guest_finalize(Object *obj)
{
}
+static ResettableState *tdx_reset_state(Object *obj)
+{
+ TdxGuest *tdx = TDX_GUEST(obj);
+ return &tdx->reset_state;
+}
+
static void tdx_guest_class_init(ObjectClass *oc, const void *data)
{
ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
+ ResettableClass *rc = RESETTABLE_CLASS(oc);
klass->kvm_init = tdx_kvm_init;
x86_klass->kvm_type = tdx_kvm_type;
x86_klass->cpu_instance_init = tdx_cpu_instance_init;
x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
x86_klass->check_features = tdx_check_features;
+
+ /*
+ * the exit phase makes sure sev handles reset after all legacy resets
+ * have taken place (in the hold phase) and IGVM has also properly
+ * set up the boot state.
+ */
+ rc->phases.exit = tdx_handle_reset;
+ rc->get_state = tdx_reset_state;
+
}
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 1c38faf983..264fbe530c 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -70,6 +70,7 @@ typedef struct TdxGuest {
uint32_t event_notify_vector;
uint32_t event_notify_apicid;
+ ResettableState reset_state;
} TdxGuest;
#ifdef CONFIG_TDX
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 14/28] i386/tdx: add a pre-vmfd change notifier to reset tdx state
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (12 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 13/28] i386/tdx: finalize TDX guest state upon reset Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 15/28] i386/sev: add migration blockers only once Ani Sinha
` (13 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
During reset, when the VM file descriptor is changed, the TDX state needs to be
re-initialized. A pre-VMFD notifier callback is implemented to reset the old
state and free memory before the new state is initialized post VM-fd change.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/tdx.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 1903cc2132..b6fac162bd 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -403,6 +403,32 @@ static void tdx_handle_reset(Object *obj, ResetType type)
tdx_finalize_vm(NULL, NULL);
}
+/* TDX guest reset will require us to reinitialize some of tdx guest state. */
+static int set_tdx_vm_uninitialized(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ TdxFirmware *fw = &tdx_guest->tdvf;
+
+ if (tdx_guest->initialized) {
+ tdx_guest->initialized = false;
+ }
+
+ g_free(tdx_guest->ram_entries);
+
+ /*
+ * the firmware entries will be parsed again, see
+ * x86_firmware_configure() -> tdx_parse_tdvf()
+ */
+ fw->entries = 0;
+ g_free(fw->entries);
+
+ return 0;
+}
+
+static NotifierWithReturn tdx_vmfd_pre_change_notifier = {
+ .notify = set_tdx_vm_uninitialized,
+};
+
static Notifier tdx_machine_done_notify = {
.notify = tdx_finalize_vm,
};
@@ -753,6 +779,7 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
if (!notifier_added) {
qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
+ kvm_vmfd_add_pre_change_notifier(&tdx_vmfd_pre_change_notifier);
notifier_added = true;
}
tdx_guest = tdx;
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 15/28] i386/sev: add migration blockers only once
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (13 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 14/28] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 16/28] i386/sev: add notifiers " Ani Sinha
` (12 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
sev_launch_finish() and sev_snp_launch_finish() could be called multiple times
if the confidential guest is capable of being reset/rebooted. The migration
blockers should not be added multiple times, once per invocation. This change
makes sure that the migration blockers are added only one time and not every
time upon invocvation of launch_finish() calls.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/sev.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index fd2dada013..9a3f488b24 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1409,6 +1409,7 @@ static void
sev_launch_finish(SevCommonState *sev_common)
{
int ret, error;
+ static bool added_migration_blocker;
trace_kvm_sev_launch_finish();
ret = sev_ioctl(sev_common->sev_fd, KVM_SEV_LAUNCH_FINISH, 0,
@@ -1421,10 +1422,13 @@ sev_launch_finish(SevCommonState *sev_common)
sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
- /* add migration blocker */
- error_setg(&sev_mig_blocker,
- "SEV: Migration is not implemented");
- migrate_add_blocker(&sev_mig_blocker, &error_fatal);
+ if (!added_migration_blocker) {
+ /* add migration blocker */
+ error_setg(&sev_mig_blocker,
+ "SEV: Migration is not implemented");
+ migrate_add_blocker(&sev_mig_blocker, &error_fatal);
+ added_migration_blocker = true;
+ }
}
static int snp_launch_update_data(uint64_t gpa, void *hva, size_t len,
@@ -1608,6 +1612,7 @@ sev_snp_launch_finish(SevCommonState *sev_common)
{
int ret, error;
Error *local_err = NULL;
+ static bool added_migration_blocker;
OvmfSevMetadata *metadata;
SevLaunchUpdateData *data;
SevSnpGuestState *sev_snp = SEV_SNP_GUEST(sev_common);
@@ -1655,13 +1660,16 @@ sev_snp_launch_finish(SevCommonState *sev_common)
kvm_mark_guest_state_protected();
sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
- /* add migration blocker */
- error_setg(&sev_mig_blocker,
- "SEV-SNP: Migration is not implemented");
- ret = migrate_add_blocker(&sev_mig_blocker, &local_err);
- if (local_err) {
- error_report_err(local_err);
- exit(1);
+ if (!added_migration_blocker) {
+ /* add migration blocker */
+ error_setg(&sev_mig_blocker,
+ "SEV-SNP: Migration is not implemented");
+ ret = migrate_add_blocker(&sev_mig_blocker, &local_err);
+ if (local_err) {
+ error_report_err(local_err);
+ exit(1);
+ }
+ added_migration_blocker = true;
}
}
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 16/28] i386/sev: add notifiers only once
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (14 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 15/28] i386/sev: add migration blockers only once Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 17/28] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
` (11 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
The vm state change notifier needs to be added only once and not every time
upon sev state initialization. This is important when the SEV guest can be
reset and the initialization needs to happen once per every reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/sev.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 9a3f488b24..1212acfaa1 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1789,6 +1789,7 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
int ret, fw_error, cmd;
uint32_t ebx;
uint32_t host_cbitpos;
+ static bool notifiers_added;
struct sev_user_data_status status = {};
SevCommonState *sev_common = SEV_COMMON(cgs);
SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(cgs);
@@ -1939,8 +1940,11 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return -1;
}
- qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
-
+ if (!notifiers_added) {
+ /* add notifiers only once */
+ qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
+ notifiers_added = true;
+ }
cgs->ready = true;
return 0;
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 17/28] i386/sev: free existing launch update data and kernel hashes data on init
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (15 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 16/28] i386/sev: add notifiers " Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 18/28] i386/sev: add support for confidential guest reset Ani Sinha
` (10 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
If there is existing launch update data and kernel hashes data, they need to be
freed when initialization code is executed. This is important for resettable
confidential guests where the initialization happens once every reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/sev.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 1212acfaa1..83b9bfb2ae 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1791,6 +1791,7 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
uint32_t host_cbitpos;
static bool notifiers_added;
struct sev_user_data_status status = {};
+ SevLaunchUpdateData *data, *next_elm;
SevCommonState *sev_common = SEV_COMMON(cgs);
SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(cgs);
X86ConfidentialGuestClass *x86_klass =
@@ -1798,6 +1799,11 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
sev_common->state = SEV_STATE_UNINIT;
+ /* free existing launch update data if any */
+ QTAILQ_FOREACH_SAFE(data, &launch_update, next, next_elm) {
+ g_free(data);
+ }
+
host_cpuid(0x8000001F, 0, NULL, &ebx, NULL, NULL);
host_cbitpos = ebx & 0x3f;
@@ -1989,6 +1995,8 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
{
MachineState *ms = MACHINE(qdev_get_machine());
X86MachineState *x86ms = X86_MACHINE(ms);
+ SevCommonState *sev_common = SEV_COMMON(cgs);
+ SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(sev_common);
if (x86ms->smm == ON_OFF_AUTO_AUTO) {
x86ms->smm = ON_OFF_AUTO_OFF;
@@ -1997,6 +2005,10 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return -1;
}
+ /* free existing kernel hashes data if any */
+ g_free(sev_snp_guest->kernel_hashes_data);
+ sev_snp_guest->kernel_hashes_data = NULL;
+
return 0;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 18/28] i386/sev: add support for confidential guest reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (16 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 17/28] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 19/28] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
` (9 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
When the KVM VM file descriptor changes as a part of the confidential guest
reset mechanism, it necessary to create a new confidential guest context and
re-encrypt the VM memeory. This happens for SEV-ES and SEV-SNP virtual machines
as a part of SEV_LAUNCH_FINISH, SEV_SNP_LAUNCH_FINISH operations.
A new resettable interface for SEV module has been added. A new reset callback
for the reset 'exit' state has been implemented to perform the above operations
when the VM file descriptor has changed during VM reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/sev.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 83b9bfb2ae..246a58c752 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -30,8 +30,10 @@
#include "system/kvm.h"
#include "kvm/kvm_i386.h"
#include "sev.h"
+#include "system/cpus.h"
#include "system/system.h"
#include "system/runstate.h"
+#include "system/reset.h"
#include "trace.h"
#include "migration/blocker.h"
#include "qom/object.h"
@@ -84,6 +86,10 @@ typedef struct QEMU_PACKED PaddedSevHashTable {
uint8_t padding[ROUND_UP(sizeof(SevHashTable), 16) - sizeof(SevHashTable)];
} PaddedSevHashTable;
+static void sev_handle_reset(Object *obj, ResetType type);
+
+SevKernelLoaderContext sev_load_ctx = {};
+
QEMU_BUILD_BUG_ON(sizeof(PaddedSevHashTable) % 16 != 0);
#define SEV_INFO_BLOCK_GUID "00f771de-1a7e-4fcb-890e-68c77e2fb44e"
@@ -127,6 +133,7 @@ struct SevCommonState {
uint8_t build_id;
int sev_fd;
SevState state;
+ ResettableState reset_state;
QTAILQ_HEAD(, SevLaunchVmsa) launch_vmsa;
};
@@ -2012,6 +2019,37 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return 0;
}
+/*
+ * handle sev vm reset
+ */
+static void sev_handle_reset(Object *obj, ResetType type)
+{
+ SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
+ SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(sev_common);
+
+ if (!sev_common) {
+ return;
+ }
+
+ if (!runstate_is_running()) {
+ return;
+ }
+
+ sev_add_kernel_loader_hashes(&sev_load_ctx, &error_fatal);
+ if (!sev_check_state(sev_common, SEV_STATE_RUNNING)) {
+ /* this calls sev_snp_launch_finish() etc */
+ klass->launch_finish(sev_common);
+ }
+
+ return;
+}
+
+static ResettableState *sev_reset_state(Object *obj)
+{
+ SevCommonState *sev_common = SEV_COMMON(obj);
+ return &sev_common->reset_state;
+}
+
int
sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp)
{
@@ -2490,6 +2528,8 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
return false;
}
+ /* save the context here so that it can be re-used when vm is reset */
+ memcpy(&sev_load_ctx, ctx, sizeof(*ctx));
return klass->build_kernel_loader_hashes(sev_common, area, ctx, errp);
}
@@ -2750,8 +2790,16 @@ static void
sev_common_class_init(ObjectClass *oc, const void *data)
{
ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
+ ResettableClass *rc = RESETTABLE_CLASS(oc);
klass->kvm_init = sev_common_kvm_init;
+ /*
+ * the exit phase makes sure sev handles reset after all legacy resets
+ * have taken place (in the hold phase) and IGVM has also properly
+ * set up the boot state.
+ */
+ rc->phases.exit = sev_handle_reset;
+ rc->get_state = sev_reset_state;
object_class_property_add_str(oc, "sev-device",
sev_common_get_sev_device,
@@ -2786,6 +2834,8 @@ sev_common_instance_init(Object *obj)
cgs->get_mem_map_entry = cgs_get_mem_map_entry;
cgs->set_guest_policy = cgs_set_guest_policy;
+ qemu_register_resettable(OBJECT(sev_common));
+
QTAILQ_INIT(&sev_common->launch_vmsa);
}
@@ -2800,6 +2850,7 @@ static const TypeInfo sev_common_info = {
.abstract = true,
.interfaces = (const InterfaceInfo[]) {
{ TYPE_USER_CREATABLE },
+ { TYPE_RESETTABLE_INTERFACE },
{ }
}
};
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 19/28] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (17 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 18/28] i386/sev: add support for confidential guest reset Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 20/28] kvm/i8254: add support for confidential guest reset Ani Sinha
` (8 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Alex Williamson, Cédric Le Goater
Cc: vkuznets, kraxel, pbonzini, qemu-devel, Ani Sinha
Normally the vfio pseudo device file descriptor lives for the life of the VM.
However, when the kvm VM file descriptor changes, a new file descriptor
for the pseudo device needs to be generated against the new kvm VM descriptor.
Other existing vfio descriptors needs to be reattached to the new pseudo device
descriptor. This change performs the above steps.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/vfio/helpers.c | 81 +++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 78 insertions(+), 3 deletions(-)
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 23d13e5db5..ad9e9c9ead 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -109,12 +109,66 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
#ifdef CONFIG_KVM
/*
* We have a single VFIO pseudo device per KVM VM. Once created it lives
- * for the life of the VM. Closing the file descriptor only drops our
- * reference to it and the device's reference to kvm. Therefore once
- * initialized, this file descriptor is only released on QEMU exit and
+ * for the life of the VM except when the vm file descriptor changes for
+ * confidential virtual machines. In that case, the old file descriptor is
+ * closed and a new file descriptor is recreated. Closing the file descriptor
+ * only drops our reference to it and the device's reference to kvm.
+ * Therefore once initialized, this file descriptor is normally only released
+ * on QEMU exit (except for confidential VMs as stated above) and
* we'll re-use it should another vfio device be attached before then.
*/
int vfio_kvm_device_fd = -1;
+
+typedef struct KVMVfioFileFd {
+ int fd;
+ QLIST_ENTRY(KVMVfioFileFd) node;
+} KVMVfioFileFd;
+
+static QLIST_HEAD(, KVMVfioFileFd) kvm_vfio_file_fds =
+ QLIST_HEAD_INITIALIZER(kvm_vfio_file_fds);
+
+static int kvm_vfio_filefd_rebind(NotifierWithReturn *notifier, void *data,
+ Error **errp);
+static struct NotifierWithReturn kvm_vfio_vmfd_change_notifier = {
+ .notify = kvm_vfio_filefd_rebind,
+};
+
+static int kvm_vfio_filefd_rebind(NotifierWithReturn *notifier, void *data,
+ Error **errp)
+{
+ KVMVfioFileFd *file_fd;
+ int ret = 0;
+ struct kvm_device_attr attr = {
+ .group = KVM_DEV_VFIO_FILE,
+ .attr = KVM_DEV_VFIO_FILE_ADD,
+ };
+ struct kvm_create_device cd = {
+ .type = KVM_DEV_TYPE_VFIO,
+ };
+
+ if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
+ error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
+ return -errno;
+ }
+
+ if (vfio_kvm_device_fd) {
+ close(vfio_kvm_device_fd);
+ }
+
+ vfio_kvm_device_fd = cd.fd;
+
+ QLIST_FOREACH(file_fd, &kvm_vfio_file_fds, node) {
+ attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
+ if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+ error_setg_errno(errp, errno,
+ "Failed to add fd %d to KVM VFIO device",
+ file_fd->fd);
+ ret = -errno;
+ }
+ }
+ return ret;
+}
+
#endif
void vfio_kvm_device_close(void)
@@ -136,6 +190,7 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
.attr = KVM_DEV_VFIO_FILE_ADD,
.addr = (uint64_t)(unsigned long)&fd,
};
+ KVMVfioFileFd *file_fd;
if (!kvm_enabled()) {
return 0;
@@ -152,6 +207,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
}
vfio_kvm_device_fd = cd.fd;
+ /*
+ * If the vm file descriptor changes, add a notifier so that we can
+ * re-create the vfio_kvm_device_fd.
+ */
+ kvm_vmfd_add_change_notifier(&kvm_vfio_vmfd_change_notifier);
}
if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
@@ -159,6 +219,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
fd);
return -errno;
}
+
+ file_fd = g_malloc0(sizeof(*file_fd));
+ file_fd->fd = fd;
+ QLIST_INSERT_HEAD(&kvm_vfio_file_fds, file_fd, node);
+
#endif
return 0;
}
@@ -171,6 +236,7 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
.attr = KVM_DEV_VFIO_FILE_DEL,
.addr = (uint64_t)(unsigned long)&fd,
};
+ KVMVfioFileFd *file_fd;
if (vfio_kvm_device_fd < 0) {
error_setg(errp, "KVM VFIO device isn't created yet");
@@ -182,6 +248,15 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
"Failed to remove fd %d from KVM VFIO device", fd);
return -errno;
}
+
+ QLIST_FOREACH(file_fd, &kvm_vfio_file_fds, node) {
+ if (file_fd->fd == fd) {
+ QLIST_REMOVE(file_fd, node);
+ g_free(file_fd);
+ break;
+ }
+ }
+
#endif
return 0;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 20/28] kvm/i8254: add support for confidential guest reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (18 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 19/28] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 21/28] hw/hyperv/vmbus: " Ani Sinha
` (7 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
Richard Henderson, Eduardo Habkost
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha
A confidential guest reset involves closing the old virtual machine KVM file
descriptor and opening a new one. Since its a new KVM fd, PIT needs to be
reinitialized again. This is done with the help of a notifier which is invoked
upon KVM vm file desciptor change during confidential guest reset process.
Some code refactoring is performed so that common operations for init and reset
are moved into a helper function.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/kvm/i8254.c | 84 ++++++++++++++++++++++++++++-----------------
1 file changed, 53 insertions(+), 31 deletions(-)
diff --git a/hw/i386/kvm/i8254.c b/hw/i386/kvm/i8254.c
index 14b78f30a8..0b741464d9 100644
--- a/hw/i386/kvm/i8254.c
+++ b/hw/i386/kvm/i8254.c
@@ -52,6 +52,8 @@ struct KVMPITState {
LostTickPolicy lost_tick_policy;
bool vm_stopped;
int64_t kernel_clock_offset;
+
+ NotifierWithReturn kvmpit_vmfd_change_notifier;
};
struct KVMPITClass {
@@ -60,6 +62,43 @@ struct KVMPITClass {
DeviceRealize parent_realize;
};
+static void do_pit_initialize(KVMPITState *s, Error **errp)
+{
+ struct kvm_pit_config config = {
+ .flags = 0,
+ };
+ int ret;
+
+ ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_PIT2, &config);
+ if (ret < 0) {
+ error_setg(errp, "Create kernel PIC irqchip failed: %s",
+ strerror(-ret));
+ return;
+ }
+ switch (s->lost_tick_policy) {
+ case LOST_TICK_POLICY_DELAY:
+ break; /* enabled by default */
+ case LOST_TICK_POLICY_DISCARD:
+ if (kvm_check_extension(kvm_state, KVM_CAP_REINJECT_CONTROL)) {
+ struct kvm_reinject_control control = { .pit_reinject = 0 };
+
+ ret = kvm_vm_ioctl(kvm_state, KVM_REINJECT_CONTROL, &control);
+ if (ret < 0) {
+ error_setg(errp,
+ "Can't disable in-kernel PIT reinjection: %s",
+ strerror(-ret));
+ return;
+ }
+ }
+ break;
+ default:
+ error_setg(errp, "Lost tick policy not supported.");
+ return;
+ }
+
+ return;
+}
+
static void kvm_pit_update_clock_offset(KVMPITState *s)
{
int64_t offset, clock_offset;
@@ -166,6 +205,16 @@ static void kvm_pit_put(PITCommonState *pit)
}
}
+static int kvmpit_post_vmfd_change(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ KVMPITState *s = container_of(notifier, KVMPITState,
+ kvmpit_vmfd_change_notifier);
+
+ do_pit_initialize(s, errp);
+ return 0;
+}
+
static void kvm_pit_set_gate(PITCommonState *s, PITChannelState *sc, int val)
{
kvm_pit_get(s);
@@ -241,49 +290,22 @@ static void kvm_pit_realizefn(DeviceState *dev, Error **errp)
PITCommonState *pit = PIT_COMMON(dev);
KVMPITClass *kpc = KVM_PIT_GET_CLASS(dev);
KVMPITState *s = KVM_PIT(pit);
- struct kvm_pit_config config = {
- .flags = 0,
- };
- int ret;
if (!kvm_check_extension(kvm_state, KVM_CAP_PIT_STATE2) ||
!kvm_check_extension(kvm_state, KVM_CAP_PIT2)) {
error_setg(errp, "In-kernel PIT not available");
}
- ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_PIT2, &config);
- if (ret < 0) {
- error_setg(errp, "Create kernel PIC irqchip failed: %s",
- strerror(-ret));
- return;
- }
- switch (s->lost_tick_policy) {
- case LOST_TICK_POLICY_DELAY:
- break; /* enabled by default */
- case LOST_TICK_POLICY_DISCARD:
- if (kvm_check_extension(kvm_state, KVM_CAP_REINJECT_CONTROL)) {
- struct kvm_reinject_control control = { .pit_reinject = 0 };
-
- ret = kvm_vm_ioctl(kvm_state, KVM_REINJECT_CONTROL, &control);
- if (ret < 0) {
- error_setg(errp,
- "Can't disable in-kernel PIT reinjection: %s",
- strerror(-ret));
- return;
- }
- }
- break;
- default:
- error_setg(errp, "Lost tick policy not supported.");
- return;
- }
-
+ do_pit_initialize(s, errp);
memory_region_init_io(&pit->ioports, OBJECT(dev), NULL, NULL, "kvm-pit", 4);
qdev_init_gpio_in(dev, kvm_pit_irq_control, 1);
qemu_add_vm_change_state_handler(kvm_pit_vm_state_change, s);
+ s->kvmpit_vmfd_change_notifier.notify = kvmpit_post_vmfd_change;
+ kvm_vmfd_add_change_notifier(&s->kvmpit_vmfd_change_notifier);
+
kpc->parent_realize(dev, errp);
}
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 21/28] hw/hyperv/vmbus: add support for confidential guest reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (19 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 20/28] kvm/i8254: add support for confidential guest reset Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 22/28] accel/kvm: add a per-confidential class callback to unlock guest state Ani Sinha
` (6 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Maciej S. Szmigiero; +Cc: vkuznets, kraxel, pbonzini, qemu-devel, Ani Sinha
On confidential guests when the KVM virtual machine file descriptor changes as
a part of the reset process, event file descriptors needs to be reassociated
with the new KVM VM file descriptor. This is achieved with the help of a
callback handler that gets called when KVM VM file descriptor changes during
the confidential guest reset process.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/hyperv/vmbus.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/hw/hyperv/vmbus.c b/hw/hyperv/vmbus.c
index 961406cdd6..4763c0aebc 100644
--- a/hw/hyperv/vmbus.c
+++ b/hw/hyperv/vmbus.c
@@ -19,6 +19,7 @@
#include "hw/hyperv/vmbus.h"
#include "hw/hyperv/vmbus-bridge.h"
#include "hw/sysbus.h"
+#include "system/kvm.h"
#include "exec/target_page.h"
#include "trace.h"
@@ -247,6 +248,12 @@ struct VMBus {
* interrupt page
*/
EventNotifier notifier;
+
+ /*
+ * Notifier to inform when vmfd is changed as a part of confidential guest
+ * reset mechanism.
+ */
+ NotifierWithReturn vmbus_vmfd_change_notifier;
};
static bool gpadl_full(VMBusGpadl *gpadl)
@@ -2346,6 +2353,26 @@ static void vmbus_dev_unrealize(DeviceState *dev)
free_channels(vdev);
}
+/*
+ * If the KVM fd changes because of VM reset in confidential guests,
+ * reassociate event fd with the new KVM fd.
+ */
+static int vmbus_handle_vmfd_change(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ VMBus *vmbus = container_of(notifier, VMBus,
+ vmbus_vmfd_change_notifier);
+ int ret = 0;
+ ret = hyperv_set_event_flag_handler(VMBUS_EVENT_CONNECTION_ID,
+ &vmbus->notifier);
+ /* if we are only using userland event handler, it may already exist */
+ if (ret != 0 && ret != -EEXIST) {
+ error_setg(errp, "hyperv set event handler failed with %d", ret);
+ }
+
+ return ret;
+}
+
static const Property vmbus_dev_props[] = {
DEFINE_PROP_UUID("instanceid", VMBusDevice, instanceid),
};
@@ -2428,6 +2455,9 @@ static void vmbus_realize(BusState *bus, Error **errp)
goto clear_event_notifier;
}
+ vmbus->vmbus_vmfd_change_notifier.notify = vmbus_handle_vmfd_change;
+ kvm_vmfd_add_change_notifier(&vmbus->vmbus_vmfd_change_notifier);
+
return;
clear_event_notifier:
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 22/28] accel/kvm: add a per-confidential class callback to unlock guest state
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (20 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 21/28] hw/hyperv/vmbus: " Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 23/28] kvm/xen-emu: re-initialize capabilities during confidential guest reset Ani Sinha
` (5 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
As a part of the confidential guest reset process, the existing encrypted guest
state must be made mutable since it would be discarded after reset. A new
encrypted and locked guest state must be established after the reset. To this
end, a new callback per confidential guest support class (eg, tdx or sev-snp)
is added that will indicate whether its possible to rebuild guest state:
bool (*can_rebuild_guest_state)(ConfidentialGuestSupport *cgs)
This api returns true if rebuilding guest state is possible,
false otherwise. A KVM based confidential guest reset is only possible when
the existing state is locked but its possible to rebuild guest state.
Otherwise, the guest is not resettable.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
include/system/confidential-guest-support.h | 27 +++++++++++++++++++++
system/runstate.c | 11 +++++++--
target/i386/kvm/tdx.c | 6 +++++
target/i386/sev.c | 9 +++++++
4 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/include/system/confidential-guest-support.h b/include/system/confidential-guest-support.h
index 0cc8b26e64..3c37227263 100644
--- a/include/system/confidential-guest-support.h
+++ b/include/system/confidential-guest-support.h
@@ -152,6 +152,11 @@ typedef struct ConfidentialGuestSupportClass {
*/
int (*get_mem_map_entry)(int index, ConfidentialGuestMemoryMapEntry *entry,
Error **errp);
+
+ /*
+ * is it possible to rebuild the guest state?
+ */
+ bool (*can_rebuild_guest_state)(ConfidentialGuestSupport *cgs);
} ConfidentialGuestSupportClass;
static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
@@ -167,6 +172,28 @@ static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
return 0;
}
+static inline bool
+confidential_guest_can_rebuild_state(ConfidentialGuestSupport *cgs)
+{
+ ConfidentialGuestSupportClass *klass;
+
+ if (!cgs) {
+ /* non-confidential guests */
+ return true;
+ }
+
+ klass = CONFIDENTIAL_GUEST_SUPPORT_GET_CLASS(cgs);
+ if (klass->can_rebuild_guest_state) {
+ return klass->can_rebuild_guest_state(cgs);
+ }
+
+ /*
+ * by default, we should not be able to unprotect the
+ * confidential guest state
+ */
+ return false;
+}
+
static inline int confidential_guest_kvm_reset(ConfidentialGuestSupport *cgs,
Error **errp)
{
diff --git a/system/runstate.c b/system/runstate.c
index f5e57fd1f7..fb878c2992 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -58,6 +58,7 @@
#include "system/reset.h"
#include "system/runstate.h"
#include "system/runstate-action.h"
+#include "system/confidential-guest-support.h"
#include "system/system.h"
#include "system/tpm.h"
#include "trace.h"
@@ -564,7 +565,12 @@ void qemu_system_reset(ShutdownCause reason)
if (cpus_are_resettable()) {
cpu_synchronize_all_post_reset();
} else {
- assert(runstate_check(RUN_STATE_PRELAUNCH));
+ /*
+ * for confidential guests, cpus are not resettable but their
+ * state can be rebuilt under some conditions.
+ */
+ assert(runstate_check(RUN_STATE_PRELAUNCH) ||
+ (current_machine->cgs && runstate_is_running()));
}
vm_set_suspended(false);
@@ -713,7 +719,8 @@ void qemu_system_reset_request(ShutdownCause reason)
if (reboot_action == REBOOT_ACTION_SHUTDOWN &&
reason != SHUTDOWN_CAUSE_SUBSYSTEM_RESET) {
shutdown_requested = reason;
- } else if (!cpus_are_resettable()) {
+ } else if (!cpus_are_resettable() &&
+ !confidential_guest_can_rebuild_state(current_machine->cgs)) {
error_report("cpus are not resettable, terminating");
shutdown_requested = reason;
} else {
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index b6fac162bd..20f9d63eff 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -1594,6 +1594,11 @@ static ResettableState *tdx_reset_state(Object *obj)
return &tdx->reset_state;
}
+static bool tdx_can_rebuild_guest_state(ConfidentialGuestSupport *cgs)
+{
+ return true;
+}
+
static void tdx_guest_class_init(ObjectClass *oc, const void *data)
{
ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
@@ -1601,6 +1606,7 @@ static void tdx_guest_class_init(ObjectClass *oc, const void *data)
ResettableClass *rc = RESETTABLE_CLASS(oc);
klass->kvm_init = tdx_kvm_init;
+ klass->can_rebuild_guest_state = tdx_can_rebuild_guest_state;
x86_klass->kvm_type = tdx_kvm_type;
x86_klass->cpu_instance_init = tdx_cpu_instance_init;
x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 246a58c752..4eea58d160 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -2659,6 +2659,14 @@ static int cgs_set_guest_state(hwaddr gpa, uint8_t *ptr, uint64_t len,
return -1;
}
+static bool sev_can_rebuild_guest_state(ConfidentialGuestSupport *cgs)
+{
+ if (!sev_snp_enabled() && !sev_es_enabled()) {
+ return false;
+ }
+ return true;
+}
+
static int cgs_get_mem_map_entry(int index,
ConfidentialGuestMemoryMapEntry *entry,
Error **errp)
@@ -2833,6 +2841,7 @@ sev_common_instance_init(Object *obj)
cgs->set_guest_state = cgs_set_guest_state;
cgs->get_mem_map_entry = cgs_get_mem_map_entry;
cgs->set_guest_policy = cgs_set_guest_policy;
+ cgs->can_rebuild_guest_state = sev_can_rebuild_guest_state;
qemu_register_resettable(OBJECT(sev_common));
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 23/28] kvm/xen-emu: re-initialize capabilities during confidential guest reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (21 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 22/28] accel/kvm: add a per-confidential class callback to unlock guest state Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 24/28] kvm/xen_evtchn: add support for " Ani Sinha
` (4 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: David Woodhouse, Paul Durrant, Paolo Bonzini, Marcelo Tosatti
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
On confidential guests KVM virtual machine file descriptor changes as a
part of the guest reset process. Xen capabilities needs to be re-initialized in
KVM against the new file descriptor.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
target/i386/kvm/xen-emu.c | 45 +++++++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 52de019834..4f4cde7c58 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -44,9 +44,12 @@
#include "xen-compat.h"
+NotifierWithReturn xen_vmfd_change_notifier;
+static bool hyperv_enabled;
static void xen_vcpu_singleshot_timer_event(void *opaque);
static void xen_vcpu_periodic_timer_event(void *opaque);
static int vcpuop_stop_singleshot_timer(CPUState *cs);
+static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr);
#ifdef TARGET_X86_64
#define hypercall_compat32(longmode) (!(longmode))
@@ -54,6 +57,25 @@ static int vcpuop_stop_singleshot_timer(CPUState *cs);
#define hypercall_compat32(longmode) (false)
#endif
+static int xen_handle_vmfd_change(NotifierWithReturn *n,
+ void *data, Error** errp)
+{
+ int ret;
+
+ ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR);
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (hyperv_enabled) {
+ ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR_HYPERV);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+ return 0;
+}
+
static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
size_t *len, bool is_write)
{
@@ -111,15 +133,16 @@ static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
return kvm_gva_rw(cs, gva, buf, sz, true);
}
-int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
+static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr)
{
+ int xen_caps, ret;
const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_SHARED_INFO;
+
struct kvm_xen_hvm_config cfg = {
.msr = hypercall_msr,
.flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL,
};
- int xen_caps, ret;
xen_caps = kvm_check_extension(s, KVM_CAP_XEN_HVM);
if (required_caps & ~xen_caps) {
@@ -143,6 +166,21 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
strerror(-ret));
return ret;
}
+ return xen_caps;
+}
+
+int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
+{
+ int xen_caps;
+
+ xen_caps = do_initialize_xen_caps(s, hypercall_msr);
+ if (xen_caps < 0) {
+ return xen_caps;
+ }
+
+ if (!hyperv_enabled && (hypercall_msr == XEN_HYPERCALL_MSR_HYPERV)) {
+ hyperv_enabled = true;
+ }
/* If called a second time, don't repeat the rest of the setup. */
if (s->xen_caps) {
@@ -185,6 +223,9 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
xen_primary_console_reset();
xen_xenstore_reset();
+ xen_vmfd_change_notifier.notify = xen_handle_vmfd_change;
+ kvm_vmfd_add_change_notifier(&xen_vmfd_change_notifier);
+
return 0;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 24/28] kvm/xen_evtchn: add support for confidential guest reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (22 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 23/28] kvm/xen-emu: re-initialize capabilities during confidential guest reset Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 25/28] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
` (3 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: David Woodhouse, Paul Durrant, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, Michael S. Tsirkin, Marcel Apfelbaum
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha
As a part of the confidential guest reset, when the KVM VM file handle is
changed, Xen event ports and kernel ports that were associated with the
previous KVM file handle needs to be reassociated with the new handle. This is
performed with the help of a callback handler that gets invoked during the
confidential guest reset process when the KVM VM file fd changes.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/kvm/xen_evtchn.c | 100 +++++++++++++++++++++++++++++++++++++--
1 file changed, 97 insertions(+), 3 deletions(-)
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index dd566c4967..ddacb26c44 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -133,6 +133,26 @@ struct pirq_info {
bool is_translated;
};
+struct eventfds {
+ uint16_t type;
+ evtchn_port_t port;
+ int fd;
+ QLIST_ENTRY(eventfds) node;
+};
+
+struct kernel_ports {
+ uint16_t type;
+ evtchn_port_t port;
+ uint32_t vcpu_id;
+ QLIST_ENTRY(kernel_ports) node;
+};
+
+static QLIST_HEAD(, eventfds) eventfd_list =
+ QLIST_HEAD_INITIALIZER(eventfd_list);
+
+static QLIST_HEAD(, kernel_ports) kernel_port_list =
+ QLIST_HEAD_INITIALIZER(kernel_port_list);
+
struct XenEvtchnState {
/*< private >*/
SysBusDevice busdev;
@@ -178,6 +198,7 @@ struct XenEvtchnState {
#define pirq_inuse(s, pirq) (pirq_inuse_word(s, pirq) & pirq_inuse_bit(pirq))
struct XenEvtchnState *xen_evtchn_singleton;
+static NotifierWithReturn xen_eventchn_notifier;
/* Top bits of callback_param are the type (HVM_PARAM_CALLBACK_TYPE_xxx) */
#define CALLBACK_VIA_TYPE_SHIFT 56
@@ -304,6 +325,52 @@ static void gsi_assert_bh(void *opaque)
}
}
+static int xen_eventchn_handle_vmfd_change(NotifierWithReturn *notifier,
+ void *data, Error **errp)
+{
+ struct eventfds *ef;
+ struct kernel_ports *kp;
+ struct kvm_xen_hvm_attr ha;
+ CPUState *cpu;
+ int ret;
+
+ QLIST_FOREACH(ef, &eventfd_list, node) {
+ ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ ha.u.evtchn.send_port = ef->port;
+ ha.u.evtchn.type = ef->type;
+ ha.u.evtchn.flags = 0;
+ ha.u.evtchn.deliver.eventfd.port = 0;
+ ha.u.evtchn.deliver.eventfd.fd = ef->fd;
+
+ ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+ if (ret < 0) {
+ error_setg(errp, "KVM_XEN_HVM_SET_ATTR failed with %d", ret);
+ return ret;
+ }
+ }
+
+ memset(&ha, 0, sizeof(ha));
+
+ QLIST_FOREACH(kp, &kernel_port_list, node) {
+ cpu = qemu_get_cpu(kp->vcpu_id);
+ ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ ha.u.evtchn.send_port = kp->port;
+ ha.u.evtchn.type = kp->type;
+ ha.u.evtchn.flags = 0;
+ ha.u.evtchn.deliver.port.port = kp->port;
+ ha.u.evtchn.deliver.port.vcpu = kvm_arch_vcpu_id(cpu);
+ ha.u.evtchn.deliver.port.priority =
+ KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
+
+ ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+ if (ret < 0) {
+ error_setg(errp, "KVM_XEN_HVM_SET_ATTR failed with %d", ret);
+ return ret;
+ }
+ }
+ return 0;
+}
+
void xen_evtchn_create(unsigned int nr_gsis, qemu_irq *system_gsis)
{
XenEvtchnState *s = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN,
@@ -350,6 +417,9 @@ void xen_evtchn_create(unsigned int nr_gsis, qemu_irq *system_gsis)
/* Set event channel functions for backend drivers to use */
xen_evtchn_ops = &emu_evtchn_backend_ops;
+
+ xen_eventchn_notifier.notify = xen_eventchn_handle_vmfd_change;
+ kvm_vmfd_add_change_notifier(&xen_eventchn_notifier);
}
static void xen_evtchn_register_types(void)
@@ -547,6 +617,7 @@ static void inject_callback(XenEvtchnState *s, uint32_t vcpu)
static void deassign_kernel_port(evtchn_port_t port)
{
struct kvm_xen_hvm_attr ha;
+ struct kernel_ports *kp;
int ret;
ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
@@ -557,6 +628,12 @@ static void deassign_kernel_port(evtchn_port_t port)
if (ret) {
qemu_log_mask(LOG_GUEST_ERROR, "Failed to unbind kernel port %d: %s\n",
port, strerror(ret));
+ } else {
+ QLIST_FOREACH(kp, &kernel_port_list, node) {
+ if (kp->port == port) {
+ QLIST_REMOVE(kp, node);
+ }
+ }
}
}
@@ -565,6 +642,8 @@ static int assign_kernel_port(uint16_t type, evtchn_port_t port,
{
CPUState *cpu = qemu_get_cpu(vcpu_id);
struct kvm_xen_hvm_attr ha;
+ g_autofree struct kernel_ports *kp = g_malloc0(sizeof(*kp));
+ int ret;
if (!cpu) {
return -ENOENT;
@@ -578,12 +657,21 @@ static int assign_kernel_port(uint16_t type, evtchn_port_t port,
ha.u.evtchn.deliver.port.vcpu = kvm_arch_vcpu_id(cpu);
ha.u.evtchn.deliver.port.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
- return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+ ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+ if (ret == 0) {
+ kp->type = type;
+ kp->port = port;
+ kp->vcpu_id = vcpu_id;
+ QLIST_INSERT_HEAD(&kernel_port_list, kp, node);
+ }
+ return ret;
}
static int assign_kernel_eventfd(uint16_t type, evtchn_port_t port, int fd)
{
struct kvm_xen_hvm_attr ha;
+ g_autofree struct eventfds *ef = g_malloc0(sizeof(*ef));
+ int ret;
ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
ha.u.evtchn.send_port = port;
@@ -592,7 +680,14 @@ static int assign_kernel_eventfd(uint16_t type, evtchn_port_t port, int fd)
ha.u.evtchn.deliver.eventfd.port = 0;
ha.u.evtchn.deliver.eventfd.fd = fd;
- return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+ ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+ if (ret == 0) {
+ ef->type = type;
+ ef->port = port;
+ ef->fd = fd;
+ QLIST_INSERT_HEAD(&eventfd_list, ef, node);
+ }
+ return ret;
}
static bool valid_port(evtchn_port_t port)
@@ -2391,4 +2486,3 @@ void hmp_xen_event_inject(Monitor *mon, const QDict *qdict)
monitor_printf(mon, "Delivered port %d\n", port);
}
}
-
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 25/28] ppc/openpic: create a new openpic device and reattach mem region on coco reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (23 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 24/28] kvm/xen_evtchn: add support for " Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 26/28] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
` (2 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Bernhard Beschow
Cc: vkuznets, kraxel, pbonzini, qemu-devel, Ani Sinha, qemu-ppc
For confidential guests during the reset process, the old KVM VM file
descriptor is closed and a new one is created. When a new file descriptor is
created, a new openpic device needs to be created against this new KVM VM file
descriptor as well. Additionally, existing memory region needs to be reattached
to this new openpic device and proper CPU attributes set associating new file
descriptor. This change makes this happen with the help of a callback handler
that gets called when the KVM VM file descriptor changes as a part of the
confidential guest reset process.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/intc/openpic_kvm.c | 108 ++++++++++++++++++++++++++++++++----------
1 file changed, 83 insertions(+), 25 deletions(-)
diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
index 673ea9ca05..1b7a1d0d00 100644
--- a/hw/intc/openpic_kvm.c
+++ b/hw/intc/openpic_kvm.c
@@ -49,6 +49,7 @@ struct KVMOpenPICState {
uint32_t fd;
uint32_t model;
hwaddr mapped;
+ NotifierWithReturn open_pic_vmfd_change_notifier;
};
static void kvm_openpic_set_irq(void *opaque, int n_IRQ, int level)
@@ -114,6 +115,83 @@ static const MemoryRegionOps kvm_openpic_mem_ops = {
},
};
+static int create_open_pic_device(KVMOpenPICState *opp, Error **errp)
+{
+ int kvm_openpic_model;
+ struct kvm_create_device cd = {0};
+ KVMState *s = kvm_state;
+ int ret;
+
+ switch (opp->model) {
+ case OPENPIC_MODEL_FSL_MPIC_20:
+ kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
+ break;
+
+ case OPENPIC_MODEL_FSL_MPIC_42:
+ kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
+ break;
+
+ default:
+ error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
+ return -1;
+ }
+
+ cd.type = kvm_openpic_model;
+ ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
+ if (ret < 0) {
+ error_setg(errp, "Can't create device %d: %s",
+ cd.type, strerror(errno));
+ return -1;
+ }
+ opp->fd = cd.fd;
+
+ return 0;
+}
+
+static int open_pic_vmfd_handle_vmfd_change(NotifierWithReturn *notifier,
+ void *data, Error **errp)
+{
+ KVMOpenPICState *opp = container_of(notifier, KVMOpenPICState,
+ open_pic_vmfd_change_notifier);
+ uint64_t reg_base;
+ struct kvm_device_attr attr;
+ CPUState *cs;
+ int ret;
+
+ /* close the old descriptor */
+ close(opp->fd);
+
+ if (create_open_pic_device(opp, errp) < 0) {
+ return -1;
+ }
+
+ if (!opp->mapped) {
+ return 0;
+ }
+
+ reg_base = opp->mapped;
+ attr.group = KVM_DEV_MPIC_GRP_MISC;
+ attr.attr = KVM_DEV_MPIC_BASE_ADDR;
+ attr.addr = (uint64_t)(unsigned long)®_base;
+
+ ret = ioctl(opp->fd, KVM_SET_DEVICE_ATTR, &attr);
+ if (ret < 0) {
+ fprintf(stderr, "%s: %s %" PRIx64 "\n", __func__,
+ strerror(errno), reg_base);
+ return -1;
+ }
+
+ CPU_FOREACH(cs) {
+ ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_MPIC, 0, opp->fd,
+ kvm_arch_vcpu_id(cs));
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static void kvm_openpic_region_add(MemoryListener *listener,
MemoryRegionSection *section)
{
@@ -197,37 +275,14 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
SysBusDevice *d = SYS_BUS_DEVICE(dev);
KVMOpenPICState *opp = KVM_OPENPIC(dev);
KVMState *s = kvm_state;
- int kvm_openpic_model;
- struct kvm_create_device cd = {0};
- int ret, i;
+ int i;
if (!kvm_check_extension(s, KVM_CAP_DEVICE_CTRL)) {
error_setg(errp, "Kernel is lacking Device Control API");
return;
}
- switch (opp->model) {
- case OPENPIC_MODEL_FSL_MPIC_20:
- kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
- break;
-
- case OPENPIC_MODEL_FSL_MPIC_42:
- kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
- break;
-
- default:
- error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
- return;
- }
-
- cd.type = kvm_openpic_model;
- ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
- if (ret < 0) {
- error_setg(errp, "Can't create device %d: %s",
- cd.type, strerror(errno));
- return;
- }
- opp->fd = cd.fd;
+ create_open_pic_device(opp, errp);
sysbus_init_mmio(d, &opp->mem);
qdev_init_gpio_in(dev, kvm_openpic_set_irq, OPENPIC_MAX_IRQ);
@@ -236,6 +291,9 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
opp->mem_listener.region_del = kvm_openpic_region_del;
opp->mem_listener.name = "openpic-kvm";
memory_listener_register(&opp->mem_listener, &address_space_memory);
+ opp->open_pic_vmfd_change_notifier.notify =
+ open_pic_vmfd_handle_vmfd_change;
+ kvm_vmfd_add_change_notifier(&opp->open_pic_vmfd_change_notifier);
/* indicate pic capabilities */
msi_nonbroken = true;
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 26/28] kvm/vcpu: add notifiers to inform vcpu file descriptor change
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (24 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 25/28] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 27/28] kvm/i386/apic: set local apic after vcpu file descriptors changed Ani Sinha
2025-12-12 15:03 ` [PATCH v1 28/28] kvm/clock: add support for confidential guest reset Ani Sinha
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: vkuznets, kraxel, qemu-devel, Ani Sinha, kvm
When new vcpu file descriptors are created and bound to the new kvm file
descriptor as a part of the confidential guest reset mechanism, various
subsystems needs to know about it. This change adds notifiers so that various
subsystems can take appropriate actions when vcpu fds change by registering
their handlers to this notifier.
Subsequent changes will register specific handlers to this notifier.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
accel/kvm/kvm-all.c | 27 ++++++++++++++++++++++++++-
accel/stubs/kvm-stub.c | 10 ++++++++++
include/system/kvm.h | 17 +++++++++++++++++
3 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 638f193626..7f9c0d454a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -130,8 +130,10 @@ static NotifierWithReturnList register_vmfd_changed_notifiers =
static NotifierWithReturnList register_vmfd_pre_change_notifiers =
NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_pre_change_notifiers);
-static int kvm_rebind_vcpus(Error **errp);
+static NotifierWithReturnList register_vcpufd_changed_notifiers =
+ NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vcpufd_changed_notifiers);
+static int kvm_rebind_vcpus(Error **errp);
static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp);
static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp);
static int vcpu_unmap_regions(KVMState *s, CPUState *cpu);
@@ -2327,6 +2329,22 @@ void kvm_vmfd_remove_pre_change_notifier(NotifierWithReturn *n)
notifier_with_return_remove(n);
}
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_list_add(®ister_vcpufd_changed_notifiers, n);
+}
+
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n)
+{
+ notifier_with_return_remove(n);
+}
+
+static int kvm_vcpufd_change_notify(Error **errp)
+{
+ return notifier_with_return_list_notify(®ister_vcpufd_changed_notifiers,
+ &vmfd_notifier, errp);
+}
+
static int kvm_vmfd_pre_change_notify(Error **errp)
{
return notifier_with_return_list_notify(®ister_vmfd_pre_change_notifiers,
@@ -2847,6 +2865,13 @@ static int kvm_reset_vmfd(MachineState *ms)
}
assert(!err);
+ /* notify everyone that vcpu fd has changed. */
+ ret = kvm_vcpufd_change_notify(&err);
+ if (ret < 0) {
+ return ret;
+ }
+ assert(!err);
+
/* these can be only called after ram_block_rebind() */
memory_listener_register(&kml->listener, &address_space_memory);
memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 7f4e3c4050..5b94f3dc3c 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -95,6 +95,16 @@ void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
{
}
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n)
+{
+ return;
+}
+
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n)
+{
+ return;
+}
+
int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
EventNotifier *rn, int virq)
{
diff --git a/include/system/kvm.h b/include/system/kvm.h
index cb5db9ff67..bfd09e70a0 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -586,6 +586,23 @@ void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
*/
void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
+/**
+ * kvm_vcpufd_add_change_notifier - register a notifier to get notified when
+ * a KVM vcpu file descriptors changes as a part of the confidential guest
+ * "reset" process. Various subsystems should use this mechanism to take
+ * actions such as re-issuing vcpu ioctls as a part of setting up vcpu
+ * features.
+ * @n: notifier with return value.
+ */
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n);
+
+/**
+ * kvm_vcpufd_remove_change_notifier - de-register a notifer previously
+ * registered with kvm_vcpufd_add_change_notifier call.
+ * @n: notifier that was previously registered.
+ */
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n);
+
/**
* kvm_vmfd_add_pre_change_notifier - register a notifier to get notified when
* kvm vm file descriptor is about to be changed as a part of the confidential
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 27/28] kvm/i386/apic: set local apic after vcpu file descriptors changed
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (25 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 26/28] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
2025-12-12 15:03 ` [PATCH v1 28/28] kvm/clock: add support for confidential guest reset Ani Sinha
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
Richard Henderson, Eduardo Habkost
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha
Once the vcpu file descriptors changed after confidential guest reset, the
local apic needs to be reinitialized. This change adds a callback from the
vcpu fd change notifiers to reinitialize local apic for kvm x86.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/kvm/apic.c | 13 +++++++++++++
include/hw/i386/apic_internal.h | 1 +
2 files changed, 14 insertions(+)
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 82355f0463..f6f8ac2764 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -229,6 +229,16 @@ static void kvm_apic_reset(APICCommonState *s)
run_on_cpu(CPU(s->cpu), kvm_apic_put, RUN_ON_CPU_HOST_PTR(s));
}
+static int apic_vcpufd_change_handler(NotifierWithReturn *n,
+ void *data, Error** errp) {
+ APICCommonState *s = container_of(n, APICCommonState,
+ vcpufd_change_notifier);
+
+ run_on_cpu(CPU(s->cpu), kvm_apic_put, RUN_ON_CPU_HOST_PTR(s));
+
+ return 0;
+}
+
static void kvm_apic_realize(DeviceState *dev, Error **errp)
{
APICCommonState *s = APIC_COMMON(dev);
@@ -238,6 +248,9 @@ static void kvm_apic_realize(DeviceState *dev, Error **errp)
assert(kvm_has_gsi_routing());
msi_nonbroken = true;
+
+ s->vcpufd_change_notifier.notify = apic_vcpufd_change_handler;
+ kvm_vcpufd_add_change_notifier(&s->vcpufd_change_notifier);
}
static void kvm_apic_unrealize(DeviceState *dev)
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index 4a62fdceb4..ffe5815e7f 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -189,6 +189,7 @@ struct APICCommonState {
hwaddr vapic_paddr; /* note: persistence via kvmvapic */
bool legacy_instance_id;
uint32_t extended_log_dest;
+ NotifierWithReturn vcpufd_change_notifier;
};
typedef struct VAPICState {
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v1 28/28] kvm/clock: add support for confidential guest reset
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
` (26 preceding siblings ...)
2025-12-12 15:03 ` [PATCH v1 27/28] kvm/i386/apic: set local apic after vcpu file descriptors changed Ani Sinha
@ 2025-12-12 15:03 ` Ani Sinha
27 siblings, 0 replies; 29+ messages in thread
From: Ani Sinha @ 2025-12-12 15:03 UTC (permalink / raw)
To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Michael S. Tsirkin, Marcel Apfelbaum
Cc: vkuznets, kraxel, qemu-devel, Ani Sinha
Confidential guests change the KVM VM file descriptor upon reset and also create
new VCPU file descriptors against the new KVM VM file descriptor. We need to
save the clock state from kvm before KVM VM file descriptor changes and restore
it after. Also after VCPU file descriptors changed, we must call
KVM_KVMCLOCK_CTRL on the VCPU file descriptor to inform KVM that the VCPU is
in paused state.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
hw/i386/kvm/clock.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index f56382717f..91a5a08f05 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -49,6 +49,9 @@ struct KVMClockState {
/* whether the 'clock' value was obtained in a host with
* reliable KVM_GET_CLOCK */
bool clock_is_reliable;
+
+ NotifierWithReturn kvmclock_vcpufd_change_notifier;
+ NotifierWithReturn kvmclock_vmfd_pre_change_notifier;
};
struct pvclock_vcpu_time_info {
@@ -62,6 +65,9 @@ struct pvclock_vcpu_time_info {
uint8_t pad[2];
} __attribute__((__packed__)); /* 32 bytes */
+static int kvmclock_set_clock(NotifierWithReturn *notifier,
+ void *data, Error** errp);
+
static uint64_t kvmclock_current_nsec(KVMClockState *s)
{
CPUState *cpu = first_cpu;
@@ -218,6 +224,51 @@ static void kvmclock_vm_state_change(void *opaque, bool running,
}
}
+static int kvmclock_save_clock(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ KVMClockState *s = container_of(notifier, KVMClockState,
+ kvmclock_vmfd_pre_change_notifier);
+ kvm_update_clock(s);
+ return 0;
+}
+
+static int kvmclock_set_clock(NotifierWithReturn *notifier,
+ void *data, Error** errp)
+{
+ struct kvm_clock_data clock_data = {};
+ CPUState *cpu;
+ int ret;
+ KVMClockState *s = container_of(notifier, KVMClockState,
+ kvmclock_vcpufd_change_notifier);
+ int cap_clock_ctrl = kvm_check_extension(kvm_state, KVM_CAP_KVMCLOCK_CTRL);
+
+ if (!s->clock_is_reliable) {
+ uint64_t pvclock_via_mem = kvmclock_current_nsec(s);
+ /* saved clock value before vmfd change is not reliable */
+ if (pvclock_via_mem) {
+ s->clock = pvclock_via_mem;
+ }
+ }
+
+ clock_data.clock = s->clock;
+ ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &clock_data);
+ if (ret < 0) {
+ fprintf(stderr, "KVM_SET_CLOCK failed: %s\n", strerror(-ret));
+ abort();
+ }
+
+ if (!cap_clock_ctrl) {
+ return 0;
+ }
+ CPU_FOREACH(cpu) {
+ run_on_cpu(cpu, do_kvmclock_ctrl, RUN_ON_CPU_NULL);
+ }
+
+ return 0;
+}
+
+
static void kvmclock_realize(DeviceState *dev, Error **errp)
{
KVMClockState *s = KVM_CLOCK(dev);
@@ -229,7 +280,12 @@ static void kvmclock_realize(DeviceState *dev, Error **errp)
kvm_update_clock(s);
+ s->kvmclock_vcpufd_change_notifier.notify = kvmclock_set_clock;
+ s->kvmclock_vmfd_pre_change_notifier.notify = kvmclock_save_clock;
+
qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
+ kvm_vcpufd_add_change_notifier(&s->kvmclock_vcpufd_change_notifier);
+ kvm_vmfd_add_pre_change_notifier(&s->kvmclock_vmfd_pre_change_notifier);
}
static bool kvmclock_clock_is_reliable_needed(void *opaque)
--
2.42.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
end of thread, other threads:[~2025-12-12 15:10 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-12 15:03 [PATCH v1 00/28] Introduce support for confidential guest reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 01/28] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
2025-12-12 15:03 ` [PATCH v1 02/28] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
2025-12-12 15:03 ` [PATCH v1 03/28] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
2025-12-12 15:03 ` [PATCH v1 04/28] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
2025-12-12 15:03 ` [PATCH v1 05/28] accel/kvm: mark guest state as unprotected after vm " Ani Sinha
2025-12-12 15:03 ` [PATCH v1 06/28] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
2025-12-12 15:03 ` [PATCH v1 07/28] kvm/i386: implement architecture support for kvm file descriptor change Ani Sinha
2025-12-12 15:03 ` [PATCH v1 08/28] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 09/28] kvm/i386: reload firmware for " Ani Sinha
2025-12-12 15:03 ` [PATCH v1 10/28] accel/kvm: Add notifier to inform that the KVM VM file fd is about to be changed Ani Sinha
2025-12-12 15:03 ` [PATCH v1 11/28] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 12/28] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
2025-12-12 15:03 ` [PATCH v1 13/28] i386/tdx: finalize TDX guest state upon reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 14/28] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
2025-12-12 15:03 ` [PATCH v1 15/28] i386/sev: add migration blockers only once Ani Sinha
2025-12-12 15:03 ` [PATCH v1 16/28] i386/sev: add notifiers " Ani Sinha
2025-12-12 15:03 ` [PATCH v1 17/28] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
2025-12-12 15:03 ` [PATCH v1 18/28] i386/sev: add support for confidential guest reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 19/28] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
2025-12-12 15:03 ` [PATCH v1 20/28] kvm/i8254: add support for confidential guest reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 21/28] hw/hyperv/vmbus: " Ani Sinha
2025-12-12 15:03 ` [PATCH v1 22/28] accel/kvm: add a per-confidential class callback to unlock guest state Ani Sinha
2025-12-12 15:03 ` [PATCH v1 23/28] kvm/xen-emu: re-initialize capabilities during confidential guest reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 24/28] kvm/xen_evtchn: add support for " Ani Sinha
2025-12-12 15:03 ` [PATCH v1 25/28] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
2025-12-12 15:03 ` [PATCH v1 26/28] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
2025-12-12 15:03 ` [PATCH v1 27/28] kvm/i386/apic: set local apic after vcpu file descriptors changed Ani Sinha
2025-12-12 15:03 ` [PATCH v1 28/28] kvm/clock: add support for confidential guest reset Ani Sinha
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).