[PATCH v2 00/32] Introduce support for confidential guest reset

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 00/32] Introduce support for confidential guest reset
@ 2026-01-12 13:22 Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 01/32] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
                   ` (31 more replies)
  0 siblings, 32 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  Cc: Ani Sinha, qemu-devel, pbonzini, kraxel, vkuznets, graf

This change introduces support for confidential guests
(SEV-ES, SEV-SNP and TDX) to reset/reboot just like other non-confidential
guests. Currently, a reboot intiated from the confidential guest results
in termination of the QEMU hypervisor as the CPUs are not resettable. As the
initial state of the guest including private memory is locked and encrypted,
the contents of that memory will not be accessible post reset. Hence a new
KVM file descriptor must be opened to create a new confidential VM context
closing the old one. All KVM VM specific ioctls must be called again. New
VCPU file descriptors must be created against the new KVM fd and most VCPU
ioctls must be called again as well.

This change perfoms closing of the old KVM fd and creating a new one. After
the new KVM fd is opened, all generic and architecture specific ioctl calls
are issued again. Notifiers are added to notify subsystems that:
- The KVM file fd is about to be changed to state sync-ing from KVM to QEMU
  should be done if required.
- The KVM file fd has changed, so ioctl calls to the new KVM fd has to be
  performed again.
- That new VCPU fds are created so that VCPU ioctl calls must be called again
  where required.

Specific subsystems use these notifiers to re-issue ioctl calls where required.

Changes are made to SEV and TDX modules to reinitialize the confidential guest
state and seal it again. Along the way, some bug fixes are made so that some
initialization functions can be called again. Some refactoring of existing
code is done so that both init and reset paths can use them.

Tested on TDX and SEV-SNP.
CI pipeline passes:
https://gitlab.com/anisinha/qemu/-/commit/eb647d2299ba8aac62a4bffbeb470c665c831421/pipelines?ref=coco-reboot-v2

Please review and test.

Changelog:

v2:
 - Bugfixes.
 - Added a new machine option so that we can exercize most of the non-coco changes
   related to reboot on non-coco platforms.
 - added a new functional test. Currently its skipped on CI pipeline as KVM is not
   enabled (no /dev/kvm on the container)for QEMU CI tests. It can be run manually and it
   passes on those systems where KVM is enabled.
 - Addressed comments from v1 with regards to refactoring of code, code simplication by
   removal of redundant stuff, moved around code
   so that notifiers, migration blockers are added only on one place.
 - Added some tracepoints for future debugging on newly added functions.
 - Rebased.

One thing I have not addressed in v2 is to combine pre and post notifiers into one
with a boolean argument to differentiate them. This will be addressed as a part of
v3 which is here: https://gitlab.com/anisinha/qemu/-/commits/coco-reboot-v3. The change
is getting tested: https://gitlab.com/anisinha/qemu/-/commit/7b3ef489a6d45c0282c851c38c54b6a2c3e2c20d
 
CC: qemu-devel@nongnu.org
CC: pbonzini@redhat.com
CC: kraxel@redhat.com
CC: vkuznets@redhat.com
CC: graf@amazon.com


Ani Sinha (32):
  i386/kvm: avoid installing duplicate msr entries in msr_handlers
  hw/accel: add a per-accelerator callback to change VM accelerator
    handle
  system/physmem: add helper to reattach existing memory after KVM VM fd
    change
  accel/kvm: add changes required to support KVM VM file descriptor
    change
  accel/kvm: mark guest state as unprotected after vm file descriptor
    change
  accel/kvm: add a notifier to indicate KVM VM file descriptor has
    changed
  accel/kvm: add notifier to inform that the KVM VM file fd is about to
    be changed
  i386/kvm: unregister smram listeners prior to vm file descriptor
    change
  kvm/i386: implement architecture support for kvm file descriptor
    change
  hw/i386: refactor x86_bios_rom_init for reuse in confidential guest
    reset
  kvm/i386: reload firmware for confidential guest reset
  accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon
    reset
  i386/tdx: refactor TDX firmware memory initialization code into a new
    function
  i386/tdx: finalize TDX guest state upon reset
  i386/tdx: add a pre-vmfd change notifier to reset tdx state
  i386/sev: add migration blockers only once
  i386/sev: add notifiers only once
  i386/sev: free existing launch update data and kernel hashes data on
    init
  i386/sev: add support for confidential guest reset
  hw/vfio: generate new file fd for pseudo device and rebind existing
    descriptors
  kvm/i8254: refactor pit initialization into a helper
  kvm/i8254: add support for confidential guest reset
  hw/hyperv/vmbus: add support for confidential guest reset
  accel/kvm: add a per-confidential class callback to unlock guest state
  kvm/xen-emu: re-initialize capabilities during confidential guest
    reset
  kvm/xen_evtchn: add support for confidential guest reset
  ppc/openpic: create a new openpic device and reattach mem region on
    coco reset
  kvm/vcpu: add notifiers to inform vcpu file descriptor change
  kvm/i386/apic: set local apic after vcpu file descriptors changed
  kvm/clock: add support for confidential guest reset
  hw/machine: introduce machine specific option 'x-change-vmfd-on-reset'
  tests/functional/x86_64: add functional test to exercise vm fd change
    on reset

 MAINTAINERS                                   |   6 +
 accel/kvm/kvm-all.c                           | 365 ++++++++++++++++--
 accel/kvm/trace-events                        |   2 +
 accel/stubs/kvm-stub.c                        |  26 ++
 hw/core/machine.c                             |  22 ++
 hw/hyperv/vmbus.c                             |  30 ++
 hw/i386/kvm/apic.c                            |  13 +
 hw/i386/kvm/clock.c                           |  56 +++
 hw/i386/kvm/i8254.c                           |  83 ++--
 hw/i386/kvm/xen_evtchn.c                      | 100 ++++-
 hw/i386/x86-common.c                          |  50 ++-
 hw/intc/openpic_kvm.c                         | 108 ++++--
 hw/vfio/helpers.c                             |  81 +++-
 include/accel/accel-ops.h                     |   1 +
 include/hw/core/boards.h                      |   6 +
 include/hw/i386/apic_internal.h               |   1 +
 include/hw/i386/x86.h                         |   5 +-
 include/system/confidential-guest-support.h   |  27 ++
 include/system/kvm.h                          |  55 +++
 include/system/physmem.h                      |   1 +
 system/physmem.c                              |  28 ++
 system/runstate.c                             |  36 +-
 target/arm/kvm.c                              |  10 +
 target/i386/kvm/kvm.c                         | 209 ++++++++--
 target/i386/kvm/tdx.c                         | 142 +++++--
 target/i386/kvm/tdx.h                         |   1 +
 target/i386/kvm/trace-events                  |   4 +
 target/i386/kvm/xen-emu.c                     |  45 ++-
 target/i386/sev.c                             |  97 ++++-
 target/i386/trace-events                      |   1 +
 target/loongarch/kvm/kvm.c                    |  10 +
 target/mips/kvm.c                             |  10 +
 target/ppc/kvm.c                              |  10 +
 target/riscv/kvm/kvm-cpu.c                    |  10 +
 target/s390x/kvm/kvm.c                        |  10 +
 tests/functional/x86_64/meson.build           |   1 +
 .../x86_64/test_vmfd_change_reboot.py         |  75 ++++
 37 files changed, 1544 insertions(+), 193 deletions(-)
 create mode 100755 tests/functional/x86_64/test_vmfd_change_reboot.py

-- 
2.42.0



^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 01/32] i386/kvm: avoid installing duplicate msr entries in msr_handlers
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 02/32] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kvm, qemu-devel

kvm_filter_msr() does not check if an msr entry is already present in the
msr_handlers table and installs a new handler unconditionally. If the function
is called again with the same MSR, it will result in duplicate entries in the
table and multiple such calls will fill up the table needlessly. Fix that.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7b9b740a8e..3fdb2a3f62 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6043,27 +6043,33 @@ static int kvm_install_msr_filters(KVMState *s)
 static int kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
                           QEMUWRMSRHandler *wrmsr)
 {
-    int i, ret;
+    int i, ret = 0;
 
     for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
-        if (!msr_handlers[i].msr) {
+        if (msr_handlers[i].msr == msr) {
+            break;
+        } else if (!msr_handlers[i].msr) {
             msr_handlers[i] = (KVMMSRHandlers) {
                 .msr = msr,
                 .rdmsr = rdmsr,
                 .wrmsr = wrmsr,
             };
+            break;
+        }
+    }
 
-            ret = kvm_install_msr_filters(s);
-            if (ret) {
-                msr_handlers[i] = (KVMMSRHandlers) { };
-                return ret;
-            }
+    if (i == ARRAY_SIZE(msr_handlers)) {
+        ret = -EINVAL;
+        goto end;
+    }
 
-            return 0;
-        }
+    ret = kvm_install_msr_filters(s);
+    if (ret) {
+        msr_handlers[i] = (KVMMSRHandlers) { };
     }
 
-    return -EINVAL;
+ end:
+    return ret;
 }
 
 static int kvm_handle_rdmsr(X86CPU *cpu, struct kvm_run *run)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 02/32] hw/accel: add a per-accelerator callback to change VM accelerator handle
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 01/32] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 17:01   ` Paolo Bonzini
  2026-01-12 13:22 ` [PATCH v2 03/32] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
                   ` (29 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Richard Henderson, Paolo Bonzini, Philippe Mathieu-Daudé
  Cc: Ani Sinha, qemu-devel

When a confidential virtual machine is reset, a new guest context in the
accelerator must be generated post reset. Therefore, the old accelerator guest
file handle must closed and a new one created. To this end, a per-accelerator
callback, "reset_vmfd" is introduced that would get called when a confidential
guest is reset. Subsequent patches will introduce specific implementation of
this callback for KVM accelerator.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 include/accel/accel-ops.h |  1 +
 system/runstate.c         | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/accel/accel-ops.h b/include/accel/accel-ops.h
index 23a8c246e1..998a95ca69 100644
--- a/include/accel/accel-ops.h
+++ b/include/accel/accel-ops.h
@@ -23,6 +23,7 @@ struct AccelClass {
     AccelOpsClass *ops;
 
     int (*init_machine)(AccelState *as, MachineState *ms);
+    int (*reset_vmfd)(MachineState *ms);
     bool (*cpu_common_realize)(CPUState *cpu, Error **errp);
     void (*cpu_common_unrealize)(CPUState *cpu);
     /* get_stats: Append statistics to @buf */
diff --git a/system/runstate.c b/system/runstate.c
index ed2db56480..b0ce0410fa 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -42,6 +42,7 @@
 #include "qapi/qapi-commands-run-state.h"
 #include "qapi/qapi-events-run-state.h"
 #include "qemu/accel.h"
+#include "accel/accel-ops.h"
 #include "qemu/error-report.h"
 #include "qemu/job.h"
 #include "qemu/log.h"
@@ -508,6 +509,8 @@ void qemu_system_reset(ShutdownCause reason)
 {
     MachineClass *mc;
     ResetType type;
+    AccelClass *ac = ACCEL_GET_CLASS(current_accel());
+    int ret;
 
     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
 
@@ -520,6 +523,23 @@ void qemu_system_reset(ShutdownCause reason)
     default:
         type = RESET_TYPE_COLD;
     }
+
+    /*
+     * different accelerators implement how to close the old file handle of
+     * the accelerator descriptor and create a new one here. Resetting
+     * file handle is necessary to create a new confidential VM context post
+     * VM reset.
+     */
+    if (current_machine->cgs && reason == SHUTDOWN_CAUSE_GUEST_RESET) {
+        if (ac->reset_vmfd) {
+            ret = ac->reset_vmfd(current_machine);
+            if (ret < 0) {
+                error_report("unable to reset vmfd: %d", ret);
+                abort();
+            }
+        }
+    }
+
     if (mc && mc->reset) {
         mc->reset(current_machine, type);
     } else {
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 03/32] system/physmem: add helper to reattach existing memory after KVM VM fd change
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 01/32] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 02/32] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 04/32] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Xu, Philippe Mathieu-Daudé
  Cc: Ani Sinha, qemu-devel

After the guest KVM file descriptor has changed as a part of the process of
confidential guest reset mechanism, existing memory needs to be reattached to
the new file descriptor. This change adds a helper function ram_block_rebind()
for this purpose. The next patch will make use of this function.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 include/system/physmem.h |  1 +
 system/physmem.c         | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/system/physmem.h b/include/system/physmem.h
index 7bb7d3e154..da91b77bd9 100644
--- a/include/system/physmem.h
+++ b/include/system/physmem.h
@@ -51,5 +51,6 @@ physical_memory_snapshot_and_clear_dirty(MemoryRegion *mr, hwaddr offset,
 bool physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
                                         ram_addr_t start,
                                         ram_addr_t length);
+int ram_block_rebind(Error **errp);
 
 #endif
diff --git a/system/physmem.c b/system/physmem.c
index 0105e88058..58c89500e9 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2857,6 +2857,34 @@ found:
     return block;
 }
 
+/*
+ * Creates new guest memfd for the ramblocks and closes the
+ * existing memfd.
+ */
+int ram_block_rebind(Error **errp)
+{
+    RAMBlock *block;
+
+    qemu_mutex_lock_ramlist();
+
+    RAMBLOCK_FOREACH(block) {
+        if (block->flags & RAM_GUEST_MEMFD) {
+            if (block->guest_memfd >= 0) {
+                close(block->guest_memfd);
+            }
+            block->guest_memfd = kvm_create_guest_memfd(block->max_length,
+                                                        0, errp);
+            if (block->guest_memfd < 0) {
+                qemu_mutex_unlock_ramlist();
+                return -1;
+            }
+
+        }
+    }
+    qemu_mutex_unlock_ramlist();
+    return 0;
+}
+
 /*
  * Finds the named RAMBlock
  *
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 04/32] accel/kvm: add changes required to support KVM VM file descriptor change
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (2 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 03/32] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 17:02   ` Paolo Bonzini
  2026-01-12 13:22 ` [PATCH v2 05/32] accel/kvm: mark guest state as unprotected after vm " Ani Sinha
                   ` (27 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Maydell, Marcelo Tosatti, Song Gao,
	Huacai Chen, Philippe Mathieu-Daudé, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Nicholas Piggin,
	Harsh Prateek Bora, Chinmay Rath, Palmer Dabbelt,
	Alistair Francis, Weiwei Li, Daniel Henrique Barboza, Liu Zhiwei,
	Halil Pasic, Christian Borntraeger, Eric Farman, Matthew Rosato,
	Thomas Huth, Richard Henderson, Ilya Leoshkevich,
	David Hildenbrand
  Cc: Ani Sinha, kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv,
	qemu-s390x

This change adds common kvm specific support to handle KVM VM file descriptor
change. KVM VM file descriptor can change as a part of confidential guest reset
mechanism. A new function api kvm_arch_vmfd_change_ops() per
architecture platform is added in order to implement architecture specific
changes required to support it. A subsequent patch will add x86 specific
implementation for kvm_arch_vmfd_change_ops as currently only x86 supports
confidential guest reset.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c        | 80 ++++++++++++++++++++++++++++++++++++--
 accel/kvm/trace-events     |  1 +
 include/system/kvm.h       |  2 +
 target/arm/kvm.c           | 10 +++++
 target/i386/kvm/kvm.c      | 10 +++++
 target/loongarch/kvm/kvm.c | 10 +++++
 target/mips/kvm.c          | 10 +++++
 target/ppc/kvm.c           | 10 +++++
 target/riscv/kvm/kvm-cpu.c | 10 +++++
 target/s390x/kvm/kvm.c     | 10 +++++
 10 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f85eb42d78..762f302551 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2415,11 +2415,9 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, qemu_irq irq, int gsi)
     g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));
 }
 
-static void kvm_irqchip_create(KVMState *s)
+static void do_kvm_irqchip_create(KVMState *s)
 {
     int ret;
-
-    assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
     if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
         ;
     } else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) {
@@ -2452,7 +2450,13 @@ static void kvm_irqchip_create(KVMState *s)
         fprintf(stderr, "Create kernel irqchip failed: %s\n", strerror(-ret));
         exit(1);
     }
+}
 
+static void kvm_irqchip_create(KVMState *s)
+{
+    assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
+
+    do_kvm_irqchip_create(s);
     kvm_kernel_irqchip = true;
     /* If we have an in-kernel IRQ chip then we must have asynchronous
      * interrupt delivery (though the reverse is not necessarily true)
@@ -2607,6 +2611,75 @@ static int kvm_setup_dirty_ring(KVMState *s)
     return 0;
 }
 
+static int kvm_reset_vmfd(MachineState *ms)
+{
+    KVMState *s;
+    KVMMemoryListener *kml;
+    int ret = 0, type;
+    Error *err = NULL;
+
+    /*
+     * bail if the current architecture does not support VM file
+     * descriptor change.
+     */
+    if (!kvm_arch_supports_vmfd_change()) {
+        error_report("This target architecture does not support KVM VM "
+                     "file descriptor change.");
+        return -EOPNOTSUPP;
+    }
+
+    s = KVM_STATE(ms->accelerator);
+    kml = &s->memory_listener;
+
+    memory_listener_unregister(&kml->listener);
+    memory_listener_unregister(&kvm_io_listener);
+
+    if (s->vmfd >= 0) {
+        close(s->vmfd);
+    }
+
+    type = find_kvm_machine_type(ms);
+    if (type < 0) {
+        return -EINVAL;
+    }
+
+    ret = do_kvm_create_vm(s, type);
+    if (ret < 0) {
+        return ret;
+    }
+
+    s->vmfd = ret;
+
+    kvm_setup_dirty_ring(s);
+
+    /* rebind memory to new vm fd */
+    ret = ram_block_rebind(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
+    ret = kvm_arch_vmfd_change_ops(ms, s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (s->kernel_irqchip_allowed) {
+        do_kvm_irqchip_create(s);
+    }
+
+    /* these can be only called after ram_block_rebind() */
+    memory_listener_register(&kml->listener, &address_space_memory);
+    memory_listener_register(&kvm_io_listener, &address_space_io);
+
+    /*
+     * kvm fd has changed. Commit the irq routes to KVM once more.
+     */
+    kvm_irqchip_commit_routes(s);
+    trace_kvm_reset_vmfd();
+    return ret;
+}
+
 static int kvm_init(AccelState *as, MachineState *ms)
 {
     MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -4014,6 +4087,7 @@ static void kvm_accel_class_init(ObjectClass *oc, const void *data)
     AccelClass *ac = ACCEL_CLASS(oc);
     ac->name = "KVM";
     ac->init_machine = kvm_init;
+    ac->reset_vmfd = kvm_reset_vmfd;
     ac->has_memory = kvm_accel_has_memory;
     ac->allowed = &kvm_allowed;
     ac->gdbstub_supported_sstep_flags = kvm_gdbstub_sstep_flags;
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e43d18a869..e4beda0148 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -14,6 +14,7 @@ kvm_destroy_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
 kvm_park_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
 kvm_unpark_vcpu(unsigned long arch_cpu_id, const char *msg) "id: %lu %s"
 kvm_irqchip_commit_routes(void) ""
+kvm_reset_vmfd(void) ""
 kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
 kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 8f9eecf044..a5ab22421d 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -358,6 +358,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s);
 int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
 int kvm_arch_init_vcpu(CPUState *cpu);
 int kvm_arch_destroy_vcpu(CPUState *cpu);
+bool kvm_arch_supports_vmfd_change(void);
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s);
 
 #ifdef TARGET_KVM_HAVE_RESET_PARKED_VCPU
 void kvm_arch_reset_parked_vcpu(unsigned long vcpu_id, int kvm_fd);
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 48f853fff8..10cd94a57d 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1569,6 +1569,16 @@ void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
 
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+    abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+    return false;
+}
+
 int kvm_arch_irqchip_create(KVMState *s)
 {
     if (kvm_kernel_irqchip_split()) {
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3fdb2a3f62..6aa17cecba 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3253,6 +3253,16 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
     return 0;
 }
 
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+    abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+    return false;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     int ret;
diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index ef3359ced9..9d5c73f3a3 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -1312,6 +1312,16 @@ int kvm_arch_irqchip_create(KVMState *s)
     return kvm_check_extension(s, KVM_CAP_DEVICE_CTRL);
 }
 
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+    abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+    return false;
+}
+
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
 }
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index a85e162409..fbef498bd7 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -44,6 +44,16 @@ unsigned long kvm_arch_vcpu_id(CPUState *cs)
     return cs->cpu_index;
 }
 
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+    abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+    return false;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     /* MIPS has 128 signals */
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 3b2f1077da..7cdc0d09f4 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -180,6 +180,16 @@ int kvm_arch_irqchip_create(KVMState *s)
     return 0;
 }
 
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+    abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+    return false;
+}
+
 static int kvm_arch_sync_sregs(PowerPCCPU *cpu)
 {
     CPUPPCState *cenv = &cpu->env;
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 5d792563b9..548ea3aeab 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -1545,6 +1545,16 @@ int kvm_arch_irqchip_create(KVMState *s)
     return kvm_check_extension(s, KVM_CAP_DEVICE_CTRL);
 }
 
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+    abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+    return false;
+}
+
 int kvm_arch_process_async_events(CPUState *cs)
 {
     return 0;
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index bd6c440aef..6374246416 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -393,6 +393,16 @@ int kvm_arch_irqchip_create(KVMState *s)
     return 0;
 }
 
+int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
+{
+    abort();
+}
+
+bool kvm_arch_supports_vmfd_change(void)
+{
+    return false;
+}
+
 unsigned long kvm_arch_vcpu_id(CPUState *cpu)
 {
     return cpu->cpu_index;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 05/32] accel/kvm: mark guest state as unprotected after vm file descriptor change
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (3 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 04/32] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 06/32] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kvm, qemu-devel

When the KVM VM file descriptor has changed and a new one created, the guest
state is no longer in protected state. Mark it as such.
The guest state becomes protected again when TDX and SEV-ES and SEV-SNP mark
it as such.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 762f302551..df49a24466 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2650,6 +2650,9 @@ static int kvm_reset_vmfd(MachineState *ms)
 
     s->vmfd = ret;
 
+    /* guest state is now unprotected again */
+    kvm_state->guest_state_protected = false;
+
     kvm_setup_dirty_ring(s);
 
     /* rebind memory to new vm fd */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 06/32] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (4 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 05/32] accel/kvm: mark guest state as unprotected after vm " Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 07/32] accel/kvm: add notifier to inform that the KVM VM file fd is about to be changed Ani Sinha
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kvm, qemu-devel

A notifier callback can be used by various subsystems to perform actions when
KVM file descriptor for a virtual machine changes as a part of confidential
guest reset process. This change adds this notifier mechanism. Subsequent
patches will add specific implementations for various notifier callbacks
corresponding to various subsystems that need to take action when KVM VM file
descriptor changed.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c    | 30 ++++++++++++++++++++++++++++++
 accel/stubs/kvm-stub.c |  8 ++++++++
 include/system/kvm.h   | 21 +++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index df49a24466..ef8e855af5 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -90,6 +90,7 @@ struct KVMParkedVcpu {
 };
 
 KVMState *kvm_state;
+VmfdChangeNotifier vmfd_notifier;
 bool kvm_kernel_irqchip;
 bool kvm_split_irqchip;
 bool kvm_async_interrupts_allowed;
@@ -123,6 +124,9 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = {
 static NotifierList kvm_irqchip_change_notifiers =
     NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
 
+static NotifierWithReturnList register_vmfd_changed_notifiers =
+    NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
+
 struct KVMResampleFd {
     int gsi;
     EventNotifier *resample_event;
@@ -2173,6 +2177,22 @@ void kvm_irqchip_change_notify(void)
     notifier_list_notify(&kvm_irqchip_change_notifiers, NULL);
 }
 
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_list_add(&register_vmfd_changed_notifiers, n);
+}
+
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_remove(n);
+}
+
+static int kvm_vmfd_change_notify(Error **errp)
+{
+    return notifier_with_return_list_notify(&register_vmfd_changed_notifiers,
+                                            &vmfd_notifier, errp);
+}
+
 int kvm_irqchip_get_virq(KVMState *s)
 {
     int next_virq;
@@ -2671,6 +2691,16 @@ static int kvm_reset_vmfd(MachineState *ms)
         do_kvm_irqchip_create(s);
     }
 
+    /*
+     * notify everyone that vmfd has changed.
+     */
+    vmfd_notifier.vmfd = s->vmfd;
+    ret = kvm_vmfd_change_notify(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
     /* these can be only called after ram_block_rebind() */
     memory_listener_register(&kml->listener, &address_space_memory);
     memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 68cd33ba97..a6e8a6e16c 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -79,6 +79,14 @@ void kvm_irqchip_change_notify(void)
 {
 }
 
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
+{
+}
+
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
+{
+}
+
 int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
                                        EventNotifier *rn, int virq)
 {
diff --git a/include/system/kvm.h b/include/system/kvm.h
index a5ab22421d..7df162b1f7 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -181,6 +181,7 @@ DECLARE_INSTANCE_CHECKER(KVMState, KVM_STATE,
 
 extern KVMState *kvm_state;
 typedef struct Notifier Notifier;
+typedef struct NotifierWithReturn NotifierWithReturn;
 
 typedef struct KVMRouteChange {
      KVMState *s;
@@ -566,4 +567,24 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size);
 
 int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
 
+/* argument to vmfd change notifier */
+typedef struct VmfdChangeNotifier {
+    int vmfd;
+} VmfdChangeNotifier;
+
+/**
+ * kvm_vmfd_add_change_notifier - register a notifier to get notified when
+ * a KVM vm file descriptor changes as a part of the confidential guest "reset"
+ * process. Various subsystems should use this mechanism to take actions such
+ * as creating new fds against this new vm file descriptor.
+ * @n: notifier with return value.
+ */
+void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
+/**
+ * kvm_vmfd_remove_change_notifier - de-register a notifer previously
+ * registered with kvm_vmfd_add_change_notifier call.
+ * @n: notifier that was previously registered.
+ */
+void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
+
 #endif
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 07/32] accel/kvm: add notifier to inform that the KVM VM file fd is about to be changed
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (5 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 06/32] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 08/32] i386/kvm: unregister smram listeners prior to vm file descriptor change Ani Sinha
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kvm, qemu-devel

Various subsystems might need to take some steps before the KVM file descriptor
for a virtual machine is changed. So a new notifier is added to inform them that
kvm VM file descriptor is about to change.

Subsequent patches will add callback implementations for specific components
that need this notification.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c    | 25 +++++++++++++++++++++++++
 accel/stubs/kvm-stub.c |  8 ++++++++
 include/system/kvm.h   | 15 +++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ef8e855af5..367968427b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -127,6 +127,9 @@ static NotifierList kvm_irqchip_change_notifiers =
 static NotifierWithReturnList register_vmfd_changed_notifiers =
     NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_changed_notifiers);
 
+static NotifierWithReturnList register_vmfd_pre_change_notifiers =
+    NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_pre_change_notifiers);
+
 struct KVMResampleFd {
     int gsi;
     EventNotifier *resample_event;
@@ -2193,6 +2196,22 @@ static int kvm_vmfd_change_notify(Error **errp)
                                             &vmfd_notifier, errp);
 }
 
+void kvm_vmfd_add_pre_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_list_add(&register_vmfd_pre_change_notifiers, n);
+}
+
+void kvm_vmfd_remove_pre_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_remove(n);
+}
+
+static int kvm_vmfd_pre_change_notify(Error **errp)
+{
+    return notifier_with_return_list_notify(&register_vmfd_pre_change_notifiers,
+                                            NULL, errp);
+}
+
 int kvm_irqchip_get_virq(KVMState *s)
 {
     int next_virq;
@@ -2654,6 +2673,12 @@ static int kvm_reset_vmfd(MachineState *ms)
     memory_listener_unregister(&kml->listener);
     memory_listener_unregister(&kvm_io_listener);
 
+    ret = kvm_vmfd_pre_change_notify(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
     if (s->vmfd >= 0) {
         close(s->vmfd);
     }
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index a6e8a6e16c..7f4e3c4050 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -79,6 +79,14 @@ void kvm_irqchip_change_notify(void)
 {
 }
 
+void kvm_vmfd_add_pre_change_notifier(NotifierWithReturn *n)
+{
+}
+
+void kvm_vmfd_remove_pre_change_notifier(NotifierWithReturn *n)
+{
+}
+
 void kvm_vmfd_add_change_notifier(NotifierWithReturn *n)
 {
 }
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 7df162b1f7..edc3fa5004 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -587,4 +587,19 @@ void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
  */
 void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
 
+/**
+ * kvm_vmfd_add_pre_change_notifier - register a notifier to get notified when
+ * kvm vm file descriptor is about to be changed as a part of the confidential
+ * guest "reset" process.
+ * @n: notifier with return value.
+ */
+void kvm_vmfd_add_pre_change_notifier(NotifierWithReturn *n);
+
+/**
+ * kvm_vmfd_remove_pre_change_notifier - de-register a notifier previously
+ * registered with kvm_vmfd_add_pre_change_notifier.
+ * @n: the notifier that was previously registered.
+ */
+void kvm_vmfd_remove_pre_change_notifier(NotifierWithReturn *n);
+
 #endif
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 08/32] i386/kvm: unregister smram listeners prior to vm file descriptor change
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (6 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 07/32] accel/kvm: add notifier to inform that the KVM VM file fd is about to be changed Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 09/32] kvm/i386: implement architecture support for kvm " Ani Sinha
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kvm, qemu-devel

We will re-register smram listeners after the VM file descriptors has changed.
We need to unregister them first to make sure addresses and reference counters
work properly.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6aa17cecba..89f9e11d3a 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -113,6 +113,11 @@ typedef struct {
 static void kvm_init_msrs(X86CPU *cpu);
 static int kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
                           QEMUWRMSRHandler *wrmsr);
+static int unregister_smram_listener(NotifierWithReturn *notifier,
+                                     void *data, Error** errp);
+NotifierWithReturn kvm_vmfd_pre_change_notifier = {
+    .notify = unregister_smram_listener,
+};
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
     KVM_CAP_INFO(SET_TSS_ADDR),
@@ -2749,6 +2754,13 @@ static void register_smram_listener(Notifier *n, void *unused)
     }
 }
 
+static int unregister_smram_listener(NotifierWithReturn *notifier,
+                                     void *data, Error** errp)
+{
+    memory_listener_unregister(&smram_listener.listener);
+    return 0;
+}
+
 /* It should only be called in cpu's hotplug callback */
 void kvm_smm_cpu_address_space_init(X86CPU *cpu)
 {
@@ -3401,6 +3413,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    kvm_vmfd_add_pre_change_notifier(&kvm_vmfd_pre_change_notifier);
+
     return 0;
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 09/32] kvm/i386: implement architecture support for kvm file descriptor change
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (7 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 08/32] i386/kvm: unregister smram listeners prior to vm file descriptor change Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 17:06   ` Paolo Bonzini
  2026-01-12 13:22 ` [PATCH v2 10/32] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
                   ` (22 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kvm, qemu-devel

When the kvm file descriptor changes as a part of confidential guest reset,
some architecture specific setups including SEV/SEV-SNP/TDX specific setups
needs to be redone. These changes are implemented as a part of the
kvm_arch_vmfd_change_ops() call which was introduced previously.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c        | 135 +++++++++++++++++++++++++++++++----
 target/i386/kvm/trace-events |   1 +
 2 files changed, 122 insertions(+), 14 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 89f9e11d3a..4fedc621b8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3265,14 +3265,132 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
     return 0;
 }
 
+static int xen_init_wrapper(MachineState *ms, KVMState *s);
+
 int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
 {
-    abort();
+    Error *local_err = NULL;
+    int ret;
+
+    /*
+     * Initialize confidential context, if required
+     *
+     * If no memory encryption is requested (ms->cgs == NULL) this is
+     * a no-op.
+     *
+     */
+    if (ms->cgs) {
+        ret = confidential_guest_kvm_init(ms->cgs, &local_err);
+        if (ret < 0) {
+            error_report_err(local_err);
+            return ret;
+        }
+    }
+
+    ret = kvm_vm_enable_exception_payload(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = kvm_vm_enable_triple_fault_event(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (s->xen_version) {
+        ret = xen_init_wrapper(ms, s);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    ret = kvm_vm_set_identity_map_addr(s, KVM_IDENTITY_BASE);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = kvm_vm_set_tss_addr(s, KVM_IDENTITY_BASE + 0x1000);
+    if (ret < 0) {
+        return ret;
+    }
+    ret = kvm_vm_set_nr_mmu_pages(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
+        x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
+        memory_listener_register(&smram_listener.listener,
+                                 &smram_address_space);
+    }
+
+    if (enable_cpu_pm) {
+        ret = kvm_vm_enable_disable_exits(s);
+        if (ret < 0) {
+            error_report("kvm: guest stopping CPU not supported: %s",
+                         strerror(-ret));
+            return ret;
+        }
+    }
+
+    if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
+        X86MachineState *x86ms = X86_MACHINE(ms);
+
+        if (x86ms->bus_lock_ratelimit > 0) {
+            ret = kvm_vm_enable_bus_lock_exit(s);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+        kvm_set_max_apic_id(x86ms->apic_id_limit);
+    }
+
+    if (kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) {
+        ret = kvm_vm_enable_notify_vmexit(s);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) {
+        ret = kvm_vm_enable_userspace_msr(s);
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (s->msr_energy.enable == true) {
+            ret = kvm_vm_enable_energy_msrs(s);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+    }
+
+    trace_kvm_arch_vmfd_change_ops();
+    return 0;
+}
+
+static int xen_init_wrapper(MachineState *ms, KVMState *s)
+{
+    int ret = 0;
+#ifdef CONFIG_XEN_EMU
+    if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
+        error_report("kvm: Xen support only available in PC machine");
+        return -ENOTSUP;
+    }
+    /* hyperv_enabled() doesn't work yet. */
+    uint32_t msr = XEN_HYPERCALL_MSR;
+    ret = kvm_xen_init(s, msr);
+#else
+    error_report("kvm: Xen support not enabled in qemu");
+    return -ENOTSUP;
+#endif
+    return ret;
 }
 
 bool kvm_arch_supports_vmfd_change(void)
 {
-    return false;
+    return true;
 }
 
 int kvm_arch_init(MachineState *ms, KVMState *s)
@@ -3308,21 +3426,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     }
 
     if (s->xen_version) {
-#ifdef CONFIG_XEN_EMU
-        if (!object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)) {
-            error_report("kvm: Xen support only available in PC machine");
-            return -ENOTSUP;
-        }
-        /* hyperv_enabled() doesn't work yet. */
-        uint32_t msr = XEN_HYPERCALL_MSR;
-        ret = kvm_xen_init(s, msr);
+        ret = xen_init_wrapper(ms, s);
         if (ret < 0) {
             return ret;
         }
-#else
-        error_report("kvm: Xen support not enabled in qemu");
-        return -ENOTSUP;
-#endif
     }
 
     ret = kvm_get_supported_msrs(s);
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 74a6234ff7..1f4786f687 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -6,6 +6,7 @@ kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
 kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
 kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 kvm_hc_map_gpa_range(uint64_t gpa, uint64_t size, uint64_t attributes, uint64_t flags) "gpa 0x%" PRIx64 " size 0x%" PRIx64 " attributes 0x%" PRIx64 " flags 0x%" PRIx64
+kvm_arch_vmfd_change_ops(void) ""
 
 # xen-emu.c
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 10/32] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (8 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 09/32] kvm/i386: implement architecture support for kvm " Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 11/32] kvm/i386: reload firmware for " Ani Sinha
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Ani Sinha, qemu-devel

For confidential guests, bios image must be reinitialized upon reset. This
is because bios memory is encrypted and hence once the old confidential
kvm context is destroyed, it cannot be decrypted. It needs to be reinitilized.
In order to do that, this change refactors x86_bios_rom_init() code so that
parts of it can be called during confidential guest reset.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/x86-common.c  | 50 ++++++++++++++++++++++++++++++++-----------
 include/hw/i386/x86.h |  5 ++++-
 2 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index c1c9224039..e58ab846d2 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -1024,17 +1024,11 @@ void x86_isa_bios_init(MemoryRegion *isa_bios, MemoryRegion *isa_memory,
     memory_region_set_readonly(isa_bios, read_only);
 }
 
-void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
-                       MemoryRegion *rom_memory, bool isapc_ram_fw)
+int get_bios_size(X86MachineState *x86ms,
+                  const char *bios_name, char *filename)
 {
-    const char *bios_name;
-    char *filename;
     int bios_size;
-    ssize_t ret;
 
-    /* BIOS load */
-    bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
-    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
     if (filename) {
         bios_size = get_image_size(filename, NULL);
     } else {
@@ -1044,6 +1038,20 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
         (bios_size % 65536) != 0) {
         goto bios_error;
     }
+
+    return bios_size;
+
+ bios_error:
+    fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
+    exit(1);
+}
+
+void load_bios_from_file(X86MachineState *x86ms, const char *bios_name,
+                         char *filename, int bios_size, bool isapc_ram_fw)
+{
+    ssize_t ret;
+
+    /* BIOS load */
     if (machine_require_guest_memfd(MACHINE(x86ms))) {
         memory_region_init_ram_guest_memfd(&x86ms->bios, NULL, "pc.bios",
                                            bios_size, &error_fatal);
@@ -1072,7 +1080,26 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
             goto bios_error;
         }
     }
-    g_free(filename);
+
+    return;
+
+ bios_error:
+    fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
+    exit(1);
+}
+
+void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
+                       MemoryRegion *rom_memory, bool isapc_ram_fw)
+{
+    int bios_size;
+    const char *bios_name;
+    char *filename;
+
+    bios_name = MACHINE(x86ms)->firmware ?: default_firmware;
+    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+
+    bios_size = get_bios_size(x86ms, bios_name, filename);
+    load_bios_from_file(x86ms, bios_name, filename, bios_size, isapc_ram_fw);
 
     if (!machine_require_guest_memfd(MACHINE(x86ms))) {
         /* map the last 128KB of the BIOS in ISA space */
@@ -1084,9 +1111,6 @@ void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
     memory_region_add_subregion(rom_memory,
                                 (uint32_t)(-bios_size),
                                 &x86ms->bios);
+    g_free(filename);
     return;
-
-bios_error:
-    fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
-    exit(1);
 }
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 0dffba95f9..86f14a7d87 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -122,7 +122,10 @@ void x86_cpu_unplug_request_cb(HotplugHandler *hotplug_dev,
                                DeviceState *dev, Error **errp);
 void x86_cpu_unplug_cb(HotplugHandler *hotplug_dev,
                        DeviceState *dev, Error **errp);
-
+int get_bios_size(X86MachineState *x86ms,
+                  const char *bios_name, char *filename);
+void load_bios_from_file(X86MachineState *x86ms, const char *bios_name,
+                         char *filename, int bios_size, bool isapc_ram_fw);
 void x86_isa_bios_init(MemoryRegion *isa_bios, MemoryRegion *isa_memory,
                        MemoryRegion *bios, bool read_only);
 void x86_bios_rom_init(X86MachineState *x86ms, const char *default_firmware,
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 11/32] kvm/i386: reload firmware for confidential guest reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (9 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 10/32] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 17:08   ` Paolo Bonzini
  2026-01-13 13:58   ` Bernhard Beschow
  2026-01-12 13:22 ` [PATCH v2 12/32] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
                   ` (20 subsequent siblings)
  31 siblings, 2 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kvm, qemu-devel

When IGVM is not being used by the confidential guest, the guest firmware has
to be reloaded explictly again into memory. This is because, the memory into
which the firmware was loaded before reset was encrypted and is thus lost
upon reset. When IGVM is used, it is expected that the IGVM will contain the
guest firmware and the execution of the IGVM directives will set up the guest
firmware memory.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/kvm.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4fedc621b8..46c4f9487b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -51,6 +51,8 @@
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 #include "qemu/memalign.h"
+#include "qemu/datadir.h"
+#include "hw/core/loader.h"
 #include "hw/i386/x86.h"
 #include "hw/i386/kvm/xen_evtchn.h"
 #include "hw/i386/pc.h"
@@ -3267,6 +3269,22 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
 
 static int xen_init_wrapper(MachineState *ms, KVMState *s);
 
+static void reload_bios_rom(X86MachineState *x86ms)
+{
+    int bios_size;
+    const char *bios_name;
+    char *filename;
+
+    bios_name = MACHINE(x86ms)->firmware ?: "bios.bin";
+    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+
+    bios_size = get_bios_size(x86ms, bios_name, filename);
+
+    void *ptr = memory_region_get_ram_ptr(&x86ms->bios);
+    load_image_size(filename, ptr, bios_size);
+    x86_firmware_configure(0x100000000ULL - bios_size, ptr, bios_size);
+}
+
 int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
 {
     Error *local_err = NULL;
@@ -3285,6 +3303,16 @@ int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
             error_report_err(local_err);
             return ret;
         }
+        if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
+            X86MachineState *x86ms = X86_MACHINE(ms);
+            /*
+             * If an IGVM file is specified then the firmware must be provided
+             * in the IGVM file.
+             */
+            if (!x86ms->igvm) {
+                reload_bios_rom(x86ms);
+            }
+        }
     }
 
     ret = kvm_vm_enable_exception_payload(s);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 12/32] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (10 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 11/32] kvm/i386: reload firmware for " Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 13/32] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kvm, qemu-devel

Confidential guests needs to generate a new KVM file descriptor upon virtual
machine reset. Existing VCPUs needs to be reattached to this new
KVM VM file descriptor. As a part of this, new VCPU file descriptors against
this new KVM VM file descriptor needs to be created and re-initialized.
Resources allocated against the old VCPU fds needs to be released. This change
makes this happen.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c    | 202 ++++++++++++++++++++++++++++++++++-------
 accel/kvm/trace-events |   1 +
 2 files changed, 168 insertions(+), 35 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 367968427b..2bd4dcd43b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -130,6 +130,12 @@ static NotifierWithReturnList register_vmfd_changed_notifiers =
 static NotifierWithReturnList register_vmfd_pre_change_notifiers =
     NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_pre_change_notifiers);
 
+static int kvm_rebind_vcpus(Error **errp);
+
+static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp);
+static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp);
+static int vcpu_unmap_regions(KVMState *s, CPUState *cpu);
+
 struct KVMResampleFd {
     int gsi;
     EventNotifier *resample_event;
@@ -423,6 +429,83 @@ err:
     return ret;
 }
 
+static int kvm_rebind_vcpus(Error **errp)
+{
+    CPUState *cpu;
+    unsigned long vcpu_id;
+    KVMState *s = kvm_state;
+    int kvm_fd, ret = 0;
+
+    CPU_FOREACH(cpu) {
+        vcpu_id = kvm_arch_vcpu_id(cpu);
+
+        if (cpu->kvm_fd) {
+            close(cpu->kvm_fd);
+        }
+
+        ret = kvm_arch_destroy_vcpu(cpu);
+        if (ret < 0) {
+            goto err;
+        }
+
+        if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+            s->coalesced_mmio_ring = NULL;
+        }
+
+        ret = vcpu_unmap_regions(s, cpu);
+        if (ret < 0) {
+            goto err;
+        }
+
+        ret = kvm_arch_pre_create_vcpu(cpu, errp);
+        if (ret < 0) {
+            goto err;
+        }
+
+        kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
+        if (kvm_fd < 0) {
+            error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu (%s)",
+                         vcpu_id, strerror(kvm_fd));
+            return kvm_fd;
+        }
+
+        cpu->kvm_fd = kvm_fd;
+
+        cpu->vcpu_dirty = false;
+        cpu->dirty_pages = 0;
+        cpu->throttle_us_per_full = 0;
+
+        ret = map_kvm_run(s, cpu, errp);
+        if (ret < 0) {
+            goto err;
+        }
+
+        if (s->kvm_dirty_ring_size) {
+            ret = map_kvm_dirty_gfns(s, cpu, errp);
+            if (ret < 0) {
+                goto err;
+            }
+        }
+
+        ret = kvm_arch_init_vcpu(cpu);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret,
+                             "kvm_init_vcpu: kvm_arch_init_vcpu failed (%lu)",
+                             vcpu_id);
+        }
+
+        close(cpu->kvm_vcpu_stats_fd);
+        cpu->kvm_vcpu_stats_fd = kvm_vcpu_ioctl(cpu, KVM_GET_STATS_FD, NULL);
+        kvm_init_cpu_signals(cpu);
+
+        kvm_cpu_synchronize_post_init(cpu);
+    }
+    trace_kvm_rebind_vcpus();
+
+ err:
+    return ret;
+}
+
 static void kvm_park_vcpu(CPUState *cpu)
 {
     struct KVMParkedVcpu *vcpu;
@@ -511,19 +594,11 @@ int kvm_create_and_park_vcpu(CPUState *cpu)
     return ret;
 }
 
-static int do_kvm_destroy_vcpu(CPUState *cpu)
+static int vcpu_unmap_regions(KVMState *s, CPUState *cpu)
 {
-    KVMState *s = kvm_state;
     int mmap_size;
     int ret = 0;
 
-    trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
-
-    ret = kvm_arch_destroy_vcpu(cpu);
-    if (ret < 0) {
-        goto err;
-    }
-
     mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
     if (mmap_size < 0) {
         ret = mmap_size;
@@ -551,39 +626,47 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
         cpu->kvm_dirty_gfns = NULL;
     }
 
-    kvm_park_vcpu(cpu);
-err:
+ err:
     return ret;
 }
 
-void kvm_destroy_vcpu(CPUState *cpu)
-{
-    if (do_kvm_destroy_vcpu(cpu) < 0) {
-        error_report("kvm_destroy_vcpu failed");
-        exit(EXIT_FAILURE);
-    }
-}
-
-int kvm_init_vcpu(CPUState *cpu, Error **errp)
+static int do_kvm_destroy_vcpu(CPUState *cpu)
 {
     KVMState *s = kvm_state;
-    int mmap_size;
-    int ret;
+    int ret = 0;
 
-    trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+    trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
-    ret = kvm_arch_pre_create_vcpu(cpu, errp);
+    ret = kvm_arch_destroy_vcpu(cpu);
     if (ret < 0) {
         goto err;
     }
 
-    ret = kvm_create_vcpu(cpu);
+    /* If I am the CPU that created coalesced_mmio_ring, then discard it */
+    if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+        s->coalesced_mmio_ring = NULL;
+    }
+
+    ret = vcpu_unmap_regions(s, cpu);
     if (ret < 0) {
-        error_setg_errno(errp, -ret,
-                         "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
-                         kvm_arch_vcpu_id(cpu));
         goto err;
     }
+    kvm_park_vcpu(cpu);
+err:
+    return ret;
+}
+
+void kvm_destroy_vcpu(CPUState *cpu)
+{
+    if (do_kvm_destroy_vcpu(cpu) < 0) {
+        error_report("kvm_destroy_vcpu failed");
+        exit(EXIT_FAILURE);
+    }
+}
+
+static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp)
+{
+    int mmap_size, ret = 0;
 
     mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
     if (mmap_size < 0) {
@@ -608,14 +691,53 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
             (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
     }
 
+ err:
+    return ret;
+}
+
+static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp)
+{
+    int ret = 0;
+    /* Use MAP_SHARED to share pages with the kernel */
+    cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_bytes,
+                               PROT_READ | PROT_WRITE, MAP_SHARED,
+                               cpu->kvm_fd,
+                               PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
+    if (cpu->kvm_dirty_gfns == MAP_FAILED) {
+        ret = -errno;
+    }
+
+    return ret;
+}
+
+int kvm_init_vcpu(CPUState *cpu, Error **errp)
+{
+    KVMState *s = kvm_state;
+    int ret;
+
+    trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+    ret = kvm_arch_pre_create_vcpu(cpu, errp);
+    if (ret < 0) {
+        goto err;
+    }
+
+    ret = kvm_create_vcpu(cpu);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
+                         kvm_arch_vcpu_id(cpu));
+        goto err;
+    }
+
+    ret = map_kvm_run(s, cpu, errp);
+    if (ret < 0) {
+        goto err;
+    }
+
     if (s->kvm_dirty_ring_size) {
-        /* Use MAP_SHARED to share pages with the kernel */
-        cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_bytes,
-                                   PROT_READ | PROT_WRITE, MAP_SHARED,
-                                   cpu->kvm_fd,
-                                   PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
-        if (cpu->kvm_dirty_gfns == MAP_FAILED) {
-            ret = -errno;
+        ret = map_kvm_dirty_gfns(s, cpu, errp);
+        if (ret < 0) {
             goto err;
         }
     }
@@ -2726,6 +2848,16 @@ static int kvm_reset_vmfd(MachineState *ms)
     }
     assert(!err);
 
+    /*
+     * rebind new vcpu fds with the new kvm fds
+     * These can only be called after kvm_arch_vmfd_change_ops()
+     */
+    ret = kvm_rebind_vcpus(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
     /* these can be only called after ram_block_rebind() */
     memory_listener_register(&kml->listener, &address_space_memory);
     memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e4beda0148..4a8921c632 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,6 +15,7 @@ kvm_park_vcpu(int cpu_index, unsigned long arch_cpu_id) "index: %d id: %lu"
 kvm_unpark_vcpu(unsigned long arch_cpu_id, const char *msg) "id: %lu %s"
 kvm_irqchip_commit_routes(void) ""
 kvm_reset_vmfd(void) ""
+kvm_rebind_vcpus(void) ""
 kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
 kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 13/32] i386/tdx: refactor TDX firmware memory initialization code into a new function
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (11 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 12/32] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 14/32] i386/tdx: finalize TDX guest state upon reset Ani Sinha
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kvm, qemu-devel

A new helper function is introduced that refactors all firmware memory
initialization code into a separate function. No functional change.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/tdx.c | 73 ++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0161985768..b595eabb6a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -295,14 +295,51 @@ static void tdx_post_init_vcpus(void)
     }
 }
 
-static void tdx_finalize_vm(Notifier *notifier, void *unused)
+static void tdx_init_fw_mem_region(void)
 {
     TdxFirmware *tdvf = &tdx_guest->tdvf;
     TdxFirmwareEntry *entry;
-    RAMBlock *ram_block;
     Error *local_err = NULL;
     int r;
 
+    for_each_tdx_fw_entry(tdvf, entry) {
+        struct kvm_tdx_init_mem_region region;
+        uint32_t flags;
+
+        region = (struct kvm_tdx_init_mem_region) {
+            .source_addr = (uintptr_t)entry->mem_ptr,
+            .gpa = entry->address,
+            .nr_pages = entry->size >> 12,
+        };
+
+        flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
+                KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+        do {
+            error_free(local_err);
+            local_err = NULL;
+            r = tdx_vcpu_ioctl(first_cpu, KVM_TDX_INIT_MEM_REGION, flags,
+                               &region, &local_err);
+        } while (r == -EAGAIN || r == -EINTR);
+        if (r < 0) {
+            error_report_err(local_err);
+            exit(1);
+        }
+
+        if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+            entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+            qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
+            entry->mem_ptr = NULL;
+        }
+    }
+}
+
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+    TdxFirmware *tdvf = &tdx_guest->tdvf;
+    TdxFirmwareEntry *entry;
+    RAMBlock *ram_block;
+
     tdx_init_ram_entries();
 
     for_each_tdx_fw_entry(tdvf, entry) {
@@ -339,37 +376,7 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
     tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
 
     tdx_post_init_vcpus();
-
-    for_each_tdx_fw_entry(tdvf, entry) {
-        struct kvm_tdx_init_mem_region region;
-        uint32_t flags;
-
-        region = (struct kvm_tdx_init_mem_region) {
-            .source_addr = (uintptr_t)entry->mem_ptr,
-            .gpa = entry->address,
-            .nr_pages = entry->size >> 12,
-        };
-
-        flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
-                KVM_TDX_MEASURE_MEMORY_REGION : 0;
-
-        do {
-            error_free(local_err);
-            local_err = NULL;
-            r = tdx_vcpu_ioctl(first_cpu, KVM_TDX_INIT_MEM_REGION, flags,
-                               &region, &local_err);
-        } while (r == -EAGAIN || r == -EINTR);
-        if (r < 0) {
-            error_report_err(local_err);
-            exit(1);
-        }
-
-        if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
-            entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
-            qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
-            entry->mem_ptr = NULL;
-        }
-    }
+    tdx_init_fw_mem_region();
 
     /*
      * TDVF image has been copied into private region above via
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 14/32] i386/tdx: finalize TDX guest state upon reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (12 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 13/32] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 15/32] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kvm, qemu-devel

When the confidential virtual machine KVM file descriptor changes due to the
guest reset, some TDX specific setup steps needs to be done again. This
includes finalizing the inital guest launch state again. This change
re-executes some parts of the TDX setup during the device reset phaze using a
resettable interface. This finalizes the guest launch state again and locks
it in. Machine done notifier which was previously used is no longer needed as
the same code is now executed as a part of VM reset.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/tdx.c        | 38 +++++++++++++++++++++++++++++++-----
 target/i386/kvm/tdx.h        |  1 +
 target/i386/kvm/trace-events |  3 +++
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index b595eabb6a..cba07785f7 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
 #include "crypto/hash.h"
 #include "system/kvm_int.h"
 #include "system/runstate.h"
+#include "system/reset.h"
 #include "system/system.h"
 #include "system/ramblock.h"
 #include "system/address-spaces.h"
@@ -38,6 +39,7 @@
 #include "kvm_i386.h"
 #include "tdx.h"
 #include "tdx-quote-generator.h"
+#include "trace.h"
 
 #include "standard-headers/asm-x86/kvm_para.h"
 
@@ -389,9 +391,19 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
     CONFIDENTIAL_GUEST_SUPPORT(tdx_guest)->ready = true;
 }
 
-static Notifier tdx_machine_done_notify = {
-    .notify = tdx_finalize_vm,
-};
+static void tdx_handle_reset(Object *obj, ResetType type)
+{
+    if (!runstate_is_running() && !phase_check(PHASE_MACHINE_READY)) {
+        return;
+    }
+
+    if (!kvm_enable_hypercall(BIT_ULL(KVM_HC_MAP_GPA_RANGE))) {
+        error_setg(&error_fatal, "KVM_HC_MAP_GPA_RANGE not enabled for guest");
+    }
+
+    tdx_finalize_vm(NULL, NULL);
+    trace_tdx_handle_reset();
+}
 
 /*
  * Some CPUID bits change from fixed1 to configurable bits when TDX module
@@ -738,8 +750,6 @@ static int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
      */
     kvm_readonly_mem_allowed = false;
 
-    qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
-
     tdx_guest = tdx;
     return 0;
 }
@@ -1505,6 +1515,7 @@ OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
                                    TDX_GUEST,
                                    X86_CONFIDENTIAL_GUEST,
                                    { TYPE_USER_CREATABLE },
+                                   { TYPE_RESETTABLE_INTERFACE },
                                    { NULL })
 
 static void tdx_guest_init(Object *obj)
@@ -1538,20 +1549,37 @@ static void tdx_guest_init(Object *obj)
 
     tdx->event_notify_vector = -1;
     tdx->event_notify_apicid = -1;
+    qemu_register_resettable(obj);
 }
 
 static void tdx_guest_finalize(Object *obj)
 {
 }
 
+static ResettableState *tdx_reset_state(Object *obj)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+    return &tdx->reset_state;
+}
+
 static void tdx_guest_class_init(ObjectClass *oc, const void *data)
 {
     ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
     X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
+    ResettableClass *rc = RESETTABLE_CLASS(oc);
 
     klass->kvm_init = tdx_kvm_init;
     x86_klass->kvm_type = tdx_kvm_type;
     x86_klass->cpu_instance_init = tdx_cpu_instance_init;
     x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
     x86_klass->check_features = tdx_check_features;
+
+    /*
+     * the exit phase makes sure sev handles reset after all legacy resets
+     * have taken place (in the hold phase) and IGVM has also properly
+     * set up the boot state.
+     */
+    rc->phases.exit = tdx_handle_reset;
+    rc->get_state = tdx_reset_state;
+
 }
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 1c38faf983..264fbe530c 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -70,6 +70,7 @@ typedef struct TdxGuest {
 
     uint32_t event_notify_vector;
     uint32_t event_notify_apicid;
+    ResettableState reset_state;
 } TdxGuest;
 
 #ifdef CONFIG_TDX
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 1f4786f687..47473001d8 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -14,3 +14,6 @@ kvm_xen_soft_reset(void) ""
 kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
 kvm_xen_set_vcpu_attr(int cpu, int type, uint64_t gpa) "vcpu attr cpu %d type %d gpa 0x%" PRIx64
 kvm_xen_set_vcpu_callback(int cpu, int vector) "callback vcpu %d vector %d"
+
+# tdx.c
+tdx_handle_reset(void) ""
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 15/32] i386/tdx: add a pre-vmfd change notifier to reset tdx state
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (13 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 14/32] i386/tdx: finalize TDX guest state upon reset Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 16/32] i386/sev: add migration blockers only once Ani Sinha
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti; +Cc: Ani Sinha, kvm, qemu-devel

During reset, when the VM file descriptor is changed, the TDX state needs to be
re-initialized. A pre-VMFD notifier callback is implemented to reset the old
state and free memory before the new state is initialized post VM-fd change.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/tdx.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index cba07785f7..314d316b7c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -405,6 +405,32 @@ static void tdx_handle_reset(Object *obj, ResetType type)
     trace_tdx_handle_reset();
 }
 
+/* TDX guest reset will require us to reinitialize some of tdx guest state. */
+static int set_tdx_vm_uninitialized(NotifierWithReturn *notifier,
+                                    void *data, Error** errp)
+{
+    TdxFirmware *fw = &tdx_guest->tdvf;
+
+    if (tdx_guest->initialized) {
+        tdx_guest->initialized = false;
+    }
+
+    g_free(tdx_guest->ram_entries);
+
+    /*
+     * the firmware entries will be parsed again, see
+     * x86_firmware_configure() -> tdx_parse_tdvf()
+     */
+    fw->entries = 0;
+    g_free(fw->entries);
+
+    return 0;
+}
+
+static NotifierWithReturn tdx_vmfd_pre_change_notifier = {
+    .notify = set_tdx_vm_uninitialized,
+};
+
 /*
  * Some CPUID bits change from fixed1 to configurable bits when TDX module
  * supports TDX_FEATURES0.VE_REDUCTION. e.g., MCA/MCE/MTRR/CORE_CAPABILITY.
@@ -1549,6 +1575,7 @@ static void tdx_guest_init(Object *obj)
 
     tdx->event_notify_vector = -1;
     tdx->event_notify_apicid = -1;
+    kvm_vmfd_add_pre_change_notifier(&tdx_vmfd_pre_change_notifier);
     qemu_register_resettable(obj);
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 16/32] i386/sev: add migration blockers only once
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (14 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 15/32] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 17:16   ` Paolo Bonzini
  2026-01-12 13:22 ` [PATCH v2 17/32] i386/sev: add notifiers " Ani Sinha
                   ` (15 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti; +Cc: Ani Sinha, kvm, qemu-devel

sev_launch_finish() and sev_snp_launch_finish() could be called multiple times
if the confidential guest is capable of being reset/rebooted. The migration
blockers should not be added multiple times, once per invocation. This change
makes sure that the migration blockers are added only one time by adding the
migration blockers from sev instance init code which is called only once.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/sev.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index fb5a3b5d77..c260c162b1 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1421,11 +1421,6 @@ sev_launch_finish(SevCommonState *sev_common)
     }
 
     sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
-
-    /* add migration blocker */
-    error_setg(&sev_mig_blocker,
-               "SEV: Migration is not implemented");
-    migrate_add_blocker(&sev_mig_blocker, &error_fatal);
 }
 
 static int snp_launch_update_data(uint64_t gpa, void *hva, size_t len,
@@ -1608,7 +1603,6 @@ static void
 sev_snp_launch_finish(SevCommonState *sev_common)
 {
     int ret, error;
-    Error *local_err = NULL;
     OvmfSevMetadata *metadata;
     SevLaunchUpdateData *data;
     SevSnpGuestState *sev_snp = SEV_SNP_GUEST(sev_common);
@@ -1655,15 +1649,6 @@ sev_snp_launch_finish(SevCommonState *sev_common)
 
     kvm_mark_guest_state_protected();
     sev_set_guest_state(sev_common, SEV_STATE_RUNNING);
-
-    /* add migration blocker */
-    error_setg(&sev_mig_blocker,
-               "SEV-SNP: Migration is not implemented");
-    ret = migrate_add_blocker(&sev_mig_blocker, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-        exit(1);
-    }
 }
 
 
@@ -2764,6 +2749,11 @@ sev_common_instance_init(Object *obj)
     cgs->set_guest_policy = cgs_set_guest_policy;
 
     QTAILQ_INIT(&sev_common->launch_vmsa);
+
+    /* add migration blocker */
+    error_setg(&sev_mig_blocker,
+               "SEV: Migration is not implemented");
+    migrate_add_blocker(&sev_mig_blocker, &error_fatal);
 }
 
 /* sev guest info common to sev/sev-es/sev-snp */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 17/32] i386/sev: add notifiers only once
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (15 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 16/32] i386/sev: add migration blockers only once Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 18/32] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Zhao Liu, Marcelo Tosatti; +Cc: Ani Sinha, kvm, qemu-devel

The vm state change notifier needs to be added only once and not every time
upon sev state initialization. This is important when the SEV guest can be
reset and the initialization needs to happen once per every reset. Therefore,
move addition of vm state change notifier to sev_common_instance_init() as its
called only once.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/sev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index c260c162b1..cb2213a32a 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1917,8 +1917,6 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return -1;
     }
 
-    qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
-
     cgs->ready = true;
 
     return 0;
@@ -2754,6 +2752,8 @@ sev_common_instance_init(Object *obj)
     error_setg(&sev_mig_blocker,
                "SEV: Migration is not implemented");
     migrate_add_blocker(&sev_mig_blocker, &error_fatal);
+
+    qemu_add_vm_change_state_handler(sev_vm_state_change, sev_common);
 }
 
 /* sev guest info common to sev/sev-es/sev-snp */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 18/32] i386/sev: free existing launch update data and kernel hashes data on init
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (16 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 17/32] i386/sev: add notifiers " Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 19/32] i386/sev: add support for confidential guest reset Ani Sinha
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu; +Cc: Ani Sinha, kvm, qemu-devel

If there is existing launch update data and kernel hashes data, they need to be
freed when initialization code is executed. This is important for resettable
confidential guests where the initialization happens once every reset.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/sev.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index cb2213a32a..d7425dde96 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1768,6 +1768,7 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     uint32_t ebx;
     uint32_t host_cbitpos;
     struct sev_user_data_status status = {};
+    SevLaunchUpdateData *data, *next_elm;
     SevCommonState *sev_common = SEV_COMMON(cgs);
     SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(cgs);
     X86ConfidentialGuestClass *x86_klass =
@@ -1775,6 +1776,11 @@ static int sev_common_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 
     sev_common->state = SEV_STATE_UNINIT;
 
+    /* free existing launch update data if any */
+    QTAILQ_FOREACH_SAFE(data, &launch_update, next, next_elm) {
+        g_free(data);
+    }
+
     host_cpuid(0x8000001F, 0, NULL, &ebx, NULL, NULL);
     host_cbitpos = ebx & 0x3f;
 
@@ -1961,6 +1967,8 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
     X86MachineState *x86ms = X86_MACHINE(ms);
+    SevCommonState *sev_common = SEV_COMMON(cgs);
+    SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(sev_common);
 
     if (x86ms->smm == ON_OFF_AUTO_AUTO) {
         x86ms->smm = ON_OFF_AUTO_OFF;
@@ -1969,6 +1977,10 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return -1;
     }
 
+    /* free existing kernel hashes data if any */
+    g_free(sev_snp_guest->kernel_hashes_data);
+    sev_snp_guest->kernel_hashes_data = NULL;
+
     return 0;
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 19/32] i386/sev: add support for confidential guest reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (17 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 18/32] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 17:12   ` Paolo Bonzini
  2026-01-12 13:22 ` [PATCH v2 20/32] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
                   ` (12 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu; +Cc: Ani Sinha, kvm, qemu-devel

When the KVM VM file descriptor changes as a part of the confidential guest
reset mechanism, it necessary to create a new confidential guest context and
re-encrypt the VM memeory. This happens for SEV-ES and SEV-SNP virtual machines
as a part of SEV_LAUNCH_FINISH, SEV_SNP_LAUNCH_FINISH operations.

A new resettable interface for SEV module has been added. A new reset callback
for the reset 'exit' state has been implemented to perform the above operations
when the VM file descriptor has changed during VM reset.

Tracepoints has been added also for tracing purpose.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/sev.c        | 52 ++++++++++++++++++++++++++++++++++++++++
 target/i386/trace-events |  1 +
 2 files changed, 53 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index d7425dde96..d45356843c 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -30,8 +30,10 @@
 #include "system/kvm.h"
 #include "kvm/kvm_i386.h"
 #include "sev.h"
+#include "system/cpus.h"
 #include "system/system.h"
 #include "system/runstate.h"
+#include "system/reset.h"
 #include "trace.h"
 #include "migration/blocker.h"
 #include "qom/object.h"
@@ -85,6 +87,10 @@ typedef struct QEMU_PACKED PaddedSevHashTable {
     uint8_t padding[ROUND_UP(sizeof(SevHashTable), 16) - sizeof(SevHashTable)];
 } PaddedSevHashTable;
 
+static void sev_handle_reset(Object *obj, ResetType type);
+
+SevKernelLoaderContext sev_load_ctx = {};
+
 QEMU_BUILD_BUG_ON(sizeof(PaddedSevHashTable) % 16 != 0);
 
 #define SEV_INFO_BLOCK_GUID     "00f771de-1a7e-4fcb-890e-68c77e2fb44e"
@@ -128,6 +134,7 @@ struct SevCommonState {
     uint8_t build_id;
     int sev_fd;
     SevState state;
+    ResettableState reset_state;
 
     QTAILQ_HEAD(, SevLaunchVmsa) launch_vmsa;
 };
@@ -1984,6 +1991,38 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     return 0;
 }
 
+/*
+ * handle sev vm reset
+ */
+static void sev_handle_reset(Object *obj, ResetType type)
+{
+    SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
+    SevCommonStateClass *klass = SEV_COMMON_GET_CLASS(sev_common);
+
+    if (!sev_common) {
+        return;
+    }
+
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    sev_add_kernel_loader_hashes(&sev_load_ctx, &error_fatal);
+    if (!sev_check_state(sev_common, SEV_STATE_RUNNING)) {
+        /* this calls sev_snp_launch_finish() etc */
+        klass->launch_finish(sev_common);
+    }
+
+    trace_sev_handle_reset();
+    return;
+}
+
+static ResettableState *sev_reset_state(Object *obj)
+{
+    SevCommonState *sev_common = SEV_COMMON(obj);
+    return &sev_common->reset_state;
+}
+
 int
 sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp)
 {
@@ -2462,6 +2501,8 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
         return false;
     }
 
+    /* save the context here so that it can be re-used when vm is reset */
+    memcpy(&sev_load_ctx, ctx, sizeof(*ctx));
     return klass->build_kernel_loader_hashes(sev_common, area, ctx, errp);
 }
 
@@ -2722,8 +2763,16 @@ static void
 sev_common_class_init(ObjectClass *oc, const void *data)
 {
     ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
+    ResettableClass *rc = RESETTABLE_CLASS(oc);
 
     klass->kvm_init = sev_common_kvm_init;
+    /*
+     * the exit phase makes sure sev handles reset after all legacy resets
+     * have taken place (in the hold phase) and IGVM has also properly
+     * set up the boot state.
+     */
+    rc->phases.exit = sev_handle_reset;
+    rc->get_state = sev_reset_state;
 
     object_class_property_add_str(oc, "sev-device",
                                   sev_common_get_sev_device,
@@ -2758,6 +2807,8 @@ sev_common_instance_init(Object *obj)
     cgs->get_mem_map_entry = cgs_get_mem_map_entry;
     cgs->set_guest_policy = cgs_set_guest_policy;
 
+    qemu_register_resettable(OBJECT(sev_common));
+
     QTAILQ_INIT(&sev_common->launch_vmsa);
 
     /* add migration blocker */
@@ -2779,6 +2830,7 @@ static const TypeInfo sev_common_info = {
     .abstract = true,
     .interfaces = (const InterfaceInfo[]) {
         { TYPE_USER_CREATABLE },
+        { TYPE_RESETTABLE_INTERFACE },
         { }
     }
 };
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 51301673f0..b320f655ee 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -14,3 +14,4 @@ kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce %s data
 kvm_sev_snp_launch_start(uint64_t policy, char *gosvw) "policy 0x%" PRIx64 " gosvw %s"
 kvm_sev_snp_launch_update(uint64_t src, uint64_t gpa, uint64_t len, const char *type) "src 0x%" PRIx64 " gpa 0x%" PRIx64 " len 0x%" PRIx64 " (%s page)"
 kvm_sev_snp_launch_finish(char *id_block, char *id_auth, char *host_data) "id_block %s id_auth %s host_data %s"
+sev_handle_reset(void) ""
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 20/32] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (18 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 19/32] i386/sev: add support for confidential guest reset Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 21/32] kvm/i8254: refactor pit initialization into a helper Ani Sinha
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Alex Williamson, Cédric Le Goater; +Cc: Ani Sinha, qemu-devel

Normally the vfio pseudo device file descriptor lives for the life of the VM.
However, when the kvm VM file descriptor changes, a new file descriptor
for the pseudo device needs to be generated against the new kvm VM descriptor.
Other existing vfio descriptors needs to be reattached to the new pseudo device
descriptor. This change performs the above steps.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/vfio/helpers.c | 81 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index c595f860ce..ab13cddfc2 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -110,12 +110,66 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
 #ifdef CONFIG_KVM
 /*
  * We have a single VFIO pseudo device per KVM VM.  Once created it lives
- * for the life of the VM.  Closing the file descriptor only drops our
- * reference to it and the device's reference to kvm.  Therefore once
- * initialized, this file descriptor is only released on QEMU exit and
+ * for the life of the VM except when the vm file descriptor changes for
+ * confidential virtual machines. In that case, the old file descriptor is
+ * closed and a new file descriptor is recreated.  Closing the file descriptor
+ * only drops our reference to it and the device's reference to kvm.
+ * Therefore once initialized, this file descriptor is normally only released
+ * on QEMU exit (except for confidential VMs as stated above) and
  * we'll re-use it should another vfio device be attached before then.
  */
 int vfio_kvm_device_fd = -1;
+
+typedef struct KVMVfioFileFd {
+    int fd;
+    QLIST_ENTRY(KVMVfioFileFd) node;
+} KVMVfioFileFd;
+
+static QLIST_HEAD(, KVMVfioFileFd) kvm_vfio_file_fds =
+    QLIST_HEAD_INITIALIZER(kvm_vfio_file_fds);
+
+static int kvm_vfio_filefd_rebind(NotifierWithReturn *notifier, void *data,
+                                  Error **errp);
+static struct NotifierWithReturn kvm_vfio_vmfd_change_notifier = {
+    .notify = kvm_vfio_filefd_rebind,
+};
+
+static int kvm_vfio_filefd_rebind(NotifierWithReturn *notifier, void *data,
+                                  Error **errp)
+{
+    KVMVfioFileFd *file_fd;
+    int ret = 0;
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_FILE,
+        .attr = KVM_DEV_VFIO_FILE_ADD,
+    };
+    struct kvm_create_device cd = {
+        .type = KVM_DEV_TYPE_VFIO,
+    };
+
+    if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
+        error_setg_errno(errp, errno, "Failed to create KVM VFIO device");
+        return -errno;
+    }
+
+    if (vfio_kvm_device_fd) {
+        close(vfio_kvm_device_fd);
+    }
+
+    vfio_kvm_device_fd = cd.fd;
+
+    QLIST_FOREACH(file_fd, &kvm_vfio_file_fds, node) {
+        attr.addr = (uint64_t)(unsigned long)&file_fd->fd;
+        if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+            error_setg_errno(errp, errno,
+                             "Failed to add fd %d to KVM VFIO device",
+                             file_fd->fd);
+            ret = -errno;
+        }
+    }
+    return ret;
+}
+
 #endif
 
 void vfio_kvm_device_close(void)
@@ -137,6 +191,7 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
         .attr = KVM_DEV_VFIO_FILE_ADD,
         .addr = (uint64_t)(unsigned long)&fd,
     };
+    KVMVfioFileFd *file_fd;
 
     if (!kvm_enabled()) {
         return 0;
@@ -153,6 +208,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
         }
 
         vfio_kvm_device_fd = cd.fd;
+        /*
+         * If the vm file descriptor changes, add a notifier so that we can
+         * re-create the vfio_kvm_device_fd.
+         */
+        kvm_vmfd_add_change_notifier(&kvm_vfio_vmfd_change_notifier);
     }
 
     if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
@@ -160,6 +220,11 @@ int vfio_kvm_device_add_fd(int fd, Error **errp)
                          fd);
         return -errno;
     }
+
+    file_fd = g_malloc0(sizeof(*file_fd));
+    file_fd->fd = fd;
+    QLIST_INSERT_HEAD(&kvm_vfio_file_fds, file_fd, node);
+
 #endif
     return 0;
 }
@@ -172,6 +237,7 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
         .attr = KVM_DEV_VFIO_FILE_DEL,
         .addr = (uint64_t)(unsigned long)&fd,
     };
+    KVMVfioFileFd *file_fd;
 
     if (vfio_kvm_device_fd < 0) {
         error_setg(errp, "KVM VFIO device isn't created yet");
@@ -183,6 +249,15 @@ int vfio_kvm_device_del_fd(int fd, Error **errp)
                          "Failed to remove fd %d from KVM VFIO device", fd);
         return -errno;
     }
+
+    QLIST_FOREACH(file_fd, &kvm_vfio_file_fds, node) {
+        if (file_fd->fd == fd) {
+            QLIST_REMOVE(file_fd, node);
+            g_free(file_fd);
+            break;
+        }
+    }
+
 #endif
     return 0;
 }
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 21/32] kvm/i8254: refactor pit initialization into a helper
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (19 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 20/32] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 22/32] kvm/i8254: add support for confidential guest reset Ani Sinha
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Ani Sinha, qemu-devel

The initialization code will be used again by VM file descriptor change
notifier callback in a subsequent change. So refactor common code into a new
helper function.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/kvm/i8254.c | 68 +++++++++++++++++++++++++--------------------
 1 file changed, 38 insertions(+), 30 deletions(-)

diff --git a/hw/i386/kvm/i8254.c b/hw/i386/kvm/i8254.c
index 81e742f866..255047458a 100644
--- a/hw/i386/kvm/i8254.c
+++ b/hw/i386/kvm/i8254.c
@@ -60,6 +60,43 @@ struct KVMPITClass {
     DeviceRealize parent_realize;
 };
 
+static void do_pit_initialize(KVMPITState *s, Error **errp)
+{
+    struct kvm_pit_config config = {
+        .flags = 0,
+    };
+    int ret;
+
+    ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_PIT2, &config);
+    if (ret < 0) {
+        error_setg(errp, "Create kernel PIC irqchip failed: %s",
+                   strerror(-ret));
+        return;
+    }
+    switch (s->lost_tick_policy) {
+    case LOST_TICK_POLICY_DELAY:
+        break; /* enabled by default */
+    case LOST_TICK_POLICY_DISCARD:
+        if (kvm_check_extension(kvm_state, KVM_CAP_REINJECT_CONTROL)) {
+            struct kvm_reinject_control control = { .pit_reinject = 0 };
+
+            ret = kvm_vm_ioctl(kvm_state, KVM_REINJECT_CONTROL, &control);
+            if (ret < 0) {
+                error_setg(errp,
+                           "Can't disable in-kernel PIT reinjection: %s",
+                           strerror(-ret));
+                return;
+            }
+        }
+        break;
+    default:
+        error_setg(errp, "Lost tick policy not supported.");
+        return;
+    }
+
+    return;
+}
+
 static void kvm_pit_update_clock_offset(KVMPITState *s)
 {
     int64_t offset, clock_offset;
@@ -241,42 +278,13 @@ static void kvm_pit_realizefn(DeviceState *dev, Error **errp)
     PITCommonState *pit = PIT_COMMON(dev);
     KVMPITClass *kpc = KVM_PIT_GET_CLASS(dev);
     KVMPITState *s = KVM_PIT(pit);
-    struct kvm_pit_config config = {
-        .flags = 0,
-    };
-    int ret;
 
     if (!kvm_check_extension(kvm_state, KVM_CAP_PIT_STATE2) ||
         !kvm_check_extension(kvm_state, KVM_CAP_PIT2)) {
         error_setg(errp, "In-kernel PIT not available");
     }
 
-    ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_PIT2, &config);
-    if (ret < 0) {
-        error_setg(errp, "Create kernel PIC irqchip failed: %s",
-                   strerror(-ret));
-        return;
-    }
-    switch (s->lost_tick_policy) {
-    case LOST_TICK_POLICY_DELAY:
-        break; /* enabled by default */
-    case LOST_TICK_POLICY_DISCARD:
-        if (kvm_check_extension(kvm_state, KVM_CAP_REINJECT_CONTROL)) {
-            struct kvm_reinject_control control = { .pit_reinject = 0 };
-
-            ret = kvm_vm_ioctl(kvm_state, KVM_REINJECT_CONTROL, &control);
-            if (ret < 0) {
-                error_setg(errp,
-                           "Can't disable in-kernel PIT reinjection: %s",
-                           strerror(-ret));
-                return;
-            }
-        }
-        break;
-    default:
-        error_setg(errp, "Lost tick policy not supported.");
-        return;
-    }
+    do_pit_initialize(s, errp);
 
     memory_region_init_io(&pit->ioports, OBJECT(dev), NULL, NULL, "kvm-pit", 4);
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 22/32] kvm/i8254: add support for confidential guest reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (20 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 21/32] kvm/i8254: refactor pit initialization into a helper Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 23/32] hw/hyperv/vmbus: " Ani Sinha
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost
  Cc: Ani Sinha, qemu-devel

A confidential guest reset involves closing the old virtual machine KVM file
descriptor and opening a new one. Since its a new KVM fd, PIT needs to be
reinitialized again. This is done with the help of a notifier which is invoked
upon KVM vm file desciptor change during confidential guest reset process.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/kvm/i8254.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/i386/kvm/i8254.c b/hw/i386/kvm/i8254.c
index 255047458a..4d212fad1b 100644
--- a/hw/i386/kvm/i8254.c
+++ b/hw/i386/kvm/i8254.c
@@ -52,6 +52,8 @@ struct KVMPITState {
     LostTickPolicy lost_tick_policy;
     bool vm_stopped;
     int64_t kernel_clock_offset;
+
+    NotifierWithReturn kvmpit_vmfd_change_notifier;
 };
 
 struct KVMPITClass {
@@ -203,6 +205,16 @@ static void kvm_pit_put(PITCommonState *pit)
     }
 }
 
+static int kvmpit_post_vmfd_change(NotifierWithReturn *notifier,
+                                   void *data, Error** errp)
+{
+    KVMPITState *s = container_of(notifier, KVMPITState,
+                                  kvmpit_vmfd_change_notifier);
+
+    do_pit_initialize(s, errp);
+    return 0;
+}
+
 static void kvm_pit_set_gate(PITCommonState *s, PITChannelState *sc, int val)
 {
     kvm_pit_get(s);
@@ -292,6 +304,9 @@ static void kvm_pit_realizefn(DeviceState *dev, Error **errp)
 
     qemu_add_vm_change_state_handler(kvm_pit_vm_state_change, s);
 
+    s->kvmpit_vmfd_change_notifier.notify = kvmpit_post_vmfd_change;
+    kvm_vmfd_add_change_notifier(&s->kvmpit_vmfd_change_notifier);
+
     kpc->parent_realize(dev, errp);
 }
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 23/32] hw/hyperv/vmbus: add support for confidential guest reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (21 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 22/32] kvm/i8254: add support for confidential guest reset Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-14 13:38   ` Maciej S. Szmigiero
  2026-01-12 13:22 ` [PATCH v2 24/32] accel/kvm: add a per-confidential class callback to unlock guest state Ani Sinha
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Maciej S. Szmigiero; +Cc: Ani Sinha, qemu-devel

On confidential guests when the KVM virtual machine file descriptor changes as
a part of the reset process, event file descriptors needs to be reassociated
with the new KVM VM file descriptor. This is achieved with the help of a
callback handler that gets called when KVM VM file descriptor changes during
the confidential guest reset process.

This patch is untested on confidential guests and only exists for completeness.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/hyperv/vmbus.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/hw/hyperv/vmbus.c b/hw/hyperv/vmbus.c
index c5bab5d245..947fb7f4f8 100644
--- a/hw/hyperv/vmbus.c
+++ b/hw/hyperv/vmbus.c
@@ -20,6 +20,7 @@
 #include "hw/hyperv/vmbus-bridge.h"
 #include "hw/core/sysbus.h"
 #include "exec/cpu-common.h"
+#include "system/kvm.h"
 #include "exec/target_page.h"
 #include "trace.h"
 
@@ -248,6 +249,12 @@ struct VMBus {
      * interrupt page
      */
     EventNotifier notifier;
+
+    /*
+     * Notifier to inform when vmfd is changed as a part of confidential guest
+     * reset mechanism.
+     */
+    NotifierWithReturn vmbus_vmfd_change_notifier;
 };
 
 static bool gpadl_full(VMBusGpadl *gpadl)
@@ -2347,6 +2354,26 @@ static void vmbus_dev_unrealize(DeviceState *dev)
     free_channels(vdev);
 }
 
+/*
+ * If the KVM fd changes because of VM reset in confidential guests,
+ * reassociate event fd with the new KVM fd.
+ */
+static int vmbus_handle_vmfd_change(NotifierWithReturn *notifier,
+                                    void *data, Error** errp)
+{
+    VMBus *vmbus = container_of(notifier, VMBus,
+                                vmbus_vmfd_change_notifier);
+    int ret = 0;
+    ret = hyperv_set_event_flag_handler(VMBUS_EVENT_CONNECTION_ID,
+                                            &vmbus->notifier);
+    /* if we are only using userland event handler, it may already exist */
+    if (ret != 0 && ret != -EEXIST) {
+        error_setg(errp, "hyperv set event handler failed with %d", ret);
+    }
+
+    return ret;
+}
+
 static const Property vmbus_dev_props[] = {
     DEFINE_PROP_UUID("instanceid", VMBusDevice, instanceid),
 };
@@ -2429,6 +2456,9 @@ static void vmbus_realize(BusState *bus, Error **errp)
         goto clear_event_notifier;
     }
 
+    vmbus->vmbus_vmfd_change_notifier.notify = vmbus_handle_vmfd_change;
+    kvm_vmfd_add_change_notifier(&vmbus->vmbus_vmfd_change_notifier);
+
     return;
 
 clear_event_notifier:
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 24/32] accel/kvm: add a per-confidential class callback to unlock guest state
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (22 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 23/32] hw/hyperv/vmbus: " Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 17:11   ` Paolo Bonzini
  2026-01-12 13:22 ` [PATCH v2 25/32] kvm/xen-emu: re-initialize capabilities during confidential guest reset Ani Sinha
                   ` (7 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Zhao Liu; +Cc: Ani Sinha, qemu-devel, kvm

As a part of the confidential guest reset process, the existing encrypted guest
state must be made mutable since it would be discarded after reset. A new
encrypted and locked guest state must be established after the reset. To this
end, a new callback per confidential guest support class (eg, tdx or sev-snp)
is added that will indicate whether its possible to rebuild guest state:

bool (*can_rebuild_guest_state)(ConfidentialGuestSupport *cgs)

This api returns true if rebuilding guest state is possible,
false otherwise. A KVM based confidential guest reset is only possible when
the existing state is locked but its possible to rebuild guest state.
Otherwise, the guest is not resettable.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 include/system/confidential-guest-support.h | 27 +++++++++++++++++++++
 system/runstate.c                           | 11 +++++++--
 target/i386/kvm/tdx.c                       |  6 +++++
 target/i386/sev.c                           |  9 +++++++
 4 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/include/system/confidential-guest-support.h b/include/system/confidential-guest-support.h
index 0cc8b26e64..3c37227263 100644
--- a/include/system/confidential-guest-support.h
+++ b/include/system/confidential-guest-support.h
@@ -152,6 +152,11 @@ typedef struct ConfidentialGuestSupportClass {
      */
     int (*get_mem_map_entry)(int index, ConfidentialGuestMemoryMapEntry *entry,
                              Error **errp);
+
+    /*
+     * is it possible to rebuild the guest state?
+     */
+    bool (*can_rebuild_guest_state)(ConfidentialGuestSupport *cgs);
 } ConfidentialGuestSupportClass;
 
 static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
@@ -167,6 +172,28 @@ static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
     return 0;
 }
 
+static inline bool
+confidential_guest_can_rebuild_state(ConfidentialGuestSupport *cgs)
+{
+    ConfidentialGuestSupportClass *klass;
+
+    if (!cgs) {
+        /* non-confidential guests */
+        return true;
+    }
+
+    klass = CONFIDENTIAL_GUEST_SUPPORT_GET_CLASS(cgs);
+    if (klass->can_rebuild_guest_state) {
+        return klass->can_rebuild_guest_state(cgs);
+    }
+
+    /*
+     * by default, we should not be able to unprotect the
+     * confidential guest state
+     */
+    return false;
+}
+
 static inline int confidential_guest_kvm_reset(ConfidentialGuestSupport *cgs,
                                                Error **errp)
 {
diff --git a/system/runstate.c b/system/runstate.c
index b0ce0410fa..710f5882d9 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -58,6 +58,7 @@
 #include "system/reset.h"
 #include "system/runstate.h"
 #include "system/runstate-action.h"
+#include "system/confidential-guest-support.h"
 #include "system/system.h"
 #include "system/tpm.h"
 #include "trace.h"
@@ -564,7 +565,12 @@ void qemu_system_reset(ShutdownCause reason)
     if (cpus_are_resettable()) {
         cpu_synchronize_all_post_reset();
     } else {
-        assert(runstate_check(RUN_STATE_PRELAUNCH));
+        /*
+         * for confidential guests, cpus are not resettable but their
+         * state can be rebuilt under some conditions.
+         */
+        assert(runstate_check(RUN_STATE_PRELAUNCH) ||
+               (current_machine->cgs && runstate_is_running()));
     }
 
     vm_set_suspended(false);
@@ -713,7 +719,8 @@ void qemu_system_reset_request(ShutdownCause reason)
     if (reboot_action == REBOOT_ACTION_SHUTDOWN &&
         reason != SHUTDOWN_CAUSE_SUBSYSTEM_RESET) {
         shutdown_requested = reason;
-    } else if (!cpus_are_resettable()) {
+    } else if (!cpus_are_resettable() &&
+               !confidential_guest_can_rebuild_state(current_machine->cgs)) {
         error_report("cpus are not resettable, terminating");
         shutdown_requested = reason;
     } else {
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 314d316b7c..a89b14d401 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -1589,6 +1589,11 @@ static ResettableState *tdx_reset_state(Object *obj)
     return &tdx->reset_state;
 }
 
+static bool tdx_can_rebuild_guest_state(ConfidentialGuestSupport *cgs)
+{
+    return true;
+}
+
 static void tdx_guest_class_init(ObjectClass *oc, const void *data)
 {
     ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
@@ -1596,6 +1601,7 @@ static void tdx_guest_class_init(ObjectClass *oc, const void *data)
     ResettableClass *rc = RESETTABLE_CLASS(oc);
 
     klass->kvm_init = tdx_kvm_init;
+    klass->can_rebuild_guest_state = tdx_can_rebuild_guest_state;
     x86_klass->kvm_type = tdx_kvm_type;
     x86_klass->cpu_instance_init = tdx_cpu_instance_init;
     x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index d45356843c..c52027c935 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -2632,6 +2632,14 @@ static int cgs_set_guest_state(hwaddr gpa, uint8_t *ptr, uint64_t len,
     return -1;
 }
 
+static bool sev_can_rebuild_guest_state(ConfidentialGuestSupport *cgs)
+{
+    if (!sev_snp_enabled() && !sev_es_enabled()) {
+        return false;
+    }
+    return true;
+}
+
 static int cgs_get_mem_map_entry(int index,
                                  ConfidentialGuestMemoryMapEntry *entry,
                                  Error **errp)
@@ -2806,6 +2814,7 @@ sev_common_instance_init(Object *obj)
     cgs->set_guest_state = cgs_set_guest_state;
     cgs->get_mem_map_entry = cgs_get_mem_map_entry;
     cgs->set_guest_policy = cgs_set_guest_policy;
+    cgs->can_rebuild_guest_state = sev_can_rebuild_guest_state;
 
     qemu_register_resettable(OBJECT(sev_common));
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 25/32] kvm/xen-emu: re-initialize capabilities during confidential guest reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (23 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 24/32] accel/kvm: add a per-confidential class callback to unlock guest state Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 17:19   ` Paolo Bonzini
  2026-01-12 13:22 ` [PATCH v2 26/32] kvm/xen_evtchn: add support for " Ani Sinha
                   ` (6 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: David Woodhouse, Paul Durrant, Paolo Bonzini, Marcelo Tosatti
  Cc: Ani Sinha, kvm, qemu-devel

On confidential guests KVM virtual machine file descriptor changes as a
part of the guest reset process. Xen capabilities needs to be re-initialized in
KVM against the new file descriptor.

This patch is untested on confidential guests and exists only for completeness.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 target/i386/kvm/xen-emu.c | 45 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 52de019834..4f4cde7c58 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -44,9 +44,12 @@
 
 #include "xen-compat.h"
 
+NotifierWithReturn xen_vmfd_change_notifier;
+static bool hyperv_enabled;
 static void xen_vcpu_singleshot_timer_event(void *opaque);
 static void xen_vcpu_periodic_timer_event(void *opaque);
 static int vcpuop_stop_singleshot_timer(CPUState *cs);
+static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr);
 
 #ifdef TARGET_X86_64
 #define hypercall_compat32(longmode) (!(longmode))
@@ -54,6 +57,25 @@ static int vcpuop_stop_singleshot_timer(CPUState *cs);
 #define hypercall_compat32(longmode) (false)
 #endif
 
+static int xen_handle_vmfd_change(NotifierWithReturn *n,
+                                  void *data, Error** errp)
+{
+    int ret;
+
+    ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (hyperv_enabled) {
+        ret = do_initialize_xen_caps(kvm_state, XEN_HYPERCALL_MSR_HYPERV);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+    return 0;
+}
+
 static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
                            size_t *len, bool is_write)
 {
@@ -111,15 +133,16 @@ static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
     return kvm_gva_rw(cs, gva, buf, sz, true);
 }
 
-int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
+static int do_initialize_xen_caps(KVMState *s, uint32_t hypercall_msr)
 {
+    int xen_caps, ret;
     const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
         KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_SHARED_INFO;
+
     struct kvm_xen_hvm_config cfg = {
         .msr = hypercall_msr,
         .flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL,
     };
-    int xen_caps, ret;
 
     xen_caps = kvm_check_extension(s, KVM_CAP_XEN_HVM);
     if (required_caps & ~xen_caps) {
@@ -143,6 +166,21 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
                      strerror(-ret));
         return ret;
     }
+    return xen_caps;
+}
+
+int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
+{
+    int xen_caps;
+
+    xen_caps = do_initialize_xen_caps(s, hypercall_msr);
+    if (xen_caps < 0) {
+        return xen_caps;
+    }
+
+    if (!hyperv_enabled && (hypercall_msr == XEN_HYPERCALL_MSR_HYPERV)) {
+        hyperv_enabled = true;
+    }
 
     /* If called a second time, don't repeat the rest of the setup. */
     if (s->xen_caps) {
@@ -185,6 +223,9 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
     xen_primary_console_reset();
     xen_xenstore_reset();
 
+    xen_vmfd_change_notifier.notify = xen_handle_vmfd_change;
+    kvm_vmfd_add_change_notifier(&xen_vmfd_change_notifier);
+
     return 0;
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 26/32] kvm/xen_evtchn: add support for confidential guest reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (24 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 25/32] kvm/xen-emu: re-initialize capabilities during confidential guest reset Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 27/32] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: David Woodhouse, Paul Durrant, Michael S. Tsirkin,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost
  Cc: Ani Sinha, qemu-devel

As a part of the confidential guest reset, when the KVM VM file handle is
changed, Xen event ports and kernel ports that were associated with the
previous KVM file handle needs to be reassociated with the new handle. This is
performed with the help of a callback handler that gets invoked during the
confidential guest reset process when the KVM VM file fd changes.

This patch is untested on confidential guests and exists only for completeness.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/kvm/xen_evtchn.c | 100 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 97 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index b65871f354..18d6aa9cd5 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -133,6 +133,26 @@ struct pirq_info {
     bool is_translated;
 };
 
+struct eventfds {
+    uint16_t type;
+    evtchn_port_t port;
+    int fd;
+    QLIST_ENTRY(eventfds) node;
+};
+
+struct kernel_ports {
+    uint16_t type;
+    evtchn_port_t port;
+    uint32_t vcpu_id;
+    QLIST_ENTRY(kernel_ports) node;
+};
+
+static QLIST_HEAD(, eventfds) eventfd_list =
+    QLIST_HEAD_INITIALIZER(eventfd_list);
+
+static QLIST_HEAD(, kernel_ports) kernel_port_list =
+    QLIST_HEAD_INITIALIZER(kernel_port_list);
+
 struct XenEvtchnState {
     /*< private >*/
     SysBusDevice busdev;
@@ -178,6 +198,7 @@ struct XenEvtchnState {
 #define pirq_inuse(s, pirq) (pirq_inuse_word(s, pirq) & pirq_inuse_bit(pirq))
 
 struct XenEvtchnState *xen_evtchn_singleton;
+static NotifierWithReturn xen_eventchn_notifier;
 
 /* Top bits of callback_param are the type (HVM_PARAM_CALLBACK_TYPE_xxx) */
 #define CALLBACK_VIA_TYPE_SHIFT 56
@@ -304,6 +325,52 @@ static void gsi_assert_bh(void *opaque)
     }
 }
 
+static int xen_eventchn_handle_vmfd_change(NotifierWithReturn *notifier,
+                                           void *data, Error **errp)
+{
+    struct eventfds *ef;
+    struct kernel_ports *kp;
+    struct kvm_xen_hvm_attr ha;
+    CPUState *cpu;
+    int ret;
+
+    QLIST_FOREACH(ef, &eventfd_list, node) {
+        ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+        ha.u.evtchn.send_port = ef->port;
+        ha.u.evtchn.type = ef->type;
+        ha.u.evtchn.flags = 0;
+        ha.u.evtchn.deliver.eventfd.port = 0;
+        ha.u.evtchn.deliver.eventfd.fd = ef->fd;
+
+        ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+        if (ret < 0) {
+            error_setg(errp, "KVM_XEN_HVM_SET_ATTR failed with %d", ret);
+            return ret;
+        }
+    }
+
+    memset(&ha, 0, sizeof(ha));
+
+    QLIST_FOREACH(kp, &kernel_port_list, node) {
+        cpu = qemu_get_cpu(kp->vcpu_id);
+        ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+        ha.u.evtchn.send_port = kp->port;
+        ha.u.evtchn.type = kp->type;
+        ha.u.evtchn.flags = 0;
+        ha.u.evtchn.deliver.port.port = kp->port;
+        ha.u.evtchn.deliver.port.vcpu = kvm_arch_vcpu_id(cpu);
+        ha.u.evtchn.deliver.port.priority =
+            KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
+
+        ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+        if (ret < 0) {
+            error_setg(errp, "KVM_XEN_HVM_SET_ATTR failed with %d", ret);
+            return ret;
+        }
+    }
+    return 0;
+}
+
 void xen_evtchn_create(unsigned int nr_gsis, qemu_irq *system_gsis)
 {
     XenEvtchnState *s = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN,
@@ -350,6 +417,9 @@ void xen_evtchn_create(unsigned int nr_gsis, qemu_irq *system_gsis)
 
     /* Set event channel functions for backend drivers to use */
     xen_evtchn_ops = &emu_evtchn_backend_ops;
+
+    xen_eventchn_notifier.notify = xen_eventchn_handle_vmfd_change;
+    kvm_vmfd_add_change_notifier(&xen_eventchn_notifier);
 }
 
 static void xen_evtchn_register_types(void)
@@ -547,6 +617,7 @@ static void inject_callback(XenEvtchnState *s, uint32_t vcpu)
 static void deassign_kernel_port(evtchn_port_t port)
 {
     struct kvm_xen_hvm_attr ha;
+    struct kernel_ports *kp;
     int ret;
 
     ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
@@ -557,6 +628,12 @@ static void deassign_kernel_port(evtchn_port_t port)
     if (ret) {
         qemu_log_mask(LOG_GUEST_ERROR, "Failed to unbind kernel port %d: %s\n",
                       port, strerror(ret));
+    } else {
+        QLIST_FOREACH(kp, &kernel_port_list, node) {
+            if (kp->port == port) {
+                QLIST_REMOVE(kp, node);
+            }
+        }
     }
 }
 
@@ -565,6 +642,8 @@ static int assign_kernel_port(uint16_t type, evtchn_port_t port,
 {
     CPUState *cpu = qemu_get_cpu(vcpu_id);
     struct kvm_xen_hvm_attr ha;
+    g_autofree struct kernel_ports *kp = g_malloc0(sizeof(*kp));
+    int ret;
 
     if (!cpu) {
         return -ENOENT;
@@ -578,12 +657,21 @@ static int assign_kernel_port(uint16_t type, evtchn_port_t port,
     ha.u.evtchn.deliver.port.vcpu = kvm_arch_vcpu_id(cpu);
     ha.u.evtchn.deliver.port.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
 
-    return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+    ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+    if (ret == 0) {
+        kp->type = type;
+        kp->port = port;
+        kp->vcpu_id = vcpu_id;
+        QLIST_INSERT_HEAD(&kernel_port_list, kp, node);
+    }
+    return ret;
 }
 
 static int assign_kernel_eventfd(uint16_t type, evtchn_port_t port, int fd)
 {
     struct kvm_xen_hvm_attr ha;
+    g_autofree struct eventfds *ef = g_malloc0(sizeof(*ef));
+    int ret;
 
     ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
     ha.u.evtchn.send_port = port;
@@ -592,7 +680,14 @@ static int assign_kernel_eventfd(uint16_t type, evtchn_port_t port, int fd)
     ha.u.evtchn.deliver.eventfd.port = 0;
     ha.u.evtchn.deliver.eventfd.fd = fd;
 
-    return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+    ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+    if (ret == 0) {
+        ef->type = type;
+        ef->port = port;
+        ef->fd = fd;
+        QLIST_INSERT_HEAD(&eventfd_list, ef, node);
+    }
+    return ret;
 }
 
 static bool valid_port(evtchn_port_t port)
@@ -2391,4 +2486,3 @@ void hmp_xen_event_inject(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "Delivered port %d\n", port);
     }
 }
-
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 27/32] ppc/openpic: create a new openpic device and reattach mem region on coco reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (25 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 26/32] kvm/xen_evtchn: add support for " Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-13 14:13   ` Bernhard Beschow
  2026-01-12 13:22 ` [PATCH v2 28/32] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
                   ` (4 subsequent siblings)
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Bernhard Beschow; +Cc: Ani Sinha, qemu-ppc, qemu-devel

For confidential guests during the reset process, the old KVM VM file
descriptor is closed and a new one is created. When a new file descriptor is
created, a new openpic device needs to be created against this new KVM VM file
descriptor as well. Additionally, existing memory region needs to be reattached
to this new openpic device and proper CPU attributes set associating new file
descriptor. This change makes this happen with the help of a callback handler
that gets called when the KVM VM file descriptor changes as a part of the
confidential guest reset process.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/intc/openpic_kvm.c | 108 ++++++++++++++++++++++++++++++++----------
 1 file changed, 83 insertions(+), 25 deletions(-)

diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
index 9aafef5d9e..4fd70d4b32 100644
--- a/hw/intc/openpic_kvm.c
+++ b/hw/intc/openpic_kvm.c
@@ -49,6 +49,7 @@ struct KVMOpenPICState {
     uint32_t fd;
     uint32_t model;
     hwaddr mapped;
+    NotifierWithReturn open_pic_vmfd_change_notifier;
 };
 
 static void kvm_openpic_set_irq(void *opaque, int n_IRQ, int level)
@@ -114,6 +115,83 @@ static const MemoryRegionOps kvm_openpic_mem_ops = {
     },
 };
 
+static int create_open_pic_device(KVMOpenPICState *opp, Error **errp)
+{
+    int kvm_openpic_model;
+    struct kvm_create_device cd = {0};
+    KVMState *s = kvm_state;
+    int ret;
+
+    switch (opp->model) {
+    case OPENPIC_MODEL_FSL_MPIC_20:
+        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
+        break;
+
+    case OPENPIC_MODEL_FSL_MPIC_42:
+        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
+        break;
+
+    default:
+        error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
+        return -1;
+    }
+
+    cd.type = kvm_openpic_model;
+    ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
+    if (ret < 0) {
+        error_setg(errp, "Can't create device %d: %s",
+                   cd.type, strerror(errno));
+        return -1;
+    }
+    opp->fd = cd.fd;
+
+    return 0;
+}
+
+static int open_pic_vmfd_handle_vmfd_change(NotifierWithReturn *notifier,
+                                            void *data, Error **errp)
+{
+    KVMOpenPICState *opp = container_of(notifier, KVMOpenPICState,
+                                        open_pic_vmfd_change_notifier);
+    uint64_t reg_base;
+    struct kvm_device_attr attr;
+    CPUState *cs;
+    int ret;
+
+    /* close the old descriptor */
+    close(opp->fd);
+
+    if (create_open_pic_device(opp, errp) < 0) {
+        return -1;
+    }
+
+    if (!opp->mapped) {
+        return 0;
+    }
+
+    reg_base = opp->mapped;
+    attr.group = KVM_DEV_MPIC_GRP_MISC;
+    attr.attr = KVM_DEV_MPIC_BASE_ADDR;
+    attr.addr = (uint64_t)(unsigned long)&reg_base;
+
+    ret = ioctl(opp->fd, KVM_SET_DEVICE_ATTR, &attr);
+    if (ret < 0) {
+        fprintf(stderr, "%s: %s %" PRIx64 "\n", __func__,
+                strerror(errno), reg_base);
+        return -1;
+    }
+
+    CPU_FOREACH(cs) {
+        ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_MPIC, 0, opp->fd,
+                                   kvm_arch_vcpu_id(cs));
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 static void kvm_openpic_region_add(MemoryListener *listener,
                                    MemoryRegionSection *section)
 {
@@ -197,37 +275,14 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
     SysBusDevice *d = SYS_BUS_DEVICE(dev);
     KVMOpenPICState *opp = KVM_OPENPIC(dev);
     KVMState *s = kvm_state;
-    int kvm_openpic_model;
-    struct kvm_create_device cd = {0};
-    int ret, i;
+    int i;
 
     if (!kvm_check_extension(s, KVM_CAP_DEVICE_CTRL)) {
         error_setg(errp, "Kernel is lacking Device Control API");
         return;
     }
 
-    switch (opp->model) {
-    case OPENPIC_MODEL_FSL_MPIC_20:
-        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
-        break;
-
-    case OPENPIC_MODEL_FSL_MPIC_42:
-        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
-        break;
-
-    default:
-        error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
-        return;
-    }
-
-    cd.type = kvm_openpic_model;
-    ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
-    if (ret < 0) {
-        error_setg(errp, "Can't create device %d: %s",
-                   cd.type, strerror(errno));
-        return;
-    }
-    opp->fd = cd.fd;
+    create_open_pic_device(opp, errp);
 
     sysbus_init_mmio(d, &opp->mem);
     qdev_init_gpio_in(dev, kvm_openpic_set_irq, OPENPIC_MAX_IRQ);
@@ -236,6 +291,9 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
     opp->mem_listener.region_del = kvm_openpic_region_del;
     opp->mem_listener.name = "openpic-kvm";
     memory_listener_register(&opp->mem_listener, &address_space_memory);
+    opp->open_pic_vmfd_change_notifier.notify =
+        open_pic_vmfd_handle_vmfd_change;
+    kvm_vmfd_add_change_notifier(&opp->open_pic_vmfd_change_notifier);
 
     /* indicate pic capabilities */
     msi_nonbroken = true;
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 28/32] kvm/vcpu: add notifiers to inform vcpu file descriptor change
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (26 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 27/32] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 29/32] kvm/i386/apic: set local apic after vcpu file descriptors changed Ani Sinha
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ani Sinha, kvm, qemu-devel

When new vcpu file descriptors are created and bound to the new kvm file
descriptor as a part of the confidential guest reset mechanism, various
subsystems needs to know about it. This change adds notifiers so that various
subsystems can take appropriate actions when vcpu fds change by registering
their handlers to this notifier.
Subsequent changes will register specific handlers to this notifier.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 accel/kvm/kvm-all.c    | 27 ++++++++++++++++++++++++++-
 accel/stubs/kvm-stub.c | 10 ++++++++++
 include/system/kvm.h   | 17 +++++++++++++++++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 2bd4dcd43b..efdfdf0ccb 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -130,8 +130,10 @@ static NotifierWithReturnList register_vmfd_changed_notifiers =
 static NotifierWithReturnList register_vmfd_pre_change_notifiers =
     NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vmfd_pre_change_notifiers);
 
-static int kvm_rebind_vcpus(Error **errp);
+static NotifierWithReturnList register_vcpufd_changed_notifiers =
+    NOTIFIER_WITH_RETURN_LIST_INITIALIZER(register_vcpufd_changed_notifiers);
 
+static int kvm_rebind_vcpus(Error **errp);
 static int map_kvm_run(KVMState *s, CPUState *cpu, Error **errp);
 static int map_kvm_dirty_gfns(KVMState *s, CPUState *cpu, Error **errp);
 static int vcpu_unmap_regions(KVMState *s, CPUState *cpu);
@@ -2328,6 +2330,22 @@ void kvm_vmfd_remove_pre_change_notifier(NotifierWithReturn *n)
     notifier_with_return_remove(n);
 }
 
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_list_add(&register_vcpufd_changed_notifiers, n);
+}
+
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_remove(n);
+}
+
+static int kvm_vcpufd_change_notify(Error **errp)
+{
+    return notifier_with_return_list_notify(&register_vcpufd_changed_notifiers,
+                                            &vmfd_notifier, errp);
+}
+
 static int kvm_vmfd_pre_change_notify(Error **errp)
 {
     return notifier_with_return_list_notify(&register_vmfd_pre_change_notifiers,
@@ -2858,6 +2876,13 @@ static int kvm_reset_vmfd(MachineState *ms)
     }
     assert(!err);
 
+    /* notify everyone that vcpu fd has changed. */
+    ret = kvm_vcpufd_change_notify(&err);
+    if (ret < 0) {
+        return ret;
+    }
+    assert(!err);
+
     /* these can be only called after ram_block_rebind() */
     memory_listener_register(&kml->listener, &address_space_memory);
     memory_listener_register(&kvm_io_listener, &address_space_io);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 7f4e3c4050..5b94f3dc3c 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -95,6 +95,16 @@ void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n)
 {
 }
 
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n)
+{
+    return;
+}
+
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n)
+{
+    return;
+}
+
 int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
                                        EventNotifier *rn, int virq)
 {
diff --git a/include/system/kvm.h b/include/system/kvm.h
index edc3fa5004..120b77d039 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -587,6 +587,23 @@ void kvm_vmfd_add_change_notifier(NotifierWithReturn *n);
  */
 void kvm_vmfd_remove_change_notifier(NotifierWithReturn *n);
 
+/**
+ * kvm_vcpufd_add_change_notifier - register a notifier to get notified when
+ * a KVM vcpu file descriptors changes as a part of the confidential guest
+ * "reset" process. Various subsystems should use this mechanism to take
+ * actions such as re-issuing vcpu ioctls as a part of setting up vcpu
+ * features.
+ * @n: notifier with return value.
+ */
+void kvm_vcpufd_add_change_notifier(NotifierWithReturn *n);
+
+/**
+ * kvm_vcpufd_remove_change_notifier - de-register a notifer previously
+ * registered with kvm_vcpufd_add_change_notifier call.
+ * @n: notifier that was previously registered.
+ */
+void kvm_vcpufd_remove_change_notifier(NotifierWithReturn *n);
+
 /**
  * kvm_vmfd_add_pre_change_notifier - register a notifier to get notified when
  * kvm vm file descriptor is about to be changed as a part of the confidential
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 29/32] kvm/i386/apic: set local apic after vcpu file descriptors changed
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (27 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 28/32] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 30/32] kvm/clock: add support for confidential guest reset Ani Sinha
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Ani Sinha, qemu-devel

Once the vcpu file descriptors changed after confidential guest reset, the
local apic needs to be reinitialized. This change adds a callback from the
vcpu fd change notifiers to reinitialize local apic for kvm x86.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/kvm/apic.c              | 13 +++++++++++++
 include/hw/i386/apic_internal.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 82355f0463..f6f8ac2764 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -229,6 +229,16 @@ static void kvm_apic_reset(APICCommonState *s)
     run_on_cpu(CPU(s->cpu), kvm_apic_put, RUN_ON_CPU_HOST_PTR(s));
 }
 
+static int apic_vcpufd_change_handler(NotifierWithReturn *n,
+                                      void *data, Error** errp) {
+    APICCommonState *s = container_of(n, APICCommonState,
+                                      vcpufd_change_notifier);
+
+    run_on_cpu(CPU(s->cpu), kvm_apic_put, RUN_ON_CPU_HOST_PTR(s));
+
+    return 0;
+}
+
 static void kvm_apic_realize(DeviceState *dev, Error **errp)
 {
     APICCommonState *s = APIC_COMMON(dev);
@@ -238,6 +248,9 @@ static void kvm_apic_realize(DeviceState *dev, Error **errp)
 
     assert(kvm_has_gsi_routing());
     msi_nonbroken = true;
+
+    s->vcpufd_change_notifier.notify = apic_vcpufd_change_handler;
+    kvm_vcpufd_add_change_notifier(&s->vcpufd_change_notifier);
 }
 
 static void kvm_apic_unrealize(DeviceState *dev)
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index 4a62fdceb4..ffe5815e7f 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -189,6 +189,7 @@ struct APICCommonState {
     hwaddr vapic_paddr; /* note: persistence via kvmvapic */
     bool legacy_instance_id;
     uint32_t extended_log_dest;
+    NotifierWithReturn vcpufd_change_notifier;
 };
 
 typedef struct VAPICState {
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 30/32] kvm/clock: add support for confidential guest reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (28 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 29/32] kvm/i386/apic: set local apic after vcpu file descriptors changed Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 31/32] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 32/32] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost
  Cc: Ani Sinha, qemu-devel

Confidential guests change the KVM VM file descriptor upon reset and also create
new VCPU file descriptors against the new KVM VM file descriptor. We need to
save the clock state from kvm before KVM VM file descriptor changes and restore
it after. Also after VCPU file descriptors changed, we must call
KVM_KVMCLOCK_CTRL on the VCPU file descriptor to inform KVM that the VCPU is
in paused state.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/i386/kvm/clock.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index aba6842a22..fb0f500b7f 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -50,6 +50,9 @@ struct KVMClockState {
     /* whether the 'clock' value was obtained in a host with
      * reliable KVM_GET_CLOCK */
     bool clock_is_reliable;
+
+    NotifierWithReturn kvmclock_vcpufd_change_notifier;
+    NotifierWithReturn kvmclock_vmfd_pre_change_notifier;
 };
 
 struct pvclock_vcpu_time_info {
@@ -63,6 +66,9 @@ struct pvclock_vcpu_time_info {
     uint8_t    pad[2];
 } __attribute__((__packed__)); /* 32 bytes */
 
+static int kvmclock_set_clock(NotifierWithReturn *notifier,
+                              void *data, Error** errp);
+
 static uint64_t kvmclock_current_nsec(KVMClockState *s)
 {
     CPUState *cpu = first_cpu;
@@ -219,6 +225,51 @@ static void kvmclock_vm_state_change(void *opaque, bool running,
     }
 }
 
+static int kvmclock_save_clock(NotifierWithReturn *notifier,
+                               void *data, Error** errp)
+{
+    KVMClockState *s = container_of(notifier, KVMClockState,
+                                    kvmclock_vmfd_pre_change_notifier);
+    kvm_update_clock(s);
+    return 0;
+}
+
+static int kvmclock_set_clock(NotifierWithReturn *notifier,
+                              void *data, Error** errp)
+{
+    struct kvm_clock_data clock_data = {};
+    CPUState *cpu;
+    int ret;
+    KVMClockState *s = container_of(notifier, KVMClockState,
+                                    kvmclock_vcpufd_change_notifier);
+    int cap_clock_ctrl = kvm_check_extension(kvm_state, KVM_CAP_KVMCLOCK_CTRL);
+
+    if (!s->clock_is_reliable) {
+        uint64_t pvclock_via_mem = kvmclock_current_nsec(s);
+        /* saved clock value before vmfd change is not reliable */
+        if (pvclock_via_mem) {
+            s->clock = pvclock_via_mem;
+        }
+    }
+
+    clock_data.clock = s->clock;
+    ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &clock_data);
+    if (ret < 0) {
+        fprintf(stderr, "KVM_SET_CLOCK failed: %s\n", strerror(-ret));
+        abort();
+    }
+
+    if (!cap_clock_ctrl) {
+        return 0;
+    }
+    CPU_FOREACH(cpu) {
+        run_on_cpu(cpu, do_kvmclock_ctrl, RUN_ON_CPU_NULL);
+    }
+
+    return 0;
+}
+
+
 static void kvmclock_realize(DeviceState *dev, Error **errp)
 {
     KVMClockState *s = KVM_CLOCK(dev);
@@ -230,7 +281,12 @@ static void kvmclock_realize(DeviceState *dev, Error **errp)
 
     kvm_update_clock(s);
 
+    s->kvmclock_vcpufd_change_notifier.notify = kvmclock_set_clock;
+    s->kvmclock_vmfd_pre_change_notifier.notify = kvmclock_save_clock;
+
     qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
+    kvm_vcpufd_add_change_notifier(&s->kvmclock_vcpufd_change_notifier);
+    kvm_vmfd_add_pre_change_notifier(&s->kvmclock_vmfd_pre_change_notifier);
 }
 
 static bool kvmclock_clock_is_reliable_needed(void *opaque)
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 31/32] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset'
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (29 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 30/32] kvm/clock: add support for confidential guest reset Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 13:22 ` [PATCH v2 32/32] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
  31 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Zhao Liu, Paolo Bonzini
  Cc: Ani Sinha, qemu-devel

A new machine specific option 'x-change-vmfd-on-reset' is introduced for
debugging and testing only (hence the 'x-' prefix). This option when enabled
will force KVM VM file descriptor to be changed upon guest reset like
in the case of confidential guests. This can be used to exercize the code
changes that are specific for confidential guests on non-confidential
guests as well (except changes that require hardware support for
confidential guests).
A new functional test has been added in the next patch that uses this new
parameter to test the VM file descriptor changes.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 hw/core/machine.c        | 22 ++++++++++++++++++++++
 include/hw/core/boards.h |  6 ++++++
 system/runstate.c        |  7 ++++++-
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 6411e68856..95d7650db9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -450,6 +450,21 @@ static void machine_set_dump_guest_core(Object *obj, bool value, Error **errp)
     ms->dump_guest_core = value;
 }
 
+static bool machine_get_new_accel_vmfd_on_reset(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->new_accel_vmfd_on_reset;
+}
+
+static void machine_set_new_accel_vmfd_on_reset(Object *obj,
+                                                bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->new_accel_vmfd_on_reset = value;
+}
+
 static bool machine_get_mem_merge(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -1198,6 +1213,13 @@ static void machine_class_init(ObjectClass *oc, const void *data)
     object_class_property_set_description(oc, "dump-guest-core",
         "Include guest memory in a core dump");
 
+    object_class_property_add_bool(oc, "x-change-vmfd-on-reset",
+        machine_get_new_accel_vmfd_on_reset,
+        machine_set_new_accel_vmfd_on_reset);
+    object_class_property_set_description(oc, "x-change-vmfd-on-reset",
+        "Generate new accelerator fd on reset, "
+        "to be used only for testing and debugging.");
+
     object_class_property_add_bool(oc, "mem-merge",
         machine_get_mem_merge, machine_set_mem_merge);
     object_class_property_set_description(oc, "mem-merge",
diff --git a/include/hw/core/boards.h b/include/hw/core/boards.h
index 07f8938752..ee3cc9130e 100644
--- a/include/hw/core/boards.h
+++ b/include/hw/core/boards.h
@@ -447,6 +447,12 @@ struct MachineState {
     struct NVDIMMState *nvdimms_state;
     struct NumaState *numa_state;
     bool acpi_spcr_enabled;
+    /*
+     * whether to change virtual machine accelerator file descriptor upon
+     * reset or not. used only for debugging and testing purpose.
+     * It should be set to false for all regular use.
+     */
+    bool new_accel_vmfd_on_reset;
 };
 
 /*
diff --git a/system/runstate.c b/system/runstate.c
index 710f5882d9..a4572af2af 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -531,13 +531,18 @@ void qemu_system_reset(ShutdownCause reason)
      * file handle is necessary to create a new confidential VM context post
      * VM reset.
      */
-    if (current_machine->cgs && reason == SHUTDOWN_CAUSE_GUEST_RESET) {
+    if (reason == SHUTDOWN_CAUSE_GUEST_RESET &&
+        (current_machine->new_accel_vmfd_on_reset || current_machine->cgs)) {
         if (ac->reset_vmfd) {
             ret = ac->reset_vmfd(current_machine);
             if (ret < 0) {
                 error_report("unable to reset vmfd: %d", ret);
                 abort();
             }
+            if (current_machine->new_accel_vmfd_on_reset) {
+                qemu_log("INFO: virtual machine accel file descriptor "
+                         "has changed.\n");
+            }
         }
     }
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 32/32] tests/functional/x86_64: add functional test to exercise vm fd change on reset
  2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
                   ` (30 preceding siblings ...)
  2026-01-12 13:22 ` [PATCH v2 31/32] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' Ani Sinha
@ 2026-01-12 13:22 ` Ani Sinha
  2026-01-12 14:36   ` Daniel P. Berrangé
  31 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-12 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Zhao Liu, Ani Sinha; +Cc: Ani Sinha, qemu-devel

A new functional test is added that exercises the code changes related to
closing of the old KVM VM file descriptor and opening a new one upon VM reset.
This normally happens when confidential guests are resetted but for
non-confidential guests, we use a special machine specific debug/test parameter
'x-change-vmfd-on-reset' to enable this behavior.
Only specific code changes related to re-initialization of SEV-ES, SEV-SNP and
TDX platforms are not exercized in this test as they require hardware that
supports running confidential guests.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 MAINTAINERS                                   |  6 ++
 tests/functional/x86_64/meson.build           |  1 +
 .../x86_64/test_vmfd_change_reboot.py         | 75 +++++++++++++++++++
 3 files changed, 82 insertions(+)
 create mode 100755 tests/functional/x86_64/test_vmfd_change_reboot.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 9a55b649e8..11871fdd35 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -150,6 +150,12 @@ F: tools/i386/
 F: tests/functional/i386/
 F: tests/functional/x86_64/
 
+X86 VM file descriptor change on reset test
+M: Ani Sinha <aniisnha@redhat.com>
+M: Paolo Bonzini <pbonzini@redhat.com>
+S: Maintained
+F: tests/functional/x86_64/test_vmfd_change_reboot.py
+
 Guest CPU cores (TCG)
 ---------------------
 Overall TCG CPUs
diff --git a/tests/functional/x86_64/meson.build b/tests/functional/x86_64/meson.build
index f78eec5e6c..784d9791cb 100644
--- a/tests/functional/x86_64/meson.build
+++ b/tests/functional/x86_64/meson.build
@@ -36,4 +36,5 @@ tests_x86_64_system_thorough = [
   'vfio_user_client',
   'virtio_balloon',
   'virtio_gpu',
+  'vmfd_change_reboot',
 ]
diff --git a/tests/functional/x86_64/test_vmfd_change_reboot.py b/tests/functional/x86_64/test_vmfd_change_reboot.py
new file mode 100755
index 0000000000..3b33322880
--- /dev/null
+++ b/tests/functional/x86_64/test_vmfd_change_reboot.py
@@ -0,0 +1,75 @@
+#!/usr/bin/env python3
+#
+# KVM VM file descriptor change on reset test
+#
+# Copyright © 2026 Red Hat, Inc.
+#
+# Author:
+#  Ani Sinha <anisinha@redhat.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os
+from qemu.machine import machine
+
+from qemu_test import QemuSystemTest, Asset, exec_command_and_wait_for_pattern
+from qemu_test import wait_for_console_pattern
+
+class KVMGuest(QemuSystemTest):
+
+    ASSET_UKI = Asset('https://gitlab.com/anisinha/misc-artifacts/'
+                      '-/raw/main/uki.x86-64.efi?ref_type=heads',
+                      'e0f806bd1fa24111312e1fe849d2ee69808d4343930a5'
+                      'dc8c1688da17c65f576')
+    ASSET_OVMF = Asset('https://gitlab.com/anisinha/misc-artifacts/'
+                       '-/raw/main/OVMF.stateless.fd?ref_type=heads',
+                       '58a4275aafa8774bd6b1540adceae4ea434b8db75b476'
+                       '11839ff47be88cfcf22')
+
+    def common_vm_setup(self):
+        self.require_accelerator("kvm")
+
+        self.vm.set_console()
+
+        self.vm.add_args("-accel", "kvm")
+        self.vm.add_args("-smp", "2")
+        self.vm.add_args("-cpu", "host")
+        self.vm.add_args("-m", "2G")
+        self.vm.add_args("-nographic", "-nodefaults")
+
+        self.uki_path = self.ASSET_UKI.fetch()
+        self.ovmf_path = self.ASSET_OVMF.fetch()
+
+    def run_and_check(self):
+        self.vm.add_args('-kernel', self.uki_path)
+        self.vm.add_args("-bios", self.ovmf_path)
+        # enable KVM VMFD change on reset for a non-coco VM
+        self.vm.add_args("-machine", "q35,x-change-vmfd-on-reset=on")
+        # enable tracing
+        self.vm.add_args("-d", "trace:kvm_reset_vmfd")
+
+        try:
+            self.vm.launch()
+        except machine.VMLaunchFailure as e:
+            raise e
+
+        self.log.info('VM launched')
+        console_pattern = 'bash-5.1#'
+        wait_for_console_pattern(self, console_pattern)
+        self.log.info('VM ready with a bash prompt')
+
+        exec_command_and_wait_for_pattern(self, '/usr/sbin/reboot -f',
+                                          'reboot: machine restart')
+        console_pattern = '# --- Hello world ---'
+        wait_for_console_pattern(self, console_pattern)
+        self.vm.shutdown()
+        self.assertRegex(self.vm.get_log(),
+                         r'kvm_reset_vmfd \nINFO: virtual machine accel file '
+                         'descriptor has changed')
+
+    def test_vmfd_change_on_reset(self):
+        self.common_vm_setup()
+        self.run_and_check()
+
+if __name__ == '__main__':
+    QemuSystemTest.main()
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 32/32] tests/functional/x86_64: add functional test to exercise vm fd change on reset
  2026-01-12 13:22 ` [PATCH v2 32/32] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
@ 2026-01-12 14:36   ` Daniel P. Berrangé
  2026-01-13  5:53     ` Ani Sinha
  0 siblings, 1 reply; 54+ messages in thread
From: Daniel P. Berrangé @ 2026-01-12 14:36 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Paolo Bonzini, Zhao Liu, qemu-devel

On Mon, Jan 12, 2026 at 06:52:45PM +0530, Ani Sinha wrote:
> A new functional test is added that exercises the code changes related to
> closing of the old KVM VM file descriptor and opening a new one upon VM reset.
> This normally happens when confidential guests are resetted but for
> non-confidential guests, we use a special machine specific debug/test parameter
> 'x-change-vmfd-on-reset' to enable this behavior.
> Only specific code changes related to re-initialization of SEV-ES, SEV-SNP and
> TDX platforms are not exercized in this test as they require hardware that
> supports running confidential guests.
> 
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
>  MAINTAINERS                                   |  6 ++
>  tests/functional/x86_64/meson.build           |  1 +
>  .../x86_64/test_vmfd_change_reboot.py         | 75 +++++++++++++++++++
>  3 files changed, 82 insertions(+)
>  create mode 100755 tests/functional/x86_64/test_vmfd_change_reboot.py


> diff --git a/tests/functional/x86_64/test_vmfd_change_reboot.py b/tests/functional/x86_64/test_vmfd_change_reboot.py
> new file mode 100755
> index 0000000000..3b33322880
> --- /dev/null
> +++ b/tests/functional/x86_64/test_vmfd_change_reboot.py
> @@ -0,0 +1,75 @@
> +#!/usr/bin/env python3
> +#
> +# KVM VM file descriptor change on reset test
> +#
> +# Copyright © 2026 Red Hat, Inc.
> +#
> +# Author:
> +#  Ani Sinha <anisinha@redhat.com>
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +import os
> +from qemu.machine import machine
> +
> +from qemu_test import QemuSystemTest, Asset, exec_command_and_wait_for_pattern
> +from qemu_test import wait_for_console_pattern
> +
> +class KVMGuest(QemuSystemTest):
> +
> +    ASSET_UKI = Asset('https://gitlab.com/anisinha/misc-artifacts/'
> +                      '-/raw/main/uki.x86-64.efi?ref_type=heads',
> +                      'e0f806bd1fa24111312e1fe849d2ee69808d4343930a5'
> +                      'dc8c1688da17c65f576')
> +    ASSET_OVMF = Asset('https://gitlab.com/anisinha/misc-artifacts/'
> +                       '-/raw/main/OVMF.stateless.fd?ref_type=heads',
> +                       '58a4275aafa8774bd6b1540adceae4ea434b8db75b476'
> +                       '11839ff47be88cfcf22')

What is the source of these two binaries - the repo doesn't show any
source code or references ?  Is there no way we can use standard distro
images for this test ?

> +
> +    def common_vm_setup(self):
> +        self.require_accelerator("kvm")
> +
> +        self.vm.set_console()
> +
> +        self.vm.add_args("-accel", "kvm")
> +        self.vm.add_args("-smp", "2")
> +        self.vm.add_args("-cpu", "host")
> +        self.vm.add_args("-m", "2G")
> +        self.vm.add_args("-nographic", "-nodefaults")
> +
> +        self.uki_path = self.ASSET_UKI.fetch()
> +        self.ovmf_path = self.ASSET_OVMF.fetch()
> +
> +    def run_and_check(self):
> +        self.vm.add_args('-kernel', self.uki_path)
> +        self.vm.add_args("-bios", self.ovmf_path)
> +        # enable KVM VMFD change on reset for a non-coco VM
> +        self.vm.add_args("-machine", "q35,x-change-vmfd-on-reset=on")
> +        # enable tracing
> +        self.vm.add_args("-d", "trace:kvm_reset_vmfd")
> +
> +        try:
> +            self.vm.launch()
> +        except machine.VMLaunchFailure as e:
> +            raise e
> +
> +        self.log.info('VM launched')
> +        console_pattern = 'bash-5.1#'
> +        wait_for_console_pattern(self, console_pattern)
> +        self.log.info('VM ready with a bash prompt')
> +
> +        exec_command_and_wait_for_pattern(self, '/usr/sbin/reboot -f',
> +                                          'reboot: machine restart')
> +        console_pattern = '# --- Hello world ---'
> +        wait_for_console_pattern(self, console_pattern)
> +        self.vm.shutdown()
> +        self.assertRegex(self.vm.get_log(),
> +                         r'kvm_reset_vmfd \nINFO: virtual machine accel file '
> +                         'descriptor has changed')
> +
> +    def test_vmfd_change_on_reset(self):
> +        self.common_vm_setup()
> +        self.run_and_check()
> +
> +if __name__ == '__main__':
> +    QemuSystemTest.main()
> -- 
> 2.42.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/32] hw/accel: add a per-accelerator callback to change VM accelerator handle
  2026-01-12 13:22 ` [PATCH v2 02/32] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
@ 2026-01-12 17:01   ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-12 17:01 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Richard Henderson, Philippe Mathieu-Daudé, qemu-devel

On Mon, Jan 12, 2026 at 2:23 PM Ani Sinha <anisinha@redhat.com> wrote:
> +    if (current_machine->cgs && reason == SHUTDOWN_CAUSE_GUEST_RESET) {

This should check (I think) !cpus_are_resettable(), instead of
current_machine->cgs.

> +        if (ac->reset_vmfd) {
> +            ret = ac->reset_vmfd(current_machine);
> +            if (ret < 0) {
> +                error_report("unable to reset vmfd: %d", ret);
> +                abort();

This should not be an abort, but it should change the runstate to
RUN_STATE_INTERNAL_ERROR.

Also, I would move the introduction of
confidential_guest_can_rebuild_state() callback even before this patch
(see upcoming review of patch 24).

At the end of the series, this should look something like:

    if (reason == SHUTDOWN_CAUSE_GUEST_RESET
        && (current_machine->new_accel_vmfd_on_reset
            || !cpus_are_resettable()) {
        if (ac->reset_vmfd) {
            ret = ac->reset_vmfd(current_machine);
            if (ret < 0) {
                 error_report(..., strerror(-ret));
                 vm_stop(RUN_STATE_INTERNAL_ERROR);
            }
        } else if (!cpus_are_resettable()) {
            error_report("accelerator does not support reset");
        } else {
            error_report("accelerator does not support
x-change-vmfd-on-reset=...,"
                        " proceeding with normal reset");
        }
    }

Paolo

> +            }
> +        }
> +    }
> +
>      if (mc && mc->reset) {
>          mc->reset(current_machine, type);
>      } else {
> --
> 2.42.0
>



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 04/32] accel/kvm: add changes required to support KVM VM file descriptor change
  2026-01-12 13:22 ` [PATCH v2 04/32] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
@ 2026-01-12 17:02   ` Paolo Bonzini
  2026-01-13  5:22     ` Ani Sinha
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-12 17:02 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Peter Maydell, Marcelo Tosatti, Song Gao, Huacai Chen,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Nicholas Piggin, Harsh Prateek Bora,
	Chinmay Rath, Palmer Dabbelt, Alistair Francis, Weiwei Li,
	Daniel Henrique Barboza, Liu Zhiwei, Halil Pasic,
	Christian Borntraeger, Eric Farman, Matthew Rosato, Thomas Huth,
	Richard Henderson, Ilya Leoshkevich, David Hildenbrand, kvm,
	qemu-devel, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x

On Mon, Jan 12, 2026 at 2:23 PM Ani Sinha <anisinha@redhat.com>:
 > +int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)

Weird name since there are no "operations". Maybe kvm_arch_on_vmfd_change?

Paolo


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 09/32] kvm/i386: implement architecture support for kvm file descriptor change
  2026-01-12 13:22 ` [PATCH v2 09/32] kvm/i386: implement architecture support for kvm " Ani Sinha
@ 2026-01-12 17:06   ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-12 17:06 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Marcelo Tosatti, kvm, qemu-devel

On Mon, Jan 12, 2026 at 2:23 PM Ani Sinha <anisinha@redhat.com> wrote:
>  int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
>  {
> -    abort();
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    /*
> +     * Initialize confidential context, if required
> +     *
> +     * If no memory encryption is requested (ms->cgs == NULL) this is
> +     * a no-op.
> +     *
> +     */
> +    if (ms->cgs) {
> +        ret = confidential_guest_kvm_init(ms->cgs, &local_err);
> +        if (ret < 0) {
> +            error_report_err(local_err);
> +            return ret;
> +        }
> +    }

Most of the code here is in common with guest startup; please extract
it out of kvm_arch_init() and into a separate function.

There shouldn't be many ordering dependencies, if any. For functions
like kvm_get_supported_msrs() you can add a "static bool first" to
ensure they aren't rerun.

Paolo


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 11/32] kvm/i386: reload firmware for confidential guest reset
  2026-01-12 13:22 ` [PATCH v2 11/32] kvm/i386: reload firmware for " Ani Sinha
@ 2026-01-12 17:08   ` Paolo Bonzini
  2026-01-13 13:58   ` Bernhard Beschow
  1 sibling, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-12 17:08 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Marcelo Tosatti, kvm, qemu-devel

On Mon, Jan 12, 2026 at 2:24 PM Ani Sinha <anisinha@redhat.com> wrote:
>
> When IGVM is not being used by the confidential guest, the guest firmware has
> to be reloaded explictly again into memory. This is because, the memory into
> which the firmware was loaded before reset was encrypted and is thus lost
> upon reset. When IGVM is used, it is expected that the IGVM will contain the
> guest firmware and the execution of the IGVM directives will set up the guest
> firmware memory.
>
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
>  target/i386/kvm/kvm.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 4fedc621b8..46c4f9487b 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -51,6 +51,8 @@
>  #include "qemu/config-file.h"
>  #include "qemu/error-report.h"
>  #include "qemu/memalign.h"
> +#include "qemu/datadir.h"
> +#include "hw/core/loader.h"
>  #include "hw/i386/x86.h"
>  #include "hw/i386/kvm/xen_evtchn.h"
>  #include "hw/i386/pc.h"
> @@ -3267,6 +3269,22 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
>
>  static int xen_init_wrapper(MachineState *ms, KVMState *s);
>
> +static void reload_bios_rom(X86MachineState *x86ms)
> +{
> +    int bios_size;
> +    const char *bios_name;
> +    char *filename;
> +
> +    bios_name = MACHINE(x86ms)->firmware ?: "bios.bin";
> +    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> +
> +    bios_size = get_bios_size(x86ms, bios_name, filename);
> +
> +    void *ptr = memory_region_get_ram_ptr(&x86ms->bios);
> +    load_image_size(filename, ptr, bios_size);
> +    x86_firmware_configure(0x100000000ULL - bios_size, ptr, bios_size);
> +}
> +
>  int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
>  {
>      Error *local_err = NULL;
> @@ -3285,6 +3303,16 @@ int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
>              error_report_err(local_err);
>              return ret;
>          }
> +        if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
> +            X86MachineState *x86ms = X86_MACHINE(ms);
> +            /*
> +             * If an IGVM file is specified then the firmware must be provided
> +             * in the IGVM file.
> +             */
> +            if (!x86ms->igvm) {
> +                reload_bios_rom(x86ms);
> +            }
> +        }

Does this have to be done here, as opposed to in its own notifier or
anyway a notifier owned by the machine?

In any case, this can be done after the part in common with kvm_arch_init().

Paolo


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 24/32] accel/kvm: add a per-confidential class callback to unlock guest state
  2026-01-12 13:22 ` [PATCH v2 24/32] accel/kvm: add a per-confidential class callback to unlock guest state Ani Sinha
@ 2026-01-12 17:11   ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-12 17:11 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Marcelo Tosatti, Zhao Liu, qemu-devel, kvm

On Mon, Jan 12, 2026 at 2:24 PM Ani Sinha <anisinha@redhat.com> wrote:
> diff --git a/system/runstate.c b/system/runstate.c
> index b0ce0410fa..710f5882d9 100644
> --- a/system/runstate.c
> +++ b/system/runstate.c
> @@ -58,6 +58,7 @@
>  #include "system/reset.h"
>  #include "system/runstate.h"
>  #include "system/runstate-action.h"
> +#include "system/confidential-guest-support.h"
>  #include "system/system.h"
>  #include "system/tpm.h"
>  #include "trace.h"
> @@ -564,7 +565,12 @@ void qemu_system_reset(ShutdownCause reason)
>      if (cpus_are_resettable()) {
>          cpu_synchronize_all_post_reset();
>      } else {
> -        assert(runstate_check(RUN_STATE_PRELAUNCH));
> +        /*
> +         * for confidential guests, cpus are not resettable but their
> +         * state can be rebuilt under some conditions.
> +         */
> +        assert(runstate_check(RUN_STATE_PRELAUNCH) ||
> +               (current_machine->cgs && runstate_is_running()));

You can remove the assertion altogether.

> +static bool tdx_can_rebuild_guest_state(ConfidentialGuestSupport *cgs)
> +{
> +    return true;
> +}
> +
>  static void tdx_guest_class_init(ObjectClass *oc, const void *data)
>  {
>      ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
> @@ -1596,6 +1601,7 @@ static void tdx_guest_class_init(ObjectClass *oc, const void *data)
>      ResettableClass *rc = RESETTABLE_CLASS(oc);
>
>      klass->kvm_init = tdx_kvm_init;
> +    klass->can_rebuild_guest_state = tdx_can_rebuild_guest_state;
>      x86_klass->kvm_type = tdx_kvm_type;
>      x86_klass->cpu_instance_init = tdx_cpu_instance_init;
>      x86_klass->adjust_cpuid_features = tdx_adjust_cpuid_features;
> diff --git a/target/i386/sev.c b/target/i386/sev.c
> index d45356843c..c52027c935 100644
> --- a/target/i386/sev.c
> +++ b/target/i386/sev.c
> @@ -2632,6 +2632,14 @@ static int cgs_set_guest_state(hwaddr gpa, uint8_t *ptr, uint64_t len,
>      return -1;
>  }
>
> +static bool sev_can_rebuild_guest_state(ConfidentialGuestSupport *cgs)
> +{
> +    if (!sev_snp_enabled() && !sev_es_enabled()) {
> +        return false;
> +    }
> +    return true;

This is always true, because if both are false then CPUs *are* resettable.

So I think .can_rebuild_guest_state can become a bool member of the
ConfidentialGuestSupportClass, instead of a function.

Paolo


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 19/32] i386/sev: add support for confidential guest reset
  2026-01-12 13:22 ` [PATCH v2 19/32] i386/sev: add support for confidential guest reset Ani Sinha
@ 2026-01-12 17:12   ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-12 17:12 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Marcelo Tosatti, Zhao Liu, kvm, qemu-devel

On Mon, Jan 12, 2026 at 2:24 PM Ani Sinha <anisinha@redhat.com> wrote:
> @@ -2758,6 +2807,8 @@ sev_common_instance_init(Object *obj)
>      cgs->get_mem_map_entry = cgs_get_mem_map_entry;
>      cgs->set_guest_policy = cgs_set_guest_policy;
>
> +    qemu_register_resettable(OBJECT(sev_common));

Same issue as previous patch.

Paolo


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 16/32] i386/sev: add migration blockers only once
  2026-01-12 13:22 ` [PATCH v2 16/32] i386/sev: add migration blockers only once Ani Sinha
@ 2026-01-12 17:16   ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-12 17:16 UTC (permalink / raw)
  To: Ani Sinha; +Cc: Zhao Liu, Marcelo Tosatti, kvm, qemu-devel

On Mon, Jan 12, 2026 at 2:24 PM Ani Sinha <anisinha@redhat.com> wrote:
> @@ -2764,6 +2749,11 @@ sev_common_instance_init(Object *obj)
>      cgs->set_guest_policy = cgs_set_guest_policy;
>
>      QTAILQ_INIT(&sev_common->launch_vmsa);
> +
> +    /* add migration blocker */
> +    error_setg(&sev_mig_blocker,
> +               "SEV: Migration is not implemented");
> +    migrate_add_blocker(&sev_mig_blocker, &error_fatal);
>  }

.instance_init callbacks cannot have side effects. For patch 17 this
is particularly bad because it causes a dangling pointer (the notifier
is attached to an object that might not be ever used, and instead
unreferenced/freed immediately), here it's just causing migration to
be blocked forever.

If you can find a good place to place these that would be best,
otherwise you can add the usual "static bool first" method/hack.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 25/32] kvm/xen-emu: re-initialize capabilities during confidential guest reset
  2026-01-12 13:22 ` [PATCH v2 25/32] kvm/xen-emu: re-initialize capabilities during confidential guest reset Ani Sinha
@ 2026-01-12 17:19   ` Paolo Bonzini
  2026-01-12 18:22     ` David Woodhouse
  2026-01-13  5:26     ` Ani Sinha
  0 siblings, 2 replies; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-12 17:19 UTC (permalink / raw)
  To: Ani Sinha; +Cc: David Woodhouse, Paul Durrant, Marcelo Tosatti, kvm, qemu-devel

On Mon, Jan 12, 2026 at 2:24 PM Ani Sinha <anisinha@redhat.com> wrote:
>
> On confidential guests KVM virtual machine file descriptor changes as a
> part of the guest reset process. Xen capabilities needs to be re-initialized in
> KVM against the new file descriptor.
>
> This patch is untested on confidential guests and exists only for completeness.

This sentence should be changed since now your code can be tests on
non-confidential guests (or removed altogether).  Same for patch
23/32.

Paolo


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 25/32] kvm/xen-emu: re-initialize capabilities during confidential guest reset
  2026-01-12 17:19   ` Paolo Bonzini
@ 2026-01-12 18:22     ` David Woodhouse
  2026-01-13  5:26     ` Ani Sinha
  1 sibling, 0 replies; 54+ messages in thread
From: David Woodhouse @ 2026-01-12 18:22 UTC (permalink / raw)
  To: Paolo Bonzini, Ani Sinha; +Cc: Paul Durrant, Marcelo Tosatti, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 637 bytes --]

On Mon, 2026-01-12 at 18:19 +0100, Paolo Bonzini wrote:
> On Mon, Jan 12, 2026 at 2:24 PM Ani Sinha <anisinha@redhat.com> wrote:
> > 
> > On confidential guests KVM virtual machine file descriptor changes as a
> > part of the guest reset process. Xen capabilities needs to be re-initialized in
> > KVM against the new file descriptor.
> > 
> > This patch is untested on confidential guests and exists only for completeness.
> 
> This sentence should be changed since now your code can be tests on
> non-confidential guests (or removed altogether).  Same for patch
> 23/32.

Are the KVM selftests expanded to cover this?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 04/32] accel/kvm: add changes required to support KVM VM file descriptor change
  2026-01-12 17:02   ` Paolo Bonzini
@ 2026-01-13  5:22     ` Ani Sinha
  2026-01-13  5:50       ` Paolo Bonzini
  0 siblings, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-13  5:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Marcelo Tosatti, Song Gao, Huacai Chen,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Nicholas Piggin, Harsh Prateek Bora,
	Chinmay Rath, Palmer Dabbelt, Alistair Francis, Weiwei Li,
	Daniel Henrique Barboza, Liu Zhiwei, Halil Pasic,
	Christian Borntraeger, Eric Farman, Matthew Rosato, Thomas Huth,
	Richard Henderson, Ilya Leoshkevich, David Hildenbrand, kvm,
	qemu-devel, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x

On Mon, Jan 12, 2026 at 10:32 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On Mon, Jan 12, 2026 at 2:23 PM Ani Sinha <anisinha@redhat.com>:
>  > +int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
>
> Weird name since there are no "operations". Maybe kvm_arch_on_vmfd_change?

I meant the operations the arch wants to do on vmfd change.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 25/32] kvm/xen-emu: re-initialize capabilities during confidential guest reset
  2026-01-12 17:19   ` Paolo Bonzini
  2026-01-12 18:22     ` David Woodhouse
@ 2026-01-13  5:26     ` Ani Sinha
  2026-01-13  5:48       ` Paolo Bonzini
  1 sibling, 1 reply; 54+ messages in thread
From: Ani Sinha @ 2026-01-13  5:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Woodhouse, Paul Durrant, Marcelo Tosatti, kvm, qemu-devel

On Mon, Jan 12, 2026 at 10:50 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On Mon, Jan 12, 2026 at 2:24 PM Ani Sinha <anisinha@redhat.com> wrote:
> >
> > On confidential guests KVM virtual machine file descriptor changes as a
> > part of the guest reset process. Xen capabilities needs to be re-initialized in
> > KVM against the new file descriptor.
> >
> > This patch is untested on confidential guests and exists only for completeness.
>
> This sentence should be changed since now your code can be tests on
> non-confidential guests (or removed altogether).  Same for patch
> 23/32.

I can drop all the xen changes altogether for now, if no one objects.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 25/32] kvm/xen-emu: re-initialize capabilities during confidential guest reset
  2026-01-13  5:26     ` Ani Sinha
@ 2026-01-13  5:48       ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-13  5:48 UTC (permalink / raw)
  To: Ani Sinha; +Cc: David Woodhouse, Paul Durrant, Marcelo Tosatti, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

Il mar 13 gen 2026, 06:26 Ani Sinha <anisinha@redhat.com> ha scritto:

> > > This patch is untested on confidential guests and exists only for
> completeness.
> >
> > This sentence should be changed since now your code can be tests on
> > non-confidential guests (or removed altogether).  Same for patch
> > 23/32.
>
> I can drop all the xen changes altogether for now, if no one objects.
>

It's the opposite, if you dropped the changes you would have to make
reset_fds fail for Xen guests. Keep them and add a test.

Paolo

>
>

[-- Attachment #2: Type: text/html, Size: 1192 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 04/32] accel/kvm: add changes required to support KVM VM file descriptor change
  2026-01-13  5:22     ` Ani Sinha
@ 2026-01-13  5:50       ` Paolo Bonzini
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Bonzini @ 2026-01-13  5:50 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Peter Maydell, Marcelo Tosatti, Song Gao, Huacai Chen,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Nicholas Piggin, Harsh Prateek Bora,
	Chinmay Rath, Palmer Dabbelt, Alistair Francis, Weiwei Li,
	Daniel Henrique Barboza, Liu Zhiwei, Halil Pasic,
	Christian Borntraeger, Eric Farman, Matthew Rosato, Thomas Huth,
	Richard Henderson, Ilya Leoshkevich, David Hildenbrand, kvm,
	qemu-devel, qemu-arm,
	zmta06.collab.prod.int.phx2.redhat.com, list@suse.de,
	open list:RISC-V, qemu-s390x

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

Il mar 13 gen 2026, 06:22 Ani Sinha <anisinha@redhat.com> ha scritto:

> On Mon, Jan 12, 2026 at 10:32 PM Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> >
> > On Mon, Jan 12, 2026 at 2:23 PM Ani Sinha <anisinha@redhat.com>:
> >  > +int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
> >
> > Weird name since there are no "operations". Maybe
> kvm_arch_on_vmfd_change?
>
> I meant the operations the arch wants to do on vmfd change.
>

All callbacks are "operations that the implementor wants to do", aren't
they? But "ops" is usually a suffix in types that group multiple callbacks,
not individual functions.

Paolo


>

[-- Attachment #2: Type: text/html, Size: 1513 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 32/32] tests/functional/x86_64: add functional test to exercise vm fd change on reset
  2026-01-12 14:36   ` Daniel P. Berrangé
@ 2026-01-13  5:53     ` Ani Sinha
  0 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-13  5:53 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Paolo Bonzini, Zhao Liu, qemu-devel

On Mon, Jan 12, 2026 at 8:06 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, Jan 12, 2026 at 06:52:45PM +0530, Ani Sinha wrote:
> > A new functional test is added that exercises the code changes related to
> > closing of the old KVM VM file descriptor and opening a new one upon VM reset.
> > This normally happens when confidential guests are resetted but for
> > non-confidential guests, we use a special machine specific debug/test parameter
> > 'x-change-vmfd-on-reset' to enable this behavior.
> > Only specific code changes related to re-initialization of SEV-ES, SEV-SNP and
> > TDX platforms are not exercized in this test as they require hardware that
> > supports running confidential guests.
> >
> > Signed-off-by: Ani Sinha <anisinha@redhat.com>
> > ---
> >  MAINTAINERS                                   |  6 ++
> >  tests/functional/x86_64/meson.build           |  1 +
> >  .../x86_64/test_vmfd_change_reboot.py         | 75 +++++++++++++++++++
> >  3 files changed, 82 insertions(+)
> >  create mode 100755 tests/functional/x86_64/test_vmfd_change_reboot.py
>
>
> > diff --git a/tests/functional/x86_64/test_vmfd_change_reboot.py b/tests/functional/x86_64/test_vmfd_change_reboot.py
> > new file mode 100755
> > index 0000000000..3b33322880
> > --- /dev/null
> > +++ b/tests/functional/x86_64/test_vmfd_change_reboot.py
> > @@ -0,0 +1,75 @@
> > +#!/usr/bin/env python3
> > +#
> > +# KVM VM file descriptor change on reset test
> > +#
> > +# Copyright © 2026 Red Hat, Inc.
> > +#
> > +# Author:
> > +#  Ani Sinha <anisinha@redhat.com>
> > +#
> > +# SPDX-License-Identifier: GPL-2.0-or-later
> > +
> > +import os
> > +from qemu.machine import machine
> > +
> > +from qemu_test import QemuSystemTest, Asset, exec_command_and_wait_for_pattern
> > +from qemu_test import wait_for_console_pattern
> > +
> > +class KVMGuest(QemuSystemTest):
> > +
> > +    ASSET_UKI = Asset('https://gitlab.com/anisinha/misc-artifacts/'
> > +                      '-/raw/main/uki.x86-64.efi?ref_type=heads',
> > +                      'e0f806bd1fa24111312e1fe849d2ee69808d4343930a5'
> > +                      'dc8c1688da17c65f576')
> > +    ASSET_OVMF = Asset('https://gitlab.com/anisinha/misc-artifacts/'
> > +                       '-/raw/main/OVMF.stateless.fd?ref_type=heads',
> > +                       '58a4275aafa8774bd6b1540adceae4ea434b8db75b476'
> > +                       '11839ff47be88cfcf22')
>
> What is the source of these two binaries - the repo doesn't show any
> source code or references ?

For ASSET_UKI, it was built using
https://gitlab.com/kraxel/edk2-tests/-/blob/unittest/tools/make-supermin.sh
Maybe I can add a comment in the test as such.

ASSET_OVMF is a little more complicated ...

> Is there no way we can use standard distro
> images for this test ?

ASSET_OVMF comes from /usr/share/edk2/ovmf/OVMF.stateless.fd of a fc43
VM. It comes from edk2-ovmf-20251119-3.fc43.noarch rpm.
This rpm must be installed in the container where the test is run. I
checked that CI images we use do not have this rpm or the edk2 binary.
In fact, we do not have OVMF.amdsev.fd or its TDX variant or any of
OVMF_CODE.secboot.fd etc either. OVMF packages are simply not in the
container images generated.

There are two reasons why I did it this way -
a) I know the long path to adding a package into all CI container
images and I wanted to avoid doing that just as yet since ...
b) as we spoke offline, even if we did add the package, since /dev/kvm
is not available, this test will be skipped and not run in CI.

So for this test to run successfully, we must enable kvm tests in CI
(have /dev/kvm available). Then we can add the above package and
remove this. Also skip the test where OVMF.stateless.fd is not
available.

For now, I wanted to make sure at least everyone can run this test
manually where kvm is available even if they do not have the package
installed or if the package is not available for their host OS (for
example RHEL-9.6 does not have the stateless variant).


>
> > +
> > +    def common_vm_setup(self):
> > +        self.require_accelerator("kvm")
> > +
> > +        self.vm.set_console()
> > +
> > +        self.vm.add_args("-accel", "kvm")
> > +        self.vm.add_args("-smp", "2")
> > +        self.vm.add_args("-cpu", "host")
> > +        self.vm.add_args("-m", "2G")
> > +        self.vm.add_args("-nographic", "-nodefaults")
> > +
> > +        self.uki_path = self.ASSET_UKI.fetch()
> > +        self.ovmf_path = self.ASSET_OVMF.fetch()
> > +
> > +    def run_and_check(self):
> > +        self.vm.add_args('-kernel', self.uki_path)
> > +        self.vm.add_args("-bios", self.ovmf_path)
> > +        # enable KVM VMFD change on reset for a non-coco VM
> > +        self.vm.add_args("-machine", "q35,x-change-vmfd-on-reset=on")
> > +        # enable tracing
> > +        self.vm.add_args("-d", "trace:kvm_reset_vmfd")
> > +
> > +        try:
> > +            self.vm.launch()
> > +        except machine.VMLaunchFailure as e:
> > +            raise e
> > +
> > +        self.log.info('VM launched')
> > +        console_pattern = 'bash-5.1#'
> > +        wait_for_console_pattern(self, console_pattern)
> > +        self.log.info('VM ready with a bash prompt')
> > +
> > +        exec_command_and_wait_for_pattern(self, '/usr/sbin/reboot -f',
> > +                                          'reboot: machine restart')
> > +        console_pattern = '# --- Hello world ---'
> > +        wait_for_console_pattern(self, console_pattern)
> > +        self.vm.shutdown()
> > +        self.assertRegex(self.vm.get_log(),
> > +                         r'kvm_reset_vmfd \nINFO: virtual machine accel file '
> > +                         'descriptor has changed')
> > +
> > +    def test_vmfd_change_on_reset(self):
> > +        self.common_vm_setup()
> > +        self.run_and_check()
> > +
> > +if __name__ == '__main__':
> > +    QemuSystemTest.main()
> > --
> > 2.42.0
> >
> >
>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 11/32] kvm/i386: reload firmware for confidential guest reset
  2026-01-12 13:22 ` [PATCH v2 11/32] kvm/i386: reload firmware for " Ani Sinha
  2026-01-12 17:08   ` Paolo Bonzini
@ 2026-01-13 13:58   ` Bernhard Beschow
  1 sibling, 0 replies; 54+ messages in thread
From: Bernhard Beschow @ 2026-01-13 13:58 UTC (permalink / raw)
  To: qemu-devel, Ani Sinha, Paolo Bonzini, Marcelo Tosatti; +Cc: kvm



Am 12. Januar 2026 13:22:24 UTC schrieb Ani Sinha <anisinha@redhat.com>:
>When IGVM is not being used by the confidential guest, the guest firmware has
>to be reloaded explictly again into memory. This is because, the memory into
>which the firmware was loaded before reset was encrypted and is thus lost
>upon reset. When IGVM is used, it is expected that the IGVM will contain the
>guest firmware and the execution of the IGVM directives will set up the guest
>firmware memory.
>
>Signed-off-by: Ani Sinha <anisinha@redhat.com>
>---
> target/i386/kvm/kvm.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
>diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>index 4fedc621b8..46c4f9487b 100644
>--- a/target/i386/kvm/kvm.c
>+++ b/target/i386/kvm/kvm.c
>@@ -51,6 +51,8 @@
> #include "qemu/config-file.h"
> #include "qemu/error-report.h"
> #include "qemu/memalign.h"
>+#include "qemu/datadir.h"
>+#include "hw/core/loader.h"
> #include "hw/i386/x86.h"
> #include "hw/i386/kvm/xen_evtchn.h"
> #include "hw/i386/pc.h"
>@@ -3267,6 +3269,22 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
> 
> static int xen_init_wrapper(MachineState *ms, KVMState *s);
> 
>+static void reload_bios_rom(X86MachineState *x86ms)
>+{
>+    int bios_size;
>+    const char *bios_name;
>+    char *filename;
>+
>+    bios_name = MACHINE(x86ms)->firmware ?: "bios.bin";
>+    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
>+
>+    bios_size = get_bios_size(x86ms, bios_name, filename);
>+
>+    void *ptr = memory_region_get_ram_ptr(&x86ms->bios);
>+    load_image_size(filename, ptr, bios_size);
>+    x86_firmware_configure(0x100000000ULL - bios_size, ptr, bios_size);
>+}

All code in this function is already present in x86-common.c. Can we move this function there (possibly renaming it to x86_bios_rom_reload()) and export it? This way, we could avoid code duplication and we didn't need to export additional functions like in the previous patch.

Best regards,
Bernhard

>+
> int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
> {
>     Error *local_err = NULL;
>@@ -3285,6 +3303,16 @@ int kvm_arch_vmfd_change_ops(MachineState *ms, KVMState *s)
>             error_report_err(local_err);
>             return ret;
>         }
>+        if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
>+            X86MachineState *x86ms = X86_MACHINE(ms);
>+            /*
>+             * If an IGVM file is specified then the firmware must be provided
>+             * in the IGVM file.
>+             */
>+            if (!x86ms->igvm) {
>+                reload_bios_rom(x86ms);
>+            }
>+        }
>     }
> 
>     ret = kvm_vm_enable_exception_payload(s);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 27/32] ppc/openpic: create a new openpic device and reattach mem region on coco reset
  2026-01-12 13:22 ` [PATCH v2 27/32] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
@ 2026-01-13 14:13   ` Bernhard Beschow
  0 siblings, 0 replies; 54+ messages in thread
From: Bernhard Beschow @ 2026-01-13 14:13 UTC (permalink / raw)
  To: Ani Sinha; +Cc: qemu-ppc, qemu-devel



Am 12. Januar 2026 13:22:40 UTC schrieb Ani Sinha <anisinha@redhat.com>:
>For confidential guests during the reset process, the old KVM VM file
>descriptor is closed and a new one is created. When a new file descriptor is
>created, a new openpic device needs to be created against this new KVM VM file
>descriptor as well. Additionally, existing memory region needs to be reattached
>to this new openpic device and proper CPU attributes set associating new file
>descriptor. This change makes this happen with the help of a callback handler
>that gets called when the KVM VM file descriptor changes as a part of the
>confidential guest reset process.
>
>Signed-off-by: Ani Sinha <anisinha@redhat.com>
>---
> hw/intc/openpic_kvm.c | 108 ++++++++++++++++++++++++++++++++----------
> 1 file changed, 83 insertions(+), 25 deletions(-)
>
>diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
>index 9aafef5d9e..4fd70d4b32 100644
>--- a/hw/intc/openpic_kvm.c
>+++ b/hw/intc/openpic_kvm.c
>@@ -49,6 +49,7 @@ struct KVMOpenPICState {
>     uint32_t fd;
>     uint32_t model;
>     hwaddr mapped;
>+    NotifierWithReturn open_pic_vmfd_change_notifier;

I'd drop the "open_pic_" prefix here since the attribute resides inside a struct where the context is clear.

> };
> 
> static void kvm_openpic_set_irq(void *opaque, int n_IRQ, int level)
>@@ -114,6 +115,83 @@ static const MemoryRegionOps kvm_openpic_mem_ops = {
>     },
> };
> 
>+static int create_open_pic_device(KVMOpenPICState *opp, Error **errp)

Here in turn I'd stick to existing conventions and use an "kvm_openpic_" prefix. What about naming this function kvm_openpic_setup_vmfd() (or reversed: kvm_openpic_vmfd_setup())?

>+{
>+    int kvm_openpic_model;
>+    struct kvm_create_device cd = {0};
>+    KVMState *s = kvm_state;
>+    int ret;
>+
>+    switch (opp->model) {
>+    case OPENPIC_MODEL_FSL_MPIC_20:
>+        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
>+        break;
>+
>+    case OPENPIC_MODEL_FSL_MPIC_42:
>+        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
>+        break;
>+
>+    default:
>+        error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
>+        return -1;
>+    }
>+
>+    cd.type = kvm_openpic_model;
>+    ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
>+    if (ret < 0) {
>+        error_setg(errp, "Can't create device %d: %s",
>+                   cd.type, strerror(errno));
>+        return -1;
>+    }
>+    opp->fd = cd.fd;
>+
>+    return 0;
>+}
>+
>+static int open_pic_vmfd_handle_vmfd_change(NotifierWithReturn *notifier,
>+                                            void *data, Error **errp)

kvm_openpic_handle_vmfd_change() or similar?

>+{
>+    KVMOpenPICState *opp = container_of(notifier, KVMOpenPICState,
>+                                        open_pic_vmfd_change_notifier);
>+    uint64_t reg_base;
>+    struct kvm_device_attr attr;
>+    CPUState *cs;
>+    int ret;
>+
>+    /* close the old descriptor */
>+    close(opp->fd);
>+
>+    if (create_open_pic_device(opp, errp) < 0) {
>+        return -1;
>+    }
>+
>+    if (!opp->mapped) {
>+        return 0;
>+    }
>+
>+    reg_base = opp->mapped;
>+    attr.group = KVM_DEV_MPIC_GRP_MISC;
>+    attr.attr = KVM_DEV_MPIC_BASE_ADDR;
>+    attr.addr = (uint64_t)(unsigned long)&reg_base;
>+
>+    ret = ioctl(opp->fd, KVM_SET_DEVICE_ATTR, &attr);
>+    if (ret < 0) {
>+        fprintf(stderr, "%s: %s %" PRIx64 "\n", __func__,
>+                strerror(errno), reg_base);

Why not use error_set*()?

Best regards,
Bernhard

>+        return -1;
>+    }
>+
>+    CPU_FOREACH(cs) {
>+        ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_MPIC, 0, opp->fd,
>+                                   kvm_arch_vcpu_id(cs));
>+        if (ret < 0) {
>+            return ret;
>+        }
>+    }
>+
>+    return 0;
>+}
>+
> static void kvm_openpic_region_add(MemoryListener *listener,
>                                    MemoryRegionSection *section)
> {
>@@ -197,37 +275,14 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
>     SysBusDevice *d = SYS_BUS_DEVICE(dev);
>     KVMOpenPICState *opp = KVM_OPENPIC(dev);
>     KVMState *s = kvm_state;
>-    int kvm_openpic_model;
>-    struct kvm_create_device cd = {0};
>-    int ret, i;
>+    int i;
> 
>     if (!kvm_check_extension(s, KVM_CAP_DEVICE_CTRL)) {
>         error_setg(errp, "Kernel is lacking Device Control API");
>         return;
>     }
> 
>-    switch (opp->model) {
>-    case OPENPIC_MODEL_FSL_MPIC_20:
>-        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_20;
>-        break;
>-
>-    case OPENPIC_MODEL_FSL_MPIC_42:
>-        kvm_openpic_model = KVM_DEV_TYPE_FSL_MPIC_42;
>-        break;
>-
>-    default:
>-        error_setg(errp, "Unsupported OpenPIC model %" PRIu32, opp->model);
>-        return;
>-    }
>-
>-    cd.type = kvm_openpic_model;
>-    ret = kvm_vm_ioctl(s, KVM_CREATE_DEVICE, &cd);
>-    if (ret < 0) {
>-        error_setg(errp, "Can't create device %d: %s",
>-                   cd.type, strerror(errno));
>-        return;
>-    }
>-    opp->fd = cd.fd;
>+    create_open_pic_device(opp, errp);
> 
>     sysbus_init_mmio(d, &opp->mem);
>     qdev_init_gpio_in(dev, kvm_openpic_set_irq, OPENPIC_MAX_IRQ);
>@@ -236,6 +291,9 @@ static void kvm_openpic_realize(DeviceState *dev, Error **errp)
>     opp->mem_listener.region_del = kvm_openpic_region_del;
>     opp->mem_listener.name = "openpic-kvm";
>     memory_listener_register(&opp->mem_listener, &address_space_memory);
>+    opp->open_pic_vmfd_change_notifier.notify =
>+        open_pic_vmfd_handle_vmfd_change;
>+    kvm_vmfd_add_change_notifier(&opp->open_pic_vmfd_change_notifier);
> 
>     /* indicate pic capabilities */
>     msi_nonbroken = true;


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 23/32] hw/hyperv/vmbus: add support for confidential guest reset
  2026-01-12 13:22 ` [PATCH v2 23/32] hw/hyperv/vmbus: " Ani Sinha
@ 2026-01-14 13:38   ` Maciej S. Szmigiero
  2026-01-14 13:42     ` Daniel P. Berrangé
  0 siblings, 1 reply; 54+ messages in thread
From: Maciej S. Szmigiero @ 2026-01-14 13:38 UTC (permalink / raw)
  To: Ani Sinha; +Cc: qemu-devel

On 12.01.2026 14:22, Ani Sinha wrote:
> On confidential guests when the KVM virtual machine file descriptor changes as
> a part of the reset process, event file descriptors needs to be reassociated
> with the new KVM VM file descriptor. This is achieved with the help of a
> callback handler that gets called when KVM VM file descriptor changes during
> the confidential guest reset process.
> 
> This patch is untested on confidential guests and only exists for completeness.
> 
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> ---
>   hw/hyperv/vmbus.c | 30 ++++++++++++++++++++++++++++++
>   1 file changed, 30 insertions(+)
> 

Quick question: is this patch set targeting QEMU 11.0 or which version?

Thanks,
Maciej



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 23/32] hw/hyperv/vmbus: add support for confidential guest reset
  2026-01-14 13:38   ` Maciej S. Szmigiero
@ 2026-01-14 13:42     ` Daniel P. Berrangé
  2026-01-14 14:40       ` Maciej S. Szmigiero
  0 siblings, 1 reply; 54+ messages in thread
From: Daniel P. Berrangé @ 2026-01-14 13:42 UTC (permalink / raw)
  To: Maciej S. Szmigiero; +Cc: Ani Sinha, qemu-devel

On Wed, Jan 14, 2026 at 02:38:54PM +0100, Maciej S. Szmigiero wrote:
> On 12.01.2026 14:22, Ani Sinha wrote:
> > On confidential guests when the KVM virtual machine file descriptor changes as
> > a part of the reset process, event file descriptors needs to be reassociated
> > with the new KVM VM file descriptor. This is achieved with the help of a
> > callback handler that gets called when KVM VM file descriptor changes during
> > the confidential guest reset process.
> > 
> > This patch is untested on confidential guests and only exists for completeness.
> > 
> > Signed-off-by: Ani Sinha <anisinha@redhat.com>
> > ---
> >   hw/hyperv/vmbus.c | 30 ++++++++++++++++++++++++++++++
> >   1 file changed, 30 insertions(+)
> > 
> 
> Quick question: is this patch set targeting QEMU 11.0 or which version?

Patches are always assumed to be targetting current git master unless
the cover letter / subject line explicity says otherwise.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 23/32] hw/hyperv/vmbus: add support for confidential guest reset
  2026-01-14 13:42     ` Daniel P. Berrangé
@ 2026-01-14 14:40       ` Maciej S. Szmigiero
  2026-01-15  7:26         ` Ani Sinha
  0 siblings, 1 reply; 54+ messages in thread
From: Maciej S. Szmigiero @ 2026-01-14 14:40 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Ani Sinha, qemu-devel

On 14.01.2026 14:42, Daniel P. Berrangé wrote:
> On Wed, Jan 14, 2026 at 02:38:54PM +0100, Maciej S. Szmigiero wrote:
>> On 12.01.2026 14:22, Ani Sinha wrote:
>>> On confidential guests when the KVM virtual machine file descriptor changes as
>>> a part of the reset process, event file descriptors needs to be reassociated
>>> with the new KVM VM file descriptor. This is achieved with the help of a
>>> callback handler that gets called when KVM VM file descriptor changes during
>>> the confidential guest reset process.
>>>
>>> This patch is untested on confidential guests and only exists for completeness.
>>>
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> ---
>>>    hw/hyperv/vmbus.c | 30 ++++++++++++++++++++++++++++++
>>>    1 file changed, 30 insertions(+)
>>>
>>
>> Quick question: is this patch set targeting QEMU 11.0 or which version?
> 
> Patches are always assumed to be targetting current git master unless
> the cover letter / subject line explicity says otherwise.

I was asking more what QEMU release the submitter thinks this patch set
will make rather than against which git tree it is based on but thanks anyway.
  
> With regards,
> Daniel

Thanks,
Maciej



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 23/32] hw/hyperv/vmbus: add support for confidential guest reset
  2026-01-14 14:40       ` Maciej S. Szmigiero
@ 2026-01-15  7:26         ` Ani Sinha
  0 siblings, 0 replies; 54+ messages in thread
From: Ani Sinha @ 2026-01-15  7:26 UTC (permalink / raw)
  To: Maciej S. Szmigiero; +Cc: Daniel P. Berrangé, qemu-devel

On Wed, Jan 14, 2026 at 8:10 PM Maciej S. Szmigiero
<mail@maciej.szmigiero.name> wrote:
>
> On 14.01.2026 14:42, Daniel P. Berrangé wrote:
> > On Wed, Jan 14, 2026 at 02:38:54PM +0100, Maciej S. Szmigiero wrote:
> >> On 12.01.2026 14:22, Ani Sinha wrote:
> >>> On confidential guests when the KVM virtual machine file descriptor changes as
> >>> a part of the reset process, event file descriptors needs to be reassociated
> >>> with the new KVM VM file descriptor. This is achieved with the help of a
> >>> callback handler that gets called when KVM VM file descriptor changes during
> >>> the confidential guest reset process.
> >>>
> >>> This patch is untested on confidential guests and only exists for completeness.
> >>>
> >>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>> ---
> >>>    hw/hyperv/vmbus.c | 30 ++++++++++++++++++++++++++++++
> >>>    1 file changed, 30 insertions(+)
> >>>
> >>
> >> Quick question: is this patch set targeting QEMU 11.0 or which version?
> >
> > Patches are always assumed to be targetting current git master unless
> > the cover letter / subject line explicity says otherwise.
>
> I was asking more what QEMU release the submitter thinks this patch set
> will make rather than against which git tree it is based on but thanks anyway.

Looking at https://wiki.qemu.org/Planning/11.0, I am not sure if we
will make it in 11.0 since the soft freeze is March 10. Perhaps 11.1



^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2026-01-15  7:28 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 13:22 [PATCH v2 00/32] Introduce support for confidential guest reset Ani Sinha
2026-01-12 13:22 ` [PATCH v2 01/32] i386/kvm: avoid installing duplicate msr entries in msr_handlers Ani Sinha
2026-01-12 13:22 ` [PATCH v2 02/32] hw/accel: add a per-accelerator callback to change VM accelerator handle Ani Sinha
2026-01-12 17:01   ` Paolo Bonzini
2026-01-12 13:22 ` [PATCH v2 03/32] system/physmem: add helper to reattach existing memory after KVM VM fd change Ani Sinha
2026-01-12 13:22 ` [PATCH v2 04/32] accel/kvm: add changes required to support KVM VM file descriptor change Ani Sinha
2026-01-12 17:02   ` Paolo Bonzini
2026-01-13  5:22     ` Ani Sinha
2026-01-13  5:50       ` Paolo Bonzini
2026-01-12 13:22 ` [PATCH v2 05/32] accel/kvm: mark guest state as unprotected after vm " Ani Sinha
2026-01-12 13:22 ` [PATCH v2 06/32] accel/kvm: add a notifier to indicate KVM VM file descriptor has changed Ani Sinha
2026-01-12 13:22 ` [PATCH v2 07/32] accel/kvm: add notifier to inform that the KVM VM file fd is about to be changed Ani Sinha
2026-01-12 13:22 ` [PATCH v2 08/32] i386/kvm: unregister smram listeners prior to vm file descriptor change Ani Sinha
2026-01-12 13:22 ` [PATCH v2 09/32] kvm/i386: implement architecture support for kvm " Ani Sinha
2026-01-12 17:06   ` Paolo Bonzini
2026-01-12 13:22 ` [PATCH v2 10/32] hw/i386: refactor x86_bios_rom_init for reuse in confidential guest reset Ani Sinha
2026-01-12 13:22 ` [PATCH v2 11/32] kvm/i386: reload firmware for " Ani Sinha
2026-01-12 17:08   ` Paolo Bonzini
2026-01-13 13:58   ` Bernhard Beschow
2026-01-12 13:22 ` [PATCH v2 12/32] accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon reset Ani Sinha
2026-01-12 13:22 ` [PATCH v2 13/32] i386/tdx: refactor TDX firmware memory initialization code into a new function Ani Sinha
2026-01-12 13:22 ` [PATCH v2 14/32] i386/tdx: finalize TDX guest state upon reset Ani Sinha
2026-01-12 13:22 ` [PATCH v2 15/32] i386/tdx: add a pre-vmfd change notifier to reset tdx state Ani Sinha
2026-01-12 13:22 ` [PATCH v2 16/32] i386/sev: add migration blockers only once Ani Sinha
2026-01-12 17:16   ` Paolo Bonzini
2026-01-12 13:22 ` [PATCH v2 17/32] i386/sev: add notifiers " Ani Sinha
2026-01-12 13:22 ` [PATCH v2 18/32] i386/sev: free existing launch update data and kernel hashes data on init Ani Sinha
2026-01-12 13:22 ` [PATCH v2 19/32] i386/sev: add support for confidential guest reset Ani Sinha
2026-01-12 17:12   ` Paolo Bonzini
2026-01-12 13:22 ` [PATCH v2 20/32] hw/vfio: generate new file fd for pseudo device and rebind existing descriptors Ani Sinha
2026-01-12 13:22 ` [PATCH v2 21/32] kvm/i8254: refactor pit initialization into a helper Ani Sinha
2026-01-12 13:22 ` [PATCH v2 22/32] kvm/i8254: add support for confidential guest reset Ani Sinha
2026-01-12 13:22 ` [PATCH v2 23/32] hw/hyperv/vmbus: " Ani Sinha
2026-01-14 13:38   ` Maciej S. Szmigiero
2026-01-14 13:42     ` Daniel P. Berrangé
2026-01-14 14:40       ` Maciej S. Szmigiero
2026-01-15  7:26         ` Ani Sinha
2026-01-12 13:22 ` [PATCH v2 24/32] accel/kvm: add a per-confidential class callback to unlock guest state Ani Sinha
2026-01-12 17:11   ` Paolo Bonzini
2026-01-12 13:22 ` [PATCH v2 25/32] kvm/xen-emu: re-initialize capabilities during confidential guest reset Ani Sinha
2026-01-12 17:19   ` Paolo Bonzini
2026-01-12 18:22     ` David Woodhouse
2026-01-13  5:26     ` Ani Sinha
2026-01-13  5:48       ` Paolo Bonzini
2026-01-12 13:22 ` [PATCH v2 26/32] kvm/xen_evtchn: add support for " Ani Sinha
2026-01-12 13:22 ` [PATCH v2 27/32] ppc/openpic: create a new openpic device and reattach mem region on coco reset Ani Sinha
2026-01-13 14:13   ` Bernhard Beschow
2026-01-12 13:22 ` [PATCH v2 28/32] kvm/vcpu: add notifiers to inform vcpu file descriptor change Ani Sinha
2026-01-12 13:22 ` [PATCH v2 29/32] kvm/i386/apic: set local apic after vcpu file descriptors changed Ani Sinha
2026-01-12 13:22 ` [PATCH v2 30/32] kvm/clock: add support for confidential guest reset Ani Sinha
2026-01-12 13:22 ` [PATCH v2 31/32] hw/machine: introduce machine specific option 'x-change-vmfd-on-reset' Ani Sinha
2026-01-12 13:22 ` [PATCH v2 32/32] tests/functional/x86_64: add functional test to exercise vm fd change on reset Ani Sinha
2026-01-12 14:36   ` Daniel P. Berrangé
2026-01-13  5:53     ` Ani Sinha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.