* [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests
@ 2026-06-02 7:12 Naveen N Rao (AMD)
2026-06-03 4:32 ` Naveen N Rao
2026-06-10 0:35 ` Michael Roth
0 siblings, 2 replies; 11+ messages in thread
From: Naveen N Rao (AMD) @ 2026-06-02 7:12 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
Cc: Daniel P. Berrangé, Eduardo Habkost, Eric Blake,
Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania,
Tom Lendacky, Michael Roth, Roy Hopkins, Srikanth Aithal,
Joerg Roedel
KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for
SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0
("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM")
stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests
(KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM
type SEV-ES guests. As a result of this, it is no longer possible to
start SEV-ES guests with any SEV feature enabled (in particular,
debug-swap) with pflash devices.
This is an issue since SEV-ES guests have historically used pflash
devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap
while using pflash devices, so work around the KVM limitation by
switching to using a VMA-based write protection. This allows
well-behaved SEV-ES guests to continue to work with pflash devices and
enable debug-swap. Mis-behaving guests trying to write to the protected
OVMF area will be killed.
Enable VMA protection and set the memory to be RO when adding the KVM
memory slot. Because pflash devices support command-mode, change VMA
protection to RW when tearing down the KVM memory slot. KVM
SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the
protection when calling that.
Print a warning when switching to VMA-based protection so that it is
clear that KVM itself isn't supporting readonly memory, and that a
workaround is in place. Users can plan on switching to using '-bios'.
Finally, drop the check rejecting SEV-ES guests with SEV features so
that debug-swap can be enabled.
Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org>
---
Background discussion on this issue:
http://lore.kernel.org/r/aUq9_cUDWeEW_qli@google.com
This series depends on VMSA features support:
http://lore.kernel.org/r/cover.1779281646.git.naveen@kernel.org
- Naveen
include/system/kvm.h | 5 ++++
include/system/kvm_int.h | 1 +
accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++-
hw/i386/pc_sysfw.c | 19 +++++++++------
target/i386/sev.c | 21 +++++++++++-----
5 files changed, 84 insertions(+), 14 deletions(-)
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 5fa33eddda38..585058bd6f1c 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void);
void kvm_mark_guest_state_protected(void);
+void kvm_enable_ro_mem_vma_protection(void);
+
/**
* kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
* reported for the VM.
@@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size);
int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
+void kvm_set_memory_readonly(void *addr, size_t len);
+void kvm_set_memory_readwrite(void *addr, size_t len);
+
/* argument to vmfd change notifier */
typedef struct VmfdChangeNotifier {
int vmfd;
diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h
index 0876aac938d3..0e083a56ce2a 100644
--- a/include/system/kvm_int.h
+++ b/include/system/kvm_int.h
@@ -123,6 +123,7 @@ struct KVMState
OnOffAuto kernel_irqchip_split;
bool sync_mmu;
bool guest_state_protected;
+ bool guest_wants_ro_mem_vma_protection;
uint64_t manual_dirty_log_protect;
/*
* Older POSIX says that ioctl numbers are signed int, but in
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 96f90ebb240f..4208df5b25ac 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size)
return kvm_set_memory_attributes(start, size, 0);
}
+static void kvm_set_memory_flags(void *addr, size_t len, int flags)
+{
+ if (mprotect(addr, len, flags)) {
+ error_report("failed to apply memory protection "
+ "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'",
+ (hwaddr)addr, len, strerror(errno));
+ exit(1);
+ }
+}
+
+void kvm_set_memory_readonly(void *addr, size_t len)
+{
+ if (kvm_state->guest_wants_ro_mem_vma_protection) {
+ kvm_set_memory_flags(addr, len, PROT_READ);
+ }
+}
+
+void kvm_set_memory_readwrite(void *addr, size_t len)
+{
+ if (kvm_state->guest_wants_ro_mem_vma_protection) {
+ kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE);
+ }
+}
+
+static bool kvm_mem_wants_vma_protection(MemoryRegion *mr)
+{
+ if (!memory_region_is_ram(mr) &&
+ (mr->readonly || mr->rom_device) &&
+ !kvm_readonly_mem_allowed &&
+ kvm_state->guest_wants_ro_mem_vma_protection) {
+ return true;
+ }
+
+ return false;
+}
+
/* Called with KVMMemoryListener.slots_lock held */
static void kvm_set_phys_mem(KVMMemoryListener *kml,
MemoryRegionSection *section, bool add)
@@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
void *ram;
if (!memory_region_is_ram(mr)) {
- if (writable || !kvm_readonly_mem_allowed) {
+ if (writable || (!kvm_readonly_mem_allowed &&
+ !kvm_state->guest_wants_ro_mem_vma_protection)) {
return;
} else if (!mr->romd_mode) {
/* If the memory device is not in romd_mode, then we actually want
@@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
kvm_slot_sync_dirty_pages(mem);
}
+ if (kvm_mem_wants_vma_protection(mr)) {
+ kvm_set_memory_readwrite(mem->ram, mem->memory_size);
+ }
+
/* unregister the slot */
g_free(mem->dirty_bmap);
mem->dirty_bmap = NULL;
@@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
}
}
+ if (kvm_mem_wants_vma_protection(mr)) {
+ kvm_set_memory_readonly(mem->ram, mem->memory_size);
+ }
+
start_addr += slot_size;
ram_start_offset += slot_size;
ram += slot_size;
@@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void)
kvm_state->guest_state_protected = true;
}
+void kvm_enable_ro_mem_vma_protection(void)
+{
+ kvm_state->guest_wants_ro_mem_vma_protection = true;
+}
+
int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
{
int fd;
diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index 1a41a5972bd0..9590458e00c5 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms,
}
} else {
if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
- /*
- * Older KVM cannot execute from device memory. So, flash
- * memory cannot be used unless the readonly memory kvm
- * capability is present.
- */
- error_report("pflash with kvm requires KVM readonly memory support");
- exit(1);
+ if (sev_es_enabled() && !sev_snp_enabled()) {
+ warn_report("pflash not supported with SEV-ES guests, "
+ "attempting VMA based protection");
+ } else {
+ /*
+ * Older KVM cannot execute from device memory. So, flash
+ * memory cannot be used unless the readonly memory kvm
+ * capability is present.
+ */
+ error_report("pflash with kvm requires KVM readonly memory support");
+ exit(1);
+ }
}
pc_system_flash_map(pcms, rom_memory);
diff --git a/target/i386/sev.c b/target/i386/sev.c
index f04ae4e91f3e..82cf2c562729 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState *sev_common, uint64_t sev_features,
return -1;
}
} else {
- if (sev_features && sev_es_enabled()) {
- error_setg(errp,
- "%s: SEV features are not supported with SEV-ES at this time",
- __func__);
- return -1;
- }
if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) {
error_setg(errp,
"%s: SEV_SNP is not enabled but is enabled in VMSA sev_features",
@@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
return -1;
}
+ /*
+ * Use VMA-based protection for SEV-ES guests that enable any
+ * SEV feature, since KVM does not advertise readonly memory
+ * support for non-default type SEV guests.
+ */
+ if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) {
+ kvm_enable_ro_mem_vma_protection();
+ }
+
if (!cgs->ready) {
/*
* SEV uses these notifiers to register/pin pages prior to guest use,
@@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp)
if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) {
int ret;
+ /*
+ * KVM requires these pages to be RW, so remove VMA RO protection
+ * for the duration of SEV_LAUNCH_UPDATE if using SEV features.
+ */
+ kvm_set_memory_readwrite(ptr, len);
ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp);
+ kvm_set_memory_readonly(ptr, len);
if (ret < 0) {
return ret;
}
base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7
prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8
prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c
prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784
prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49
prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3
prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b
prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab
prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368
prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-02 7:12 [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests Naveen N Rao (AMD) @ 2026-06-03 4:32 ` Naveen N Rao 2026-06-10 0:35 ` Michael Roth 1 sibling, 0 replies; 11+ messages in thread From: Naveen N Rao @ 2026-06-03 4:32 UTC (permalink / raw) To: Paolo Bonzini, qemu-devel Cc: Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Tom Lendacky, Michael Roth, Roy Hopkins, Srikanth Aithal, Joerg Roedel, Manali Shukla On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote: > KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for > SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0 > ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") > stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests > (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM > type SEV-ES guests. As a result of this, it is no longer possible to > start SEV-ES guests with any SEV feature enabled (in particular, > debug-swap) with pflash devices. > > This is an issue since SEV-ES guests have historically used pflash > devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap > while using pflash devices, so work around the KVM limitation by > switching to using a VMA-based write protection. This allows > well-behaved SEV-ES guests to continue to work with pflash devices and > enable debug-swap. Mis-behaving guests trying to write to the protected > OVMF area will be killed. > > Enable VMA protection and set the memory to be RO when adding the KVM > memory slot. Because pflash devices support command-mode, change VMA > protection to RW when tearing down the KVM memory slot. KVM > SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the > protection when calling that. > > Print a warning when switching to VMA-based protection so that it is > clear that KVM itself isn't supporting readonly memory, and that a > workaround is in place. Users can plan on switching to using '-bios'. > > Finally, drop the check rejecting SEV-ES guests with SEV features so > that debug-swap can be enabled. > > > Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> > --- > Background discussion on this issue: > http://lore.kernel.org/r/aUq9_cUDWeEW_qli@google.com > > This series depends on VMSA features support: > http://lore.kernel.org/r/cover.1779281646.git.naveen@kernel.org > ... > @@ -2024,6 +2018,15 @@ static int > sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) > return -1; > } > > + /* > + * Use VMA-based protection for SEV-ES guests that enable any > + * SEV feature, since KVM does not advertise readonly memory > + * support for non-default type SEV guests. > + */ > + if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) { > + kvm_enable_ro_mem_vma_protection(); > + } > + Manali reported that -bios no longer works due to this change. That's because we enable protection for all SEV-ES guests here. A simple fix is to move this to where we throw the warning for pflash today (diff below). It looks like we will need an additional change for kernel-hashes since that also calls into sev_encrypt_flash(), so we need to move calls to mprotect to the OVMF caller. The diff below also addresses that. I will wait for some feedback on this patch/approach before posting the next version. - Naveen --- hw/i386/pc_sysfw.c | 1 + hw/i386/pc_sysfw_ovmf.c | 8 ++++++++ target/i386/sev.c | 15 --------------- 3 files changed, 9 insertions(+), 15 deletions(-) diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c index 9590458e00c5..41da3685b98f 100644 --- a/hw/i386/pc_sysfw.c +++ b/hw/i386/pc_sysfw.c @@ -257,6 +257,7 @@ void pc_system_firmware_init(PCMachineState *pcms, if (sev_es_enabled() && !sev_snp_enabled()) { warn_report("pflash not supported with SEV-ES guests, " "attempting VMA based protection"); + kvm_enable_ro_mem_vma_protection(); } else { /* * Older KVM cannot execute from device memory. So, flash diff --git a/hw/i386/pc_sysfw_ovmf.c b/hw/i386/pc_sysfw_ovmf.c index 2f7d15c9f3eb..d7a2e36aa736 100644 --- a/hw/i386/pc_sysfw_ovmf.c +++ b/hw/i386/pc_sysfw_ovmf.c @@ -30,6 +30,7 @@ #include "cpu.h" #include "target/i386/sev.h" #include "kvm/tdx.h" +#include "system/kvm.h" #define OVMF_TABLE_FOOTER_GUID "96b582de-1fb2-45f7-baea-a366c55a082d" @@ -184,7 +185,14 @@ void x86_firmware_configure(hwaddr gpa, void *ptr, int size) exit(1); } + /* + * KVM requires these pages to be RW for SEV_LAUNCH_UPDATE_DATA, + * so remove VMA RO protection if any, for the duration of + * sev_encrypt_flash(). + */ + kvm_set_memory_readwrite(ptr, size); sev_encrypt_flash(gpa, ptr, size, &error_fatal); + kvm_set_memory_readonly(ptr, size); } else if (is_tdx_vm()) { ret = tdx_parse_tdvf(ptr, size); if (ret) { diff --git a/target/i386/sev.c b/target/i386/sev.c index 0876e7c1fa1e..dc8851a12927 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -2018,15 +2018,6 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) return -1; } - /* - * Use VMA-based protection for SEV-ES guests that enable any - * SEV feature, since KVM does not advertise readonly memory - * support for non-default type SEV guests. - */ - if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) { - kvm_enable_ro_mem_vma_protection(); - } - if (!cgs->ready) { /* * SEV uses these notifiers to register/pin pages prior to guest use, @@ -2114,13 +2105,7 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp) if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) { int ret; - /* - * KVM requires these pages to be RW, so remove VMA RO protection - * for the duration of SEV_LAUNCH_UPDATE if using SEV features. - */ - kvm_set_memory_readwrite(ptr, len); ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp); - kvm_set_memory_readonly(ptr, len); if (ret < 0) { return ret; } -- ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-02 7:12 [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests Naveen N Rao (AMD) 2026-06-03 4:32 ` Naveen N Rao @ 2026-06-10 0:35 ` Michael Roth 2026-06-10 11:30 ` Naveen N Rao 1 sibling, 1 reply; 11+ messages in thread From: Michael Roth @ 2026-06-10 0:35 UTC (permalink / raw) To: Naveen N Rao (AMD) Cc: Paolo Bonzini, qemu-devel, Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Tom Lendacky, Roy Hopkins, Srikanth Aithal, Joerg Roedel On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote: > KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for > SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0 > ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") > stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests > (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM > type SEV-ES guests. As a result of this, it is no longer possible to > start SEV-ES guests with any SEV feature enabled (in particular, > debug-swap) with pflash devices. > > This is an issue since SEV-ES guests have historically used pflash > devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap > while using pflash devices, so work around the KVM limitation by > switching to using a VMA-based write protection. This allows > well-behaved SEV-ES guests to continue to work with pflash devices and > enable debug-swap. Mis-behaving guests trying to write to the protected > OVMF area will be killed. Based on Sean's description, a write access to a read-only memslot would cause the vCPU to permanently spin on #NPFs if trying to write to it as MMIO due to #VC handler not triggering, and that's why we don't support read-only memslots. But since SEV-ES was previously working with pflash, it seems like it does not rely on this functionality... So if OVMF isn't writing to write-protected memory, then it wouldn't be triggering the MMIO emulation path in the first place. And if we don't care about enabling the emulation path in this case... then I'm not sure the original reasons for not allowing it for SEV-ES/SNP are applicable. It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE, whereas we literally just need as actually slot that's permanently mapped in the NPT without write access. Is that an accurate summary of the situation? If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT. Or, for a QEMU-specific workaround, just have a pflash implementation that doesn't rely on KVM_MEM_READONLY for cases like this where we don't need MMIO emulation. There's actually another case in hw/nvram/nrf51_nvm.c where this would be applicable. I guess it could be done automatically for the confidential VM case to retain cmdline compatibility...though you're wanting to add the debugswap feature anyway so not sure how important that aspect is. Thanks, Mike > > Enable VMA protection and set the memory to be RO when adding the KVM > memory slot. Because pflash devices support command-mode, change VMA > protection to RW when tearing down the KVM memory slot. KVM > SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the > protection when calling that. > > Print a warning when switching to VMA-based protection so that it is > clear that KVM itself isn't supporting readonly memory, and that a > workaround is in place. Users can plan on switching to using '-bios'. > > Finally, drop the check rejecting SEV-ES guests with SEV features so > that debug-swap can be enabled. > > > Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> > --- > Background discussion on this issue: > http://lore.kernel.org/r/aUq9_cUDWeEW_qli@google.com > > This series depends on VMSA features support: > http://lore.kernel.org/r/cover.1779281646.git.naveen@kernel.org > > > - Naveen > > > include/system/kvm.h | 5 ++++ > include/system/kvm_int.h | 1 + > accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++- > hw/i386/pc_sysfw.c | 19 +++++++++------ > target/i386/sev.c | 21 +++++++++++----- > 5 files changed, 84 insertions(+), 14 deletions(-) > > diff --git a/include/system/kvm.h b/include/system/kvm.h > index 5fa33eddda38..585058bd6f1c 100644 > --- a/include/system/kvm.h > +++ b/include/system/kvm.h > @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void); > > void kvm_mark_guest_state_protected(void); > > +void kvm_enable_ro_mem_vma_protection(void); > + > /** > * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page > * reported for the VM. > @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size); > > int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private); > > +void kvm_set_memory_readonly(void *addr, size_t len); > +void kvm_set_memory_readwrite(void *addr, size_t len); > + > /* argument to vmfd change notifier */ > typedef struct VmfdChangeNotifier { > int vmfd; > diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h > index 0876aac938d3..0e083a56ce2a 100644 > --- a/include/system/kvm_int.h > +++ b/include/system/kvm_int.h > @@ -123,6 +123,7 @@ struct KVMState > OnOffAuto kernel_irqchip_split; > bool sync_mmu; > bool guest_state_protected; > + bool guest_wants_ro_mem_vma_protection; > uint64_t manual_dirty_log_protect; > /* > * Older POSIX says that ioctl numbers are signed int, but in > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index 96f90ebb240f..4208df5b25ac 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size) > return kvm_set_memory_attributes(start, size, 0); > } > > +static void kvm_set_memory_flags(void *addr, size_t len, int flags) > +{ > + if (mprotect(addr, len, flags)) { > + error_report("failed to apply memory protection " > + "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'", > + (hwaddr)addr, len, strerror(errno)); > + exit(1); > + } > +} > + > +void kvm_set_memory_readonly(void *addr, size_t len) > +{ > + if (kvm_state->guest_wants_ro_mem_vma_protection) { > + kvm_set_memory_flags(addr, len, PROT_READ); > + } > +} > + > +void kvm_set_memory_readwrite(void *addr, size_t len) > +{ > + if (kvm_state->guest_wants_ro_mem_vma_protection) { > + kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE); > + } > +} > + > +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr) > +{ > + if (!memory_region_is_ram(mr) && > + (mr->readonly || mr->rom_device) && > + !kvm_readonly_mem_allowed && > + kvm_state->guest_wants_ro_mem_vma_protection) { > + return true; > + } > + > + return false; > +} > + > /* Called with KVMMemoryListener.slots_lock held */ > static void kvm_set_phys_mem(KVMMemoryListener *kml, > MemoryRegionSection *section, bool add) > @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > void *ram; > > if (!memory_region_is_ram(mr)) { > - if (writable || !kvm_readonly_mem_allowed) { > + if (writable || (!kvm_readonly_mem_allowed && > + !kvm_state->guest_wants_ro_mem_vma_protection)) { > return; > } else if (!mr->romd_mode) { > /* If the memory device is not in romd_mode, then we actually want > @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > kvm_slot_sync_dirty_pages(mem); > } > > + if (kvm_mem_wants_vma_protection(mr)) { > + kvm_set_memory_readwrite(mem->ram, mem->memory_size); > + } > + > /* unregister the slot */ > g_free(mem->dirty_bmap); > mem->dirty_bmap = NULL; > @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > } > } > > + if (kvm_mem_wants_vma_protection(mr)) { > + kvm_set_memory_readonly(mem->ram, mem->memory_size); > + } > + > start_addr += slot_size; > ram_start_offset += slot_size; > ram += slot_size; > @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void) > kvm_state->guest_state_protected = true; > } > > +void kvm_enable_ro_mem_vma_protection(void) > +{ > + kvm_state->guest_wants_ro_mem_vma_protection = true; > +} > + > int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp) > { > int fd; > diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c > index 1a41a5972bd0..9590458e00c5 100644 > --- a/hw/i386/pc_sysfw.c > +++ b/hw/i386/pc_sysfw.c > @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms, > } > } else { > if (kvm_enabled() && !kvm_readonly_mem_enabled()) { > - /* > - * Older KVM cannot execute from device memory. So, flash > - * memory cannot be used unless the readonly memory kvm > - * capability is present. > - */ > - error_report("pflash with kvm requires KVM readonly memory support"); > - exit(1); > + if (sev_es_enabled() && !sev_snp_enabled()) { > + warn_report("pflash not supported with SEV-ES guests, " > + "attempting VMA based protection"); > + } else { > + /* > + * Older KVM cannot execute from device memory. So, flash > + * memory cannot be used unless the readonly memory kvm > + * capability is present. > + */ > + error_report("pflash with kvm requires KVM readonly memory support"); > + exit(1); > + } > } > > pc_system_flash_map(pcms, rom_memory); > diff --git a/target/i386/sev.c b/target/i386/sev.c > index f04ae4e91f3e..82cf2c562729 100644 > --- a/target/i386/sev.c > +++ b/target/i386/sev.c > @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState *sev_common, uint64_t sev_features, > return -1; > } > } else { > - if (sev_features && sev_es_enabled()) { > - error_setg(errp, > - "%s: SEV features are not supported with SEV-ES at this time", > - __func__); > - return -1; > - } > if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) { > error_setg(errp, > "%s: SEV_SNP is not enabled but is enabled in VMSA sev_features", > @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) > return -1; > } > > + /* > + * Use VMA-based protection for SEV-ES guests that enable any > + * SEV feature, since KVM does not advertise readonly memory > + * support for non-default type SEV guests. > + */ > + if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) { > + kvm_enable_ro_mem_vma_protection(); > + } > + > if (!cgs->ready) { > /* > * SEV uses these notifiers to register/pin pages prior to guest use, > @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp) > if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) { > int ret; > > + /* > + * KVM requires these pages to be RW, so remove VMA RO protection > + * for the duration of SEV_LAUNCH_UPDATE if using SEV features. > + */ > + kvm_set_memory_readwrite(ptr, len); > ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp); > + kvm_set_memory_readonly(ptr, len); > if (ret < 0) { > return ret; > } > > base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7 > prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8 > prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c > prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784 > prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49 > prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3 > prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b > prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab > prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368 > prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd > -- > 2.54.0 > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-10 0:35 ` Michael Roth @ 2026-06-10 11:30 ` Naveen N Rao 2026-06-10 14:52 ` Tom Lendacky 2026-06-10 19:41 ` Michael Roth 0 siblings, 2 replies; 11+ messages in thread From: Naveen N Rao @ 2026-06-10 11:30 UTC (permalink / raw) To: Michael Roth, Sean Christopherson Cc: Paolo Bonzini, qemu-devel, Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Tom Lendacky, Roy Hopkins, Srikanth Aithal, Joerg Roedel [+Sean] Hi Mike, On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote: > On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote: > > KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for > > SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0 > > ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") > > stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests > > (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM > > type SEV-ES guests. As a result of this, it is no longer possible to > > start SEV-ES guests with any SEV feature enabled (in particular, > > debug-swap) with pflash devices. > > > > This is an issue since SEV-ES guests have historically used pflash > > devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap > > while using pflash devices, so work around the KVM limitation by > > switching to using a VMA-based write protection. This allows > > well-behaved SEV-ES guests to continue to work with pflash devices and > > enable debug-swap. Mis-behaving guests trying to write to the protected > > OVMF area will be killed. > > Based on Sean's description, a write access to a read-only memslot would > cause the vCPU to permanently spin on #NPFs if trying to write to it as > MMIO due to #VC handler not triggering, and that's why we don't support > read-only memslots. But since SEV-ES was previously working with pflash, > it seems like it does not rely on this functionality... Right, normal well-behaved SEV-ES/SNP guests work just fine as they don't write to any of the read-only areas. > > So if OVMF isn't writing to write-protected memory, then it wouldn't be > triggering the MMIO emulation path in the first place. And if we don't > care about enabling the emulation path in this case... then I'm not sure > the original reasons for not allowing it for SEV-ES/SNP are applicable. Guest (not just OVMF) could try and write to the read-only area triggering this issue. A simple write to 0xc0000 from within the guest triggers this. > > It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE, > whereas we literally just need as actually slot that's permanently mapped > in the NPT without write access. > > Is that an accurate summary of the situation? Yes, that sounds right to me. > > If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures > this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT. That would certainly work. > Or, for > a QEMU-specific workaround, just have a pflash implementation that doesn't > rely on KVM_MEM_READONLY for cases like this where we don't need MMIO > emulation. Not sure I follow that... are you suggesting that pflash use regular RW memslots and just let the write through? Thanks, Naveen > There's actually another case in hw/nvram/nrf51_nvm.c where this > would be applicable. I guess it could be done automatically for the > confidential VM case to retain cmdline compatibility...though you're > wanting to add the debugswap feature anyway so not sure how important that > aspect is. > > Thanks, > > Mike > > > > > Enable VMA protection and set the memory to be RO when adding the KVM > > memory slot. Because pflash devices support command-mode, change VMA > > protection to RW when tearing down the KVM memory slot. KVM > > SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the > > protection when calling that. > > > > Print a warning when switching to VMA-based protection so that it is > > clear that KVM itself isn't supporting readonly memory, and that a > > workaround is in place. Users can plan on switching to using '-bios'. > > > > Finally, drop the check rejecting SEV-ES guests with SEV features so > > that debug-swap can be enabled. > > > > > > Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> > > --- > > Background discussion on this issue: > > http://lore.kernel.org/r/aUq9_cUDWeEW_qli@google.com > > > > This series depends on VMSA features support: > > http://lore.kernel.org/r/cover.1779281646.git.naveen@kernel.org > > > > > > - Naveen > > > > > > include/system/kvm.h | 5 ++++ > > include/system/kvm_int.h | 1 + > > accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++- > > hw/i386/pc_sysfw.c | 19 +++++++++------ > > target/i386/sev.c | 21 +++++++++++----- > > 5 files changed, 84 insertions(+), 14 deletions(-) > > > > diff --git a/include/system/kvm.h b/include/system/kvm.h > > index 5fa33eddda38..585058bd6f1c 100644 > > --- a/include/system/kvm.h > > +++ b/include/system/kvm.h > > @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void); > > > > void kvm_mark_guest_state_protected(void); > > > > +void kvm_enable_ro_mem_vma_protection(void); > > + > > /** > > * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page > > * reported for the VM. > > @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size); > > > > int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private); > > > > +void kvm_set_memory_readonly(void *addr, size_t len); > > +void kvm_set_memory_readwrite(void *addr, size_t len); > > + > > /* argument to vmfd change notifier */ > > typedef struct VmfdChangeNotifier { > > int vmfd; > > diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h > > index 0876aac938d3..0e083a56ce2a 100644 > > --- a/include/system/kvm_int.h > > +++ b/include/system/kvm_int.h > > @@ -123,6 +123,7 @@ struct KVMState > > OnOffAuto kernel_irqchip_split; > > bool sync_mmu; > > bool guest_state_protected; > > + bool guest_wants_ro_mem_vma_protection; > > uint64_t manual_dirty_log_protect; > > /* > > * Older POSIX says that ioctl numbers are signed int, but in > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > > index 96f90ebb240f..4208df5b25ac 100644 > > --- a/accel/kvm/kvm-all.c > > +++ b/accel/kvm/kvm-all.c > > @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size) > > return kvm_set_memory_attributes(start, size, 0); > > } > > > > +static void kvm_set_memory_flags(void *addr, size_t len, int flags) > > +{ > > + if (mprotect(addr, len, flags)) { > > + error_report("failed to apply memory protection " > > + "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'", > > + (hwaddr)addr, len, strerror(errno)); > > + exit(1); > > + } > > +} > > + > > +void kvm_set_memory_readonly(void *addr, size_t len) > > +{ > > + if (kvm_state->guest_wants_ro_mem_vma_protection) { > > + kvm_set_memory_flags(addr, len, PROT_READ); > > + } > > +} > > + > > +void kvm_set_memory_readwrite(void *addr, size_t len) > > +{ > > + if (kvm_state->guest_wants_ro_mem_vma_protection) { > > + kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE); > > + } > > +} > > + > > +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr) > > +{ > > + if (!memory_region_is_ram(mr) && > > + (mr->readonly || mr->rom_device) && > > + !kvm_readonly_mem_allowed && > > + kvm_state->guest_wants_ro_mem_vma_protection) { > > + return true; > > + } > > + > > + return false; > > +} > > + > > /* Called with KVMMemoryListener.slots_lock held */ > > static void kvm_set_phys_mem(KVMMemoryListener *kml, > > MemoryRegionSection *section, bool add) > > @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > void *ram; > > > > if (!memory_region_is_ram(mr)) { > > - if (writable || !kvm_readonly_mem_allowed) { > > + if (writable || (!kvm_readonly_mem_allowed && > > + !kvm_state->guest_wants_ro_mem_vma_protection)) { > > return; > > } else if (!mr->romd_mode) { > > /* If the memory device is not in romd_mode, then we actually want > > @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > kvm_slot_sync_dirty_pages(mem); > > } > > > > + if (kvm_mem_wants_vma_protection(mr)) { > > + kvm_set_memory_readwrite(mem->ram, mem->memory_size); > > + } > > + > > /* unregister the slot */ > > g_free(mem->dirty_bmap); > > mem->dirty_bmap = NULL; > > @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > } > > } > > > > + if (kvm_mem_wants_vma_protection(mr)) { > > + kvm_set_memory_readonly(mem->ram, mem->memory_size); > > + } > > + > > start_addr += slot_size; > > ram_start_offset += slot_size; > > ram += slot_size; > > @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void) > > kvm_state->guest_state_protected = true; > > } > > > > +void kvm_enable_ro_mem_vma_protection(void) > > +{ > > + kvm_state->guest_wants_ro_mem_vma_protection = true; > > +} > > + > > int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp) > > { > > int fd; > > diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c > > index 1a41a5972bd0..9590458e00c5 100644 > > --- a/hw/i386/pc_sysfw.c > > +++ b/hw/i386/pc_sysfw.c > > @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms, > > } > > } else { > > if (kvm_enabled() && !kvm_readonly_mem_enabled()) { > > - /* > > - * Older KVM cannot execute from device memory. So, flash > > - * memory cannot be used unless the readonly memory kvm > > - * capability is present. > > - */ > > - error_report("pflash with kvm requires KVM readonly memory support"); > > - exit(1); > > + if (sev_es_enabled() && !sev_snp_enabled()) { > > + warn_report("pflash not supported with SEV-ES guests, " > > + "attempting VMA based protection"); > > + } else { > > + /* > > + * Older KVM cannot execute from device memory. So, flash > > + * memory cannot be used unless the readonly memory kvm > > + * capability is present. > > + */ > > + error_report("pflash with kvm requires KVM readonly memory support"); > > + exit(1); > > + } > > } > > > > pc_system_flash_map(pcms, rom_memory); > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > index f04ae4e91f3e..82cf2c562729 100644 > > --- a/target/i386/sev.c > > +++ b/target/i386/sev.c > > @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState *sev_common, uint64_t sev_features, > > return -1; > > } > > } else { > > - if (sev_features && sev_es_enabled()) { > > - error_setg(errp, > > - "%s: SEV features are not supported with SEV-ES at this time", > > - __func__); > > - return -1; > > - } > > if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) { > > error_setg(errp, > > "%s: SEV_SNP is not enabled but is enabled in VMSA sev_features", > > @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) > > return -1; > > } > > > > + /* > > + * Use VMA-based protection for SEV-ES guests that enable any > > + * SEV feature, since KVM does not advertise readonly memory > > + * support for non-default type SEV guests. > > + */ > > + if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) { > > + kvm_enable_ro_mem_vma_protection(); > > + } > > + > > if (!cgs->ready) { > > /* > > * SEV uses these notifiers to register/pin pages prior to guest use, > > @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp) > > if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) { > > int ret; > > > > + /* > > + * KVM requires these pages to be RW, so remove VMA RO protection > > + * for the duration of SEV_LAUNCH_UPDATE if using SEV features. > > + */ > > + kvm_set_memory_readwrite(ptr, len); > > ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp); > > + kvm_set_memory_readonly(ptr, len); > > if (ret < 0) { > > return ret; > > } > > > > base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7 > > prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8 > > prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c > > prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784 > > prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49 > > prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3 > > prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b > > prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab > > prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368 > > prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd > > -- > > 2.54.0 > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-10 11:30 ` Naveen N Rao @ 2026-06-10 14:52 ` Tom Lendacky 2026-06-10 14:52 ` Tom Lendacky 2026-06-10 19:33 ` Michael Roth 2026-06-10 19:41 ` Michael Roth 1 sibling, 2 replies; 11+ messages in thread From: Tom Lendacky @ 2026-06-10 14:52 UTC (permalink / raw) To: Naveen N Rao, Michael Roth, Sean Christopherson Cc: Paolo Bonzini, qemu-devel, Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Roy Hopkins, Srikanth Aithal, Joerg Roedel On 6/10/26 06:30, Naveen N Rao wrote: > [+Sean] > > Hi Mike, > > On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote: >> On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote: >>> KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for >>> SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0 >>> ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") >>> stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests >>> (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM >>> type SEV-ES guests. As a result of this, it is no longer possible to >>> start SEV-ES guests with any SEV feature enabled (in particular, >>> debug-swap) with pflash devices. >>> >>> This is an issue since SEV-ES guests have historically used pflash >>> devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap >>> while using pflash devices, so work around the KVM limitation by >>> switching to using a VMA-based write protection. This allows >>> well-behaved SEV-ES guests to continue to work with pflash devices and >>> enable debug-swap. Mis-behaving guests trying to write to the protected >>> OVMF area will be killed. >> >> Based on Sean's description, a write access to a read-only memslot would >> cause the vCPU to permanently spin on #NPFs if trying to write to it as >> MMIO due to #VC handler not triggering, and that's why we don't support >> read-only memslots. But since SEV-ES was previously working with pflash, >> it seems like it does not rely on this functionality... > > Right, normal well-behaved SEV-ES/SNP guests work just fine as they > don't write to any of the read-only areas. Yes they do. There is specific support to make a direct GHCB MMIO request because of the lack of the #VC exception (see OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c). Thanks, Tom > >> >> So if OVMF isn't writing to write-protected memory, then it wouldn't be >> triggering the MMIO emulation path in the first place. And if we don't >> care about enabling the emulation path in this case... then I'm not sure >> the original reasons for not allowing it for SEV-ES/SNP are applicable. > > Guest (not just OVMF) could try and write to the read-only area > triggering this issue. A simple write to 0xc0000 from within the guest > triggers this. > >> >> It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE, >> whereas we literally just need as actually slot that's permanently mapped >> in the NPT without write access. >> >> Is that an accurate summary of the situation? > > Yes, that sounds right to me. > >> >> If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures >> this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT. > > That would certainly work. > >> Or, for >> a QEMU-specific workaround, just have a pflash implementation that doesn't >> rely on KVM_MEM_READONLY for cases like this where we don't need MMIO >> emulation. > > Not sure I follow that... are you suggesting that pflash use regular RW > memslots and just let the write through? > > > Thanks, > Naveen > >> There's actually another case in hw/nvram/nrf51_nvm.c where this >> would be applicable. I guess it could be done automatically for the >> confidential VM case to retain cmdline compatibility...though you're >> wanting to add the debugswap feature anyway so not sure how important that >> aspect is. >> >> Thanks, >> >> Mike >> >>> >>> Enable VMA protection and set the memory to be RO when adding the KVM >>> memory slot. Because pflash devices support command-mode, change VMA >>> protection to RW when tearing down the KVM memory slot. KVM >>> SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the >>> protection when calling that. >>> >>> Print a warning when switching to VMA-based protection so that it is >>> clear that KVM itself isn't supporting readonly memory, and that a >>> workaround is in place. Users can plan on switching to using '-bios'. >>> >>> Finally, drop the check rejecting SEV-ES guests with SEV features so >>> that debug-swap can be enabled. >>> >>> >>> Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> >>> --- >>> Background discussion on this issue: >>> http://lore.kernel.org/r/aUq9_cUDWeEW_qli@google.com >>> >>> This series depends on VMSA features support: >>> http://lore.kernel.org/r/cover.1779281646.git.naveen@kernel.org >>> >>> >>> - Naveen >>> >>> >>> include/system/kvm.h | 5 ++++ >>> include/system/kvm_int.h | 1 + >>> accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++- >>> hw/i386/pc_sysfw.c | 19 +++++++++------ >>> target/i386/sev.c | 21 +++++++++++----- >>> 5 files changed, 84 insertions(+), 14 deletions(-) >>> >>> diff --git a/include/system/kvm.h b/include/system/kvm.h >>> index 5fa33eddda38..585058bd6f1c 100644 >>> --- a/include/system/kvm.h >>> +++ b/include/system/kvm.h >>> @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void); >>> >>> void kvm_mark_guest_state_protected(void); >>> >>> +void kvm_enable_ro_mem_vma_protection(void); >>> + >>> /** >>> * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page >>> * reported for the VM. >>> @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size); >>> >>> int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private); >>> >>> +void kvm_set_memory_readonly(void *addr, size_t len); >>> +void kvm_set_memory_readwrite(void *addr, size_t len); >>> + >>> /* argument to vmfd change notifier */ >>> typedef struct VmfdChangeNotifier { >>> int vmfd; >>> diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h >>> index 0876aac938d3..0e083a56ce2a 100644 >>> --- a/include/system/kvm_int.h >>> +++ b/include/system/kvm_int.h >>> @@ -123,6 +123,7 @@ struct KVMState >>> OnOffAuto kernel_irqchip_split; >>> bool sync_mmu; >>> bool guest_state_protected; >>> + bool guest_wants_ro_mem_vma_protection; >>> uint64_t manual_dirty_log_protect; >>> /* >>> * Older POSIX says that ioctl numbers are signed int, but in >>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c >>> index 96f90ebb240f..4208df5b25ac 100644 >>> --- a/accel/kvm/kvm-all.c >>> +++ b/accel/kvm/kvm-all.c >>> @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size) >>> return kvm_set_memory_attributes(start, size, 0); >>> } >>> >>> +static void kvm_set_memory_flags(void *addr, size_t len, int flags) >>> +{ >>> + if (mprotect(addr, len, flags)) { >>> + error_report("failed to apply memory protection " >>> + "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'", >>> + (hwaddr)addr, len, strerror(errno)); >>> + exit(1); >>> + } >>> +} >>> + >>> +void kvm_set_memory_readonly(void *addr, size_t len) >>> +{ >>> + if (kvm_state->guest_wants_ro_mem_vma_protection) { >>> + kvm_set_memory_flags(addr, len, PROT_READ); >>> + } >>> +} >>> + >>> +void kvm_set_memory_readwrite(void *addr, size_t len) >>> +{ >>> + if (kvm_state->guest_wants_ro_mem_vma_protection) { >>> + kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE); >>> + } >>> +} >>> + >>> +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr) >>> +{ >>> + if (!memory_region_is_ram(mr) && >>> + (mr->readonly || mr->rom_device) && >>> + !kvm_readonly_mem_allowed && >>> + kvm_state->guest_wants_ro_mem_vma_protection) { >>> + return true; >>> + } >>> + >>> + return false; >>> +} >>> + >>> /* Called with KVMMemoryListener.slots_lock held */ >>> static void kvm_set_phys_mem(KVMMemoryListener *kml, >>> MemoryRegionSection *section, bool add) >>> @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, >>> void *ram; >>> >>> if (!memory_region_is_ram(mr)) { >>> - if (writable || !kvm_readonly_mem_allowed) { >>> + if (writable || (!kvm_readonly_mem_allowed && >>> + !kvm_state->guest_wants_ro_mem_vma_protection)) { >>> return; >>> } else if (!mr->romd_mode) { >>> /* If the memory device is not in romd_mode, then we actually want >>> @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, >>> kvm_slot_sync_dirty_pages(mem); >>> } >>> >>> + if (kvm_mem_wants_vma_protection(mr)) { >>> + kvm_set_memory_readwrite(mem->ram, mem->memory_size); >>> + } >>> + >>> /* unregister the slot */ >>> g_free(mem->dirty_bmap); >>> mem->dirty_bmap = NULL; >>> @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, >>> } >>> } >>> >>> + if (kvm_mem_wants_vma_protection(mr)) { >>> + kvm_set_memory_readonly(mem->ram, mem->memory_size); >>> + } >>> + >>> start_addr += slot_size; >>> ram_start_offset += slot_size; >>> ram += slot_size; >>> @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void) >>> kvm_state->guest_state_protected = true; >>> } >>> >>> +void kvm_enable_ro_mem_vma_protection(void) >>> +{ >>> + kvm_state->guest_wants_ro_mem_vma_protection = true; >>> +} >>> + >>> int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp) >>> { >>> int fd; >>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c >>> index 1a41a5972bd0..9590458e00c5 100644 >>> --- a/hw/i386/pc_sysfw.c >>> +++ b/hw/i386/pc_sysfw.c >>> @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms, >>> } >>> } else { >>> if (kvm_enabled() && !kvm_readonly_mem_enabled()) { >>> - /* >>> - * Older KVM cannot execute from device memory. So, flash >>> - * memory cannot be used unless the readonly memory kvm >>> - * capability is present. >>> - */ >>> - error_report("pflash with kvm requires KVM readonly memory support"); >>> - exit(1); >>> + if (sev_es_enabled() && !sev_snp_enabled()) { >>> + warn_report("pflash not supported with SEV-ES guests, " >>> + "attempting VMA based protection"); >>> + } else { >>> + /* >>> + * Older KVM cannot execute from device memory. So, flash >>> + * memory cannot be used unless the readonly memory kvm >>> + * capability is present. >>> + */ >>> + error_report("pflash with kvm requires KVM readonly memory support"); >>> + exit(1); >>> + } >>> } >>> >>> pc_system_flash_map(pcms, rom_memory); >>> diff --git a/target/i386/sev.c b/target/i386/sev.c >>> index f04ae4e91f3e..82cf2c562729 100644 >>> --- a/target/i386/sev.c >>> +++ b/target/i386/sev.c >>> @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState *sev_common, uint64_t sev_features, >>> return -1; >>> } >>> } else { >>> - if (sev_features && sev_es_enabled()) { >>> - error_setg(errp, >>> - "%s: SEV features are not supported with SEV-ES at this time", >>> - __func__); >>> - return -1; >>> - } >>> if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) { >>> error_setg(errp, >>> "%s: SEV_SNP is not enabled but is enabled in VMSA sev_features", >>> @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) >>> return -1; >>> } >>> >>> + /* >>> + * Use VMA-based protection for SEV-ES guests that enable any >>> + * SEV feature, since KVM does not advertise readonly memory >>> + * support for non-default type SEV guests. >>> + */ >>> + if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) { >>> + kvm_enable_ro_mem_vma_protection(); >>> + } >>> + >>> if (!cgs->ready) { >>> /* >>> * SEV uses these notifiers to register/pin pages prior to guest use, >>> @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp) >>> if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) { >>> int ret; >>> >>> + /* >>> + * KVM requires these pages to be RW, so remove VMA RO protection >>> + * for the duration of SEV_LAUNCH_UPDATE if using SEV features. >>> + */ >>> + kvm_set_memory_readwrite(ptr, len); >>> ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp); >>> + kvm_set_memory_readonly(ptr, len); >>> if (ret < 0) { >>> return ret; >>> } >>> >>> base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7 >>> prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8 >>> prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c >>> prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784 >>> prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49 >>> prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3 >>> prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b >>> prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab >>> prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368 >>> prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd >>> -- >>> 2.54.0 >>> >>> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-10 14:52 ` Tom Lendacky @ 2026-06-10 14:52 ` Tom Lendacky 2026-06-10 19:33 ` Michael Roth 1 sibling, 0 replies; 11+ messages in thread From: Tom Lendacky @ 2026-06-10 14:52 UTC (permalink / raw) To: Naveen N Rao, Michael Roth, Sean Christopherson Cc: Paolo Bonzini, qemu-devel, Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Roy Hopkins, Srikanth Aithal, Joerg Roedel On 6/10/26 09:52, Tom Lendacky wrote: > On 6/10/26 06:30, Naveen N Rao wrote: >> [+Sean] >> >> Hi Mike, >> >> On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote: >>> On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote: >>>> KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for >>>> SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0 >>>> ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") >>>> stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests >>>> (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM >>>> type SEV-ES guests. As a result of this, it is no longer possible to >>>> start SEV-ES guests with any SEV feature enabled (in particular, >>>> debug-swap) with pflash devices. >>>> >>>> This is an issue since SEV-ES guests have historically used pflash >>>> devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap >>>> while using pflash devices, so work around the KVM limitation by >>>> switching to using a VMA-based write protection. This allows >>>> well-behaved SEV-ES guests to continue to work with pflash devices and >>>> enable debug-swap. Mis-behaving guests trying to write to the protected >>>> OVMF area will be killed. >>> >>> Based on Sean's description, a write access to a read-only memslot would >>> cause the vCPU to permanently spin on #NPFs if trying to write to it as >>> MMIO due to #VC handler not triggering, and that's why we don't support >>> read-only memslots. But since SEV-ES was previously working with pflash, >>> it seems like it does not rely on this functionality... >> >> Right, normal well-behaved SEV-ES/SNP guests work just fine as they >> don't write to any of the read-only areas. > > Yes they do. There is specific support to make a direct GHCB MMIO > request because of the lack of the #VC exception (see > OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c). Specifically the QemuFlashPtrWrite() function. Thanks, Tom > > Thanks, > Tom > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-10 14:52 ` Tom Lendacky 2026-06-10 14:52 ` Tom Lendacky @ 2026-06-10 19:33 ` Michael Roth 2026-06-11 11:32 ` Naveen N Rao 1 sibling, 1 reply; 11+ messages in thread From: Michael Roth @ 2026-06-10 19:33 UTC (permalink / raw) To: Tom Lendacky Cc: Naveen N Rao, Sean Christopherson, Paolo Bonzini, qemu-devel, Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Roy Hopkins, Srikanth Aithal, Joerg Roedel On Wed, Jun 10, 2026 at 09:52:00AM -0500, Tom Lendacky wrote: > On 6/10/26 06:30, Naveen N Rao wrote: > > [+Sean] > > > > Hi Mike, > > > > On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote: > >> On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote: > >>> KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for > >>> SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0 > >>> ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") > >>> stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests > >>> (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM > >>> type SEV-ES guests. As a result of this, it is no longer possible to > >>> start SEV-ES guests with any SEV feature enabled (in particular, > >>> debug-swap) with pflash devices. > >>> > >>> This is an issue since SEV-ES guests have historically used pflash > >>> devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap > >>> while using pflash devices, so work around the KVM limitation by > >>> switching to using a VMA-based write protection. This allows > >>> well-behaved SEV-ES guests to continue to work with pflash devices and > >>> enable debug-swap. Mis-behaving guests trying to write to the protected > >>> OVMF area will be killed. > >> > >> Based on Sean's description, a write access to a read-only memslot would > >> cause the vCPU to permanently spin on #NPFs if trying to write to it as > >> MMIO due to #VC handler not triggering, and that's why we don't support > >> read-only memslots. But since SEV-ES was previously working with pflash, > >> it seems like it does not rely on this functionality... > > > > Right, normal well-behaved SEV-ES/SNP guests work just fine as they > > don't write to any of the read-only areas. > > Yes they do. There is specific support to make a direct GHCB MMIO > request because of the lack of the #VC exception (see > OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c). With that change in place, it seems like we don't have remaining guest-side code for ES/SNP guests that relies on emulate-on-write in OVMF for private MMIO (seems like it never would have worked properly anyway). It's possible we still rely on emulate-on-write for writes to shared MMIO ranges though. But in that case I don't see why it wouldn't be okay to continue to just forward the corresponding write-faults to userspace as KVM_EXIT_MMIO events since QEMU can access shared memory just fine. It's only the private MMIO that would misbehave because the emulation path ... but I'm a little confused on this, because we'd still get #NPFs due to the write protection... and it looks like this would trigger a KVM_EXIT_MEMORY_FAULT to QEMU... so if QEMU really wanted to catch this case... which seems to be the only one that's indicative of misbehavior, we could just terminate if the access is to a read-only memslot and we are running an ES/SNP guest... so if that's all that's holding us back on the kernel side, we could directly start re-advertising KVM_CAP_READONLY_MEM, or some new variant of it where userspace needs to be aware of these additional considerations for private MMIO. I think maybe the case that Sean is referencing in his commit, where we can't make use of MMIO stub entries to trigger #VC, comes into play when QEMU switches the memory region from romd_mode to !romd_mode, which then unmaps the memslot and relies on the noslot MMIO handling. That's where private MMIO would stop triggering the (desired) QEMU crash, but KVM would catch this too as an #NPF, and this would also be forwarded to userspace as a KVM_EXIT_MEMORY_FAULT... so just like the above, if we accept that private MMIO is not possible, and only want to actively catch it so we can crash the guest or warn... then we can handle this the same as above and error if the KVM_EXIT_MEMORY_FAULT is for a private access to a GPA range backed by a read-only memslot... *maybe* the fault info would need some flag to indicate that this is MMIO since we do allow implicit conversions via KVM_EXIT_MEMORY_FAULT in general. and userspace might like some way to easily differentiate between the good/bad conversions without tracking to much state, but wouldn't that work in theory at least? Thanks, Mike > > Thanks, > Tom > > > > >> > >> So if OVMF isn't writing to write-protected memory, then it wouldn't be > >> triggering the MMIO emulation path in the first place. And if we don't > >> care about enabling the emulation path in this case... then I'm not sure > >> the original reasons for not allowing it for SEV-ES/SNP are applicable. > > > > Guest (not just OVMF) could try and write to the read-only area > > triggering this issue. A simple write to 0xc0000 from within the guest > > triggers this. > > > >> > >> It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE, > >> whereas we literally just need as actually slot that's permanently mapped > >> in the NPT without write access. > >> > >> Is that an accurate summary of the situation? > > > > Yes, that sounds right to me. > > > >> > >> If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures > >> this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT. > > > > That would certainly work. > > > >> Or, for > >> a QEMU-specific workaround, just have a pflash implementation that doesn't > >> rely on KVM_MEM_READONLY for cases like this where we don't need MMIO > >> emulation. > > > > Not sure I follow that... are you suggesting that pflash use regular RW > > memslots and just let the write through? > > > > > > Thanks, > > Naveen > > > >> There's actually another case in hw/nvram/nrf51_nvm.c where this > >> would be applicable. I guess it could be done automatically for the > >> confidential VM case to retain cmdline compatibility...though you're > >> wanting to add the debugswap feature anyway so not sure how important that > >> aspect is. > >> > >> Thanks, > >> > >> Mike > >> > >>> > >>> Enable VMA protection and set the memory to be RO when adding the KVM > >>> memory slot. Because pflash devices support command-mode, change VMA > >>> protection to RW when tearing down the KVM memory slot. KVM > >>> SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the > >>> protection when calling that. > >>> > >>> Print a warning when switching to VMA-based protection so that it is > >>> clear that KVM itself isn't supporting readonly memory, and that a > >>> workaround is in place. Users can plan on switching to using '-bios'. > >>> > >>> Finally, drop the check rejecting SEV-ES guests with SEV features so > >>> that debug-swap can be enabled. > >>> > >>> > >>> Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> > >>> --- > >>> Background discussion on this issue: > >>> http://lore.kernel.org/r/aUq9_cUDWeEW_qli@google.com > >>> > >>> This series depends on VMSA features support: > >>> http://lore.kernel.org/r/cover.1779281646.git.naveen@kernel.org > >>> > >>> > >>> - Naveen > >>> > >>> > >>> include/system/kvm.h | 5 ++++ > >>> include/system/kvm_int.h | 1 + > >>> accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++- > >>> hw/i386/pc_sysfw.c | 19 +++++++++------ > >>> target/i386/sev.c | 21 +++++++++++----- > >>> 5 files changed, 84 insertions(+), 14 deletions(-) > >>> > >>> diff --git a/include/system/kvm.h b/include/system/kvm.h > >>> index 5fa33eddda38..585058bd6f1c 100644 > >>> --- a/include/system/kvm.h > >>> +++ b/include/system/kvm.h > >>> @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void); > >>> > >>> void kvm_mark_guest_state_protected(void); > >>> > >>> +void kvm_enable_ro_mem_vma_protection(void); > >>> + > >>> /** > >>> * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page > >>> * reported for the VM. > >>> @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size); > >>> > >>> int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private); > >>> > >>> +void kvm_set_memory_readonly(void *addr, size_t len); > >>> +void kvm_set_memory_readwrite(void *addr, size_t len); > >>> + > >>> /* argument to vmfd change notifier */ > >>> typedef struct VmfdChangeNotifier { > >>> int vmfd; > >>> diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h > >>> index 0876aac938d3..0e083a56ce2a 100644 > >>> --- a/include/system/kvm_int.h > >>> +++ b/include/system/kvm_int.h > >>> @@ -123,6 +123,7 @@ struct KVMState > >>> OnOffAuto kernel_irqchip_split; > >>> bool sync_mmu; > >>> bool guest_state_protected; > >>> + bool guest_wants_ro_mem_vma_protection; > >>> uint64_t manual_dirty_log_protect; > >>> /* > >>> * Older POSIX says that ioctl numbers are signed int, but in > >>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > >>> index 96f90ebb240f..4208df5b25ac 100644 > >>> --- a/accel/kvm/kvm-all.c > >>> +++ b/accel/kvm/kvm-all.c > >>> @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size) > >>> return kvm_set_memory_attributes(start, size, 0); > >>> } > >>> > >>> +static void kvm_set_memory_flags(void *addr, size_t len, int flags) > >>> +{ > >>> + if (mprotect(addr, len, flags)) { > >>> + error_report("failed to apply memory protection " > >>> + "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'", > >>> + (hwaddr)addr, len, strerror(errno)); > >>> + exit(1); > >>> + } > >>> +} > >>> + > >>> +void kvm_set_memory_readonly(void *addr, size_t len) > >>> +{ > >>> + if (kvm_state->guest_wants_ro_mem_vma_protection) { > >>> + kvm_set_memory_flags(addr, len, PROT_READ); > >>> + } > >>> +} > >>> + > >>> +void kvm_set_memory_readwrite(void *addr, size_t len) > >>> +{ > >>> + if (kvm_state->guest_wants_ro_mem_vma_protection) { > >>> + kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE); > >>> + } > >>> +} > >>> + > >>> +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr) > >>> +{ > >>> + if (!memory_region_is_ram(mr) && > >>> + (mr->readonly || mr->rom_device) && > >>> + !kvm_readonly_mem_allowed && > >>> + kvm_state->guest_wants_ro_mem_vma_protection) { > >>> + return true; > >>> + } > >>> + > >>> + return false; > >>> +} > >>> + > >>> /* Called with KVMMemoryListener.slots_lock held */ > >>> static void kvm_set_phys_mem(KVMMemoryListener *kml, > >>> MemoryRegionSection *section, bool add) > >>> @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > >>> void *ram; > >>> > >>> if (!memory_region_is_ram(mr)) { > >>> - if (writable || !kvm_readonly_mem_allowed) { > >>> + if (writable || (!kvm_readonly_mem_allowed && > >>> + !kvm_state->guest_wants_ro_mem_vma_protection)) { > >>> return; > >>> } else if (!mr->romd_mode) { > >>> /* If the memory device is not in romd_mode, then we actually want > >>> @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > >>> kvm_slot_sync_dirty_pages(mem); > >>> } > >>> > >>> + if (kvm_mem_wants_vma_protection(mr)) { > >>> + kvm_set_memory_readwrite(mem->ram, mem->memory_size); > >>> + } > >>> + > >>> /* unregister the slot */ > >>> g_free(mem->dirty_bmap); > >>> mem->dirty_bmap = NULL; > >>> @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > >>> } > >>> } > >>> > >>> + if (kvm_mem_wants_vma_protection(mr)) { > >>> + kvm_set_memory_readonly(mem->ram, mem->memory_size); > >>> + } > >>> + > >>> start_addr += slot_size; > >>> ram_start_offset += slot_size; > >>> ram += slot_size; > >>> @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void) > >>> kvm_state->guest_state_protected = true; > >>> } > >>> > >>> +void kvm_enable_ro_mem_vma_protection(void) > >>> +{ > >>> + kvm_state->guest_wants_ro_mem_vma_protection = true; > >>> +} > >>> + > >>> int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp) > >>> { > >>> int fd; > >>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c > >>> index 1a41a5972bd0..9590458e00c5 100644 > >>> --- a/hw/i386/pc_sysfw.c > >>> +++ b/hw/i386/pc_sysfw.c > >>> @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms, > >>> } > >>> } else { > >>> if (kvm_enabled() && !kvm_readonly_mem_enabled()) { > >>> - /* > >>> - * Older KVM cannot execute from device memory. So, flash > >>> - * memory cannot be used unless the readonly memory kvm > >>> - * capability is present. > >>> - */ > >>> - error_report("pflash with kvm requires KVM readonly memory support"); > >>> - exit(1); > >>> + if (sev_es_enabled() && !sev_snp_enabled()) { > >>> + warn_report("pflash not supported with SEV-ES guests, " > >>> + "attempting VMA based protection"); > >>> + } else { > >>> + /* > >>> + * Older KVM cannot execute from device memory. So, flash > >>> + * memory cannot be used unless the readonly memory kvm > >>> + * capability is present. > >>> + */ > >>> + error_report("pflash with kvm requires KVM readonly memory support"); > >>> + exit(1); > >>> + } > >>> } > >>> > >>> pc_system_flash_map(pcms, rom_memory); > >>> diff --git a/target/i386/sev.c b/target/i386/sev.c > >>> index f04ae4e91f3e..82cf2c562729 100644 > >>> --- a/target/i386/sev.c > >>> +++ b/target/i386/sev.c > >>> @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState *sev_common, uint64_t sev_features, > >>> return -1; > >>> } > >>> } else { > >>> - if (sev_features && sev_es_enabled()) { > >>> - error_setg(errp, > >>> - "%s: SEV features are not supported with SEV-ES at this time", > >>> - __func__); > >>> - return -1; > >>> - } > >>> if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) { > >>> error_setg(errp, > >>> "%s: SEV_SNP is not enabled but is enabled in VMSA sev_features", > >>> @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) > >>> return -1; > >>> } > >>> > >>> + /* > >>> + * Use VMA-based protection for SEV-ES guests that enable any > >>> + * SEV feature, since KVM does not advertise readonly memory > >>> + * support for non-default type SEV guests. > >>> + */ > >>> + if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) { > >>> + kvm_enable_ro_mem_vma_protection(); > >>> + } > >>> + > >>> if (!cgs->ready) { > >>> /* > >>> * SEV uses these notifiers to register/pin pages prior to guest use, > >>> @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp) > >>> if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) { > >>> int ret; > >>> > >>> + /* > >>> + * KVM requires these pages to be RW, so remove VMA RO protection > >>> + * for the duration of SEV_LAUNCH_UPDATE if using SEV features. > >>> + */ > >>> + kvm_set_memory_readwrite(ptr, len); > >>> ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp); > >>> + kvm_set_memory_readonly(ptr, len); > >>> if (ret < 0) { > >>> return ret; > >>> } > >>> > >>> base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7 > >>> prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8 > >>> prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c > >>> prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784 > >>> prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49 > >>> prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3 > >>> prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b > >>> prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab > >>> prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368 > >>> prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd > >>> -- > >>> 2.54.0 > >>> > >>> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-10 19:33 ` Michael Roth @ 2026-06-11 11:32 ` Naveen N Rao 2026-06-11 16:11 ` Sean Christopherson 0 siblings, 1 reply; 11+ messages in thread From: Naveen N Rao @ 2026-06-11 11:32 UTC (permalink / raw) To: Michael Roth Cc: Tom Lendacky, Sean Christopherson, Paolo Bonzini, qemu-devel, Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Roy Hopkins, Srikanth Aithal, Joerg Roedel On Wed, Jun 10, 2026 at 02:33:20PM -0500, Michael Roth wrote: > On Wed, Jun 10, 2026 at 09:52:00AM -0500, Tom Lendacky wrote: > > On 6/10/26 06:30, Naveen N Rao wrote: > > > [+Sean] > > > > > > Hi Mike, > > > > > > On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote: > > >> On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote: > > >>> KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for > > >>> SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0 > > >>> ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") > > >>> stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests > > >>> (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM > > >>> type SEV-ES guests. As a result of this, it is no longer possible to > > >>> start SEV-ES guests with any SEV feature enabled (in particular, > > >>> debug-swap) with pflash devices. > > >>> > > >>> This is an issue since SEV-ES guests have historically used pflash > > >>> devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap > > >>> while using pflash devices, so work around the KVM limitation by > > >>> switching to using a VMA-based write protection. This allows > > >>> well-behaved SEV-ES guests to continue to work with pflash devices and > > >>> enable debug-swap. Mis-behaving guests trying to write to the protected > > >>> OVMF area will be killed. > > >> > > >> Based on Sean's description, a write access to a read-only memslot would > > >> cause the vCPU to permanently spin on #NPFs if trying to write to it as > > >> MMIO due to #VC handler not triggering, and that's why we don't support > > >> read-only memslots. But since SEV-ES was previously working with pflash, > > >> it seems like it does not rely on this functionality... > > > > > > Right, normal well-behaved SEV-ES/SNP guests work just fine as they > > > don't write to any of the read-only areas. > > > > Yes they do. There is specific support to make a direct GHCB MMIO > > request because of the lack of the #VC exception (see > > OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c). Good to know! > > With that change in place, it seems like we don't have remaining guest-side > code for ES/SNP guests that relies on emulate-on-write in OVMF for private > MMIO (seems like it never would have worked properly anyway). > > It's possible we still rely on emulate-on-write for writes to shared MMIO > ranges though. But in that case I don't see why it wouldn't be okay to > continue to just forward the corresponding write-faults to userspace as > KVM_EXIT_MMIO events since QEMU can access shared memory just fine. > > It's only the private MMIO that would misbehave because the emulation > path ... but I'm a little confused on this, because we'd still get #NPFs > due to the write protection... and it looks like this would trigger a > KVM_EXIT_MEMORY_FAULT to QEMU... so if QEMU really wanted to catch this > case... which seems to be the only one that's indicative of misbehavior, > we could just terminate if the access is to a read-only memslot and we > are running an ES/SNP guest... so if that's all that's holding us back > on the kernel side, we could directly start re-advertising > KVM_CAP_READONLY_MEM, or some new variant of it where userspace needs to > be aware of these additional considerations for private MMIO. Right, Sean did suggest a change to do exactly that (send out -EFAULT to userspace on writes to RO memslots): https://lore.kernel.org/kvm/aUq9_cUDWeEW_qli@google.com/ This change kills a KVM_X86_DEFAULT_VM SEV-ES guest if it writes to RO memslot. If Sean's change disabling KVM_CAP_READONLY_MEM for SEV-ES guests (and the subsequent commit d30d9ee94cc0 ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM")) are reverted, this results in what you are describing: killing SEV-ES guests that write to RO memslot without #NPF loop. This was discussed in PUCK and my understanding is that Sean is still opposed to enabling KVM_CAP_READONLY_MEM for SEV-ES guests (and he has written as much in the above mail I have linked to). Introducing a variant of KVM_CAP_READONLY_MEM might be a good option - I suppose Qemu can just check for that capability for encrypted guests and everything else can mostly work as-is. Sean? > > I think maybe the case that Sean is referencing in his commit, where we > can't make use of MMIO stub entries to trigger #VC, comes into play > when QEMU switches the memory region from romd_mode to !romd_mode, which > then unmaps the memslot and relies on the noslot MMIO handling. That's > where private MMIO would stop triggering the (desired) QEMU crash, but > KVM would catch this too as an #NPF, and this would also be forwarded > to userspace as a KVM_EXIT_MEMORY_FAULT... so just like the above, if > we accept that private MMIO is not possible, and only want to actively > catch it so we can crash the guest or warn... then we can handle this > the same as above and error if the KVM_EXIT_MEMORY_FAULT is for a > private access to a GPA range backed by a read-only memslot... My understanding from Sean's commit is about the romd_mode itself. For !romd_mode, we install MMIO SPTEs with reserved bits set, so all accesses trap. But, for romd_mode, we can't set reserved bits since we want to allow read access, and if reserved bits are not set, there won't be a #VC generated and the #NPF will end up being an automatic exit, which KVM can't emulate or do anything about, short of punting to userspace. > > *maybe* the fault info would need some flag to indicate that this is MMIO > since we do allow implicit conversions via KVM_EXIT_MEMORY_FAULT in general. > and userspace might like some way to easily differentiate between the > good/bad conversions without tracking to much state, but wouldn't that > work in theory at least? > > Thanks, > > Mike Thanks, Naveen ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-11 11:32 ` Naveen N Rao @ 2026-06-11 16:11 ` Sean Christopherson 0 siblings, 0 replies; 11+ messages in thread From: Sean Christopherson @ 2026-06-11 16:11 UTC (permalink / raw) To: Naveen N Rao Cc: Michael Roth, Tom Lendacky, Paolo Bonzini, qemu-devel, Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Roy Hopkins, Srikanth Aithal, Joerg Roedel On Thu, Jun 11, 2026, Naveen N Rao wrote: > On Wed, Jun 10, 2026 at 02:33:20PM -0500, Michael Roth wrote: > > > > Right, normal well-behaved SEV-ES/SNP guests work just fine as they > > > > don't write to any of the read-only areas. > > > > > > Yes they do. There is specific support to make a direct GHCB MMIO > > > request because of the lack of the #VC exception (see > > > OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c). > > Good to know! > > > With that change in place, it seems like we don't have remaining guest-side > > code for ES/SNP guests that relies on emulate-on-write in OVMF for private > > MMIO (seems like it never would have worked properly anyway). > > > > It's possible we still rely on emulate-on-write for writes to shared MMIO > > ranges though. But in that case I don't see why it wouldn't be okay to > > continue to just forward the corresponding write-faults to userspace as > > KVM_EXIT_MMIO events since QEMU can access shared memory just fine. > > > > It's only the private MMIO that would misbehave because the emulation > > path ... but I'm a little confused on this, because we'd still get #NPFs > > due to the write protection... and it looks like this would trigger a > > KVM_EXIT_MEMORY_FAULT to QEMU... so if QEMU really wanted to catch this > > case... which seems to be the only one that's indicative of misbehavior, > > we could just terminate if the access is to a read-only memslot and we > > are running an ES/SNP guest... so if that's all that's holding us back > > on the kernel side, we could directly start re-advertising > > KVM_CAP_READONLY_MEM, or some new variant of it where userspace needs to > > be aware of these additional considerations for private MMIO. > > Right, Sean did suggest a change to do exactly that (send out -EFAULT to > userspace on writes to RO memslots): > https://lore.kernel.org/kvm/aUq9_cUDWeEW_qli@google.com/ > > This change kills a KVM_X86_DEFAULT_VM SEV-ES guest if it writes to RO > memslot. If Sean's change disabling KVM_CAP_READONLY_MEM for SEV-ES > guests (and the subsequent commit d30d9ee94cc0 ("KVM: x86: Only > advertise KVM_CAP_READONLY_MEM when supported by VM")) are reverted, > this results in what you are describing: killing SEV-ES guests that > write to RO memslot without #NPF loop. > > This was discussed in PUCK and my understanding is that Sean is still > opposed to enabling KVM_CAP_READONLY_MEM for SEV-ES guests (and he has > written as much in the above mail I have linked to). > > Introducing a variant of KVM_CAP_READONLY_MEM might be a good option > - I suppose Qemu can just check for that capability for encrypted guests > and everything else can mostly work as-is. > > Sean? If you're ok terminating the guest, just map the memory as read-only in the VMA so that faulting in the PFN fails with KVM_PFN_ERR_FAULT => -EFAULT. If we also need to support guest_memfd, then I would prefer to add support for RWX memory attributes (per-VM), which are allegedly being working on by Amazon and/or Microsoft for VBS(-like) support in KVM. https://lore.kernel.org/all/Y1a1i9vbJ%2FpVmV9r@google.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-10 11:30 ` Naveen N Rao 2026-06-10 14:52 ` Tom Lendacky @ 2026-06-10 19:41 ` Michael Roth 2026-06-11 11:55 ` Naveen N Rao 1 sibling, 1 reply; 11+ messages in thread From: Michael Roth @ 2026-06-10 19:41 UTC (permalink / raw) To: Naveen N Rao Cc: Sean Christopherson, Paolo Bonzini, qemu-devel, Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Tom Lendacky, Roy Hopkins, Srikanth Aithal, Joerg Roedel On Wed, Jun 10, 2026 at 05:00:11PM +0530, Naveen N Rao wrote: > [+Sean] > > Hi Mike, > > On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote: > > On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote: > > > KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for > > > SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0 > > > ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") > > > stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests > > > (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM > > > type SEV-ES guests. As a result of this, it is no longer possible to > > > start SEV-ES guests with any SEV feature enabled (in particular, > > > debug-swap) with pflash devices. > > > > > > This is an issue since SEV-ES guests have historically used pflash > > > devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap > > > while using pflash devices, so work around the KVM limitation by > > > switching to using a VMA-based write protection. This allows > > > well-behaved SEV-ES guests to continue to work with pflash devices and > > > enable debug-swap. Mis-behaving guests trying to write to the protected > > > OVMF area will be killed. > > > > Based on Sean's description, a write access to a read-only memslot would > > cause the vCPU to permanently spin on #NPFs if trying to write to it as > > MMIO due to #VC handler not triggering, and that's why we don't support > > read-only memslots. But since SEV-ES was previously working with pflash, > > it seems like it does not rely on this functionality... > > Right, normal well-behaved SEV-ES/SNP guests work just fine as they > don't write to any of the read-only areas. > > > > > So if OVMF isn't writing to write-protected memory, then it wouldn't be > > triggering the MMIO emulation path in the first place. And if we don't > > care about enabling the emulation path in this case... then I'm not sure > > the original reasons for not allowing it for SEV-ES/SNP are applicable. > > Guest (not just OVMF) could try and write to the read-only area > triggering this issue. A simple write to 0xc0000 from within the guest > triggers this. Is that still true even with this patch? commit 0f4a1e80989aca185d955fcd791d7750082044a2 Author: Kevin Loughlin <kevinloughlin@google.com> Date: Wed Mar 13 12:15:46 2024 +0000 x86/sev: Skip ROM range scans and validation for SEV-SNP guests SEV-SNP requires encrypted memory to be validated before access. Because the ROM memory range is not part of the e820 table, it is not pre-validated by the BIOS. Therefore, if a SEV-SNP guest kernel wishes to access this range, the guest must first validate the range. The current SEV-SNP code does indeed scan the ROM range during early boot and thus attempts to validate the ROM range in probe_roms(). However, this behavior is neither sufficient nor necessary for the following reasons: ... but in that case, those private accesses didn't work because they were accessing legacy MMIO regions as private/encrypted even though none of the option ROMs were loaded into memory as encrypted, so they're basically just garbage/legacy regions we try to completely ignore on the guest-side now and any lingering cases should probably get the same treatment. It would be nice to still be able to catch write accesses....but I think we still could (with the kernel changes discussed in my reply to Tom) if we really wanted that. But is that really a hard requirement? Personally, the -bios vs pflash argument thing makes this feel justified since -bios also let's the writes through silently, but maybe we can do better with kernel changes. > > > > > It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE, > > whereas we literally just need as actually slot that's permanently mapped > > in the NPT without write access. > > > > Is that an accurate summary of the situation? > > Yes, that sounds right to me. > > > > > If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures > > this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT. > > That would certainly work. > > > Or, for > > a QEMU-specific workaround, just have a pflash implementation that doesn't > > rely on KVM_MEM_READONLY for cases like this where we don't need MMIO > > emulation. > > Not sure I follow that... are you suggesting that pflash use regular RW > memslots and just let the write through? Yes, isn't that basically what we're getting with -bios? At least this way we don't have the awkwardness of needing to randomly switch from -bios to pflash based on what SEV features the user selects, which is pretty bad. But that was more of a last resort, maybe we haven't yet bottomed out on whether we could do things a bit more nicely with some kernel help as discussed elsewhere in this thread. -Mike > > > Thanks, > Naveen > > > There's actually another case in hw/nvram/nrf51_nvm.c where this > > would be applicable. I guess it could be done automatically for the > > confidential VM case to retain cmdline compatibility...though you're > > wanting to add the debugswap feature anyway so not sure how important that > > aspect is. > > > > Thanks, > > > > Mike > > > > > > > > Enable VMA protection and set the memory to be RO when adding the KVM > > > memory slot. Because pflash devices support command-mode, change VMA > > > protection to RW when tearing down the KVM memory slot. KVM > > > SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the > > > protection when calling that. > > > > > > Print a warning when switching to VMA-based protection so that it is > > > clear that KVM itself isn't supporting readonly memory, and that a > > > workaround is in place. Users can plan on switching to using '-bios'. > > > > > > Finally, drop the check rejecting SEV-ES guests with SEV features so > > > that debug-swap can be enabled. > > > > > > > > > Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> > > > --- > > > Background discussion on this issue: > > > http://lore.kernel.org/r/aUq9_cUDWeEW_qli@google.com > > > > > > This series depends on VMSA features support: > > > http://lore.kernel.org/r/cover.1779281646.git.naveen@kernel.org > > > > > > > > > - Naveen > > > > > > > > > include/system/kvm.h | 5 ++++ > > > include/system/kvm_int.h | 1 + > > > accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++- > > > hw/i386/pc_sysfw.c | 19 +++++++++------ > > > target/i386/sev.c | 21 +++++++++++----- > > > 5 files changed, 84 insertions(+), 14 deletions(-) > > > > > > diff --git a/include/system/kvm.h b/include/system/kvm.h > > > index 5fa33eddda38..585058bd6f1c 100644 > > > --- a/include/system/kvm.h > > > +++ b/include/system/kvm.h > > > @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void); > > > > > > void kvm_mark_guest_state_protected(void); > > > > > > +void kvm_enable_ro_mem_vma_protection(void); > > > + > > > /** > > > * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page > > > * reported for the VM. > > > @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size); > > > > > > int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private); > > > > > > +void kvm_set_memory_readonly(void *addr, size_t len); > > > +void kvm_set_memory_readwrite(void *addr, size_t len); > > > + > > > /* argument to vmfd change notifier */ > > > typedef struct VmfdChangeNotifier { > > > int vmfd; > > > diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h > > > index 0876aac938d3..0e083a56ce2a 100644 > > > --- a/include/system/kvm_int.h > > > +++ b/include/system/kvm_int.h > > > @@ -123,6 +123,7 @@ struct KVMState > > > OnOffAuto kernel_irqchip_split; > > > bool sync_mmu; > > > bool guest_state_protected; > > > + bool guest_wants_ro_mem_vma_protection; > > > uint64_t manual_dirty_log_protect; > > > /* > > > * Older POSIX says that ioctl numbers are signed int, but in > > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > > > index 96f90ebb240f..4208df5b25ac 100644 > > > --- a/accel/kvm/kvm-all.c > > > +++ b/accel/kvm/kvm-all.c > > > @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start, uint64_t size) > > > return kvm_set_memory_attributes(start, size, 0); > > > } > > > > > > +static void kvm_set_memory_flags(void *addr, size_t len, int flags) > > > +{ > > > + if (mprotect(addr, len, flags)) { > > > + error_report("failed to apply memory protection " > > > + "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'", > > > + (hwaddr)addr, len, strerror(errno)); > > > + exit(1); > > > + } > > > +} > > > + > > > +void kvm_set_memory_readonly(void *addr, size_t len) > > > +{ > > > + if (kvm_state->guest_wants_ro_mem_vma_protection) { > > > + kvm_set_memory_flags(addr, len, PROT_READ); > > > + } > > > +} > > > + > > > +void kvm_set_memory_readwrite(void *addr, size_t len) > > > +{ > > > + if (kvm_state->guest_wants_ro_mem_vma_protection) { > > > + kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE); > > > + } > > > +} > > > + > > > +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr) > > > +{ > > > + if (!memory_region_is_ram(mr) && > > > + (mr->readonly || mr->rom_device) && > > > + !kvm_readonly_mem_allowed && > > > + kvm_state->guest_wants_ro_mem_vma_protection) { > > > + return true; > > > + } > > > + > > > + return false; > > > +} > > > + > > > /* Called with KVMMemoryListener.slots_lock held */ > > > static void kvm_set_phys_mem(KVMMemoryListener *kml, > > > MemoryRegionSection *section, bool add) > > > @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > > void *ram; > > > > > > if (!memory_region_is_ram(mr)) { > > > - if (writable || !kvm_readonly_mem_allowed) { > > > + if (writable || (!kvm_readonly_mem_allowed && > > > + !kvm_state->guest_wants_ro_mem_vma_protection)) { > > > return; > > > } else if (!mr->romd_mode) { > > > /* If the memory device is not in romd_mode, then we actually want > > > @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > > kvm_slot_sync_dirty_pages(mem); > > > } > > > > > > + if (kvm_mem_wants_vma_protection(mr)) { > > > + kvm_set_memory_readwrite(mem->ram, mem->memory_size); > > > + } > > > + > > > /* unregister the slot */ > > > g_free(mem->dirty_bmap); > > > mem->dirty_bmap = NULL; > > > @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > > } > > > } > > > > > > + if (kvm_mem_wants_vma_protection(mr)) { > > > + kvm_set_memory_readonly(mem->ram, mem->memory_size); > > > + } > > > + > > > start_addr += slot_size; > > > ram_start_offset += slot_size; > > > ram += slot_size; > > > @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void) > > > kvm_state->guest_state_protected = true; > > > } > > > > > > +void kvm_enable_ro_mem_vma_protection(void) > > > +{ > > > + kvm_state->guest_wants_ro_mem_vma_protection = true; > > > +} > > > + > > > int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp) > > > { > > > int fd; > > > diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c > > > index 1a41a5972bd0..9590458e00c5 100644 > > > --- a/hw/i386/pc_sysfw.c > > > +++ b/hw/i386/pc_sysfw.c > > > @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms, > > > } > > > } else { > > > if (kvm_enabled() && !kvm_readonly_mem_enabled()) { > > > - /* > > > - * Older KVM cannot execute from device memory. So, flash > > > - * memory cannot be used unless the readonly memory kvm > > > - * capability is present. > > > - */ > > > - error_report("pflash with kvm requires KVM readonly memory support"); > > > - exit(1); > > > + if (sev_es_enabled() && !sev_snp_enabled()) { > > > + warn_report("pflash not supported with SEV-ES guests, " > > > + "attempting VMA based protection"); > > > + } else { > > > + /* > > > + * Older KVM cannot execute from device memory. So, flash > > > + * memory cannot be used unless the readonly memory kvm > > > + * capability is present. > > > + */ > > > + error_report("pflash with kvm requires KVM readonly memory support"); > > > + exit(1); > > > + } > > > } > > > > > > pc_system_flash_map(pcms, rom_memory); > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > > index f04ae4e91f3e..82cf2c562729 100644 > > > --- a/target/i386/sev.c > > > +++ b/target/i386/sev.c > > > @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState *sev_common, uint64_t sev_features, > > > return -1; > > > } > > > } else { > > > - if (sev_features && sev_es_enabled()) { > > > - error_setg(errp, > > > - "%s: SEV features are not supported with SEV-ES at this time", > > > - __func__); > > > - return -1; > > > - } > > > if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) { > > > error_setg(errp, > > > "%s: SEV_SNP is not enabled but is enabled in VMSA sev_features", > > > @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) > > > return -1; > > > } > > > > > > + /* > > > + * Use VMA-based protection for SEV-ES guests that enable any > > > + * SEV feature, since KVM does not advertise readonly memory > > > + * support for non-default type SEV guests. > > > + */ > > > + if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) { > > > + kvm_enable_ro_mem_vma_protection(); > > > + } > > > + > > > if (!cgs->ready) { > > > /* > > > * SEV uses these notifiers to register/pin pages prior to guest use, > > > @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t len, Error **errp) > > > if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) { > > > int ret; > > > > > > + /* > > > + * KVM requires these pages to be RW, so remove VMA RO protection > > > + * for the duration of SEV_LAUNCH_UPDATE if using SEV features. > > > + */ > > > + kvm_set_memory_readwrite(ptr, len); > > > ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp); > > > + kvm_set_memory_readonly(ptr, len); > > > if (ret < 0) { > > > return ret; > > > } > > > > > > base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7 > > > prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8 > > > prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c > > > prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784 > > > prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49 > > > prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3 > > > prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b > > > prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab > > > prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368 > > > prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd > > > -- > > > 2.54.0 > > > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests 2026-06-10 19:41 ` Michael Roth @ 2026-06-11 11:55 ` Naveen N Rao 0 siblings, 0 replies; 11+ messages in thread From: Naveen N Rao @ 2026-06-11 11:55 UTC (permalink / raw) To: Michael Roth Cc: Sean Christopherson, Paolo Bonzini, qemu-devel, Daniel P. Berrangé, Eduardo Habkost, Eric Blake, Markus Armbruster, Marcelo Tosatti, Zhao Liu, Nikunj A Dadhania, Tom Lendacky, Roy Hopkins, Srikanth Aithal, Joerg Roedel On Wed, Jun 10, 2026 at 02:41:58PM -0500, Michael Roth wrote: > On Wed, Jun 10, 2026 at 05:00:11PM +0530, Naveen N Rao wrote: > > [+Sean] > > > > Hi Mike, > > > > On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote: > > > On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote: > > > > KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for > > > > SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0 > > > > ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") > > > > stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests > > > > (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM > > > > type SEV-ES guests. As a result of this, it is no longer possible to > > > > start SEV-ES guests with any SEV feature enabled (in particular, > > > > debug-swap) with pflash devices. > > > > > > > > This is an issue since SEV-ES guests have historically used pflash > > > > devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap > > > > while using pflash devices, so work around the KVM limitation by > > > > switching to using a VMA-based write protection. This allows > > > > well-behaved SEV-ES guests to continue to work with pflash devices and > > > > enable debug-swap. Mis-behaving guests trying to write to the protected > > > > OVMF area will be killed. > > > > > > Based on Sean's description, a write access to a read-only memslot would > > > cause the vCPU to permanently spin on #NPFs if trying to write to it as > > > MMIO due to #VC handler not triggering, and that's why we don't support > > > read-only memslots. But since SEV-ES was previously working with pflash, > > > it seems like it does not rely on this functionality... > > > > Right, normal well-behaved SEV-ES/SNP guests work just fine as they > > don't write to any of the read-only areas. > > > > > > > > So if OVMF isn't writing to write-protected memory, then it wouldn't be > > > triggering the MMIO emulation path in the first place. And if we don't > > > care about enabling the emulation path in this case... then I'm not sure > > > the original reasons for not allowing it for SEV-ES/SNP are applicable. > > > > Guest (not just OVMF) could try and write to the read-only area > > triggering this issue. A simple write to 0xc0000 from within the guest > > triggers this. > > Is that still true even with this patch? > > commit 0f4a1e80989aca185d955fcd791d7750082044a2 > Author: Kevin Loughlin <kevinloughlin@google.com> > Date: Wed Mar 13 12:15:46 2024 +0000 > > x86/sev: Skip ROM range scans and validation for SEV-SNP guests > > SEV-SNP requires encrypted memory to be validated before access. > Because the ROM memory range is not part of the e820 table, it is not > pre-validated by the BIOS. Therefore, if a SEV-SNP guest kernel wishes > to access this range, the guest must first validate the range. > > The current SEV-SNP code does indeed scan the ROM range during early > boot and thus attempts to validate the ROM range in probe_roms(). > However, this behavior is neither sufficient nor necessary for the > following reasons: > > ... Yes, that was mostly a change for SEV-SNP guests, and only to not have the kernel access those regions. Userspace is still free to access through /dev/mem. > > but in that case, those private accesses didn't work because they were > accessing legacy MMIO regions as private/encrypted even though none of the > option ROMs were loaded into memory as encrypted, so they're basically just > garbage/legacy regions we try to completely ignore on the guest-side now and > any lingering cases should probably get the same treatment. > > It would be nice to still be able to catch write accesses....but I think we > still could (with the kernel changes discussed in my reply to Tom) if we > really wanted that. But is that really a hard requirement? Personally, the > -bios vs pflash argument thing makes this feel justified since -bios also > let's the writes through silently, but maybe we can do better with kernel > changes. Indeed. > > > > > > > > > It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE, > > > whereas we literally just need as actually slot that's permanently mapped > > > in the NPT without write access. > > > > > > Is that an accurate summary of the situation? > > > > Yes, that sounds right to me. > > > > > > > > If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures > > > this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT. > > > > That would certainly work. > > > > > Or, for > > > a QEMU-specific workaround, just have a pflash implementation that doesn't > > > rely on KVM_MEM_READONLY for cases like this where we don't need MMIO > > > emulation. > > > > Not sure I follow that... are you suggesting that pflash use regular RW > > memslots and just let the write through? > > Yes, isn't that basically what we're getting with -bios? At least this > way we don't have the awkwardness of needing to randomly switch from -bios > to pflash based on what SEV features the user selects, which is pretty > bad. > > But that was more of a last resort, maybe we haven't yet bottomed out on > whether we could do things a bit more nicely with some kernel help as > discussed elsewhere in this thread. Yes, letting the writes through is simple enough as a last resort. The VMA based protection I have implemented here is the other option if we want to be able to prevent writes without KVM's help (but will likely need more work overall). Thanks, Naveen ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-11 16:12 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-02 7:12 [RFC PATCH] target/i386: SEV: Allow pflash devices with SEV-ES guests Naveen N Rao (AMD) 2026-06-03 4:32 ` Naveen N Rao 2026-06-10 0:35 ` Michael Roth 2026-06-10 11:30 ` Naveen N Rao 2026-06-10 14:52 ` Tom Lendacky 2026-06-10 14:52 ` Tom Lendacky 2026-06-10 19:33 ` Michael Roth 2026-06-11 11:32 ` Naveen N Rao 2026-06-11 16:11 ` Sean Christopherson 2026-06-10 19:41 ` Michael Roth 2026-06-11 11:55 ` Naveen N Rao
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.