From: Sean Christopherson <seanjc@google.com>
To: Naveen N Rao <naveen@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Peter Gonda <pgonda@google.com>,
Michael Roth <michael.roth@amd.com>,
Vishal Annapurve <vannapurve@google.com>,
Ackerly Tng <ackerleytng@google.com>,
Nikunj A Dadhania <nikunj@amd.com>,
Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [PATCH 01/22] KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
Date: Tue, 23 Dec 2025 08:06:35 -0800 [thread overview]
Message-ID: <aUq9_cUDWeEW_qli@google.com> (raw)
In-Reply-To: <fcqjl5a7m27f2mfpblnhgmozbipdjmvpdyk3m5hhzwcenp4cpg@m2ooa7ykrcvs>
On Wed, Dec 03, 2025, Naveen N Rao wrote:
> Hi Sean,
>
> On Fri, Aug 09, 2024 at 12:02:58PM -0700, Sean Christopherson wrote:
> > Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
> > directly emulate instructions for ES/SNP, and instead the guest must
> > explicitly request emulation. Unless the guest explicitly requests
> > emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
> > SPTE, with the subsequent #NPF being reflected into the guest as a #VC.
> >
> > But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
> > because except for ES/SNP, doing so requires setting reserved bits in the
> > SPTE, i.e. the SPTE can't be readable while also generating a #VC on
> > writes. Because KVM never creates MMIO SPTEs and jumps directly to
> > emulation, the guest never gets a #VC. And since KVM simply resumes the
> > guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
> > into an infinite #NPF loop if the vCPU attempts to write read-only memory.
> >
> > Disallow read-only memory for all VMs with protected state, i.e. for
> > upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible
> > to support read-only memory, as TDX uses EPT Violation #VE to reflect the
> > fault into the guest, e.g. KVM could configure read-only SPTEs with RX
> > protections and SUPPRESS_VE=0. But there is no strong use case for
> > supporting read-only memslots on TDX, e.g. the main historical usage is
> > to emulate option ROMs, but TDX disallows executing from shared memory.
> > And if someone comes along with a legitimate, strong use case, the
> > restriction can always be lifted for TDX.
> >
> > Don't bother trying to retroactively apply the restriction to SEV-ES
> > VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't
> > possibly work for SEV-ES, i.e. disallowing such memslots is really just
> > means reporting an error to userspace instead of silently hanging vCPUs.
> > Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
> > isn't worth the marginal benefit it would provide userspace.
> >
> > Fixes: 26c44aa9e076 ("KVM: SEV: define VM types for SEV and SEV-ES")
> > Fixes: 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support")
> > Cc: Peter Gonda <pgonda@google.com>
> > Cc: Michael Roth <michael.roth@amd.com>
> > Cc: Vishal Annapurve <vannapurve@google.com>
> > Cc: Ackerly Tng <ackerleytng@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 2 ++
> > include/linux/kvm_host.h | 7 +++++++
> > virt/kvm/kvm_main.c | 5 ++---
> > 3 files changed, 11 insertions(+), 3 deletions(-)
>
> As discussed in one of the previous PUCK calls, this is causing Qemu to
> throw an error when trying to enable debug-swap for a SEV-ES guest when
> using a pflash drive for OVMF. Sample qemu invocation (*):
> qemu-system-x86_64 ... \
> -drive if=pflash,format=raw,unit=0,file=/path/to/OVMF_CODE.fd,readonly=on \
> -drive if=pflash,format=raw,unit=1,file=/path/to/OVMF_VARS.fd \
> -machine q35,confidential-guest-support=sev0 \
> -object sev-guest,id=sev0,policy=0x5,cbitpos=51,reduced-phys-bits=1,debug-swap=on
>
> This is expected since enabling debug-swap requires use of
> KVM_SEV_INIT2, which implies a VM type of KVM_X86_SEV_ES_VM. However,
> SEV-ES VMs that do not enable any VMSA SEV features (and are hence
> KVM_X86_DEFAULT_VM type) are allowed to continue to launch though they
> are also susceptible to this issue.
>
> One of the suggestions in the call was to consider returning an error to
> userspace instead. Is this close to what you had in mind:
>
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 73cdcbccc89e..19e27ed27e17 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -387,8 +387,10 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> * they can fix it by changing memory to shared, or they can
> * provide a better error.
> */
> - if (r == RET_PF_EMULATE && fault.is_private) {
> - pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
> + if (r == RET_PF_EMULATE && (fault.is_private ||
> + (!fault.map_writable && fault.write && vcpu->arch.guest_state_protected))) {
> + if (fault.is_private)
> + pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
> kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
> return -EFAULT;
> }
>
> This seems to work though Qemu seems to think we are asking it to
> convert the memory to shared (so we probably need to signal this error
> some other way?):
> qemu-system-x86_64: Convert non guest_memfd backed memory region (0xf0000 ,+ 0x1000) to shared
>
> Thoughts?
The choke point would be kvm_handle_error_pfn() (see below), where the RET_PF_EMULATE
originates. But looking at all of this again, I am opposed to changing KVM's
ABI to allow KVM_MEM_READONLY for SEV-ES guests, it simply can't work. And KVM
enumerates as much.
case KVM_CAP_READONLY_MEM:
r = kvm ? kvm_arch_has_readonly_mem(kvm) : 1;
break;
More importantly, if QEMU wants to provide a not-fully-functional configuration
to allow KVM_SEV_INIT2 with pflash, QEMU can fudge around the lack of read-only
memory without KVM's assistance. It likely won't be pretty, but it's doable,
by clearing PROT_WRITE in the backing VMA that's handed to the KVM memslot.
KVM will see a normal memslot that the guest can read/execute, and if the guest
attempts to write to the memory, hva_to_pfn() will return KVM_PFN_RR_FAULT and
kvm_handle_error_pfn() will send that out to userspace as -EFAULT.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f17324546900..27dc909b8225 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3493,8 +3493,12 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
* into the spte otherwise read access on readonly gfn also can
* caused mmio page fault and treat it as mmio access.
*/
- if (fault->pfn == KVM_PFN_ERR_RO_FAULT)
+ if (fault->pfn == KVM_PFN_ERR_RO_FAULT) {
+ if (kvm_arch_has_readonly_mem(vcpu->kvm))
+ return -EFAULT;
+
return RET_PF_EMULATE;
+ }
if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
kvm_send_hwpoison_signal(fault->slot, fault->gfn);
next prev parent reply other threads:[~2025-12-23 16:06 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-09 19:02 [PATCH 00/22] KVM: x86: Fix multiple #PF RO infinite loop bugs Sean Christopherson
2024-08-09 19:02 ` [PATCH 01/22] KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX) Sean Christopherson
2024-08-14 16:31 ` Paolo Bonzini
2025-12-03 13:04 ` Naveen N Rao
2025-12-23 16:06 ` Sean Christopherson [this message]
2024-08-09 19:02 ` [PATCH 02/22] KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid Sean Christopherson
2024-08-14 11:11 ` Yuan Yao
2024-08-09 19:03 ` [PATCH 03/22] KVM: x86/mmu: Trigger unprotect logic only on write-protection page faults Sean Christopherson
2024-08-14 11:42 ` Yuan Yao
2024-08-14 14:21 ` Sean Christopherson
2024-08-15 8:30 ` Yuan Yao
2024-08-14 16:40 ` Paolo Bonzini
2024-08-14 19:34 ` Sean Christopherson
2024-08-09 19:03 ` [PATCH 04/22] KVM: x86/mmu: Skip emulation on page fault iff 1+ SPs were unprotected Sean Christopherson
2024-08-14 14:22 ` Yuan Yao
2024-08-15 23:31 ` Yao Yuan
2024-08-23 0:39 ` Sean Christopherson
2024-08-23 23:46 ` Sean Christopherson
2024-08-26 20:28 ` Sean Christopherson
2024-08-14 16:47 ` Paolo Bonzini
2024-08-09 19:03 ` [PATCH 05/22] KVM: x86: Retry to-be-emulated insn in "slow" unprotect path iff sp is zapped Sean Christopherson
2024-08-09 19:03 ` [PATCH 06/22] KVM: x86: Get RIP from vCPU state when storing it to last_retry_eip Sean Christopherson
2024-08-14 17:01 ` Paolo Bonzini
2024-08-09 19:03 ` [PATCH 07/22] KVM: x86: Store gpa as gpa_t, not unsigned long, when unprotecting for retry Sean Christopherson
2024-08-09 19:03 ` [PATCH 08/22] KVM: x86/mmu: Apply retry protection to "fast nTDP unprotect" path Sean Christopherson
2024-08-09 19:03 ` [PATCH 09/22] KVM: x86/mmu: Try "unprotect for retry" iff there are indirect SPs Sean Christopherson
2024-08-14 17:30 ` Paolo Bonzini
2024-08-15 14:09 ` Sean Christopherson
2024-08-15 16:48 ` Paolo Bonzini
2024-08-28 23:28 ` Sean Christopherson
2024-08-09 19:03 ` [PATCH 10/22] KVM: x86/mmu: Replace PFERR_NESTED_GUEST_PAGE with a more descriptive helper Sean Christopherson
2024-08-14 17:32 ` Paolo Bonzini
2024-08-15 14:10 ` Sean Christopherson
2024-08-09 19:03 ` [PATCH 11/22] KVM: x86: Move EMULTYPE_ALLOW_RETRY_PF to x86_emulate_instruction() Sean Christopherson
2024-08-09 19:03 ` [PATCH 12/22] KVM: x86: Fold retry_instruction() into x86_emulate_instruction() Sean Christopherson
2024-08-09 19:03 ` [PATCH 13/22] KVM: x86/mmu: Don't try to unprotect an INVALID_GPA Sean Christopherson
2024-08-09 19:03 ` [PATCH 14/22] KVM: x86/mmu: Always walk guest PTEs with WRITE access when unprotecting Sean Christopherson
2024-08-09 19:03 ` [PATCH 15/22] KVM: x86/mmu: Move event re-injection unprotect+retry into common path Sean Christopherson
2024-08-14 17:43 ` Paolo Bonzini
2024-08-26 21:52 ` Sean Christopherson
2024-08-09 19:03 ` [PATCH 16/22] KVM: x86: Remove manual pfn lookup when retrying #PF after failed emulation Sean Christopherson
2024-08-14 17:50 ` Paolo Bonzini
2024-08-09 19:03 ` [PATCH 17/22] KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn Sean Christopherson
2024-08-14 17:53 ` Paolo Bonzini
2024-08-09 19:03 ` [PATCH 18/22] KVM: x86: Apply retry protection to "unprotect on failure" path Sean Christopherson
2024-08-09 19:03 ` [PATCH 19/22] KVM: x86: Update retry protection fields when forcing retry on emulation failure Sean Christopherson
2024-08-09 19:03 ` [PATCH 20/22] KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure() Sean Christopherson
2024-08-09 19:03 ` [PATCH 21/22] KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version Sean Christopherson
2024-08-09 19:03 ` [PATCH 22/22] KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list Sean Christopherson
2024-08-14 17:57 ` Paolo Bonzini
2024-08-15 14:25 ` Sean Christopherson
2024-08-30 23:54 ` Sean Christopherson
2024-08-14 17:58 ` [PATCH 00/22] KVM: x86: Fix multiple #PF RO infinite loop bugs Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUq9_cUDWeEW_qli@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=michael.roth@amd.com \
--cc=naveen@kernel.org \
--cc=nikunj@amd.com \
--cc=pbonzini@redhat.com \
--cc=pgonda@google.com \
--cc=thomas.lendacky@amd.com \
--cc=vannapurve@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.