linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Naveen N Rao <naveen@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Peter Gonda <pgonda@google.com>,
	Michael Roth <michael.roth@amd.com>,
	 Vishal Annapurve <vannapurve@google.com>,
	Ackerly Tng <ackerleytng@google.com>,
	 Nikunj A Dadhania <nikunj@amd.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [PATCH 01/22] KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
Date: Tue, 23 Dec 2025 08:06:35 -0800	[thread overview]
Message-ID: <aUq9_cUDWeEW_qli@google.com> (raw)
In-Reply-To: <fcqjl5a7m27f2mfpblnhgmozbipdjmvpdyk3m5hhzwcenp4cpg@m2ooa7ykrcvs>

On Wed, Dec 03, 2025, Naveen N Rao wrote:
> Hi Sean,
> 
> On Fri, Aug 09, 2024 at 12:02:58PM -0700, Sean Christopherson wrote:
> > Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
> > directly emulate instructions for ES/SNP, and instead the guest must
> > explicitly request emulation.  Unless the guest explicitly requests
> > emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
> > SPTE, with the subsequent #NPF being reflected into the guest as a #VC.
> > 
> > But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
> > because except for ES/SNP, doing so requires setting reserved bits in the
> > SPTE, i.e. the SPTE can't be readable while also generating a #VC on
> > writes.  Because KVM never creates MMIO SPTEs and jumps directly to
> > emulation, the guest never gets a #VC.  And since KVM simply resumes the
> > guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
> > into an infinite #NPF loop if the vCPU attempts to write read-only memory.
> > 
> > Disallow read-only memory for all VMs with protected state, i.e. for
> > upcoming TDX VMs as well as ES/SNP VMs.  For TDX, it's actually possible
> > to support read-only memory, as TDX uses EPT Violation #VE to reflect the
> > fault into the guest, e.g. KVM could configure read-only SPTEs with RX
> > protections and SUPPRESS_VE=0.  But there is no strong use case for
> > supporting read-only memslots on TDX, e.g. the main historical usage is
> > to emulate option ROMs, but TDX disallows executing from shared memory.
> > And if someone comes along with a legitimate, strong use case, the
> > restriction can always be lifted for TDX.
> > 
> > Don't bother trying to retroactively apply the restriction to SEV-ES
> > VMs that are created as type KVM_X86_DEFAULT_VM.  Read-only memslots can't
> > possibly work for SEV-ES, i.e. disallowing such memslots is really just
> > means reporting an error to userspace instead of silently hanging vCPUs.
> > Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
> > isn't worth the marginal benefit it would provide userspace.
> > 
> > Fixes: 26c44aa9e076 ("KVM: SEV: define VM types for SEV and SEV-ES")
> > Fixes: 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support")
> > Cc: Peter Gonda <pgonda@google.com>
> > Cc: Michael Roth <michael.roth@amd.com>
> > Cc: Vishal Annapurve <vannapurve@google.com>
> > Cc: Ackerly Tng <ackerleytng@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 2 ++
> >  include/linux/kvm_host.h        | 7 +++++++
> >  virt/kvm/kvm_main.c             | 5 ++---
> >  3 files changed, 11 insertions(+), 3 deletions(-)
> 
> As discussed in one of the previous PUCK calls, this is causing Qemu to 
> throw an error when trying to enable debug-swap for a SEV-ES guest when 
> using a pflash drive for OVMF. Sample qemu invocation (*):
>   qemu-system-x86_64 ... \
>     -drive if=pflash,format=raw,unit=0,file=/path/to/OVMF_CODE.fd,readonly=on \
>     -drive if=pflash,format=raw,unit=1,file=/path/to/OVMF_VARS.fd \
>     -machine q35,confidential-guest-support=sev0 \
>     -object sev-guest,id=sev0,policy=0x5,cbitpos=51,reduced-phys-bits=1,debug-swap=on
> 
> This is expected since enabling debug-swap requires use of 
> KVM_SEV_INIT2, which implies a VM type of KVM_X86_SEV_ES_VM. However, 
> SEV-ES VMs that do not enable any VMSA SEV features (and are hence 
> KVM_X86_DEFAULT_VM type) are allowed to continue to launch though they 
> are also susceptible to this issue.
> 
> One of the suggestions in the call was to consider returning an error to 
> userspace instead. Is this close to what you had in mind:
> 
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 73cdcbccc89e..19e27ed27e17 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -387,8 +387,10 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>          * they can fix it by changing memory to shared, or they can
>          * provide a better error.
>          */
> -       if (r == RET_PF_EMULATE && fault.is_private) {
> -               pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
> +       if (r == RET_PF_EMULATE && (fault.is_private ||
> +           (!fault.map_writable && fault.write && vcpu->arch.guest_state_protected))) {
> +               if (fault.is_private)
> +                       pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
>                 kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
>                 return -EFAULT;
>         }
> 
> This seems to work though Qemu seems to think we are asking it to 
> convert the memory to shared (so we probably need to signal this error 
> some other way?):
>   qemu-system-x86_64: Convert non guest_memfd backed memory region (0xf0000 ,+ 0x1000) to shared
> 
> Thoughts?

The choke point would be kvm_handle_error_pfn() (see below), where the RET_PF_EMULATE
originates.  But looking at all of this again, I am opposed to changing KVM's
ABI to allow KVM_MEM_READONLY for SEV-ES guests, it simply can't work.  And KVM
enumerates as much.

	case KVM_CAP_READONLY_MEM:
		r = kvm ? kvm_arch_has_readonly_mem(kvm) : 1;
		break;

More importantly, if QEMU wants to provide a not-fully-functional configuration
to allow KVM_SEV_INIT2 with pflash, QEMU can fudge around the lack of read-only
memory without KVM's assistance.  It likely won't be pretty, but it's doable,
by clearing PROT_WRITE in the backing VMA that's handed to the KVM memslot.

KVM will see a normal memslot that the guest can read/execute, and if the guest
attempts to write to the memory, hva_to_pfn() will return KVM_PFN_RR_FAULT and
kvm_handle_error_pfn() will send that out to userspace as -EFAULT.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f17324546900..27dc909b8225 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3493,8 +3493,12 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
         * into the spte otherwise read access on readonly gfn also can
         * caused mmio page fault and treat it as mmio access.
         */
-       if (fault->pfn == KVM_PFN_ERR_RO_FAULT)
+       if (fault->pfn == KVM_PFN_ERR_RO_FAULT) {
+               if (kvm_arch_has_readonly_mem(vcpu->kvm))
+                       return -EFAULT;
+
                return RET_PF_EMULATE;
+       }
 
        if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
                kvm_send_hwpoison_signal(fault->slot, fault->gfn);

  reply	other threads:[~2025-12-23 16:06 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-09 19:02 [PATCH 00/22] KVM: x86: Fix multiple #PF RO infinite loop bugs Sean Christopherson
2024-08-09 19:02 ` [PATCH 01/22] KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX) Sean Christopherson
2024-08-14 16:31   ` Paolo Bonzini
2025-12-03 13:04   ` Naveen N Rao
2025-12-23 16:06     ` Sean Christopherson [this message]
2024-08-09 19:02 ` [PATCH 02/22] KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid Sean Christopherson
2024-08-14 11:11   ` Yuan Yao
2024-08-09 19:03 ` [PATCH 03/22] KVM: x86/mmu: Trigger unprotect logic only on write-protection page faults Sean Christopherson
2024-08-14 11:42   ` Yuan Yao
2024-08-14 14:21     ` Sean Christopherson
2024-08-15  8:30       ` Yuan Yao
2024-08-14 16:40   ` Paolo Bonzini
2024-08-14 19:34     ` Sean Christopherson
2024-08-09 19:03 ` [PATCH 04/22] KVM: x86/mmu: Skip emulation on page fault iff 1+ SPs were unprotected Sean Christopherson
2024-08-14 14:22   ` Yuan Yao
2024-08-15 23:31     ` Yao Yuan
2024-08-23  0:39       ` Sean Christopherson
2024-08-23 23:46         ` Sean Christopherson
2024-08-26 20:28           ` Sean Christopherson
2024-08-14 16:47   ` Paolo Bonzini
2024-08-09 19:03 ` [PATCH 05/22] KVM: x86: Retry to-be-emulated insn in "slow" unprotect path iff sp is zapped Sean Christopherson
2024-08-09 19:03 ` [PATCH 06/22] KVM: x86: Get RIP from vCPU state when storing it to last_retry_eip Sean Christopherson
2024-08-14 17:01   ` Paolo Bonzini
2024-08-09 19:03 ` [PATCH 07/22] KVM: x86: Store gpa as gpa_t, not unsigned long, when unprotecting for retry Sean Christopherson
2024-08-09 19:03 ` [PATCH 08/22] KVM: x86/mmu: Apply retry protection to "fast nTDP unprotect" path Sean Christopherson
2024-08-09 19:03 ` [PATCH 09/22] KVM: x86/mmu: Try "unprotect for retry" iff there are indirect SPs Sean Christopherson
2024-08-14 17:30   ` Paolo Bonzini
2024-08-15 14:09     ` Sean Christopherson
2024-08-15 16:48       ` Paolo Bonzini
2024-08-28 23:28         ` Sean Christopherson
2024-08-09 19:03 ` [PATCH 10/22] KVM: x86/mmu: Replace PFERR_NESTED_GUEST_PAGE with a more descriptive helper Sean Christopherson
2024-08-14 17:32   ` Paolo Bonzini
2024-08-15 14:10     ` Sean Christopherson
2024-08-09 19:03 ` [PATCH 11/22] KVM: x86: Move EMULTYPE_ALLOW_RETRY_PF to x86_emulate_instruction() Sean Christopherson
2024-08-09 19:03 ` [PATCH 12/22] KVM: x86: Fold retry_instruction() into x86_emulate_instruction() Sean Christopherson
2024-08-09 19:03 ` [PATCH 13/22] KVM: x86/mmu: Don't try to unprotect an INVALID_GPA Sean Christopherson
2024-08-09 19:03 ` [PATCH 14/22] KVM: x86/mmu: Always walk guest PTEs with WRITE access when unprotecting Sean Christopherson
2024-08-09 19:03 ` [PATCH 15/22] KVM: x86/mmu: Move event re-injection unprotect+retry into common path Sean Christopherson
2024-08-14 17:43   ` Paolo Bonzini
2024-08-26 21:52     ` Sean Christopherson
2024-08-09 19:03 ` [PATCH 16/22] KVM: x86: Remove manual pfn lookup when retrying #PF after failed emulation Sean Christopherson
2024-08-14 17:50   ` Paolo Bonzini
2024-08-09 19:03 ` [PATCH 17/22] KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn Sean Christopherson
2024-08-14 17:53   ` Paolo Bonzini
2024-08-09 19:03 ` [PATCH 18/22] KVM: x86: Apply retry protection to "unprotect on failure" path Sean Christopherson
2024-08-09 19:03 ` [PATCH 19/22] KVM: x86: Update retry protection fields when forcing retry on emulation failure Sean Christopherson
2024-08-09 19:03 ` [PATCH 20/22] KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure() Sean Christopherson
2024-08-09 19:03 ` [PATCH 21/22] KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version Sean Christopherson
2024-08-09 19:03 ` [PATCH 22/22] KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list Sean Christopherson
2024-08-14 17:57   ` Paolo Bonzini
2024-08-15 14:25     ` Sean Christopherson
2024-08-30 23:54       ` Sean Christopherson
2024-08-14 17:58 ` [PATCH 00/22] KVM: x86: Fix multiple #PF RO infinite loop bugs Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUq9_cUDWeEW_qli@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=naveen@kernel.org \
    --cc=nikunj@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=pgonda@google.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vannapurve@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).