The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 2/3] KVM: MMU: unconditionally clear MMIO cache on root rebuild
Date: Tue, 9 Jun 2026 13:24:54 -0700	[thread overview]
Message-ID: <aih2lgkT9Tmt09uv@google.com> (raw)
In-Reply-To: <20260604160733.12555-3-pbonzini@redhat.com>

On Thu, Jun 04, 2026, Paolo Bonzini wrote:
> Upon changing CR3, the MMIO cache becomes invalid because the
> GVA->GPA mapping has changed.  However, kvm_load_new_pgd() calls

s/kvm_load_new_pgd/kvm_mmu_new_pgd

> vcpu_clear_mmio_info() call only if the fast switch succeeded.
> The early-return path instead leaves the root invalid; the next entry
> then calls kvm_mmu_reload() and from there kvm_mmu_load().
> 
> kvm_mmu_load() calls kvm_mmu_sync_roots(), which clears the MMIO
> cache, but one combination that falls through is root_role.direct==1,
> i.e. CR0.PG=0, for which kvm_mmu_sync_roots() bails before reaching the
> call to vcpu_clear_mmio_info().
> 
> That combination is barely reachable: a valid direct root is pretty much
> always a fast-switch success because it does not check the PGD for a
> match.  The early return for a direct root thus requires the current root
> to already be invalid, and kvm_mmu_unload() itself clears the MMIO cache.
> 
> That said, doing an independent clear in the style of kvm_mmu_new_pgd()
> is more obviously correct and basically free, so harden it.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f8aa7eda661e..6689c9f8ae16 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6138,6 +6138,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  	if (r)
>  		goto out;
>  
> +	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
>  	kvm_mmu_sync_roots(vcpu);


I don't dislike this change, but I don't really like it either.  MMIO caching as
a whole is a mess of spaghetti.  E.g. handle_mmio_page_fault() will never use or
cache a gva for the CR0.PG=0 case, and so the only way for this to be reachable
in any capacity is if KVM emulates an instruction, in response to something *other*
than a #PF (otherwise KVM will use the GPA), immediately after inserting into the
cache via kvm_handle_noslot_fault().

IMO, flushing cache here, when it's _just_ barely necessary is rather confusing.
E.g. kvm_mmu_sync_roots() obviously handles the shadow paging case, and TDP can't
ever use the GVA-based cache because KVM doesn't track CR3 changes (and it's
*really* hard to see that TDP is safe/fine).  We could "fix" that with a comment,
but rather than go through all that effort to support a path that in all likelihood
never gets a cache hit, or at least never gets enough hits to provide meaningful
value, we actually make direct MMUs completely mutually exclusive with GVA-based
caching?

It's actually a fairly nice cleanup, and would entirely obviate the need to worry
about flushing the cache for CR0.PG=0.

E.g. over a few patches (completely untested):

---
 arch/x86/kvm/mmu/mmu.c | 29 ++++++++++++-----------------
 arch/x86/kvm/x86.h     | 24 +++++++++++++++++-------
 2 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91843e9224d0..42ac9986d73b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3533,14 +3533,12 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault,
 				   unsigned int access)
 {
-	gva_t gva = fault->is_tdp ? 0 : fault->addr;
-
 	if (fault->is_private) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return -EFAULT;
 	}
 
-	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
+	vcpu_cache_mmio_info(vcpu, fault->addr, fault->gfn,
 			     access & shadow_mmio_access_mask);
 
 	fault->slot = NULL;
@@ -4364,19 +4362,20 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return kvm_translate_gpa(vcpu, mmu, vaddr, access, exception);
 }
 
-static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct)
+static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 cr2_or_gpa)
 {
 	/*
-	 * A nested guest cannot use the MMIO cache if it is using nested
-	 * page tables, because cr2 is a nGPA while the cache stores GPAs.
+	 * A nested guest cannot use the software MMIO cache if it is using
+	 * nested TDP, because the address at the time of fault is a nGPA,
+	 * while the cache stores GPAs.
 	 */
 	if (mmu_is_nested(vcpu))
 		return false;
 
-	if (direct)
-		return vcpu_match_mmio_gpa(vcpu, addr);
+	if (vcpu_can_cache_mmio_gva(vcpu))
+		return vcpu_match_mmio_gva(vcpu, cr2_or_gpa);
 
-	return vcpu_match_mmio_gva(vcpu, addr);
+	return vcpu_match_mmio_gpa(vcpu, cr2_or_gpa);
 }
 
 /*
@@ -4462,12 +4461,12 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
 	return reserved;
 }
 
-static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
+static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr)
 {
 	u64 spte;
 	bool reserved;
 
-	if (mmio_info_in_cache(vcpu, addr, direct))
+	if (mmio_info_in_cache(vcpu, addr))
 		return RET_PF_EMULATE;
 
 	reserved = get_mmio_spte(vcpu, addr, &spte);
@@ -4481,9 +4480,6 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 		if (!check_mmio_spte(vcpu, spte))
 			return RET_PF_INVALID;
 
-		if (direct)
-			addr = 0;
-
 		trace_handle_mmio_page_fault(addr, gfn, access);
 		vcpu_cache_mmio_info(vcpu, addr, gfn, access);
 		return RET_PF_EMULATE;
@@ -6359,7 +6355,7 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * and could put the vCPU into an infinite loop because the processor
 	 * will keep faulting on the non-existent MMIO address.
 	 */
-	if (WARN_ON_ONCE(mmio_info_in_cache(vcpu, cr2_or_gpa, direct)))
+	if (WARN_ON_ONCE(mmio_info_in_cache(vcpu, cr2_or_gpa)))
 		return RET_PF_EMULATE;
 
 	/*
@@ -6421,7 +6417,6 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 		       void *insn, int insn_len)
 {
 	int r, emulation_type = EMULTYPE_PF;
-	bool direct = vcpu->arch.mmu->root_role.direct;
 
 	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
@@ -6445,7 +6440,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 		if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
 			return -EFAULT;
 
-		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
+		r = handle_mmio_page_fault(vcpu, cr2_or_gpa);
 		if (r == RET_PF_EMULATE)
 			goto emulate;
 	}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 38a905fa86de..64da5ba8539b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -366,6 +366,19 @@ static inline bool is_noncanonical_invlpg_address(u64 la, struct kvm_vcpu *vcpu)
 	return is_noncanonical_address(la, vcpu, X86EMUL_F_INVLPG);
 }
 
+static inline bool vcpu_can_cache_mmio_gva(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Disable GVA-based caching if TDP is enabled, as GVA is actually a
+	 * GPA or nGPA (if L2 is active), and KVM doesn't track CR3 changes,
+	 * i.e. can't know when to flush the cache.  Similarly, don't track
+	 * GVAs for direct MMUs (CR0.PG=0), as CR2 == GPA, i.e. KVM can simply
+	 * use the GFN cache, so that KVM doesn't have to flush the cache even
+	 * when all other shadow paging update/sync operations can be skipped.
+	 */
+	return !tdp_enabled && is_paging(vcpu);
+}
+
 static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
 					gva_t gva, gfn_t gfn, unsigned access)
 {
@@ -374,11 +387,7 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
 	if (unlikely(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS))
 		return;
 
-	/*
-	 * If this is a shadow nested page table, the "GVA" is
-	 * actually a nGPA.
-	 */
-	vcpu->arch.mmio_gva = mmu_is_nested(vcpu) ? 0 : gva & PAGE_MASK;
+	vcpu->arch.mmio_gva = vcpu_can_cache_mmio_gva(vcpu) ? gva & PAGE_MASK : 0;
 	vcpu->arch.mmio_access = access;
 	vcpu->arch.mmio_gfn = gfn;
 	vcpu->arch.mmio_gen = gen;
@@ -405,8 +414,9 @@ static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
 
 static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
 {
-	if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gva &&
-	      vcpu->arch.mmio_gva == (gva & PAGE_MASK))
+	if (vcpu_can_cache_mmio_gva(vcpu) &&
+	    vcpu_match_mmio_gen(vcpu) &&
+	    vcpu->arch.mmio_gva && vcpu->arch.mmio_gva == (gva & PAGE_MASK))
 		return true;
 
 	return false;

base-commit: 03e00f8b41e58b5dd5ceef21940a62e5d8e08766
--

  reply	other threads:[~2026-06-09 20:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-04 16:07 [PATCH 0/3] KVM: harden and cleanup PDPTR load on forced L1 reload Paolo Bonzini
2026-06-04 16:07 ` [PATCH 1/3] KVM: nVMX: unwind PDPTR load if processor triggers a nested VMFail Paolo Bonzini
2026-06-09  3:31   ` Sean Christopherson
2026-06-04 16:07 ` [PATCH 2/3] KVM: MMU: unconditionally clear MMIO cache on root rebuild Paolo Bonzini
2026-06-09 20:24   ` Sean Christopherson [this message]
2026-06-04 16:07 ` [PATCH 3/3] KVM: nVMX: remove unnecessary unload on processor-detected VMFail Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aih2lgkT9Tmt09uv@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox