From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 2/3] KVM: MMU: unconditionally clear MMIO cache on root rebuild
Date: Tue, 9 Jun 2026 13:24:54 -0700 [thread overview]
Message-ID: <aih2lgkT9Tmt09uv@google.com> (raw)
In-Reply-To: <20260604160733.12555-3-pbonzini@redhat.com>
On Thu, Jun 04, 2026, Paolo Bonzini wrote:
> Upon changing CR3, the MMIO cache becomes invalid because the
> GVA->GPA mapping has changed. However, kvm_load_new_pgd() calls
s/kvm_load_new_pgd/kvm_mmu_new_pgd
> vcpu_clear_mmio_info() call only if the fast switch succeeded.
> The early-return path instead leaves the root invalid; the next entry
> then calls kvm_mmu_reload() and from there kvm_mmu_load().
>
> kvm_mmu_load() calls kvm_mmu_sync_roots(), which clears the MMIO
> cache, but one combination that falls through is root_role.direct==1,
> i.e. CR0.PG=0, for which kvm_mmu_sync_roots() bails before reaching the
> call to vcpu_clear_mmio_info().
>
> That combination is barely reachable: a valid direct root is pretty much
> always a fast-switch success because it does not check the PGD for a
> match. The early return for a direct root thus requires the current root
> to already be invalid, and kvm_mmu_unload() itself clears the MMIO cache.
>
> That said, doing an independent clear in the style of kvm_mmu_new_pgd()
> is more obviously correct and basically free, so harden it.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f8aa7eda661e..6689c9f8ae16 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6138,6 +6138,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
> if (r)
> goto out;
>
> + vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
> kvm_mmu_sync_roots(vcpu);
I don't dislike this change, but I don't really like it either. MMIO caching as
a whole is a mess of spaghetti. E.g. handle_mmio_page_fault() will never use or
cache a gva for the CR0.PG=0 case, and so the only way for this to be reachable
in any capacity is if KVM emulates an instruction, in response to something *other*
than a #PF (otherwise KVM will use the GPA), immediately after inserting into the
cache via kvm_handle_noslot_fault().
IMO, flushing cache here, when it's _just_ barely necessary is rather confusing.
E.g. kvm_mmu_sync_roots() obviously handles the shadow paging case, and TDP can't
ever use the GVA-based cache because KVM doesn't track CR3 changes (and it's
*really* hard to see that TDP is safe/fine). We could "fix" that with a comment,
but rather than go through all that effort to support a path that in all likelihood
never gets a cache hit, or at least never gets enough hits to provide meaningful
value, we actually make direct MMUs completely mutually exclusive with GVA-based
caching?
It's actually a fairly nice cleanup, and would entirely obviate the need to worry
about flushing the cache for CR0.PG=0.
E.g. over a few patches (completely untested):
---
arch/x86/kvm/mmu/mmu.c | 29 ++++++++++++-----------------
arch/x86/kvm/x86.h | 24 +++++++++++++++++-------
2 files changed, 29 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91843e9224d0..42ac9986d73b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3533,14 +3533,12 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault,
unsigned int access)
{
- gva_t gva = fault->is_tdp ? 0 : fault->addr;
-
if (fault->is_private) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
}
- vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
+ vcpu_cache_mmio_info(vcpu, fault->addr, fault->gfn,
access & shadow_mmio_access_mask);
fault->slot = NULL;
@@ -4364,19 +4362,20 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
return kvm_translate_gpa(vcpu, mmu, vaddr, access, exception);
}
-static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct)
+static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 cr2_or_gpa)
{
/*
- * A nested guest cannot use the MMIO cache if it is using nested
- * page tables, because cr2 is a nGPA while the cache stores GPAs.
+ * A nested guest cannot use the software MMIO cache if it is using
+ * nested TDP, because the address at the time of fault is a nGPA,
+ * while the cache stores GPAs.
*/
if (mmu_is_nested(vcpu))
return false;
- if (direct)
- return vcpu_match_mmio_gpa(vcpu, addr);
+ if (vcpu_can_cache_mmio_gva(vcpu))
+ return vcpu_match_mmio_gva(vcpu, cr2_or_gpa);
- return vcpu_match_mmio_gva(vcpu, addr);
+ return vcpu_match_mmio_gpa(vcpu, cr2_or_gpa);
}
/*
@@ -4462,12 +4461,12 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
return reserved;
}
-static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
+static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr)
{
u64 spte;
bool reserved;
- if (mmio_info_in_cache(vcpu, addr, direct))
+ if (mmio_info_in_cache(vcpu, addr))
return RET_PF_EMULATE;
reserved = get_mmio_spte(vcpu, addr, &spte);
@@ -4481,9 +4480,6 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
if (!check_mmio_spte(vcpu, spte))
return RET_PF_INVALID;
- if (direct)
- addr = 0;
-
trace_handle_mmio_page_fault(addr, gfn, access);
vcpu_cache_mmio_info(vcpu, addr, gfn, access);
return RET_PF_EMULATE;
@@ -6359,7 +6355,7 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
* and could put the vCPU into an infinite loop because the processor
* will keep faulting on the non-existent MMIO address.
*/
- if (WARN_ON_ONCE(mmio_info_in_cache(vcpu, cr2_or_gpa, direct)))
+ if (WARN_ON_ONCE(mmio_info_in_cache(vcpu, cr2_or_gpa)))
return RET_PF_EMULATE;
/*
@@ -6421,7 +6417,6 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
void *insn, int insn_len)
{
int r, emulation_type = EMULTYPE_PF;
- bool direct = vcpu->arch.mmu->root_role.direct;
if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
return RET_PF_RETRY;
@@ -6445,7 +6440,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
return -EFAULT;
- r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
+ r = handle_mmio_page_fault(vcpu, cr2_or_gpa);
if (r == RET_PF_EMULATE)
goto emulate;
}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 38a905fa86de..64da5ba8539b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -366,6 +366,19 @@ static inline bool is_noncanonical_invlpg_address(u64 la, struct kvm_vcpu *vcpu)
return is_noncanonical_address(la, vcpu, X86EMUL_F_INVLPG);
}
+static inline bool vcpu_can_cache_mmio_gva(struct kvm_vcpu *vcpu)
+{
+ /*
+ * Disable GVA-based caching if TDP is enabled, as GVA is actually a
+ * GPA or nGPA (if L2 is active), and KVM doesn't track CR3 changes,
+ * i.e. can't know when to flush the cache. Similarly, don't track
+ * GVAs for direct MMUs (CR0.PG=0), as CR2 == GPA, i.e. KVM can simply
+ * use the GFN cache, so that KVM doesn't have to flush the cache even
+ * when all other shadow paging update/sync operations can be skipped.
+ */
+ return !tdp_enabled && is_paging(vcpu);
+}
+
static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
gva_t gva, gfn_t gfn, unsigned access)
{
@@ -374,11 +387,7 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
if (unlikely(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS))
return;
- /*
- * If this is a shadow nested page table, the "GVA" is
- * actually a nGPA.
- */
- vcpu->arch.mmio_gva = mmu_is_nested(vcpu) ? 0 : gva & PAGE_MASK;
+ vcpu->arch.mmio_gva = vcpu_can_cache_mmio_gva(vcpu) ? gva & PAGE_MASK : 0;
vcpu->arch.mmio_access = access;
vcpu->arch.mmio_gfn = gfn;
vcpu->arch.mmio_gen = gen;
@@ -405,8 +414,9 @@ static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
{
- if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gva &&
- vcpu->arch.mmio_gva == (gva & PAGE_MASK))
+ if (vcpu_can_cache_mmio_gva(vcpu) &&
+ vcpu_match_mmio_gen(vcpu) &&
+ vcpu->arch.mmio_gva && vcpu->arch.mmio_gva == (gva & PAGE_MASK))
return true;
return false;
base-commit: 03e00f8b41e58b5dd5ceef21940a62e5d8e08766
--
next prev parent reply other threads:[~2026-06-09 20:24 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-04 16:07 [PATCH 0/3] KVM: harden and cleanup PDPTR load on forced L1 reload Paolo Bonzini
2026-06-04 16:07 ` [PATCH 1/3] KVM: nVMX: unwind PDPTR load if processor triggers a nested VMFail Paolo Bonzini
2026-06-09 3:31 ` Sean Christopherson
2026-06-04 16:07 ` [PATCH 2/3] KVM: MMU: unconditionally clear MMIO cache on root rebuild Paolo Bonzini
2026-06-09 20:24 ` Sean Christopherson [this message]
2026-06-04 16:07 ` [PATCH 3/3] KVM: nVMX: remove unnecessary unload on processor-detected VMFail Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aih2lgkT9Tmt09uv@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox