All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues
@ 2026-06-26 11:26 Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 01/17] KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots Paolo Bonzini
                   ` (16 more replies)
  0 siblings, 17 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable

Sasha, Greg,

this is the backport to 5.10 for the above CVE.  Similar to 5.15, the
fix was relatively simple upstream but only due to years of refactoring
and cleaning up of the code; fixing from scratch is not really feasible
so start by applying the patches that are needed.

Paolo

David Matlack (2):
  KVM: x86/mmu: Use a bool for direct
  KVM: x86/mmu: Stop passing "direct" to mmu_alloc_root()

Lai Jiangshan (2):
  KVM: X86: Fix missed remote tlb flush in rmap_write_protect()
  KVM: X86: Synchronize the shadow pagetable before link it

Paolo Bonzini (5):
  KVM: x86/mmu: Derive shadow MMU page role from parent
  KVM: x86/mmu: Always pass 0 for @quadrant when gptes are 8 bytes
  KVM: x86/mmu: pull call to drop_large_spte() into __link_shadow_page()
  KVM: x86: Fix shadow paging use-after-free due to unexpected role

Sean Christopherson (9):
  KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots
  KVM: x86/mmu: Allocate the lm_root before allocating PAE roots
  KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper
  KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce
    indentation
  KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  KVM: x86: Fix shadow paging use-after-free due to unexpected GFN
  KVM: x86/mmu: Pass the memslot to the rmap callbacks
  KVM: x86/mmu: Ensure hugepage is in by slot before checking max
    mapping level

 arch/x86/kvm/mmu/mmu.c         | 431 ++++++++++++++++++++-------------
 arch/x86/kvm/mmu/paging_tmpl.h |  72 +++---
 arch/x86/kvm/mmu/spte.h        |   5 +
 arch/x86/kvm/mmu/tdp_mmu.c     |  23 +-
 arch/x86/kvm/vmx/vmx_ops.h     |   3 +-
 include/linux/kvm_host.h       |   5 +
 6 files changed, 308 insertions(+), 231 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 01/17] KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 02/17] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Paolo Bonzini
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

commit b37233c911cbecd22a8a2a80137efe706c727d76 upstream.

Grab 'mmu' and do s/vcpu->arch.mmu/mmu to shorten line lengths and yield
smaller diffs when moving code around in future cleanup without forcing
the new code to use the same ugly pattern.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 58 ++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 13bf3198d0ce..c2c76419af0c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3234,7 +3234,8 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
 
 static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 {
-	u8 shadow_root_level = vcpu->arch.mmu->shadow_root_level;
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	u8 shadow_root_level = mmu->shadow_root_level;
 	hpa_t root;
 	unsigned i;
 
@@ -3243,42 +3244,43 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
-		vcpu->arch.mmu->root_hpa = root;
+		mmu->root_hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level,
 				      true);
 
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
-		vcpu->arch.mmu->root_hpa = root;
+		mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
-			MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
+			MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
 			if (!VALID_PAGE(root))
 				return -ENOSPC;
-			vcpu->arch.mmu->pae_root[i] = root | PT_PRESENT_MASK;
+			mmu->pae_root[i] = root | PT_PRESENT_MASK;
 		}
-		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
+		mmu->root_hpa = __pa(mmu->pae_root);
 	} else
 		BUG();
 
 	/* root_pgd is ignored for direct MMUs. */
-	vcpu->arch.mmu->root_pgd = 0;
+	mmu->root_pgd = 0;
 
 	return 0;
 }
 
 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	u64 pdptr, pm_mask;
 	gfn_t root_gfn, root_pgd;
 	hpa_t root;
 	int i;
 
-	root_pgd = vcpu->arch.mmu->get_guest_pgd(vcpu);
+	root_pgd = mmu->get_guest_pgd(vcpu);
 	root_gfn = root_pgd >> PAGE_SHIFT;
 
 	if (mmu_check_root(vcpu, root_gfn))
@@ -3288,14 +3290,14 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * Do we shadow a long mode page table? If so we need to
 	 * write-protect the guests page table root.
 	 */
-	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
-		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->root_hpa));
+	if (mmu->root_level >= PT64_ROOT_4LEVEL) {
+		MMU_WARN_ON(VALID_PAGE(mmu->root_hpa));
 
 		root = mmu_alloc_root(vcpu, root_gfn, 0,
-				      vcpu->arch.mmu->shadow_root_level, false);
+				      mmu->shadow_root_level, false);
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
-		vcpu->arch.mmu->root_hpa = root;
+		mmu->root_hpa = root;
 		goto set_root_pgd;
 	}
 
@@ -3305,7 +3307,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
 		/*
@@ -3313,21 +3315,21 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		 * with 64-bit only when needed.  Unlike 32-bit NPT, it doesn't
 		 * need to be in low mem.  See also lm_root below.
 		 */
-		if (!vcpu->arch.mmu->pae_root) {
+		if (!mmu->pae_root) {
 			WARN_ON_ONCE(!tdp_enabled);
 
-			vcpu->arch.mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (!vcpu->arch.mmu->pae_root)
+			mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+			if (!mmu->pae_root)
 				return -ENOMEM;
 		}
 	}
 
 	for (i = 0; i < 4; ++i) {
-		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
-		if (vcpu->arch.mmu->root_level == PT32E_ROOT_LEVEL) {
-			pdptr = vcpu->arch.mmu->get_pdptr(vcpu, i);
+		MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
+		if (mmu->root_level == PT32E_ROOT_LEVEL) {
+			pdptr = mmu->get_pdptr(vcpu, i);
 			if (!(pdptr & PT_PRESENT_MASK)) {
-				vcpu->arch.mmu->pae_root[i] = 0;
+				mmu->pae_root[i] = 0;
 				continue;
 			}
 			root_gfn = pdptr >> PAGE_SHIFT;
@@ -3339,9 +3341,9 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 				      PT32_ROOT_LEVEL, false);
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
-		vcpu->arch.mmu->pae_root[i] = root | pm_mask;
+		mmu->pae_root[i] = root | pm_mask;
 	}
-	vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root);
+	mmu->root_hpa = __pa(mmu->pae_root);
 
 	/*
 	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
@@ -3350,24 +3352,24 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * on demand, as running a 32-bit L1 VMM is very rare.  The PDP is
 	 * handled above (to share logic with PAE), deal with the PML4 here.
 	 */
-	if (vcpu->arch.mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
-		if (vcpu->arch.mmu->lm_root == NULL) {
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
+		if (mmu->lm_root == NULL) {
 			u64 *lm_root;
 
 			lm_root = (void*)get_zeroed_page(GFP_KERNEL_ACCOUNT);
 			if (!lm_root)
 				return -ENOMEM;
 
-			lm_root[0] = __pa(vcpu->arch.mmu->pae_root) | pm_mask;
+			lm_root[0] = __pa(mmu->pae_root) | pm_mask;
 
-			vcpu->arch.mmu->lm_root = lm_root;
+			mmu->lm_root = lm_root;
 		}
 
-		vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->lm_root);
+		mmu->root_hpa = __pa(mmu->lm_root);
 	}
 
 set_root_pgd:
-	vcpu->arch.mmu->root_pgd = root_pgd;
+	mmu->root_pgd = root_pgd;
 
 	return 0;
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 02/17] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 01/17] KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 03/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper Paolo Bonzini
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

commit ba0a194ffbfb4168a277fb2116e8362013e2078f upstream.

Allocate lm_root before the PAE roots so that the PAE roots aren't
leaked if the memory allocation for the lm_root happens to fail.

Note, KVM can still leak PAE roots if mmu_check_root() fails on a guest's
PDPTR, or if mmu_alloc_root() fails due to MMU pages not being available.
Those issues will be fixed in future commits.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 64 ++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c2c76419af0c..508acf26e30c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3307,21 +3307,38 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
-		/*
-		 * Allocate the page for the PDPTEs when shadowing 32-bit NPT
-		 * with 64-bit only when needed.  Unlike 32-bit NPT, it doesn't
-		 * need to be in low mem.  See also lm_root below.
-		 */
-		if (!mmu->pae_root) {
-			WARN_ON_ONCE(!tdp_enabled);
+	/*
+	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
+	 * tables are allocated and initialized at root creation as there is no
+	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
+	 * on demand, as running a 32-bit L1 VMM is very rare.  Unlike 32-bit
+	 * NPT, the PDP table doesn't need to be in low mem.  Preallocate the
+	 * pages so that the PAE roots aren't leaked on failure.
+	 */
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL &&
+	    (!mmu->pae_root || !mmu->lm_root)) {
+		u64 *lm_root, *pae_root;
 
-			mmu->pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (!mmu->pae_root)
-				return -ENOMEM;
+		if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
+			return -EIO;
+
+		pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+		if (!pae_root)
+			return -ENOMEM;
+
+		lm_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+		if (!lm_root) {
+			free_page((unsigned long)pae_root);
+			return -ENOMEM;
 		}
+
+		mmu->pae_root = pae_root;
+		mmu->lm_root = lm_root;
+
+		lm_root[0] = __pa(mmu->pae_root) | pm_mask;
 	}
 
 	for (i = 0; i < 4; ++i) {
@@ -3343,30 +3360,11 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 			return -ENOSPC;
 		mmu->pae_root[i] = root | pm_mask;
 	}
-	mmu->root_hpa = __pa(mmu->pae_root);
-
-	/*
-	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
-	 * tables are allocated and initialized at MMU creation as there is no
-	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
-	 * on demand, as running a 32-bit L1 VMM is very rare.  The PDP is
-	 * handled above (to share logic with PAE), deal with the PML4 here.
-	 */
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
-		if (mmu->lm_root == NULL) {
-			u64 *lm_root;
-
-			lm_root = (void*)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-			if (!lm_root)
-				return -ENOMEM;
-
-			lm_root[0] = __pa(mmu->pae_root) | pm_mask;
-
-			mmu->lm_root = lm_root;
-		}
 
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
 		mmu->root_hpa = __pa(mmu->lm_root);
-	}
+	else
+		mmu->root_hpa = __pa(mmu->pae_root);
 
 set_root_pgd:
 	mmu->root_pgd = root_pgd;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 03/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 01/17] KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 02/17] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 04/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Paolo Bonzini
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

commit 748e52b9b7368017d3fccb486914804ed4577b42 upstream.

Move the on-demand allocation of the pae_root and lm_root pages, used by
nested NPT for 32-bit L1s, into a separate helper.  This will allow a
future patch to hold mmu_lock while allocating the non-special roots so
that make_mmu_pages_available() can be checked once at the start of root
allocation, and thus avoid having to deal with failure in the middle of
root allocation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 84 +++++++++++++++++++++++++++---------------
 1 file changed, 54 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 508acf26e30c..a506c0818e77 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3307,38 +3307,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
+	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
-	/*
-	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
-	 * tables are allocated and initialized at root creation as there is no
-	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
-	 * on demand, as running a 32-bit L1 VMM is very rare.  Unlike 32-bit
-	 * NPT, the PDP table doesn't need to be in low mem.  Preallocate the
-	 * pages so that the PAE roots aren't leaked on failure.
-	 */
-	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL &&
-	    (!mmu->pae_root || !mmu->lm_root)) {
-		u64 *lm_root, *pae_root;
-
-		if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
-			return -EIO;
-
-		pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-		if (!pae_root)
-			return -ENOMEM;
-
-		lm_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-		if (!lm_root) {
-			free_page((unsigned long)pae_root);
-			return -ENOMEM;
-		}
-
-		mmu->pae_root = pae_root;
-		mmu->lm_root = lm_root;
-
-		lm_root[0] = __pa(mmu->pae_root) | pm_mask;
+		mmu->lm_root[0] = __pa(mmu->pae_root) | pm_mask;
 	}
 
 	for (i = 0; i < 4; ++i) {
@@ -3372,6 +3344,55 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	u64 *lm_root, *pae_root;
+
+	/*
+	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
+	 * tables are allocated and initialized at root creation as there is no
+	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
+	 * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
+	 */
+	if (mmu->direct_map || mmu->root_level >= PT64_ROOT_4LEVEL ||
+	    mmu->shadow_root_level < PT64_ROOT_4LEVEL)
+		return 0;
+
+	/*
+	 * This mess only works with 4-level paging and needs to be updated to
+	 * work with 5-level paging.
+	 */
+	if (WARN_ON_ONCE(mmu->shadow_root_level != PT64_ROOT_4LEVEL))
+		return -EIO;
+
+	if (mmu->pae_root && mmu->lm_root)
+		return 0;
+
+	/*
+	 * The special roots should always be allocated in concert.  Yell and
+	 * bail if KVM ends up in a state where only one of the roots is valid.
+	 */
+	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
+		return -EIO;
+
+	/* Unlike 32-bit NPT, the PDP table doesn't need to be in low mem. */
+	pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (!pae_root)
+		return -ENOMEM;
+
+	lm_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (!lm_root) {
+		free_page((unsigned long)pae_root);
+		return -ENOMEM;
+	}
+
+	mmu->pae_root = pae_root;
+	mmu->lm_root = lm_root;
+
+	return 0;
+}
+
 static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 {
 	if (vcpu->arch.mmu->direct_map)
@@ -4846,6 +4867,9 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	int r;
 
 	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
+	if (r)
+		goto out;
+	r = mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
 	r = mmu_alloc_roots(vcpu);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 04/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (2 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 03/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 12:26   ` sashiko-bot
  2026-06-26 17:54   ` Sasha Levin
  2026-06-26 11:26 ` [PATCH 5.10.y 05/17] KVM: x86/mmu: Use a bool for direct Paolo Bonzini
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Sean Christopherson, Ben Gardon

From: Sean Christopherson <seanjc@google.com>

commit 6e6ec58485746eb64487bd49bf5cd90ded3d2cf6 upstream.

Hold the mmu_lock for write for the entire duration of allocating and
initializing an MMU's roots.  This ensures there are MMU pages available
and thus prevents root allocations from failing.  That in turn fixes a
bug where KVM would fail to free valid PAE roots if a one of the later
roots failed to allocate.

Add a comment to make_mmu_pages_available() to call out that the limit
is a soft limit, e.g. KVM will temporarily exceed the threshold if a
page fault allocates multiple shadow pages and there was only one page
"available".

Note, KVM _still_ leaks the PAE roots if the guest PDPTR checks fail.
This will be addressed in a future commit.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c     | 50 +++++++++++++++-----------------------
 arch/x86/kvm/mmu/tdp_mmu.c | 23 ++++--------------
 2 files changed, 25 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a506c0818e77..9b1f63b5e86e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2440,6 +2440,15 @@ static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
 
 	kvm_mmu_zap_oldest_mmu_pages(vcpu->kvm, KVM_REFILL_PAGES - avail);
 
+	/*
+	 * Note, this check is intentionally soft, it only guarantees that one
+	 * page is available, while the caller may end up allocating as many as
+	 * four pages, e.g. for PAE roots or for 5-level paging.  Temporarily
+	 * exceeding the (arbitrary by default) limit will not harm the host,
+	 * being too agressive may unnecessarily kill the guest, and getting an
+	 * exact count is far more trouble than it's worth, especially in the
+	 * page fault paths.
+	 */
 	if (!kvm_mmu_available_pages(vcpu->kvm))
 		return -ENOSPC;
 	return 0;
@@ -3219,16 +3228,9 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
 {
 	struct kvm_mmu_page *sp;
 
-	spin_lock(&vcpu->kvm->mmu_lock);
-
-	if (make_mmu_pages_available(vcpu)) {
-		spin_unlock(&vcpu->kvm->mmu_lock);
-		return INVALID_PAGE;
-	}
 	sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL);
 	++sp->root_count;
 
-	spin_unlock(&vcpu->kvm->mmu_lock);
 	return __pa(sp->spt);
 }
 
@@ -3241,16 +3243,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 	if (vcpu->kvm->arch.tdp_mmu_enabled) {
 		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
-
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		mmu->root_hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
-		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level,
-				      true);
-
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
+		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
 		mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
@@ -3258,8 +3253,6 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30, PT32_ROOT_LEVEL, true);
-			if (!VALID_PAGE(root))
-				return -ENOSPC;
 			mmu->pae_root[i] = root | PT_PRESENT_MASK;
 		}
 		mmu->root_hpa = __pa(mmu->pae_root);
@@ -3295,8 +3288,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 		root = mmu_alloc_root(vcpu, root_gfn, 0,
 				      mmu->shadow_root_level, false);
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		mmu->root_hpa = root;
 		goto set_root_pgd;
 	}
@@ -3315,6 +3306,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 	for (i = 0; i < 4; ++i) {
 		MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
+
 		if (mmu->root_level == PT32E_ROOT_LEVEL) {
 			pdptr = mmu->get_pdptr(vcpu, i);
 			if (!(pdptr & PT_PRESENT_MASK)) {
@@ -3328,8 +3320,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
 		root = mmu_alloc_root(vcpu, root_gfn, i << 30,
 				      PT32_ROOT_LEVEL, false);
-		if (!VALID_PAGE(root))
-			return -ENOSPC;
 		mmu->pae_root[i] = root | pm_mask;
 	}
 
@@ -3393,14 +3383,6 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
-{
-	if (vcpu->arch.mmu->direct_map)
-		return mmu_alloc_direct_roots(vcpu);
-	else
-		return mmu_alloc_shadow_roots(vcpu);
-}
-
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 {
 	int i;
@@ -4872,7 +4854,15 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	r = mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
-	r = mmu_alloc_roots(vcpu);
+	spin_lock(&vcpu->kvm->mmu_lock);
+	if (make_mmu_pages_available(vcpu))
+		r = -ENOSPC;
+	else if (vcpu->arch.mmu->direct_map)
+		r = mmu_alloc_direct_roots(vcpu);
+	else
+		r = mmu_alloc_shadow_roots(vcpu);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
 	kvm_mmu_sync_roots(vcpu);
 	if (r)
 		goto out;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 073514bbb5f7..5cba2e85ce04 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -152,22 +152,21 @@ static struct kvm_mmu_page *alloc_tdp_mmu_page(struct kvm_vcpu *vcpu, gfn_t gfn,
 	return sp;
 }
 
-static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu)
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 {
 	union kvm_mmu_page_role role;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *root;
 
-	role = page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level);
+	lockdep_assert_held_write(&kvm->mmu_lock);
 
-	spin_lock(&kvm->mmu_lock);
+	role = page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level);
 
 	/* Check for an existing root before allocating a new one. */
 	for_each_tdp_mmu_root(kvm, root) {
 		if (root->role.word == role.word) {
 			kvm_mmu_get_root(kvm, root);
-			spin_unlock(&kvm->mmu_lock);
-			return root;
+			goto out;
 		}
 	}
 
@@ -176,19 +175,7 @@ static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu)
 
 	list_add(&root->link, &kvm->arch.tdp_mmu_roots);
 
-	spin_unlock(&kvm->mmu_lock);
-
-	return root;
-}
-
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
-{
-	struct kvm_mmu_page *root;
-
-	root = get_tdp_mmu_vcpu_root(vcpu);
-	if (!root)
-		return INVALID_PAGE;
-
+out:
 	return __pa(root->spt);
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 05/17] KVM: x86/mmu: Use a bool for direct
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (3 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 04/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 06/17] KVM: x86/mmu: Stop passing "direct" to mmu_alloc_root() Paolo Bonzini
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable
  Cc: David Matlack, Lai Jiangshan, Sean Christopherson

From: David Matlack <dmatlack@google.com>

commit 27a59d57f073f21f029df1517c2c0a1abea5b0ce upstream.

The parameter "direct" can either be true or false, and all of the
callers pass in a bool variable or true/false literal, so just use the
type bool.

No functional change intended.

Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220516232138.1783324-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9b1f63b5e86e..97705c28a97e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1659,7 +1659,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
 	mmu_spte_clear_no_track(parent_pte);
 }
 
-static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
+static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, bool direct)
 {
 	struct kvm_mmu_page *sp;
 
@@ -2021,7 +2021,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gfn_t gfn,
 					     gva_t gaddr,
 					     unsigned level,
-					     int direct,
+					     bool direct,
 					     unsigned int access)
 {
 	bool direct_mmu = vcpu->arch.mmu->direct_map;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 06/17] KVM: x86/mmu: Stop passing "direct" to mmu_alloc_root()
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (4 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 05/17] KVM: x86/mmu: Use a bool for direct Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 07/17] KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce indentation Paolo Bonzini
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: David Matlack, Lai Jiangshan

From: David Matlack <dmatlack@google.com>

commit 86938ab6925b8fe174ca6abf397e6ea9d3c054a4 upstream.

The "direct" argument is vcpu->arch.mmu->root_role.direct,
because unlike non-root page tables, it's impossible to have
a direct root in an indirect MMU.  So just use that.

Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220516232138.1783324-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 97705c28a97e..b168b804ac7f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3224,8 +3224,9 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
 }
 
 static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
-			    u8 level, bool direct)
+			    u8 level)
 {
+	bool direct = vcpu->arch.mmu->mmu_role.base.direct;
 	struct kvm_mmu_page *sp;
 
 	sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL);
@@ -3245,14 +3246,14 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
 		mmu->root_hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
-		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
+		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
 		mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
 			MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
 
 			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
-					      i << 30, PT32_ROOT_LEVEL, true);
+					      i << 30, PT32_ROOT_LEVEL);
 			mmu->pae_root[i] = root | PT_PRESENT_MASK;
 		}
 		mmu->root_hpa = __pa(mmu->pae_root);
@@ -3287,7 +3288,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		MMU_WARN_ON(VALID_PAGE(mmu->root_hpa));
 
 		root = mmu_alloc_root(vcpu, root_gfn, 0,
-				      mmu->shadow_root_level, false);
+				      mmu->shadow_root_level);
 		mmu->root_hpa = root;
 		goto set_root_pgd;
 	}
@@ -3319,7 +3320,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		}
 
 		root = mmu_alloc_root(vcpu, root_gfn, i << 30,
-				      PT32_ROOT_LEVEL, false);
+				      PT32_ROOT_LEVEL);
 		mmu->pae_root[i] = root | pm_mask;
 	}
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 07/17] KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce indentation
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (5 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 06/17] KVM: x86/mmu: Stop passing "direct" to mmu_alloc_root() Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 08/17] KVM: X86: Fix missed remote tlb flush in rmap_write_protect() Paolo Bonzini
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Sean Christopherson, Isaku Yamahata

From: Sean Christopherson <sean.j.christopherson@intel.com>

commit 03fffc5493c8c8d850586df5740c985026c313bf upstream.

Employ a 'continue' to reduce the indentation for linking a new shadow
page during __direct_map() in preparation for linking private pages.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <702419686d5700373123f6ea84e7a946c2cad8b4.1625186503.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b168b804ac7f..161e05783629 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2891,15 +2891,16 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 			break;
 
 		drop_large_spte(vcpu, it.sptep);
-		if (!is_shadow_present_pte(*it.sptep)) {
-			sp = kvm_mmu_get_page(vcpu, base_gfn, it.addr,
-					      it.level - 1, true, ACC_ALL);
+		if (is_shadow_present_pte(*it.sptep))
+			continue;
 
-			link_shadow_page(vcpu, it.sptep, sp);
-			if (is_tdp && huge_page_disallowed &&
-			    req_level >= it.level)
-				account_huge_nx_page(vcpu->kvm, sp);
-		}
+		sp = kvm_mmu_get_page(vcpu, base_gfn, it.addr,
+				      it.level - 1, true, ACC_ALL);
+
+		link_shadow_page(vcpu, it.sptep, sp);
+		if (is_tdp && huge_page_disallowed &&
+		    req_level >= it.level)
+			account_huge_nx_page(vcpu->kvm, sp);
 	}
 
 	ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 08/17] KVM: X86: Fix missed remote tlb flush in rmap_write_protect()
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (6 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 07/17] KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce indentation Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 09/17] KVM: X86: Synchronize the shadow pagetable before link it Paolo Bonzini
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

commit f81602958c115fc7c87b985f71574042a20ff858 upstream.

When kvm->tlbs_dirty > 0, some rmaps might have been deleted
without flushing tlb remotely after kvm_sync_page().  If @gfn
was writable before and it's rmaps was deleted in kvm_sync_page(),
and if the tlb entry is still in a remote running VCPU,  the @gfn
is not safely protected.

To fix the problem, kvm_sync_page() does the remote flush when
needed to avoid the problem.

Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-2-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index c6daeeff1d9c..1500fc877aec 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1007,14 +1007,6 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gpa_t vaddr,
  * Using the cached information from sp->gfns is safe because:
  * - The spte has a reference to the struct page, so the pfn for a given gfn
  *   can't change unless all sptes pointing to it are nuked first.
- *
- * Note:
- *   We should flush all tlbs if spte is dropped even though guest is
- *   responsible for it. Since if we don't, kvm_mmu_notifier_invalidate_page
- *   and kvm_mmu_notifier_invalidate_range_start detect the mapping page isn't
- *   used by guest then tlbs are not flushed, so guest is allowed to access the
- *   freed pages.
- *   And we increase kvm->tlbs_dirty to delay tlbs flush in this case.
  */
 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
@@ -1044,13 +1036,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 			return 0;
 
 		if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
-			/*
-			 * Update spte before increasing tlbs_dirty to make
-			 * sure no tlb flush is lost after spte is zapped; see
-			 * the comments in kvm_flush_remote_tlbs().
-			 */
-			smp_wmb();
-			vcpu->kvm->tlbs_dirty++;
+			set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
 			continue;
 		}
 
@@ -1065,12 +1051,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		if (gfn != sp->gfns[i]) {
 			drop_spte(vcpu->kvm, &sp->spt[i]);
-			/*
-			 * The same as above where we are doing
-			 * prefetch_invalid_gpte().
-			 */
-			smp_wmb();
-			vcpu->kvm->tlbs_dirty++;
+			set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
 			continue;
 		}
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 09/17] KVM: X86: Synchronize the shadow pagetable before link it
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (7 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 08/17] KVM: X86: Fix missed remote tlb flush in rmap_write_protect() Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 10/17] KVM: x86/mmu: Derive shadow MMU page role from parent Paolo Bonzini
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

commit 65855ed8b03437e79e42f2a89a993206981ac6cb upstream.

If gpte is changed from non-present to present, the guest doesn't need
to flush tlb per SDM.  So the host must synchronze sp before
link it.  Otherwise the guest might use a wrong mapping.

For example: the guest first changes a level-1 pagetable, and then
links its parent to a new place where the original gpte is non-present.
Finally the guest can access the remapped area without flushing
the tlb.  The guest's behavior should be allowed per SDM, but the host
kvm mmu makes it wrong.

Fixes: 4731d4c7a077 ("KVM: MMU: out of sync shadow core")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-3-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c         | 17 ++++++++++-------
 arch/x86/kvm/mmu/paging_tmpl.h | 23 +++++++++++++++++++++--
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 161e05783629..6db07ebeb695 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1972,8 +1972,8 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 	} while (!sp->unsync_children);
 }
 
-static void mmu_sync_children(struct kvm_vcpu *vcpu,
-			      struct kvm_mmu_page *parent)
+static int mmu_sync_children(struct kvm_vcpu *vcpu,
+			     struct kvm_mmu_page *parent, bool can_yield)
 {
 	int i;
 	struct kvm_mmu_page *sp;
@@ -1999,12 +1999,18 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 		}
 		if (need_resched() || spin_needbreak(&vcpu->kvm->mmu_lock)) {
 			kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
+			if (!can_yield) {
+				kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+				return -EINTR;
+			}
+
 			cond_resched_lock(&vcpu->kvm->mmu_lock);
 			flush = false;
 		}
 	}
 
 	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
+	return 0;
 }
 
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
@@ -2073,9 +2079,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 		}
 
-		if (sp->unsync_children)
-			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-
 		__clear_sp_write_flooding_count(sp);
 
 trace_get_page:
@@ -3419,7 +3422,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 		spin_lock(&vcpu->kvm->mmu_lock);
 		kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
 
-		mmu_sync_children(vcpu, sp);
+		mmu_sync_children(vcpu, sp, true);
 
 		kvm_mmu_audit(vcpu, AUDIT_POST_SYNC);
 		spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3435,7 +3438,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 		if (root && VALID_PAGE(root)) {
 			root &= PT64_BASE_ADDR_MASK;
 			sp = to_shadow_page(root);
-			mmu_sync_children(vcpu, sp);
+			mmu_sync_children(vcpu, sp, true);
 		}
 	}
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 1500fc877aec..25d4484c78aa 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -667,8 +667,27 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
 			access = gw->pt_access[it.level - 2];
-			sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
-					      false, access);
+			sp = kvm_mmu_get_page(vcpu, table_gfn, addr,
+					      it.level-1, false, access);
+			/*
+			 * We must synchronize the pagetable before linking it
+			 * because the guest doesn't need to flush tlb when
+			 * the gpte is changed from non-present to present.
+			 * Otherwise, the guest may use the wrong mapping.
+			 *
+			 * For PG_LEVEL_4K, kvm_mmu_get_page() has already
+			 * synchronized it transiently via kvm_sync_page().
+			 *
+			 * For higher level pagetable, we synchronize it via
+			 * the slower mmu_sync_children().  If it needs to
+			 * break, some progress has been made; return
+			 * RET_PF_RETRY and retry on the next #PF.
+			 * KVM_REQ_MMU_SYNC is not necessary but it
+			 * expedites the process.
+			 */
+			if (sp->unsync_children &&
+			    mmu_sync_children(vcpu, sp, false))
+				return RET_PF_RETRY;
 		}
 
 		/*
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 10/17] KVM: x86/mmu: Derive shadow MMU page role from parent
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (8 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 09/17] KVM: X86: Synchronize the shadow pagetable before link it Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 11/17] KVM: x86/mmu: Always pass 0 for @quadrant when gptes are 8 bytes Paolo Bonzini
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Peter Xu, David Matlack

commit 2e65e842c57d72e9a573ba42bc2055b7f626ea1f upstream.

Instead of computing the shadow page role from scratch for every new
page, derive most of the information from the parent shadow page.  This
eliminates the dependency on the vCPU root role to allocate shadow page
tables, and reduces the number of parameters to kvm_mmu_get_page().

Preemptively split out the role calculation to a separate function for
use in a following commit.

Note that when calculating the MMU root role, we can take
@role.passthrough, @role.direct, and @role.access directly from
@vcpu->arch.mmu->root_role. Only @role.level and @role.quadrant still
must be overridden for PAE page directories, when shadowing 32-bit
guest page tables with PAE page tables.

No functional change intended.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220516232138.1783324-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c         | 99 ++++++++++++++++++++++------------
 arch/x86/kvm/mmu/paging_tmpl.h |  9 ++--
 2 files changed, 71 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6db07ebeb695..e4759156a2dc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2023,35 +2023,17 @@ static void clear_sp_write_flooding_count(u64 *spte)
 	__clear_sp_write_flooding_count(sptep_to_sp(spte));
 }
 
-static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
-					     gfn_t gfn,
-					     gva_t gaddr,
-					     unsigned level,
-					     bool direct,
-					     unsigned int access)
+static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gfn_t gfn,
+					     union kvm_mmu_page_role role)
 {
 	bool direct_mmu = vcpu->arch.mmu->direct_map;
-	union kvm_mmu_page_role role;
 	struct hlist_head *sp_list;
-	unsigned quadrant;
 	struct kvm_mmu_page *sp;
 	bool need_sync = false;
 	bool flush = false;
 	int collisions = 0;
 	LIST_HEAD(invalid_list);
 
-	role = vcpu->arch.mmu->mmu_role.base;
-	role.level = level;
-	role.direct = direct;
-	if (role.direct)
-		role.gpte_is_8_bytes = true;
-	role.access = access;
-	if (!direct_mmu && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) {
-		quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
-		quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
-		role.quadrant = quadrant;
-	}
-
 	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
 	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
 		if (sp->gfn != gfn) {
@@ -2088,22 +2070,22 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 
 	++vcpu->kvm->stat.mmu_cache_miss;
 
-	sp = kvm_mmu_alloc_page(vcpu, direct);
+	sp = kvm_mmu_alloc_page(vcpu, role.direct);
 
 	sp->gfn = gfn;
 	sp->role = role;
 	hlist_add_head(&sp->hash_link, sp_list);
-	if (!direct) {
+	if (!role.direct) {
 		/*
 		 * we should do write protection before syncing pages
 		 * otherwise the content of the synced shadow page may
 		 * be inconsistent with guest page table.
 		 */
 		account_shadowed(vcpu->kvm, sp);
-		if (level == PG_LEVEL_4K && rmap_write_protect(vcpu, gfn))
+		if (role.level == PG_LEVEL_4K && rmap_write_protect(vcpu, gfn))
 			kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);
 
-		if (level > PG_LEVEL_4K && need_sync)
+		if (role.level > PG_LEVEL_4K && need_sync)
 			flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
 	}
 	trace_kvm_mmu_get_page(sp, true);
@@ -2115,6 +2097,54 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	return sp;
 }
 
+static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, unsigned int access)
+{
+	struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
+	union kvm_mmu_page_role role;
+
+	role = parent_sp->role;
+	role.level--;
+	role.access = access;
+	role.direct = direct;
+
+	/*
+	 * If the guest has 4-byte PTEs then that means it's using 32-bit,
+	 * 2-level, non-PAE paging. KVM shadows such guests with PAE paging
+	 * (i.e. 8-byte PTEs). The difference in PTE size means that KVM must
+	 * shadow each guest page table with multiple shadow page tables, which
+	 * requires extra bookkeeping in the role.
+	 *
+	 * Specifically, to shadow the guest's page directory (which covers a
+	 * 4GiB address space), KVM uses 4 PAE page directories, each mapping
+	 * 1GiB of the address space. @role.quadrant encodes which quarter of
+	 * the address space each maps.
+	 *
+	 * To shadow the guest's page tables (which each map a 4MiB region), KVM
+	 * uses 2 PAE page tables, each mapping a 2MiB region. For these,
+	 * @role.quadrant encodes which half of the region they map.
+	 *
+	 * Note, the 4 PAE page directories are pre-allocated and the quadrant
+	 * assigned in mmu_alloc_root(). So only page tables need to be handled
+	 * here.
+	 */
+	if (!role.gpte_is_8_bytes) {
+		WARN_ON_ONCE(role.level != PG_LEVEL_4K);
+		role.quadrant = (sptep - parent_sp->spt) % 2;
+	}
+
+	return role;
+}
+
+static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
+						 u64 *sptep, gfn_t gfn,
+						 bool direct, unsigned int access)
+{
+	union kvm_mmu_page_role role;
+
+	role = kvm_mmu_child_role(sptep, direct, access);
+	return kvm_mmu_get_page(vcpu, gfn, role);
+}
+
 static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
 					struct kvm_vcpu *vcpu, hpa_t root,
 					u64 addr)
@@ -2897,8 +2927,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		if (is_shadow_present_pte(*it.sptep))
 			continue;
 
-		sp = kvm_mmu_get_page(vcpu, base_gfn, it.addr,
-				      it.level - 1, true, ACC_ALL);
+		sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL);
 
 		link_shadow_page(vcpu, it.sptep, sp);
 		if (is_tdp && huge_page_disallowed &&
@@ -3227,13 +3256,18 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
 	return ret;
 }
 
-static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
+static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 			    u8 level)
 {
-	bool direct = vcpu->arch.mmu->mmu_role.base.direct;
+	union kvm_mmu_page_role role = vcpu->arch.mmu->mmu_role.base;
 	struct kvm_mmu_page *sp;
 
-	sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL);
+	role.level = level;
+
+	if (!role.gpte_is_8_bytes)
+		role.quadrant = quadrant;
+
+	sp = kvm_mmu_get_page(vcpu, gfn, role);
 	++sp->root_count;
 
 	return __pa(sp->spt);
@@ -3256,8 +3290,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		for (i = 0; i < 4; ++i) {
 			MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
 
-			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
-					      i << 30, PT32_ROOT_LEVEL);
+			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), i,
+					      PT32_ROOT_LEVEL);
 			mmu->pae_root[i] = root | PT_PRESENT_MASK;
 		}
 		mmu->root_hpa = __pa(mmu->pae_root);
@@ -3323,8 +3357,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 				return 1;
 		}
 
-		root = mmu_alloc_root(vcpu, root_gfn, i << 30,
-				      PT32_ROOT_LEVEL);
+		root = mmu_alloc_root(vcpu, root_gfn, i, PT32_ROOT_LEVEL);
 		mmu->pae_root[i] = root | pm_mask;
 	}
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 25d4484c78aa..d5facee0db60 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -667,8 +667,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
 			access = gw->pt_access[it.level - 2];
-			sp = kvm_mmu_get_page(vcpu, table_gfn, addr,
-					      it.level-1, false, access);
+			sp = kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn,
+						  false, access);
+
 			/*
 			 * We must synchronize the pagetable before linking it
 			 * because the guest doesn't need to flush tlb when
@@ -726,8 +727,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 		drop_large_spte(vcpu, it.sptep);
 
 		if (!is_shadow_present_pte(*it.sptep)) {
-			sp = kvm_mmu_get_page(vcpu, base_gfn, addr,
-					      it.level - 1, true, direct_access);
+			sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn,
+						  true, direct_access);
 			link_shadow_page(vcpu, it.sptep, sp);
 			if (huge_page_disallowed && req_level >= it.level)
 				account_huge_nx_page(vcpu->kvm, sp);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 11/17] KVM: x86/mmu: Always pass 0 for @quadrant when gptes are 8 bytes
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (9 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 10/17] KVM: x86/mmu: Derive shadow MMU page role from parent Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 13:28   ` sashiko-bot
  2026-06-26 11:26 ` [PATCH 5.10.y 12/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots Paolo Bonzini
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: David Matlack

commit 7f49777550e55a7d6832cbb0873f48f91c175b9c upstream.

The quadrant is only used when gptes are 4 bytes, but
mmu_alloc_{direct,shadow}_roots() pass in a non-zero quadrant for PAE
page directories regardless. Make this less confusing by only passing in
a non-zero quadrant when it is actually necessary.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220516232138.1783324-6-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e4759156a2dc..76d87da1d071 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3263,9 +3263,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 	struct kvm_mmu_page *sp;
 
 	role.level = level;
+	role.quadrant = quadrant;
 
-	if (!role.gpte_is_8_bytes)
-		role.quadrant = quadrant;
+	WARN_ON_ONCE(quadrant && role.gpte_is_8_bytes);
+	WARN_ON_ONCE(role.direct && !role.gpte_is_8_bytes);
 
 	sp = kvm_mmu_get_page(vcpu, gfn, role);
 	++sp->root_count;
@@ -3290,7 +3291,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		for (i = 0; i < 4; ++i) {
 			MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
 
-			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), i,
+			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), 0,
 					      PT32_ROOT_LEVEL);
 			mmu->pae_root[i] = root | PT_PRESENT_MASK;
 		}
@@ -3309,8 +3310,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	u64 pdptr, pm_mask;
 	gfn_t root_gfn, root_pgd;
+	int quadrant, i;
 	hpa_t root;
-	int i;
 
 	root_pgd = mmu->get_guest_pgd(vcpu);
 	root_gfn = root_pgd >> PAGE_SHIFT;
@@ -3357,7 +3358,15 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 				return 1;
 		}
 
-		root = mmu_alloc_root(vcpu, root_gfn, i, PT32_ROOT_LEVEL);
+		/*
+		 * If shadowing 32-bit non-PAE page tables, each PAE page
+		 * directory maps one quarter of the guest's non-PAE page
+		 * directory. Othwerise each PAE page direct shadows one guest
+		 * PAE page directory so that quadrant should be 0.
+		 */
+		quadrant = !mmu->mmu_role.base.gpte_is_8_bytes ? i : 0;
+
+		root = mmu_alloc_root(vcpu, root_gfn, quadrant, PT32_ROOT_LEVEL);
 		mmu->pae_root[i] = root | pm_mask;
 	}
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 12/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (10 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 11/17] KVM: x86/mmu: Always pass 0 for @quadrant when gptes are 8 bytes Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 13:44   ` sashiko-bot
  2026-06-26 11:26 ` [PATCH 5.10.y 13/17] KVM: x86/mmu: pull call to drop_large_spte() into __link_shadow_page() Paolo Bonzini
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

commit 6e0918aec49a5f89ca22c60c60cb5d20d8c9af29 upstream.

Check the validity of the PDPTRs before allocating any of the PAE roots,
otherwise a bad PDPTR will cause KVM to leak any previously allocated
roots.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 76d87da1d071..5df1cd5bff1b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3308,7 +3308,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
-	u64 pdptr, pm_mask;
+	u64 pdptrs[4], pm_mask;
 	gfn_t root_gfn, root_pgd;
 	int quadrant, i;
 	hpa_t root;
@@ -3319,6 +3319,17 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	if (mmu_check_root(vcpu, root_gfn))
 		return 1;
 
+	if (mmu->root_level == PT32E_ROOT_LEVEL) {
+		for (i = 0; i < 4; ++i) {
+			pdptrs[i] = mmu->get_pdptr(vcpu, i);
+			if (!(pdptrs[i] & PT_PRESENT_MASK))
+				continue;
+
+			if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
+				return 1;
+		}
+	}
+
 	/*
 	 * Do we shadow a long mode page table? If so we need to
 	 * write-protect the guests page table root.
@@ -3348,14 +3359,11 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
 
 		if (mmu->root_level == PT32E_ROOT_LEVEL) {
-			pdptr = mmu->get_pdptr(vcpu, i);
-			if (!(pdptr & PT_PRESENT_MASK)) {
+			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
 				mmu->pae_root[i] = 0;
 				continue;
 			}
-			root_gfn = pdptr >> PAGE_SHIFT;
-			if (mmu_check_root(vcpu, root_gfn))
-				return 1;
+			root_gfn = pdptrs[i] >> PAGE_SHIFT;
 		}
 
 		/*
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 13/17] KVM: x86/mmu: pull call to drop_large_spte() into __link_shadow_page()
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (11 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 12/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 14:01   ` sashiko-bot
  2026-06-26 11:26 ` [PATCH 5.10.y 14/17] KVM: x86: Fix shadow paging use-after-free due to unexpected GFN Paolo Bonzini
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Sean Christopherson

commit 0cd8dc739833080aa0813cbd94d907a93e3a14c3 upstream.

Before allocating a child shadow page table, all callers check
whether the parent already points to a huge page and, if so, they
drop that SPTE.  This is done by drop_large_spte().

However, dropping the large SPTE is really only necessary before the
sp is installed.  While the sp is returned by kvm_mmu_get_child_sp(),
installing it happens later in __link_shadow_page().  Move the call
there instead of having it in each and every caller.

To ensure that the shadow page is not linked twice if it was present,
do _not_ opportunistically make kvm_mmu_get_child_sp() idempotent:
instead, return an error value if the shadow page already existed.
This is a bit more verbose, but clearer than NULL.

Finally, now that the drop_large_spte() name is not taken anymore,
remove the two underscores in front of __drop_large_spte().

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c         | 51 +++++++++++++++++++---------------
 arch/x86/kvm/mmu/paging_tmpl.h | 29 +++++++++----------
 2 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5df1cd5bff1b..47c5c3613b68 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1067,27 +1067,17 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 		rmap_remove(kvm, sptep);
 }
 
-
-static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
+static void drop_large_spte(struct kvm *kvm, u64 *sptep)
 {
-	if (is_large_pte(*sptep)) {
-		WARN_ON(sptep_to_sp(sptep)->role.level == PG_LEVEL_4K);
-		drop_spte(kvm, sptep);
-		--kvm->stat.lpages;
-		return true;
-	}
+	struct kvm_mmu_page *sp;
 
-	return false;
-}
+	sp = sptep_to_sp(sptep);
+	WARN_ON(sp->role.level == PG_LEVEL_4K);
 
-static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
-{
-	if (__drop_large_spte(vcpu->kvm, sptep)) {
-		struct kvm_mmu_page *sp = sptep_to_sp(sptep);
-
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
+	drop_spte(kvm, sptep);
+	--kvm->stat.lpages;
+	kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
 			KVM_PAGES_PER_HPAGE(sp->role.level));
-	}
 }
 
 /*
@@ -2141,6 +2131,9 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
 {
 	union kvm_mmu_page_role role;
 
+	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
+		return ERR_PTR(-EEXIST);
+
 	role = kvm_mmu_child_role(sptep, direct, access);
 	return kvm_mmu_get_page(vcpu, gfn, role);
 }
@@ -2208,13 +2201,21 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
 	__shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
-			     struct kvm_mmu_page *sp)
+static void __link_shadow_page(struct kvm_vcpu *vcpu,
+			       struct kvm_mmu_memory_cache *cache, u64 *sptep,
+			       struct kvm_mmu_page *sp)
 {
 	u64 spte;
 
 	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
 
+	/*
+	 * If an SPTE is present already, it must be a leaf and therefore
+	 * a large one.  Drop it and flush the TLB before installing sp.
+	 */
+	if (is_shadow_present_pte(*sptep))
+		drop_large_spte(vcpu->kvm, sptep);
+
 	spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
 
 	mmu_spte_set(sptep, spte);
@@ -2225,6 +2226,12 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
 		mark_unsync(sptep);
 }
 
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+			     struct kvm_mmu_page *sp)
+{
+	__link_shadow_page(vcpu, &vcpu->arch.mmu_pte_list_desc_cache, sptep, sp);
+}
+
 static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 				   unsigned direct_access)
 {
@@ -2923,11 +2930,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		if (it.level == level)
 			break;
 
-		drop_large_spte(vcpu, it.sptep);
-		if (is_shadow_present_pte(*it.sptep))
-			continue;
-
 		sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL);
+		if (sp == ERR_PTR(-EEXIST))
+			continue;
 
 		link_shadow_page(vcpu, it.sptep, sp);
 		if (is_tdp && huge_page_disallowed &&
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index d5facee0db60..250b4b0f6b68 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -661,15 +661,13 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 		gfn_t table_gfn;
 
 		clear_sp_write_flooding_count(it.sptep);
-		drop_large_spte(vcpu, it.sptep);
 
-		sp = NULL;
-		if (!is_shadow_present_pte(*it.sptep)) {
-			table_gfn = gw->table_gfn[it.level - 2];
-			access = gw->pt_access[it.level - 2];
-			sp = kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn,
-						  false, access);
+		table_gfn = gw->table_gfn[it.level - 2];
+		access = gw->pt_access[it.level - 2];
+		sp = kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn,
+					  false, access);
 
+		if (sp != ERR_PTR(-EEXIST)) {
 			/*
 			 * We must synchronize the pagetable before linking it
 			 * because the guest doesn't need to flush tlb when
@@ -698,7 +696,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 		if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))
 			goto out_gpte_changed;
 
-		if (sp)
+		if (sp != ERR_PTR(-EEXIST))
 			link_shadow_page(vcpu, it.sptep, sp);
 	}
 
@@ -724,15 +722,14 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 
 		validate_direct_spte(vcpu, it.sptep, direct_access);
 
-		drop_large_spte(vcpu, it.sptep);
+		sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn,
+					  true, direct_access);
+		if (sp == ERR_PTR(-EEXIST))
+			continue;
 
-		if (!is_shadow_present_pte(*it.sptep)) {
-			sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn,
-						  true, direct_access);
-			link_shadow_page(vcpu, it.sptep, sp);
-			if (huge_page_disallowed && req_level >= it.level)
-				account_huge_nx_page(vcpu->kvm, sp);
-		}
+		link_shadow_page(vcpu, it.sptep, sp);
+		if (huge_page_disallowed && req_level >= it.level)
+			account_huge_nx_page(vcpu->kvm, sp);
 	}
 
 	ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 14/17] KVM: x86: Fix shadow paging use-after-free due to unexpected GFN
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (12 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 13/17] KVM: x86/mmu: pull call to drop_large_spte() into __link_shadow_page() Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 15/17] KVM: x86: Fix shadow paging use-after-free due to unexpected role Paolo Bonzini
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable
  Cc: Sean Christopherson, Alexander Bulekov, Fred Griffoul

From: Sean Christopherson <seanjc@google.com>

commit 0cb2af2ea66ad8ff195c156ea690f11216285bdf upstream.

The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus
the SPTE index. This assumption breaks for shadow paging if the guest
page tables are modified between VM entries (similar to commit
aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even
when creating an MMIO SPTE", 2026-03-27).  The flow is as follows:

- a PDE is installed for a 2MB mapping, and a page in that area is
  accessed.  KVM creates a kvm_mmu_page consisting of 512 4KB pages;
  the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because
  the guest's mapping is a huge page (and thus contiguous).

- the PDE mapping is changed from outside the guest.

- the guest accesses another page in the same 2MB area.  KVM installs
  a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN
  (i.e. based on the new mapping, as changed in the previous step) but
  that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore
  the rmap entry cannot be found and removed when the kvm_mmu_page
  is zapped.

- the memslot that covers the first 2MB mapping is deleted, and the
  kvm_mmu_page for the now-invalid GPA is zapped.  However, rmap_remove()
  only looks at the [sp->gfn, sp->gfn + 511] range established in step 1,
  and fails to find the rmap entry that was recorded by step 3.

- any operation that causes an rmap walk for the same page accessed
  by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page.
  This includes dirty logging or MMU notifier invalidations (e.g., from
  MADV_DONTNEED).

The underlying issue is that KVM's walking of shadow PTEs assumes that
if a SPTE is present when KVM wants to install a non-leaf SPTE, then the
existing kvm_mmu_page must be for the correct gfn.  Because the only way
for the gfn to be wrong is if KVM messed up and failed to zap a SPTE...
which shouldn't happen, but *actually* only happens in response to a
guest write.

That bug dates back literally forever, as even the first version of KVM
assumes that the GFN matches and walks into the "wrong" shadow page.
However, that was only an imprecision until 2032a93d66fa ("KVM: MMU:
Don't allocate gfns page for direct mmu pages") came along.

Fix it by checking for a target gfn mismatch and zapping the existing
SPTE.  That way the old SP and rmap entries are gone, KVM installs
the rmap in the right location, and everyone is happy.

Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages")
Fixes: 6aa8b732ca01 ("kvm: userspace interface")
Reported-by: Alexander Bulekov <bkov@amazon.com>
Reported-by: Fred Griffoul <fgriffo@amazon.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20260503201029.106481-1-pbonzini@redhat.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c  | 34 ++++++++++++++--------------------
 arch/x86/kvm/mmu/spte.h |  5 +++++
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 47c5c3613b68..b669a847e007 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -176,6 +176,8 @@ static struct percpu_counter kvm_total_used_mmu_pages;
 static void mmu_spte_set(u64 *sptep, u64 spte);
 static union kvm_mmu_page_role
 kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
+static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
+			    u64 *spte, struct list_head *invalid_list);
 
 #define CREATE_TRACE_POINTS
 #include "mmutrace.h"
@@ -1067,19 +1069,6 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 		rmap_remove(kvm, sptep);
 }
 
-static void drop_large_spte(struct kvm *kvm, u64 *sptep)
-{
-	struct kvm_mmu_page *sp;
-
-	sp = sptep_to_sp(sptep);
-	WARN_ON(sp->role.level == PG_LEVEL_4K);
-
-	drop_spte(kvm, sptep);
-	--kvm->stat.lpages;
-	kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
-			KVM_PAGES_PER_HPAGE(sp->role.level));
-}
-
 /*
  * Write-protect on the specified @sptep, @pt_protect indicates whether
  * spte write-protection is caused by protecting shadow page table.
@@ -2131,7 +2120,8 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
 {
 	union kvm_mmu_page_role role;
 
-	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
+	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep) &&
+	    spte_to_child_sp(*sptep) && spte_to_child_sp(*sptep)->gfn == gfn)
 		return ERR_PTR(-EEXIST);
 
 	role = kvm_mmu_child_role(sptep, direct, access);
@@ -2209,12 +2199,16 @@ static void __link_shadow_page(struct kvm_vcpu *vcpu,
 
 	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
 
-	/*
-	 * If an SPTE is present already, it must be a leaf and therefore
-	 * a large one.  Drop it and flush the TLB before installing sp.
-	 */
-	if (is_shadow_present_pte(*sptep))
-		drop_large_spte(vcpu->kvm, sptep);
+	if (is_shadow_present_pte(*sptep)) {
+		struct kvm_mmu_page *parent_sp;
+		LIST_HEAD(invalid_list);
+
+		parent_sp = sptep_to_sp(sptep);
+		WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K);
+
+		mmu_page_zap_pte(vcpu->kvm, parent_sp, sptep, &invalid_list);
+		kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, true);
+	}
 
 	spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 667f207d3d09..01fd29c91141 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -203,6 +203,11 @@ static inline bool is_executable_pte(u64 spte)
 	return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
 }
 
+static inline struct kvm_mmu_page *spte_to_child_sp(u64 spte)
+{
+	return to_shadow_page(spte & PT64_BASE_ADDR_MASK);
+}
+
 static inline kvm_pfn_t spte_to_pfn(u64 pte)
 {
 	return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 15/17] KVM: x86: Fix shadow paging use-after-free due to unexpected role
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (13 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 14/17] KVM: x86: Fix shadow paging use-after-free due to unexpected GFN Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 16/17] KVM: x86/mmu: Pass the memslot to the rmap callbacks Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 17/17] KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level Paolo Bonzini
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Hyunwoo Kim

Commit 0cb2af2ea66ad ("KVM: x86: Fix shadow paging use-after-free due
to unexpected GFN") fixed a shadow paging mismatch between stored and
computed GFNs; the bug could be triggered by changing a PDE mapping from
outside the guest, and then deleting a memslot.  The rmap_remove()
call would miss entries created after the PDE change because the GFN
of the leaf SPTE does not match the GFN of the struct kvm_mmu_page.

A similar hole however remains if the modified PDE points to a non-leaf
page.  In this case the gfn can be made to match, but the role does not
match: the original large 2MB page creates a kvm_mmu_page with direct=1,
while the new 4KB needs a kvm_mmu_page with direct=0.  However,
kvm_mmu_get_child_sp() does not compare the role, and therefore reuses
the page.

The next step is installing a leaf (4KB) SPTE on the new path which
records an rmap entry under the gfn resolved by the walk.  But when
that child is zapped its parent kvm_mmu_page has direct=1 and
kvm_mmu_page_get_gfn() computes the gfn for the 4KB page as
sp->gfn + index instead of using sp->shadowed_translation[] (or sp->gfns[]
in older kernels).  It therefore fails to remove the recorded entry.

When the memslot is dropped the shadow page is freed but the rmap
entry survives, as in the scenario that was already fixed.  Code that
later walks that gfn (dirty logging, MMU notifier invalidation, and
so on) dereferences an sptep that lies in the freed page, causing the
use-after-free.

Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b669a847e007..276cf62eb94d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2118,13 +2118,15 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
 						 u64 *sptep, gfn_t gfn,
 						 bool direct, unsigned int access)
 {
-	union kvm_mmu_page_role role;
+	union kvm_mmu_page_role role = kvm_mmu_child_role(sptep, direct, access);
 
-	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep) &&
-	    spte_to_child_sp(*sptep) && spte_to_child_sp(*sptep)->gfn == gfn)
+	if (is_shadow_present_pte(*sptep) &&
+	    !is_large_pte(*sptep) &&
+	    spte_to_child_sp(*sptep) &&
+	    spte_to_child_sp(*sptep)->gfn == gfn &&
+	    spte_to_child_sp(*sptep)->role.word == role.word)
 		return ERR_PTR(-EEXIST);
 
-	role = kvm_mmu_child_role(sptep, direct, access);
 	return kvm_mmu_get_page(vcpu, gfn, role);
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 16/17] KVM: x86/mmu: Pass the memslot to the rmap callbacks
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (14 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 15/17] KVM: x86: Fix shadow paging use-after-free due to unexpected role Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 11:26 ` [PATCH 5.10.y 17/17] KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level Paolo Bonzini
  16 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

commit 0a234f5dd06582e82edec7cf17a0f971c5a4142e upstream.

Pass the memslot to the rmap callbacks, it will be used when zapping
collapsible SPTEs to verify the memslot is compatible with hugepages
before zapping its SPTEs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 276cf62eb94d..39186d695269 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1140,7 +1140,8 @@ static bool spte_wrprot_for_clear_dirty(u64 *sptep)
  *	- W bit on ad-disabled SPTEs.
  * Returns true iff any D or W bits were cleared.
  */
-static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			       struct kvm_memory_slot *slot)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -1171,7 +1172,8 @@ static bool spte_set_dirty(u64 *sptep)
 	return mmu_spte_update(sptep, spte);
 }
 
-static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			     struct kvm_memory_slot *slot)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -1235,7 +1237,7 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 	while (mask) {
 		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
 					  PG_LEVEL_4K, slot);
-		__rmap_clear_dirty(kvm, rmap_head);
+		__rmap_clear_dirty(kvm, rmap_head, slot);
 
 		/* clear the first set bit */
 		mask &= mask - 1;
@@ -1291,7 +1293,8 @@ static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
 	return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn);
 }
 
-static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			  struct kvm_memory_slot *slot)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -1311,7 +1314,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			   struct kvm_memory_slot *slot, gfn_t gfn, int level,
 			   unsigned long data)
 {
-	return kvm_zap_rmapp(kvm, rmap_head);
+	return kvm_zap_rmapp(kvm, rmap_head, slot);
 }
 
 static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
@@ -5298,7 +5301,8 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
 EXPORT_SYMBOL_GPL(kvm_configure_mmu);
 
 /* The return value indicates if tlb flush on all vcpus is needed. */
-typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
+typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+				    struct kvm_memory_slot *slot);
 
 /* The caller should hold mmu-lock before calling this function. */
 static __always_inline bool
@@ -5312,7 +5316,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
 			end_gfn, &iterator) {
 		if (iterator.rmap)
-			flush |= fn(kvm, iterator.rmap);
+			flush |= fn(kvm, iterator.rmap, memslot);
 
 		if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
 			if (flush && lock_flush_tlb) {
@@ -5605,7 +5609,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 }
 
 static bool slot_rmap_write_protect(struct kvm *kvm,
-				    struct kvm_rmap_head *rmap_head)
+				    struct kvm_rmap_head *rmap_head,
+				    struct kvm_memory_slot *slot)
 {
 	return __rmap_write_protect(kvm, rmap_head, false);
 }
@@ -5639,7 +5644,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 }
 
 static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
-					 struct kvm_rmap_head *rmap_head)
+					 struct kvm_rmap_head *rmap_head,
+					 struct kvm_memory_slot *slot)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5.10.y 17/17] KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level
  2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
                   ` (15 preceding siblings ...)
  2026-06-26 11:26 ` [PATCH 5.10.y 16/17] KVM: x86/mmu: Pass the memslot to the rmap callbacks Paolo Bonzini
@ 2026-06-26 11:26 ` Paolo Bonzini
  2026-06-26 14:53   ` sashiko-bot
  16 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2026-06-26 11:26 UTC (permalink / raw)
  To: linux-kernel, kvm, stable
  Cc: Sean Christopherson, David Matlack, James Houghton,
	Alexander Bulekov, Fred Griffoul, Alexander Graf, David Woodhouse,
	Filippo Sironi, Ivan Orlov

From: Sean Christopherson <seanjc@google.com>

commit ef057cbf825e03b63f6edf5980f96abf3c53089d upstream.

When recovering hugepages in the shadow MMU, verify that the base gfn of
the shadow page is actually contained within the target memslot, *before*
querying the max mapping level given the shadow page's gfn.  Failure to
pre-check the validity of the gfn can lead to an out-of-bounds access to
the slot's lpage_info (which typically manifests as a host #PF because the
lpage_info is vmalloc'd) if the guest creates a hugepage mapping (in its
PTEs) that extends "below" the bounds of a memslot.

When faulting in memory for a guest, and the size of the guest mapping is
greater than KVM's (current) max mapping, then KVM will create a "direct"
shadow page (direct in that there are no gPTEs to shadow, and so the target
gfn is a direct calculation given the base gfn of the shadow page).  The
hugepage recovery flow looks for such direct shadow pages, as forcing 4KiB
mappings when dirty logging generates the guest > host mapping size case.
When the 4KiB restriction is lifted, then KVM can replace the shadow page
with a hugepage.

But if KVM originally used a smaller mapping than the guest because the
range of memory covered by the guest hugepage exceeds the bounds of a
memslot, then KVM will link a direct shadow page with a gfn that is outside
the bounds of the memslot being used to fault in memory.  The rmap entry
added for the leaf mapping is correct and within bounds, but the gfn of the
leaf SPTE's parent shadow page will be out of bounds.

  BUG: unable to handle page fault for address: ffffc90000806ffc
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 100000067 P4D 100000067 PUD 1002a7067 PMD 10612f067 PTE 0
  Oops: Oops: 0000 [#1] SMP
  CPU: 13 UID: 1000 PID: 757 Comm: mmu_stress_test Not tainted 7.1.0-rc1-48ce1e26eace-x86_pir_to_irr_comments-vm #341 PREEMPT
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_mmu_max_mapping_level+0x79/0x2b0 [kvm]
  Call Trace:
   <TASK>
   kvm_mmu_recover_huge_pages+0x21b/0x320 [kvm]
   kvm_set_memslot+0x1ee/0x590 [kvm]
   kvm_set_memory_region.part.0+0x3a1/0x4d0 [kvm]
   kvm_vm_ioctl+0x9bf/0x15d0 [kvm]
   __x64_sys_ioctl+0x8a/0xd0
   do_syscall_64+0xb7/0xbb0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f21c0f1a9bf
   </TASK>

Don't bother pre-checking the bounds of the potential hugepage, i.e. don't
check that e.g. sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1) is also
within the memslot, as the checks performed by kvm_mmu_max_mapping_level()
are a superset of the basic bounds checks.  I.e. pre-checking the full
range would be a dubious micro-optimization.

Fixes: 9eba50f8d7fc ("KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Alexander Bulekov <bkov@amazon.com>
Cc: Fred Griffoul <fgriffo@amazon.co.uk>
Cc: Alexander Graf <graf@amazon.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Filippo Sironi <sironi@amazon.de>
Cc: Ivan Orlov <iorlov@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c   | 19 +++++++++++++------
 include/linux/kvm_host.h |  5 +++++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 39186d695269..b7e8618d3df5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5659,13 +5659,20 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		pfn = spte_to_pfn(*sptep);
 
 		/*
-		 * We cannot do huge page mapping for indirect shadow pages,
-		 * which are found on the last rmap (level = 1) when not using
-		 * tdp; such shadow pages are synced with the page table in
-		 * the guest, and the guest page table is using 4K page size
-		 * mapping if the indirect sp has level = 1.
+		 * Direct shadow page can be replaced by a hugepage if the host
+		 * mapping level allows it and the memslot maps all of the host
+		 * hugepage.  Note!  If the memslot maps only part of the
+		 * hugepage, sp->gfn may be below slot->base_gfn, and querying
+		 * the max mapping level would cause an out-of-bounds lpage_info
+		 * access.  So the gfn bounds check *must* be done first.
+		 *
+		 * Indirect shadow pages are created when the guest page tables
+		 * are using 4K pages.  Since the host mapping is always
+		 * constrained by the page size in the guest, indirect shadow
+		 * pages are never collapsible.
 		 */
-		if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
+		if (sp->role.direct && is_gfn_in_memslot(slot, sp->gfn) &&
+		    !kvm_is_reserved_pfn(pfn) &&
 		    (kvm_is_zone_device_pfn(pfn) ||
 		     PageCompound(pfn_to_page(pfn)))) {
 			pte_list_remove(rmap_head, sptep);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6b8b562407a0..b737604a676e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1092,6 +1092,11 @@ int kvm_request_irq_source_id(struct kvm *kvm);
 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
 bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args);
 
+static inline bool is_gfn_in_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	return gfn >= slot->base_gfn && gfn < slot->base_gfn + slot->npages;
+}
+
 /*
  * search_memslots() and __gfn_to_memslot() are here because they are
  * used in non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c.
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 5.10.y 04/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  2026-06-26 11:26 ` [PATCH 5.10.y 04/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Paolo Bonzini
@ 2026-06-26 12:26   ` sashiko-bot
  2026-06-26 17:54   ` Sasha Levin
  1 sibling, 0 replies; 24+ messages in thread
From: sashiko-bot @ 2026-06-26 12:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] The patch introduces a sleep-in-atomic bug by holding `mmu_lock` (a spinlock) while reading guest PDPTRs, which can sleep.
--

commit 2d0be26b413397b6f1a85adc0319fa2727427ab6
Author: Sean Christopherson <seanjc@google.com>

KVM: x86/mmu: Ensure MMU pages are available when allocating roots

This patch modifies the root allocation flow to hold the mmu_lock for write
for the entire duration of allocating and initializing an MMU's roots. This
prevents root allocations from failing by ensuring MMU pages are available.

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a506c0818e77a..9b1f63b5e86ec 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c

[ ... ]

> @@ -3315,6 +3306,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  
>  	for (i = 0; i < 4; ++i) {
>  		MMU_WARN_ON(VALID_PAGE(mmu->pae_root[i]));
> +
>  		if (mmu->root_level == PT32E_ROOT_LEVEL) {
>  			pdptr = mmu->get_pdptr(vcpu, i);

[Severity: High]
Does this code sleep while holding a spinlock?

Since kvm_mmu_load() now holds vcpu->kvm->mmu_lock when calling
mmu_alloc_shadow_roots(), this ends up calling mmu->get_pdptr()
under a spinlock.

On SVM, this resolves to kvm_pdptr_read(), which can sleep:

arch/x86/kvm/kvm_cache_regs.h:kvm_pdptr_read() {
    might_sleep();
    ...
}

Could this lead to a "scheduling while atomic" panic?

>  			if (!(pdptr & PT_PRESENT_MASK)) {

[ ... ]

> @@ -4872,7 +4854,15 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  	r = mmu_alloc_special_roots(vcpu);
>  	if (r)
>  		goto out;
> -	r = mmu_alloc_roots(vcpu);
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +	if (make_mmu_pages_available(vcpu))
> +		r = -ENOSPC;
> +	else if (vcpu->arch.mmu->direct_map)
> +		r = mmu_alloc_direct_roots(vcpu);
> +	else
> +		r = mmu_alloc_shadow_roots(vcpu);
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
>  	kvm_mmu_sync_roots(vcpu);
>  	if (r)
>  		goto out;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260626112634.1778506-1-pbonzini@redhat.com?part=4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5.10.y 11/17] KVM: x86/mmu: Always pass 0 for @quadrant when gptes are 8 bytes
  2026-06-26 11:26 ` [PATCH 5.10.y 11/17] KVM: x86/mmu: Always pass 0 for @quadrant when gptes are 8 bytes Paolo Bonzini
@ 2026-06-26 13:28   ` sashiko-bot
  0 siblings, 0 replies; 24+ messages in thread
From: sashiko-bot @ 2026-06-26 13:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] The newly added `WARN_ON_ONCE(role.direct && !role.gpte_is_8_bytes)` is factually incorrect and trivially triggerable by an unprivileged guest, leading to a host-side Denial of Service if `panic_on_warn` is enabled.

Pre-existing issues:
- [High] In direct MMU mappings for non-paging guests, child shadow page allocation incorrectly applies 32-bit quadrant logic and triggers a warning.
--

commit 890f97eba3156db8c16203bd93e23c5a69ffdd15
Author: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Always pass 0 for @quadrant when gptes are 8 bytes

The quadrant is only used when gptes are 4 bytes, but
mmu_alloc_{direct,shadow}_roots() pass in a non-zero quadrant for PAE
page directories regardless. Make this less confusing by only passing in
a non-zero quadrant when it is actually necessary.

Message-Id: <20220516232138.1783324-6-dmatlack@google.com>

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e4759156a2dc8..76d87da1d0712 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3263,9 +3263,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
>  	struct kvm_mmu_page *sp;
>  
>  	role.level = level;
> +	role.quadrant = quadrant;
>  
> -	if (!role.gpte_is_8_bytes)
> -		role.quadrant = quadrant;
> +	WARN_ON_ONCE(quadrant && role.gpte_is_8_bytes);
> +	WARN_ON_ONCE(role.direct && !role.gpte_is_8_bytes);

[Severity: High]
Can this new warning be triggered by an unprivileged guest?

If a guest runs with TDP disabled (Shadow MMU active) and disables paging
(e.g., boots in real mode), kvm_calc_shadow_mmu_root_page_role() sets
role.base.direct = 1 and role.base.gpte_is_8_bytes = 0.

When mmu_alloc_direct_roots() calls mmu_alloc_root() for a direct MMU
root, won't this unconditionally hit the WARN_ON_ONCE and potentially
cause a host panic if panic_on_warn is enabled?

>  	sp = kvm_mmu_get_page(vcpu, gfn, role);
>  	++sp->root_count;

[ ... ]

[Severity: High]
This isn't a bug introduced by this patch, but a prior patch in the series
appears to have introduced a similar issue for direct MMU mappings for
non-paging guests.

When a guest operates without paging using the Shadow MMU, the root shadow
page has role.direct = 1 and role.gpte_is_8_bytes = 0.

During page faults, __direct_map() allocates intermediate shadow pages by
calling kvm_mmu_get_child_sp() -> kvm_mmu_child_role():

arch/x86/kvm/mmu/mmu.c:kvm_mmu_child_role() {
        ...
        if (!role.gpte_is_8_bytes) {
                WARN_ON_ONCE(role.level != PG_LEVEL_4K);
                role.quadrant = (sptep - parent_sp->spt) % 2;
        }
        ...
}

Since role.gpte_is_8_bytes == 0, does this erroneously execute the 32-bit
guest PAE-shadowing path, triggering the warning when allocating level 2
shadow pages and incorrectly assigning a quadrant to direct mapping pages?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260626112634.1778506-1-pbonzini@redhat.com?part=11

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5.10.y 12/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots
  2026-06-26 11:26 ` [PATCH 5.10.y 12/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots Paolo Bonzini
@ 2026-06-26 13:44   ` sashiko-bot
  0 siblings, 0 replies; 24+ messages in thread
From: sashiko-bot @ 2026-06-26 13:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Calling mmu->get_pdptr() (which can sleep) while holding the vcpu->kvm->mmu_lock spinlock, leading to sleep-in-atomic bugs.
--

commit f2009072d01cd7e9d707fa7cab9918b457bec0aa
Author: Sean Christopherson <seanjc@google.com>

KVM: x86/mmu: Check PDPTRs before allocating PAE roots

This commit adds a validity check for PDPTRs before allocating any of the PAE
roots. It iterates over the PDPTRs and checks if they are valid to prevent KVM
from leaking previously allocated roots when a bad PDPTR is encountered.

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3308,7 +3308,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
>  static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_mmu *mmu = vcpu->arch.mmu;
> -	u64 pdptr, pm_mask;
> +	u64 pdptrs[4], pm_mask;
>  	gfn_t root_gfn, root_pgd;
>  	int quadrant, i;
>  	hpa_t root;
> @@ -3319,6 +3319,17 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  	if (mmu_check_root(vcpu, root_gfn))
>  		return 1;
>  
> +	if (mmu->root_level == PT32E_ROOT_LEVEL) {
> +		for (i = 0; i < 4; ++i) {
> +			pdptrs[i] = mmu->get_pdptr(vcpu, i);

[Severity: High]
Does this sequence call mmu->get_pdptr() while holding a spinlock?

Looking at kvm_mmu_load(), it calls mmu_alloc_shadow_roots() while holding
vcpu->kvm->mmu_lock:

kvm_mmu_load()
    spin_lock(&vcpu->kvm->mmu_lock);
    ...
    r = mmu_alloc_shadow_roots(vcpu);

If the vCPU is running a nested guest (L2) with a 32-bit PAE NPT, the callback
for mmu->get_pdptr() is nested_svm_get_tdp_pdptr(). This function reads guest
memory via kvm_vcpu_read_guest_page(), which can sleep.

Can this lead to a sleep-in-atomic issue when the lock is held?

> +			if (!(pdptrs[i] & PT_PRESENT_MASK))
> +				continue;
> +
> +			if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
> +				return 1;
> +		}
> +	}
> +
>  	/*
>  	 * Do we shadow a long mode page table? If so we need to
>  	 * write-protect the guests page table root.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260626112634.1778506-1-pbonzini@redhat.com?part=12

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5.10.y 13/17] KVM: x86/mmu: pull call to drop_large_spte() into __link_shadow_page()
  2026-06-26 11:26 ` [PATCH 5.10.y 13/17] KVM: x86/mmu: pull call to drop_large_spte() into __link_shadow_page() Paolo Bonzini
@ 2026-06-26 14:01   ` sashiko-bot
  0 siblings, 0 replies; 24+ messages in thread
From: sashiko-bot @ 2026-06-26 14:01 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The newly added `cache` parameter in `__link_shadow_page()` is never used.
--

commit 58832cb7365080324e7906cc11df0e03073304a5
Author: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: pull call to drop_large_spte() into __link_shadow_page()

This commit moves the dropping of large SPTEs from the callers into
__link_shadow_page() to happen just before the shadow page is installed.
It also modifies kvm_mmu_get_child_sp() to return an error pointer if the
shadow page already exists, preventing duplicate links.

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5df1cd5bff1b9..47c5c3613b682 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c

[ ... ]

> @@ -2208,13 +2201,21 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
>  	__shadow_walk_next(iterator, *iterator->sptep);
>  }
>  
> -static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
> -			     struct kvm_mmu_page *sp)
> +static void __link_shadow_page(struct kvm_vcpu *vcpu,
> +			       struct kvm_mmu_memory_cache *cache, u64 *sptep,

[Severity: Low]
This isn't a bug, but is the new cache parameter intended to be used?

Looking at the rest of this function, the cache argument is never referenced.
Instead, nested operations like mmu_page_add_parent_pte() continue to
implicitly use vcpu->arch.mmu_pte_list_desc_cache from the vcpu struct.

> +			       struct kvm_mmu_page *sp)
>  {
>  	u64 spte;
>  
>  	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260626112634.1778506-1-pbonzini@redhat.com?part=13

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5.10.y 17/17] KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level
  2026-06-26 11:26 ` [PATCH 5.10.y 17/17] KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level Paolo Bonzini
@ 2026-06-26 14:53   ` sashiko-bot
  0 siblings, 0 replies; 24+ messages in thread
From: sashiko-bot @ 2026-06-26 14:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Medium] The patch backports a fix for an out-of-bounds array access to a kernel version where the buggy function does not exist, introducing dead logic.

Pre-existing issues:
- [Critical] KVM flushes the wrong GFN when zapping SPTEs, leaving stale EPT/TLB entries for the actual guest memory.
--

commit 1fc28b2436d7c174304f3fc391cd3180566514b7
Author: Sean Christopherson <seanjc@google.com>

KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level

This patch adds a bounds check to verify that a shadow page's base GFN is
within the target memslot when checking for collapsible SPTEs. This is intended
to prevent out-of-bounds accesses to the slot's lpage_info during hugepage
recovery.

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 39186d6952696..b7e8618d3df52 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5659,13 +5659,20 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>  		pfn = spte_to_pfn(*sptep);
>  
>  		/*
> -		 * We cannot do huge page mapping for indirect shadow pages,
[ ... ]
> +		 * Direct shadow page can be replaced by a hugepage if the host
> +		 * mapping level allows it and the memslot maps all of the host
> +		 * hugepage.  Note!  If the memslot maps only part of the
> +		 * hugepage, sp->gfn may be below slot->base_gfn, and querying
> +		 * the max mapping level would cause an out-of-bounds lpage_info
> +		 * access.  So the gfn bounds check *must* be done first.
> +		 *
> +		 * Indirect shadow pages are created when the guest page tables
> +		 * are using 4K pages.  Since the host mapping is always
> +		 * constrained by the page size in the guest, indirect shadow
> +		 * pages are never collapsible.
>  		 */
> -		if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
> +		if (sp->role.direct && is_gfn_in_memslot(slot, sp->gfn) &&
> +		    !kvm_is_reserved_pfn(pfn) &&
> +		    (kvm_is_zone_device_pfn(pfn) ||
> +		     PageCompound(pfn_to_page(pfn)))) {
>  			pte_list_remove(rmap_head, sptep);

[Severity: Medium]
The commit message mentions that querying the max mapping level causes an
out-of-bounds lpage_info access, but looking at kvm_mmu_zap_collapsible_spte()
in this kernel version, neither kvm_mmu_max_mapping_level() nor lpage_info are
accessed here.

Does this added check just prevent KVM from collapsing 4KB pages into a 2MB
hugepage if the 2MB region crosses the memslot boundary, without actually fixing
an out-of-bounds access in this code path?

[Severity: Critical]
This is a pre-existing issue, but does this code flush the correct GFN?

In kvm_mmu_zap_collapsible_spte(), just below this change, the flush uses
sp->gfn:

			if (kvm_available_flush_tlb_with_range())
				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
					KVM_PAGES_PER_HPAGE(sp->role.level));

sp->gfn is the base GFN of the shadow page, not the actual data page GFN mapped
by the zapped SPTE. If KVM_PAGES_PER_HPAGE(sp->role.level) is 1 (for a level-1
shadow page), wouldn't this flush only the base GFN and leave stale TLB entries
for the actual guest memory?

(Similar pre-existing issues appear to exist in rmap_recycle() and
FNAME(sync_page)() which also flush sp->gfn instead of the actual data page GFN.)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260626112634.1778506-1-pbonzini@redhat.com?part=17

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5.10.y 04/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots
  2026-06-26 11:26 ` [PATCH 5.10.y 04/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Paolo Bonzini
  2026-06-26 12:26   ` sashiko-bot
@ 2026-06-26 17:54   ` Sasha Levin
  1 sibling, 0 replies; 24+ messages in thread
From: Sasha Levin @ 2026-06-26 17:54 UTC (permalink / raw)
  To: linux-kernel, kvm, stable
  Cc: Sasha Levin, Sean Christopherson, Ben Gardon, Paolo Bonzini

> [PATCH 5.10.y 04/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots

On 5.10 mmu_lock is still a spinlock. Hoisting it into kvm_mmu_load() means
mmu_alloc_shadow_roots() -> mmu->get_pdptr() -> kvm_vcpu_read_guest_page()
can sleep under the spinlock on the nested-SVM PAE path. The fix for that,
4a38162ee9f1 ("KVM: MMU: load PDPTRs outside mmu_lock"), isn't in the series.

Do we need that one too?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-06-26 17:55 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 11:26 [PATCH 5.10.y 00/17] KVM: fixes for CVE-2026-46113 and related issues Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 01/17] KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 02/17] KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 03/17] KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 04/17] KVM: x86/mmu: Ensure MMU pages are available when allocating roots Paolo Bonzini
2026-06-26 12:26   ` sashiko-bot
2026-06-26 17:54   ` Sasha Levin
2026-06-26 11:26 ` [PATCH 5.10.y 05/17] KVM: x86/mmu: Use a bool for direct Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 06/17] KVM: x86/mmu: Stop passing "direct" to mmu_alloc_root() Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 07/17] KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce indentation Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 08/17] KVM: X86: Fix missed remote tlb flush in rmap_write_protect() Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 09/17] KVM: X86: Synchronize the shadow pagetable before link it Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 10/17] KVM: x86/mmu: Derive shadow MMU page role from parent Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 11/17] KVM: x86/mmu: Always pass 0 for @quadrant when gptes are 8 bytes Paolo Bonzini
2026-06-26 13:28   ` sashiko-bot
2026-06-26 11:26 ` [PATCH 5.10.y 12/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots Paolo Bonzini
2026-06-26 13:44   ` sashiko-bot
2026-06-26 11:26 ` [PATCH 5.10.y 13/17] KVM: x86/mmu: pull call to drop_large_spte() into __link_shadow_page() Paolo Bonzini
2026-06-26 14:01   ` sashiko-bot
2026-06-26 11:26 ` [PATCH 5.10.y 14/17] KVM: x86: Fix shadow paging use-after-free due to unexpected GFN Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 15/17] KVM: x86: Fix shadow paging use-after-free due to unexpected role Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 16/17] KVM: x86/mmu: Pass the memslot to the rmap callbacks Paolo Bonzini
2026-06-26 11:26 ` [PATCH 5.10.y 17/17] KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level Paolo Bonzini
2026-06-26 14:53   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.