[PATCH 22/23] KVM: x86/mmu: Split huge pages aliased by multiple SPTEs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	leksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Peter Xu <peterx@redhat.com>, Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Peter Feiner <pfeiner@google.com>,
	Andrew Jones <drjones@redhat.com>,
	maciej.szmigiero@oracle.com, kvm@vger.kernel.org,
	David Matlack <dmatlack@google.com>
Subject: [PATCH 22/23] KVM: x86/mmu: Split huge pages aliased by multiple SPTEs
Date: Thu,  3 Feb 2022 01:00:50 +0000	[thread overview]
Message-ID: <20220203010051.2813563-23-dmatlack@google.com> (raw)
In-Reply-To: <20220203010051.2813563-1-dmatlack@google.com>

The existing huge page splitting code bails if it encounters a huge page
that is aliased by another SPTE that has already been split (either due
to NX huge pages or eager page splitting). Extend the huge page
splitting code to also handle such aliases.

The thing we have to be careful about is dealing with what's already in
the lower level page table. If eager page splitting was the only
operation that split huge pages, this would be fine. However huge pages
can also be split by NX huge pages. This means the lower level page
table may only be partially filled in and may point to even lower level
page tables that are partially filled in. We can fill in the rest of the
page table but dealing with the lower level page tables would be too
complex.

To handle this we flush TLBs after dropping the huge SPTE whenever we
are about to install a lower level page table that was partially filled
in (*). We can skip the TLB flush if the lower level page table was
empty (no aliasing) or identical to what we were already going to
populate it with (aliased huge page that was just eagerly split).

(*) This TLB flush could probably be delayed until we're about to drop
the MMU lock, which would also let us batch flushes for multiple splits.
However such scenarios should be rare in practice (a huge page must be
aliased in multiple SPTEs and have been split for NX Huge Pages in only
some of them). Flushing immediately is simpler to plumb and also reduces
the chances of tripping over a CPU bug (e.g. see iTLB multi-hit).

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h |  5 ++-
 arch/x86/kvm/mmu/mmu.c          | 77 +++++++++++++++------------------
 2 files changed, 38 insertions(+), 44 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a0f7578f7a26..c11f27f38981 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1237,9 +1237,10 @@ struct kvm_arch {
 	 * Memory cache used to allocate pte_list_desc structs while splitting
 	 * huge pages. In the worst case, to split one huge page we need 512
 	 * pte_list_desc structs to add each new lower level leaf sptep to the
-	 * memslot rmap.
+	 * memslot rmap plus 1 to extend the parent_ptes rmap of the new lower
+	 * level page table.
 	 */
-#define HUGE_PAGE_SPLIT_DESC_CACHE_CAPACITY 512
+#define HUGE_PAGE_SPLIT_DESC_CACHE_CAPACITY 513
 	__DEFINE_KVM_MMU_MEMORY_CACHE(huge_page_split_desc_cache,
 				      HUGE_PAGE_SPLIT_DESC_CACHE_CAPACITY);
 };
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c7981a934237..62fbff8979ba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6056,7 +6056,8 @@ static int min_descs_for_split(const struct kvm_memory_slot *slot, u64 *huge_spt
 		gfn += KVM_PAGES_PER_HPAGE(split_level);
 	}
 
-	return min;
+	/* Plus 1 to extend the parent_ptes rmap of the lower level SP. */
+	return min + 1;
 }
 
 static int topup_huge_page_split_desc_cache(struct kvm *kvm, int min, gfp_t gfp)
@@ -6126,6 +6127,7 @@ static struct kvm_mmu_page *kvm_mmu_get_sp_for_split(struct kvm *kvm,
 	struct kvm_mmu_page *huge_sp = sptep_to_sp(huge_sptep);
 	struct kvm_mmu_page *split_sp;
 	union kvm_mmu_page_role role;
+	bool created = false;
 	unsigned int access;
 	gfn_t gfn;
 
@@ -6138,25 +6140,21 @@ static struct kvm_mmu_page *kvm_mmu_get_sp_for_split(struct kvm *kvm,
 	 */
 	role = kvm_mmu_child_role(huge_sp, true, access);
 	split_sp = kvm_mmu_get_existing_direct_sp(kvm, gfn, role);
-
-	/*
-	 * Opt not to split if the lower-level SP already exists. This requires
-	 * more complex handling as the SP may be already partially filled in
-	 * and may need extra pte_list_desc structs to update parent_ptes.
-	 */
 	if (split_sp)
-		return NULL;
+		goto out;
 
+	created = true;
 	swap(split_sp, *spp);
 	kvm_mmu_init_sp(kvm, split_sp, slot, gfn, role);
-	trace_kvm_mmu_get_page(split_sp, true);
 
+out:
+	trace_kvm_mmu_get_page(split_sp, created);
 	return split_sp;
 }
 
-static int kvm_mmu_split_huge_page(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   u64 *huge_sptep, struct kvm_mmu_page **spp)
+static void kvm_mmu_split_huge_page(struct kvm *kvm,
+				    const struct kvm_memory_slot *slot,
+				    u64 *huge_sptep, struct kvm_mmu_page **spp)
 
 {
 	struct kvm_mmu_memory_cache *cache;
@@ -6164,22 +6162,11 @@ static int kvm_mmu_split_huge_page(struct kvm *kvm,
 	u64 huge_spte, split_spte;
 	int split_level, index;
 	unsigned int access;
+	bool flush = false;
 	u64 *split_sptep;
 	gfn_t split_gfn;
 
 	split_sp = kvm_mmu_get_sp_for_split(kvm, slot, huge_sptep, spp);
-	if (!split_sp)
-		return -EOPNOTSUPP;
-
-	/*
-	 * We did not allocate an extra pte_list_desc struct to add huge_sptep
-	 * to split_sp->parent_ptes. An extra pte_list_desc struct should never
-	 * be necessary in practice though since split_sp is brand new.
-	 *
-	 * Note, this makes it safe to pass NULL to __link_shadow_page() below.
-	 */
-	if (WARN_ON_ONCE(pte_list_need_new_desc(&split_sp->parent_ptes)))
-		return -EINVAL;
 
 	huge_spte = READ_ONCE(*huge_sptep);
 
@@ -6191,7 +6178,20 @@ static int kvm_mmu_split_huge_page(struct kvm *kvm,
 		split_sptep = &split_sp->spt[index];
 		split_gfn = kvm_mmu_page_get_gfn(split_sp, index);
 
-		BUG_ON(is_shadow_present_pte(*split_sptep));
+		/*
+		 * split_sp may have populated page table entries if this huge
+		 * page is aliased in multiple shadow page table entries. We
+		 * know the existing SP will be mapping the same GFN->PFN
+		 * translation since this is a direct SP. However, the SPTE may
+		 * point to an even lower level page table that may only be
+		 * partially filled in (e.g. for NX huge pages). In other words,
+		 * we may be unmapping a portion of the huge page, which
+		 * requires a TLB flush.
+		 */
+		if (is_shadow_present_pte(*split_sptep)) {
+			flush |= !is_last_spte(*split_sptep, split_level);
+			continue;
+		}
 
 		split_spte = make_huge_page_split_spte(
 				huge_spte, split_level + 1, index, access);
@@ -6202,16 +6202,12 @@ static int kvm_mmu_split_huge_page(struct kvm *kvm,
 
 	/*
 	 * Replace the huge spte with a pointer to the populated lower level
-	 * page table. Since we are making this change without a TLB flush vCPUs
-	 * will see a mix of the split mappings and the original huge mapping,
-	 * depending on what's currently in their TLB. This is fine from a
-	 * correctness standpoint since the translation will be the same either
-	 * way.
+	 * page table. If the lower-level page table indentically maps the huge
+	 * page, there's no need for a TLB flush. Otherwise, flush TLBs after
+	 * dropping the huge page and before installing the shadow page table.
 	 */
-	drop_large_spte(kvm, huge_sptep, false);
-	__link_shadow_page(NULL, huge_sptep, split_sp);
-
-	return 0;
+	drop_large_spte(kvm, huge_sptep, flush);
+	__link_shadow_page(cache, huge_sptep, split_sp);
 }
 
 static bool should_split_huge_page(u64 *huge_sptep)
@@ -6266,16 +6262,13 @@ static bool rmap_try_split_huge_pages(struct kvm *kvm,
 		if (dropped_lock)
 			goto restart;
 
-		r = kvm_mmu_split_huge_page(kvm, slot, huge_sptep, &sp);
-
-		trace_kvm_mmu_split_huge_page(gfn, spte, level, r);
-
 		/*
-		 * If splitting is successful we must restart the iterator
-		 * because huge_sptep has just been removed from it.
+		 * After splitting we must restart the iterator because
+		 * huge_sptep has just been removed from it.
 		 */
-		if (!r)
-			goto restart;
+		kvm_mmu_split_huge_page(kvm, slot, huge_sptep, &sp);
+		trace_kvm_mmu_split_huge_page(gfn, spte, level, 0);
+		goto restart;
 	}
 
 	if (sp)
-- 
2.35.0.rc2.247.g8bbb082509-goog

next prev parent reply	other threads:[~2022-02-03  1:01 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-03  1:00 [PATCH 00/23] Extend Eager Page Splitting to the shadow MMU David Matlack
2022-02-03  1:00 ` [PATCH 01/23] KVM: x86/mmu: Optimize MMU page cache lookup for all direct SPs David Matlack
2022-02-19  0:57   ` Sean Christopherson
2022-02-03  1:00 ` [PATCH 02/23] KVM: x86/mmu: Derive shadow MMU page role from parent David Matlack
2022-02-19  1:14   ` Sean Christopherson
2022-02-24 18:45     ` David Matlack
2022-03-04  0:22     ` David Matlack
2022-02-03  1:00 ` [PATCH 03/23] KVM: x86/mmu: Decompose kvm_mmu_get_page() into separate functions David Matlack
2022-02-19  1:25   ` Sean Christopherson
2022-02-24 18:54     ` David Matlack
2022-02-03  1:00 ` [PATCH 04/23] KVM: x86/mmu: Rename shadow MMU functions that deal with shadow pages David Matlack
2022-02-03  1:00 ` [PATCH 05/23] KVM: x86/mmu: Pass memslot to kvm_mmu_create_sp() David Matlack
2022-02-03  1:00 ` [PATCH 06/23] KVM: x86/mmu: Separate shadow MMU sp allocation from initialization David Matlack
2022-02-16 19:37   ` Ben Gardon
2022-02-16 21:42     ` David Matlack
2022-02-03  1:00 ` [PATCH 07/23] KVM: x86/mmu: Move huge page split sp allocation code to mmu.c David Matlack
2022-02-03  1:00 ` [PATCH 08/23] KVM: x86/mmu: Use common code to free kvm_mmu_page structs David Matlack
2022-02-03  1:00 ` [PATCH 09/23] KVM: x86/mmu: Use common code to allocate kvm_mmu_page structs from vCPU caches David Matlack
2022-02-03  1:00 ` [PATCH 10/23] KVM: x86/mmu: Pass const memslot to rmap_add() David Matlack
2022-02-23 23:25   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 11/23] KVM: x86/mmu: Pass const memslot to kvm_mmu_init_sp() and descendants David Matlack
2022-02-23 23:27   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 12/23] KVM: x86/mmu: Decouple rmap_add() and link_shadow_page() from kvm_vcpu David Matlack
2022-02-23 23:30   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 13/23] KVM: x86/mmu: Update page stats in __rmap_add() David Matlack
2022-02-23 23:32   ` Ben Gardon
2022-02-23 23:35     ` Ben Gardon
2022-02-03  1:00 ` [PATCH 14/23] KVM: x86/mmu: Cache the access bits of shadowed translations David Matlack
2022-02-28 20:30   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 15/23] KVM: x86/mmu: Pass access information to make_huge_page_split_spte() David Matlack
2022-02-28 20:32   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 16/23] KVM: x86/mmu: Zap collapsible SPTEs at all levels in the shadow MMU David Matlack
2022-02-28 20:39   ` Ben Gardon
2022-03-03 19:42     ` David Matlack
2022-02-03  1:00 ` [PATCH 17/23] KVM: x86/mmu: Pass bool flush parameter to drop_large_spte() David Matlack
2022-02-28 20:47   ` Ben Gardon
2022-03-03 19:52     ` David Matlack
2022-02-03  1:00 ` [PATCH 18/23] KVM: x86/mmu: Extend Eager Page Splitting to the shadow MMU David Matlack
2022-02-28 21:09   ` Ben Gardon
2022-02-28 23:29     ` David Matlack
2022-02-03  1:00 ` [PATCH 19/23] KVM: Allow for different capacities in kvm_mmu_memory_cache structs David Matlack
2022-02-24 11:28   ` Marc Zyngier
2022-02-24 19:20     ` David Matlack
2022-03-04 21:59       ` David Matlack
2022-03-04 22:24         ` David Matlack
2022-03-05 16:55         ` Marc Zyngier
2022-03-07 23:49           ` David Matlack
2022-03-08  7:42             ` Marc Zyngier
2022-03-09 21:49             ` David Matlack
2022-03-10  8:30               ` Marc Zyngier
2022-02-03  1:00 ` [PATCH 20/23] KVM: Allow GFP flags to be passed when topping up MMU caches David Matlack
2022-02-28 21:12   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 21/23] KVM: x86/mmu: Fully split huge pages that require extra pte_list_desc structs David Matlack
2022-02-28 21:22   ` Ben Gardon
2022-02-28 23:41     ` David Matlack
2022-03-01  0:37       ` Ben Gardon
2022-03-03 19:59         ` David Matlack
2022-02-03  1:00 ` David Matlack [this message]
2022-02-03  1:00 ` [PATCH 23/23] KVM: selftests: Map x86_64 guest virtual memory with huge pages David Matlack
2022-03-07  5:21 ` [PATCH 00/23] Extend Eager Page Splitting to the shadow MMU Peter Xu
2022-03-07 23:39   ` David Matlack
2022-03-09  7:31     ` Peter Xu
2022-03-09 23:39       ` David Matlack
2022-03-10  7:03         ` Peter Xu
2022-03-10 19:26           ` David Matlack

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a0f7578f7a2 dfblob:c11f27f3898 dfblob:c7981a93423
dfblob:62fbff8979b )
 OR (
bs:"[PATCH 22/23] KVM: x86/mmu: Split huge pages aliased by multiple SPTEs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220203010051.2813563-23-dmatlack@google.com \
    --to=dmatlack@google.com \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=chenhuacai@kernel.org \
    --cc=drjones@redhat.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=maciej.szmigiero@oracle.com \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pfeiner@google.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.