public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Anup Patel <anup@brainfault.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Sean Christopherson <seanjc@google.com>,
	Andrew Jones <drjones@redhat.com>,
	Ben Gardon <bgardon@google.com>, Peter Xu <peterx@redhat.com>,
	maciej.szmigiero@oracle.com,
	"moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" 
	<kvmarm@lists.cs.columbia.edu>,
	"open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" 
	<linux-mips@vger.kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" 
	<kvm@vger.kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)" 
	<kvm-riscv@lists.infradead.org>,
	Peter Feiner <pfeiner@google.com>,
	David Matlack <dmatlack@google.com>
Subject: [PATCH v3 16/23] KVM: x86/mmu: Cache the access bits of shadowed translations
Date: Fri,  1 Apr 2022 17:55:47 +0000	[thread overview]
Message-ID: <20220401175554.1931568-17-dmatlack@google.com> (raw)
In-Reply-To: <20220401175554.1931568-1-dmatlack@google.com>

In order to split a huge page we need to know what access bits to assign
to the role of the new child page table. This can't be easily derived
from the huge page SPTE itself since KVM applies its own access policies
on top, such as for HugePage NX.

We could walk the guest page tables to determine the correct access
bits, but that is difficult to plumb outside of a vCPU fault context.
Instead, we can store the original access bits for each leaf SPTE
alongside the GFN in the gfns array. The access bits only take up 3
bits, which leaves 61 bits left over for gfns, which is more than
enough. So this change does not require any additional memory.

In order to keep the access bit cache in sync with the guest, we have to
extend FNAME(sync_page) to also update the access bits.

Now that the gfns array caches more information than just GFNs, rename
it to shadowed_translation.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu/mmu.c          | 71 ++++++++++++++++++++++++++++-----
 arch/x86/kvm/mmu/mmu_internal.h | 20 +++++++++-
 arch/x86/kvm/mmu/paging_tmpl.h  |  8 +++-
 4 files changed, 85 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9694dd5e6ccc..be4349c9ffea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -696,7 +696,7 @@ struct kvm_vcpu_arch {
 
 	struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
 	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
-	struct kvm_mmu_memory_cache mmu_gfn_array_cache;
+	struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
 
 	/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5e1002d57689..3a425ed80e23 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -708,7 +708,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 	if (r)
 		return r;
 	if (maybe_indirect) {
-		r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
+		r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadowed_info_cache,
 					       PT64_ROOT_MAX_LEVEL);
 		if (r)
 			return r;
@@ -721,7 +721,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
+	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
 
@@ -733,7 +733,7 @@ static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
 static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 {
 	if (!sp->role.direct)
-		return sp->gfns[index];
+		return sp->shadowed_translation[index].gfn;
 
 	return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS));
 }
@@ -741,7 +741,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
 {
 	if (!sp->role.direct) {
-		sp->gfns[index] = gfn;
+		sp->shadowed_translation[index].gfn = gfn;
 		return;
 	}
 
@@ -752,6 +752,47 @@ static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
 				   kvm_mmu_page_get_gfn(sp, index), gfn);
 }
 
+static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index, u32 access)
+{
+	if (!sp->role.direct) {
+		sp->shadowed_translation[index].access = access;
+		return;
+	}
+
+	if (WARN_ON(access != sp->role.access))
+		pr_err_ratelimited("access mismatch under direct page %llx "
+				   "(expected %llx, got %llx)\n",
+				   kvm_mmu_page_get_gfn(sp, index),
+				   sp->role.access, access);
+}
+
+/*
+ * For leaf SPTEs, fetch the *guest* access permissions being shadowed. Note
+ * that the SPTE itself may have a more constrained access permissions that
+ * what the guest enforces. For example, a guest may create an executable
+ * huge PTE but KVM may disallow execution to mitigate iTLB multihit.
+ */
+static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
+{
+	if (!sp->role.direct)
+		return sp->shadowed_translation[index].access;
+
+	/*
+	 * For direct MMUs (e.g. TDP or non-paging guests) there are no *guest*
+	 * access permissions being shadowed. So we can just return ACC_ALL
+	 * here.
+	 *
+	 * For indirect MMUs (shadow paging), direct shadow pages exist when KVM
+	 * is shadowing a guest huge page with smaller pages, since the guest
+	 * huge page is being directly mapped. In this case the guest access
+	 * permissions being shadowed are the access permissions of the huge
+	 * page.
+	 *
+	 * In both cases, sp->role.access contains exactly what we want.
+	 */
+	return sp->role.access;
+}
+
 /*
  * Return the pointer to the large page information for a given gfn,
  * handling slots that are not large page aligned.
@@ -1594,7 +1635,7 @@ static bool kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 static void __rmap_add(struct kvm *kvm,
 		       struct kvm_mmu_memory_cache *cache,
 		       const struct kvm_memory_slot *slot,
-		       u64 *spte, gfn_t gfn)
+		       u64 *spte, gfn_t gfn, u32 access)
 {
 	struct kvm_mmu_page *sp;
 	struct kvm_rmap_head *rmap_head;
@@ -1602,6 +1643,7 @@ static void __rmap_add(struct kvm *kvm,
 
 	sp = sptep_to_sp(spte);
 	kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
+	kvm_mmu_page_set_access(sp, spte - sp->spt, access);
 	kvm_update_page_stats(kvm, sp->role.level, 1);
 
 	rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
@@ -1615,9 +1657,9 @@ static void __rmap_add(struct kvm *kvm,
 }
 
 static void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
-		     u64 *spte, gfn_t gfn)
+		     u64 *spte, gfn_t gfn, u32 access)
 {
-	__rmap_add(vcpu->kvm, &vcpu->arch.mmu_pte_list_desc_cache, slot, spte, gfn);
+	__rmap_add(vcpu->kvm, &vcpu->arch.mmu_pte_list_desc_cache, slot, spte, gfn, access);
 }
 
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -1678,7 +1720,7 @@ void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
 {
 	free_page((unsigned long)sp->spt);
 	if (!sp->role.direct)
-		free_page((unsigned long)sp->gfns);
+		free_page((unsigned long)sp->shadowed_translation);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
 
@@ -1715,8 +1757,12 @@ struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm_vcpu *vcpu, bool direc
 
 	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+
+	BUILD_BUG_ON(sizeof(sp->shadowed_translation[0]) != sizeof(u64));
+
 	if (!direct)
-		sp->gfns = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);
+		sp->shadowed_translation =
+			kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadowed_info_cache);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
@@ -1738,7 +1784,7 @@ static inline gfp_t gfp_flags_for_split(bool locked)
  *
  * Huge page splitting always uses direct shadow pages since the huge page is
  * being mapped directly with a lower level page table. Thus there's no need to
- * allocate the gfns array.
+ * allocate the shadowed_translation array.
  */
 struct kvm_mmu_page *kvm_mmu_alloc_direct_sp_for_split(bool locked)
 {
@@ -2841,7 +2887,10 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 
 	if (!was_rmapped) {
 		WARN_ON_ONCE(ret == RET_PF_SPURIOUS);
-		rmap_add(vcpu, slot, sptep, gfn);
+		rmap_add(vcpu, slot, sptep, gfn, pte_access);
+	} else {
+		/* Already rmapped but the pte_access bits may have changed. */
+		kvm_mmu_page_set_access(sp, sptep - sp->spt, pte_access);
 	}
 
 	return ret;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index b6e22ba9c654..3f76f4c1ae59 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -32,6 +32,18 @@ extern bool dbg;
 
 typedef u64 __rcu *tdp_ptep_t;
 
+/*
+ * Stores the result of the guest translation being shadowed by an SPTE. KVM
+ * shadows two types of guest translations: nGPA -> GPA (shadow EPT/NPT) and
+ * GVA -> GPA (traditional shadow paging). In both cases the result of the
+ * translation is a GPA and a set of access constraints.
+ */
+struct shadowed_translation_entry {
+	/* Note, GFNs can have at most 64 - PAGE_SHIFT = 52 bits. */
+	u64 gfn:52;
+	u64 access:3;
+};
+
 struct kvm_mmu_page {
 	/*
 	 * Note, "link" through "spt" fit in a single 64 byte cache line on
@@ -53,8 +65,12 @@ struct kvm_mmu_page {
 	gfn_t gfn;
 
 	u64 *spt;
-	/* hold the gfn of each spte inside spt */
-	gfn_t *gfns;
+	/*
+	 * Caches the result of the intermediate guest translation being
+	 * shadowed by each SPTE. NULL for direct shadow pages.
+	 */
+	struct shadowed_translation_entry *shadowed_translation;
+
 	/* Currently serving as active root */
 	union {
 		int root_count;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index db63b5377465..91c2088464ce 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1014,7 +1014,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 
 /*
- * Using the cached information from sp->gfns is safe because:
+ * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()
+ * and kvm_mmu_page_get_access()) is safe because:
  * - The spte has a reference to the struct page, so the pfn for a given gfn
  *   can't change unless all sptes pointing to it are nuked first.
  *
@@ -1088,12 +1089,15 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
 			continue;
 
-		if (gfn != sp->gfns[i]) {
+		if (gfn != kvm_mmu_page_get_gfn(sp, i)) {
 			drop_spte(vcpu->kvm, &sp->spt[i]);
 			flush = true;
 			continue;
 		}
 
+		if (pte_access != kvm_mmu_page_get_access(sp, i))
+			kvm_mmu_page_set_access(sp, i, pte_access);
+
 		sptep = &sp->spt[i];
 		spte = *sptep;
 		host_writable = spte & shadow_host_writable_mask;
-- 
2.35.1.1094.g7c7d902a7c-goog


  parent reply	other threads:[~2022-04-01 17:56 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-01 17:55 [PATCH v3 00/23] KVM: Extend Eager Page Splitting to the shadow MMU David Matlack
2022-04-01 17:55 ` [PATCH v3 01/23] KVM: x86/mmu: Optimize MMU page cache lookup for all direct SPs David Matlack
2022-04-01 17:55 ` [PATCH v3 02/23] KVM: x86/mmu: Use a bool for direct David Matlack
2022-04-08 22:24   ` Sean Christopherson
2022-04-01 17:55 ` [PATCH v3 03/23] KVM: x86/mmu: Derive shadow MMU page role from parent David Matlack
2022-04-01 17:55 ` [PATCH v3 04/23] KVM: x86/mmu: Decompose kvm_mmu_get_page() into separate functions David Matlack
2022-04-01 17:55 ` [PATCH v3 05/23] KVM: x86/mmu: Rename shadow MMU functions that deal with shadow pages David Matlack
2022-04-01 17:55 ` [PATCH v3 06/23] KVM: x86/mmu: Pass memslot to kvm_mmu_new_shadow_page() David Matlack
2022-04-01 17:55 ` [PATCH v3 07/23] KVM: x86/mmu: Separate shadow MMU sp allocation from initialization David Matlack
2022-04-01 17:55 ` [PATCH v3 08/23] KVM: x86/mmu: Link spt to sp during allocation David Matlack
2022-04-01 17:55 ` [PATCH v3 09/23] KVM: x86/mmu: Move huge page split sp allocation code to mmu.c David Matlack
2022-04-01 17:55 ` [PATCH v3 10/23] KVM: x86/mmu: Use common code to free kvm_mmu_page structs David Matlack
2022-04-01 17:55 ` [PATCH v3 11/23] KVM: x86/mmu: Use common code to allocate shadow pages from vCPU caches David Matlack
2022-04-01 17:55 ` [PATCH v3 12/23] KVM: x86/mmu: Pass const memslot to rmap_add() David Matlack
2022-04-01 17:55 ` [PATCH v3 13/23] KVM: x86/mmu: Pass const memslot to init_shadow_page() and descendants David Matlack
2022-04-01 17:55 ` [PATCH v3 14/23] KVM: x86/mmu: Decouple rmap_add() and link_shadow_page() from kvm_vcpu David Matlack
2022-04-01 17:55 ` [PATCH v3 15/23] KVM: x86/mmu: Update page stats in __rmap_add() David Matlack
2022-04-01 17:55 ` David Matlack [this message]
2022-04-09  0:02   ` [PATCH v3 16/23] KVM: x86/mmu: Cache the access bits of shadowed translations Sean Christopherson
2022-04-14 16:47     ` David Matlack
2022-04-01 17:55 ` [PATCH v3 17/23] KVM: x86/mmu: Extend make_huge_page_split_spte() for the shadow MMU David Matlack
2022-04-01 17:55 ` [PATCH v3 18/23] KVM: x86/mmu: Zap collapsible SPTEs at all levels in " David Matlack
2022-04-01 17:55 ` [PATCH v3 19/23] KVM: x86/mmu: Refactor drop_large_spte() David Matlack
2022-04-01 17:55 ` [PATCH v3 20/23] KVM: Allow for different capacities in kvm_mmu_memory_cache structs David Matlack
2022-04-20 10:55   ` Anup Patel
2022-04-21 16:19   ` Ben Gardon
2022-04-21 16:33     ` David Matlack
2022-04-01 17:55 ` [PATCH v3 21/23] KVM: Allow GFP flags to be passed when topping up MMU caches David Matlack
2022-04-01 17:55 ` [PATCH v3 22/23] KVM: x86/mmu: Support Eager Page Splitting in the shadow MMU David Matlack
2022-04-09  0:39   ` Sean Christopherson
2022-04-14 16:50     ` David Matlack
2022-04-01 17:55 ` [PATCH v3 23/23] KVM: selftests: Map x86_64 guest virtual memory with huge pages David Matlack
2022-04-11 17:12 ` [PATCH v3 00/23] KVM: Extend Eager Page Splitting to the shadow MMU Sean Christopherson
2022-04-11 17:54   ` David Matlack
2022-04-11 20:12     ` Sean Christopherson
2022-04-11 23:41       ` David Matlack
2022-04-12  0:39         ` Sean Christopherson
2022-04-12 16:49           ` David Matlack
2022-04-13  1:02             ` Sean Christopherson
2022-04-13 17:57               ` David Matlack
2022-04-13 18:28                 ` Sean Christopherson
2022-04-13 21:22                   ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220401175554.1931568-17-dmatlack@google.com \
    --to=dmatlack@google.com \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=bgardon@google.com \
    --cc=chenhuacai@kernel.org \
    --cc=drjones@redhat.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-mips@vger.kernel.org \
    --cc=maciej.szmigiero@oracle.com \
    --cc=maz@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pfeiner@google.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox