From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org
Cc: kai.huang@intel.com, dmatlack@google.com, erdemaktas@google.com,
isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org,
sagis@google.com, yan.y.zhao@intel.com,
rick.p.edgecombe@intel.com,
Isaku Yamahata <isaku.yamahata@intel.com>
Subject: [PATCH v3 16/17] KVM: x86/tdp_mmu: Propagate tearing down mirror page tables
Date: Wed, 19 Jun 2024 15:36:13 -0700 [thread overview]
Message-ID: <20240619223614.290657-17-rick.p.edgecombe@intel.com> (raw)
In-Reply-To: <20240619223614.290657-1-rick.p.edgecombe@intel.com>
From: Isaku Yamahata <isaku.yamahata@intel.com>
Integrate hooks for mirroring page table operations for cases where TDX
will zap PTEs or free page tables.
Like other Coco technologies, TDX has the concept of private and shared
memory. For TDX the private and shared mappings are managed on separate
EPT roots. The private half is managed indirectly though calls into a
protected runtime environment called the TDX module, where the shared half
is managed within KVM in normal page tables.
Since calls into the TDX module are relatively slow, walking private page
tables by making calls into the TDX module would not be efficient. Because
of this, previous changes have taught the TDP MMU to keep a mirror root,
which is separate, unmapped TDP root that private operations can be
directed to. Currently this root is disconnected from the guest. Now add
plumbing to propagate changes to the "external" page tables being
mirrored. Just create the x86_ops for now, leave plumbing the operations
into the TDX module for future patches.
Add two operations for tearing down page tables, one for freeing page
tables (free_external_spt) and one for zapping PTEs (remove_external_spte).
Define them such that remove_external_spte will perform a TLB flush as
well. (in TDX terms "ensure there are no active translations").
TDX MMU support will exclude certain MMU operations, so only plug in the
mirroring x86 ops where they will be needed. For zapping/freeing, only
hook tdp_mmu_iter_set_spte() which is use used for mapping and linking
PTs. Don't bother hooking tdp_mmu_set_spte_atomic() as it is only used for
zapping PTEs in operations unsupported by TDX: zapping collapsible PTEs and
kvm_mmu_zap_all_fast().
In previous changes to address races around concurrent populating using
tdp_mmu_set_spte_atomic(), a solution was introduced to temporarily set
REMOVED_SPTE in the mirrored page tables while performing the external
operations. Such a solution is not needed for the tear down paths in TDX
as these will always be performed with the mmu_lock held for write.
Sprinkle some KVM_BUG_ON()s to reflect this.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
TDX MMU Prep v3:
- Rename mirrored->external (Paolo)
- Drop new_spte arg from reflect_removed_spte() (Paolo)
- ...and drop was_present and is_present bools (Paolo)
- Use base_gfn instead of sp->gfn (Paolo)
- Better comment on logic that bugs if doing tdp_mmu_set_spte() on
present PTE. (Paolo)
- Move comment around KVM_BUG_ON() in __tdp_mmu_set_spte_atomic() to this
patch, and add better comment. (Paolo)
- In remove_external_spte(), remove was_leaf bool, skip duplicates
present check and add comment.
- Rename REMOVED_SPTE to FROZEN_SPTE (Paolo)
TDX MMU Prep v2:
- Split from "KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU"
- Rename x86_ops from "private" to "reflect"
- In response to "sp->mirrored_spt" rename helpers to "mirrored"
- Remove unused present mirroring support in tdp_mmu_set_spte()
- Merge reflect_zap_spte() into reflect_remove_spte()
- Move mirror zapping logic out of handle_changed_spte()
- Add some KVM_BUG_ONs
---
arch/x86/include/asm/kvm-x86-ops.h | 2 ++
arch/x86/include/asm/kvm_host.h | 8 +++++
arch/x86/kvm/mmu/tdp_mmu.c | 51 +++++++++++++++++++++++++++++-
3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 3ef19fcb5e42..18a83b211c90 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -97,6 +97,8 @@ KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
KVM_X86_OP(load_mmu_pgd)
KVM_X86_OP_OPTIONAL(link_external_spt)
KVM_X86_OP_OPTIONAL(set_external_spte)
+KVM_X86_OP_OPTIONAL(free_external_spt)
+KVM_X86_OP_OPTIONAL(remove_external_spte)
KVM_X86_OP(has_wbinvd_exit)
KVM_X86_OP(get_l2_tsc_offset)
KVM_X86_OP(get_l2_tsc_multiplier)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 12ff04135a0e..dca623ffa903 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1745,6 +1745,14 @@ struct kvm_x86_ops {
int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
kvm_pfn_t pfn_for_gfn);
+ /* Update external page tables for page table about to be freed */
+ int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+ void *external_spt);
+
+ /* Update external page table from spte getting removed, and flush TLB */
+ int (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+ kvm_pfn_t pfn_for_gfn);
+
bool (*has_wbinvd_exit)(void);
u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bc1ad127046d..630e6b6d4bf2 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -340,6 +340,29 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
}
+static void remove_external_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte,
+ int level)
+{
+ kvm_pfn_t old_pfn = spte_to_pfn(old_spte);
+ int ret;
+
+ /*
+ * External (TDX) SPTEs are limited to PG_LEVEL_4K, and external
+ * PTs are removed in a special order, involving free_external_spt().
+ * But remove_external_spte() will be called on non-leaf PTEs via
+ * __tdp_mmu_zap_root(), so avoid the error the former would return
+ * in this case.
+ */
+ if (!is_last_spte(old_spte, level))
+ return;
+
+ /* Zapping leaf spte is allowed only when write lock is held. */
+ lockdep_assert_held_write(&kvm->mmu_lock);
+ /* Because write lock is held, operation should success. */
+ ret = static_call(kvm_x86_remove_external_spte)(kvm, gfn, level, old_pfn);
+ KVM_BUG_ON(ret, kvm);
+}
+
/**
* handle_removed_pt() - handle a page table removed from the TDP structure
*
@@ -435,6 +458,23 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
}
handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
old_spte, FROZEN_SPTE, level, shared);
+
+ if (is_mirror_sp(sp)) {
+ KVM_BUG_ON(shared, kvm);
+ remove_external_spte(kvm, gfn, old_spte, level);
+ }
+ }
+
+ if (is_mirror_sp(sp) &&
+ WARN_ON(static_call(kvm_x86_free_external_spt)(kvm, base_gfn, sp->role.level,
+ sp->external_spt))) {
+ /*
+ * Failed to free page table page in mirror page table and
+ * there is nothing to do further.
+ * Intentionally leak the page to prevent the kernel from
+ * accessing the encrypted page.
+ */
+ sp->external_spt = NULL;
}
call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
@@ -616,6 +656,13 @@ static inline int __tdp_mmu_set_spte_atomic(struct kvm *kvm, struct tdp_iter *it
if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) {
int ret;
+ /*
+ * Users of atomic zapping don't operate on mirror roots,
+ * so don't handle it and bug the VM if it's seen.
+ */
+ if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm))
+ return -EBUSY;
+
ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
iter->old_spte, new_spte, iter->level);
if (ret)
@@ -749,8 +796,10 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
* Users that do non-atomic setting of PTEs don't operate on mirror
* roots, so don't handle it and bug the VM if it's seen.
*/
- if (is_mirror_sptep(sptep))
+ if (is_mirror_sptep(sptep)) {
KVM_BUG_ON(is_shadow_present_pte(new_spte), kvm);
+ remove_external_spte(kvm, gfn, old_spte, level);
+ }
return old_spte;
}
--
2.34.1
next prev parent reply other threads:[~2024-06-19 22:36 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-19 22:35 [PATCH v3 00/17] TDX MMU prep series part 1 Rick Edgecombe
2024-06-19 22:35 ` [PATCH v3 01/17] KVM: x86/tdp_mmu: Rename REMOVED_SPTE to FROZEN_SPTE Rick Edgecombe
2024-06-19 22:35 ` [PATCH v3 02/17] KVM: Add member to struct kvm_gfn_range for target alias Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 03/17] KVM: x86: Add a VM type define for TDX Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 04/17] KVM: x86/mmu: Add an external pointer to struct kvm_mmu_page Rick Edgecombe
2024-07-03 7:03 ` Yan Zhao
2024-06-19 22:36 ` [PATCH v3 05/17] KVM: x86/mmu: Add an is_mirror member for union kvm_mmu_page_role Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 06/17] KVM: x86/mmu: Make kvm_tdp_mmu_alloc_root() return void Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 07/17] KVM: x86/tdp_mmu: Take struct kvm in iter loops Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 08/17] KVM: x86/tdp_mmu: Take a GFN in kvm_tdp_mmu_fast_pf_get_last_sptep() Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 09/17] KVM: x86/mmu: Support GFN direct bits Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 10/17] KVM: x86/tdp_mmu: Extract root invalid check from tdx_mmu_next_root() Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 11/17] KVM: x86/tdp_mmu: Introduce KVM MMU root types to specify page table type Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 12/17] KVM: x86/tdp_mmu: Take root in tdp_mmu_for_each_pte() Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 13/17] KVM: x86/tdp_mmu: Support mirror root for TDP MMU Rick Edgecombe
2024-06-24 8:30 ` Yan Zhao
2024-06-25 0:51 ` Edgecombe, Rick P
2024-06-25 5:43 ` Yan Zhao
2024-06-25 20:33 ` Edgecombe, Rick P
2024-06-26 5:05 ` Yan Zhao
2024-07-03 19:40 ` Edgecombe, Rick P
2024-07-04 8:09 ` Yan Zhao
2024-07-09 22:36 ` Edgecombe, Rick P
2024-07-04 8:51 ` Yan Zhao
2024-07-09 22:38 ` Edgecombe, Rick P
2024-07-11 23:54 ` Edgecombe, Rick P
2024-07-12 1:42 ` Yan Zhao
2024-06-19 22:36 ` [PATCH v3 14/17] KVM: x86/tdp_mmu: Propagate attr_filter to MMU notifier callbacks Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 15/17] KVM: x86/tdp_mmu: Propagate building mirror page tables Rick Edgecombe
2024-06-20 5:15 ` Binbin Wu
2024-06-24 23:52 ` Edgecombe, Rick P
2024-06-19 22:36 ` Rick Edgecombe [this message]
2024-06-20 8:44 ` [PATCH v3 16/17] KVM: x86/tdp_mmu: Propagate tearing down " Binbin Wu
2024-06-24 23:55 ` Edgecombe, Rick P
2024-06-19 22:36 ` [PATCH v3 17/17] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots() Rick Edgecombe
2024-06-21 7:10 ` Yan Zhao
2024-06-21 19:08 ` Edgecombe, Rick P
2024-06-24 8:29 ` Yan Zhao
2024-06-24 23:15 ` Edgecombe, Rick P
2024-06-25 6:14 ` Yan Zhao
2024-06-25 20:56 ` Edgecombe, Rick P
2024-06-26 2:25 ` Yan Zhao
2024-07-03 20:00 ` Edgecombe, Rick P
2024-07-05 1:16 ` Yan Zhao
2024-07-09 22:52 ` Edgecombe, Rick P
2024-07-18 15:28 ` Isaku Yamahata
2024-07-18 15:55 ` Edgecombe, Rick P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240619223614.290657-17-rick.p.edgecombe@intel.com \
--to=rick.p.edgecombe@intel.com \
--cc=dmatlack@google.com \
--cc=erdemaktas@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=sagis@google.com \
--cc=seanjc@google.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox