* Re: [PATCH 06/17] KVM: x86/tdp_mmu: Morph the !is_frozen_spte() check into a KVM_MMU_WARN_ON()
[not found] ` <20260327201421.2824383-7-rick.p.edgecombe@intel.com>
@ 2026-03-30 5:00 ` Yan Zhao
0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30 5:00 UTC (permalink / raw)
To: Rick Edgecombe
Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
dave.hansen
On Fri, Mar 27, 2026 at 01:14:10PM -0700, Rick Edgecombe wrote:
> Remove the conditional logic for handling the setting of mirror EPTs to
Should we unify the terms "mirror EPTs," "mirror TDP," and "mirror page tables"
in this series?
> frozen in __tdp_mmu_set_spte_atomic() and add it as a warning instead.
>
> Mirror TDP needs propagate PTE changes to the to the external TDP. This
Two "to the".
> presents a problem for atomic updates which can't update both at once. So
> a special value, FROZEN_SPTE, is used as a temporary state during these
> updates to prevent concurrent operations to the PTE. If the TDP MMU tried
> to install this as a long term value, it would confuse these updates.
> Despite this __tdp_mmu_set_spte_atomic() includes a check to handle it
> being set. Remove this check and turn it into a warning.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 0809fe8e8737..338957bc5109 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -656,7 +656,13 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
> */
> WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));
>
> - if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) {
> + /*
> + * FROZEN_SPTE is a temporary state and should never be set via higher
> + * level helpers.
> + */
> + KVM_MMU_WARN_ON(is_frozen_spte(new_spte));
Why is KVM_MMU_WARN_ON() used here for new_spte while WARN_ON_ONCE() is used
above for old_spte?
> + if (is_mirror_sptep(iter->sptep)) {
> int ret;
>
> ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 07/17] KVM: x86/tdp_mmu: Centralize updates to present external PTEs
[not found] ` <20260327201421.2824383-8-rick.p.edgecombe@intel.com>
@ 2026-03-30 6:14 ` Yan Zhao
0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30 6:14 UTC (permalink / raw)
To: Rick Edgecombe
Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
dave.hansen
On Fri, Mar 27, 2026 at 01:14:11PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Centralize the updates to present external PTEs to the
> handle_changed_spte() function.
>
> When setting a PTE to present in the mirror page tables, the update needs
> to propagate to the external page tables (in TDX parlance the S-EPT).
> Today this is handled by special mirror page tables branching in
> __tdp_mmu_set_spte_atomic(), which is the only place where present PTEs
> are set for TDX.
>
> This keeps things running, but is a bit hacked on. The hook for setting
> present leaf PTEs are added only where TDX happens to need them. For
> example, TDX does not support any of the operations that use the
> non-atomic variant, tdp_mmu_set_spte() to set present PTEs. Since the hook
> is missing there, it is very hard to understand the code from a non-TDX
> lens. If the reader doesn’t know the TDX specifics it could look like the
> external update is missing.
>
> In addition to being confusing, it also litters the TDP MMU with
> "external" update callbacks. This is especially unfortunate because there
> is already a central place to react to TDP updates, handle_changed_spte().
>
> Begin the process of moving towards a model where all mirror page table
> updates are forwarded to TDX code where the TDX specific logic can live
> with a more proper separation of concerns. Do this by teaching
> handle_changed_spte() how to return error codes, such that it can
> propagate the failures that may come from TDX external page table updates.
>
> Atomic mirror page table updates need to be done in a special way to
> prevent concurrent updates to the mirror page table while the external
> page table is updated. The mirror page table is set to the frozen PTE
> value while the external version is updates. This frozen PTE dance is
"is updates" --> "is updating"?
> currently done in __tdp_mmu_set_spte_atomic(). Hoist it up a level so that
> the external update in handle_changed_spte() can be done while the PTE is
> frozen.
>
> Link: https://lore.kernel.org/lkml/aYYn0nf2cayYu8e7@google.com/
> Not-yet-Signed-off-by: Sean Christopherson <seanjc@google.com>
> [Based on a diff by Sean Chrisopherson]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 150 ++++++++++++++++++++++---------------
> 1 file changed, 88 insertions(+), 62 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 338957bc5109..db16e81b9701 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -320,9 +320,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool mirror)
> }
> }
>
> -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> - u64 old_spte, u64 new_spte, int level,
> - bool shared);
> +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
> + gfn_t gfn, u64 old_spte, u64 new_spte,
> + int level, bool shared);
>
> static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> {
> @@ -471,8 +471,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
> FROZEN_SPTE, level);
> }
> - handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> - old_spte, FROZEN_SPTE, level, shared);
> + handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared);
>
> if (is_mirror_sp(sp)) {
> KVM_BUG_ON(shared, kvm);
> @@ -508,22 +507,15 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte, int level)
> return NULL;
> }
>
> -static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sptep,
> - gfn_t gfn, u64 *old_spte,
> - u64 new_spte, int level)
> +static int __must_check set_external_spte_present(struct kvm *kvm,
> + gfn_t gfn, u64 old_spte,
> + u64 new_spte, int level)
> {
> bool is_present = is_shadow_present_pte(new_spte);
> bool is_leaf = is_present && is_last_spte(new_spte, level);
> int ret = 0;
>
> lockdep_assert_held(&kvm->mmu_lock);
> - /*
> - * We need to lock out other updates to the SPTE until the external
> - * page table has been modified. Use FROZEN_SPTE similar to
> - * the zapping case.
> - */
> - if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
> - return -EBUSY;
>
> /*
> * Use different call to either set up middle level
> @@ -537,17 +529,13 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
> KVM_BUG_ON(!external_spt, kvm);
> ret = kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt);
> }
> - if (ret)
> - __kvm_tdp_mmu_write_spte(sptep, *old_spte);
> - else
> - __kvm_tdp_mmu_write_spte(sptep, new_spte);
> return ret;
> }
>
> /**
> - * handle_changed_spte - handle bookkeeping associated with an SPTE change
> + * __handle_changed_spte - handle bookkeeping associated with an SPTE change
> * @kvm: kvm instance
> - * @as_id: the address space of the paging structure the SPTE was a part of
> + * @sp: the page table in which the SPTE resides
> * @gfn: the base GFN that was mapped by the SPTE
> * @old_spte: The value of the SPTE before the change
> * @new_spte: The value of the SPTE after the change
> @@ -560,15 +548,17 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
> * dirty logging updates are handled in common code, not here (see make_spte()
> * and fast_pf_fix_direct_spte()).
> */
> -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> - u64 old_spte, u64 new_spte, int level,
> - bool shared)
> +static int __handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
> + gfn_t gfn, u64 old_spte, u64 new_spte,
> + int level, bool shared)
> {
> bool was_present = is_shadow_present_pte(old_spte);
> bool is_present = is_shadow_present_pte(new_spte);
> bool was_leaf = was_present && is_last_spte(old_spte, level);
> bool is_leaf = is_present && is_last_spte(new_spte, level);
> bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
> + int as_id = kvm_mmu_page_as_id(sp);
> + int r;
>
> WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL);
> WARN_ON_ONCE(level < PG_LEVEL_4K);
> @@ -598,9 +588,7 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> }
>
> if (old_spte == new_spte)
> - return;
> -
> - trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
> + return 0;
>
> if (is_leaf)
> check_spte_writable_invariants(new_spte);
> @@ -627,21 +615,41 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> "a temporary frozen SPTE.\n"
> "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d",
> as_id, gfn, old_spte, new_spte, level);
> - return;
> + return 0;
> }
>
> - if (is_leaf != was_leaf)
> - kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
> -
> /*
> * Recursively handle child PTs if the change removed a subtree from
> * the paging structure. Note the WARN on the PFN changing without the
> * SPTE being converted to a hugepage (leaf) or being zapped. Shadow
> * pages are kernel allocations and should never be migrated.
> + *
> + * When modifying leaf entries in mirrored page tables, propagate all
> + * changes to the external SPTE.
> */
> if (was_present && !was_leaf &&
> - (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
> + (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) {
> handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
> + } else if (is_mirror_sp(sp) && is_present) {
> + r = set_external_spte_present(kvm, gfn, old_spte, new_spte, level);
> + if (r)
> + return r;
> + }
> +
> + trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
> +
> + if (is_leaf != was_leaf)
> + kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
> +
> + return 0;
> +}
> +
> +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
> + gfn_t gfn, u64 old_spte, u64 new_spte,
> + int level, bool shared)
> +{
> + KVM_BUG_ON(__handle_changed_spte(kvm, sp, gfn, old_spte, new_spte,
> + level, shared), kvm);
> }
>
> static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
> @@ -657,32 +665,14 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
> WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));
>
> /*
> - * FROZEN_SPTE is a temporary state and should never be set via higher
> - * level helpers.
> + * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and
> + * does not hold the mmu_lock. On failure, i.e. if a different logical
> + * CPU modified the SPTE, try_cmpxchg64() updates iter->old_spte with
> + * the current value, so the caller operates on fresh data, e.g. if it
> + * retries tdp_mmu_set_spte_atomic()
> */
> - KVM_MMU_WARN_ON(is_frozen_spte(new_spte));
> -
> - if (is_mirror_sptep(iter->sptep)) {
> - int ret;
> -
> - ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
> - &iter->old_spte, new_spte, iter->level);
> - if (ret)
> - return ret;
> - } else {
> - u64 *sptep = rcu_dereference(iter->sptep);
> -
> - /*
> - * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs
> - * and does not hold the mmu_lock. On failure, i.e. if a
> - * different logical CPU modified the SPTE, try_cmpxchg64()
> - * updates iter->old_spte with the current value, so the caller
> - * operates on fresh data, e.g. if it retries
> - * tdp_mmu_set_spte_atomic()
> - */
> - if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
> - return -EBUSY;
> - }
> + if (!try_cmpxchg64(rcu_dereference(iter->sptep), &iter->old_spte, new_spte))
> + return -EBUSY;
>
> return 0;
> }
> @@ -708,18 +698,49 @@ static inline int __must_check tdp_mmu_set_spte_atomic(struct kvm *kvm,
> struct tdp_iter *iter,
> u64 new_spte)
> {
> + struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
> int ret;
>
> lockdep_assert_held_read(&kvm->mmu_lock);
>
> - ret = __tdp_mmu_set_spte_atomic(kvm, iter, new_spte);
> + /* KVM should never freeze SPTEs using higher level APIs. */
"higher level API" is ambiguous. e.g. kvm_tdp_mmu_write_spte_atomic() allows
new_spte to be FROZEN_SPTE.
What about just "callers of tdp_mmu_set_spte_atomic() should not freeze SPTEs
directly"?
> + KVM_MMU_WARN_ON(is_frozen_spte(new_spte));
> +
> + /*
> + * Temporarily freeze the SPTE until the external PTE operation has
> + * completed (unless the new SPTE itself will be frozen), e.g. so that
> + * concurrent faults don't attempt to install a child PTE in the
> + * external page table before the parent PTE has been written, or try
> + * to re-install a page table before the old one was removed.
> + */
> + if (is_mirror_sptep(iter->sptep))
> + ret = __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE);
> + else
> + ret = __tdp_mmu_set_spte_atomic(kvm, iter, new_spte);
> if (ret)
> return ret;
>
> - handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> - new_spte, iter->level, true);
> + ret = __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte,
> + new_spte, iter->level, true);
What about adding a comment for the tricky part for the mirror page table:
while new_spte is set to FROZEN_SPTE in the above __tdp_mmu_set_spte_atomic()
for freezing the mirror page table, the original new_spte from the caller of
tdp_mmu_set_spte_atomic() is passed to __handle_changed_spte() in order to
properly update statistics and propagate to the external page table.
> - return 0;
> + /*
> + * Unfreeze the mirror SPTE. If updating the external SPTE failed,
> + * restore the old SPTE so that the SPTE isn't frozen in perpetuity,
> + * otherwise set the mirror SPTE to the new desired value.
> + */
> + if (is_mirror_sptep(iter->sptep)) {
> + if (ret)
> + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte);
> + else
> + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte);
> + } else {
> + /*
> + * Bug the VM if handling the change failed, as failure is only
> + * allowed if KVM couldn't update the external SPTE.
> + */
> + KVM_BUG_ON(ret, kvm);
> + }
> + return ret;
> }
>
> /*
> @@ -738,6 +759,8 @@ static inline int __must_check tdp_mmu_set_spte_atomic(struct kvm *kvm,
> static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> u64 old_spte, u64 new_spte, gfn_t gfn, int level)
> {
> + struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(sptep));
> +
> lockdep_assert_held_write(&kvm->mmu_lock);
>
> /*
> @@ -751,7 +774,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
>
> old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
>
> - handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
> + handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level, false);
>
> /*
> * Users that do non-atomic setting of PTEs don't operate on mirror
> @@ -1373,6 +1396,9 @@ static void kvm_tdp_mmu_age_spte(struct kvm *kvm, struct tdp_iter *iter)
> {
> u64 new_spte;
>
> + if (WARN_ON_ONCE(is_mirror_sptep(iter->sptep)))
> + return;
> +
Add a comment for why mirror page table is not expected here?
And do we need a similar WARN_ON_ONCE() in kvm_tdp_mmu_clear_dirty_pt_masked()
or clear_dirty_pt_masked()?
> if (spte_ad_enabled(iter->old_spte)) {
> iter->old_spte = tdp_mmu_clear_spte_bits_atomic(iter->sptep,
> shadow_accessed_mask);
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 08/17] KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use .set_external_spte() for all
[not found] ` <20260327201421.2824383-9-rick.p.edgecombe@intel.com>
@ 2026-03-30 6:28 ` Yan Zhao
0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30 6:28 UTC (permalink / raw)
To: Rick Edgecombe
Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
dave.hansen
On Fri, Mar 27, 2026 at 01:14:12PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Drop the dedicated .link_external_spt() for linking non-leaf S-EPT pages,
> and instead funnel everything through .set_external_spte(). Using separate
> hooks doesn't help prevent TDP MMU details from bleeding into TDX, and vice
> versa; to the contrary, dedicated callbacks will result in _more_ pollution
> when hugepage support is added, e.g. will require the TDP MMU to know
> details about the splitting rules for TDX that aren't all that relevant to
> the TDP MMU.
>
> Ideally, KVM would provide a single pair of hooks to set S-EPT entries,
> one hook for setting SPTEs under write-lock and another for settings SPTEs
> under read-lock (e.g. to ensure the entire operation is "atomic", to allow
> for failure, etc.). Sadly, TDX's requirement that all child S-EPT entries
> are removed before the parent makes that impractical: the TDP MMU
> deliberately prunes non-leaf SPTEs and _then_ processes its children, thus
> making it quite important for the TDP MMU to differentiate between zapping
> leaf and non-leaf S-EPT entries.
>
> However, that's the _only_ case that's truly special, and even that case
> could be shoehorned into a single hook; it's just wouldn't be a net
"it's" --> "it".
> positive.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 09/17] KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT
[not found] ` <20260327201421.2824383-10-rick.p.edgecombe@intel.com>
@ 2026-03-30 6:43 ` Yan Zhao
0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30 6:43 UTC (permalink / raw)
To: Rick Edgecombe
Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
dave.hansen
On Fri, Mar 27, 2026 at 01:14:13PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Add a helper, tdx_sept_map_leaf_spte(), to wrap and isolate PAGE.ADD and
> PAGE.AUG operations, and thus complete tdx_sept_set_private_spte()'s
> transition into a "dispatch" routine for setting/writing S-EPT entries.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> arch/x86/kvm/vmx/tdx.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 92a846b91bac..361a75b42ae7 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1689,18 +1689,12 @@ static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
> return 0;
> }
>
> -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
> - enum pg_level level, u64 mirror_spte)
> +static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> + u64 mirror_spte)
> {
> struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> kvm_pfn_t pfn = spte_to_pfn(mirror_spte);
>
> - if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm))
> - return -EIO;
> -
> - if (!is_last_spte(mirror_spte, level))
> - return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte);
> -
> /* TODO: handle large pages. */
> if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm))
> return -EIO;
> @@ -1725,7 +1719,18 @@ static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
> return tdx_mem_page_aug(kvm, gfn, level, pfn);
> }
>
> +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
> + enum pg_level level, u64 mirror_spte)
> +{
>
> + if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm))
> + return -EIO;
> +
> + if (!is_last_spte(mirror_spte, level))
> + return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte);
For symmetry, what about renaming tdx_sept_link_private_spt() to
tdx_sept_map_nonleaf_spte()?
> +
> + return tdx_sept_map_leaf_spte(kvm, gfn, level, mirror_spte);
> +}
>
> /*
> * Ensure shared and private EPTs to be flushed on all vCPUs.
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 14/17] KVM: x86/mmu: Remove KVM_BUG_ON() that checks lock when removing PTs
[not found] ` <20260327201421.2824383-15-rick.p.edgecombe@intel.com>
@ 2026-03-30 7:01 ` Yan Zhao
0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30 7:01 UTC (permalink / raw)
To: Rick Edgecombe
Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
dave.hansen
On Fri, Mar 27, 2026 at 01:14:18PM -0700, Rick Edgecombe wrote:
> As part of an ongoing effort to move TDX specific bits from the MMU into
> the TDX code, drop the KVM_BUG_ON() that checks the MMU lock is held for
> write while removing page tables.
>
> Future changes forward PTE removal mirror EPT updates into the
> set_private_spte() and let TDX code parse the PTE to decide what S-EPT
> operations to take. This operations does not pass a shared bool for this
"operations" --> "operation" ?
> KVM_BUG_ON() to use in the logics future home.
By "in the logics future home", do you mean "when this logic is moved to its
future location"?
> But even today there are already MMU write lockdep asserts that mostly
> cover the case. Since the KVM_BUG_ON() is already a bit redundant, just
> remove it instead of trying to plumb the bool into TDX code.
>
> Link: https://lore.kernel.org/kvm/aYUarHf3KEwHGuJe@google.com/
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 991870789863..5dc9633c866e 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -473,10 +473,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> }
> handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared);
>
> - if (is_mirror_sp(sp)) {
> - KVM_BUG_ON(shared, kvm);
> + if (is_mirror_sp(sp))
> remove_external_spte(kvm, gfn, old_spte, level);
> - }
> }
>
> if (is_mirror_sp(sp) &&
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 17/17] KVM: TDX: Move external page table freeing to TDX code
[not found] ` <20260327201421.2824383-18-rick.p.edgecombe@intel.com>
@ 2026-03-30 7:49 ` Yan Zhao
0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30 7:49 UTC (permalink / raw)
To: Rick Edgecombe
Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
dave.hansen
On Fri, Mar 27, 2026 at 01:14:21PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Move the freeing of external page tables into the reclaim operation that
> lives in TDX code.
>
> The TDP MMU supports traversing the TDP without holding locks. Page
> tables needs to be freed via RCU to prevent walking one that gets
> freed.
>
> While none of these lockless walk operations actually happen for the
> mirror EPT, the TDP MMU none-the-less frees the mirror EPT page tables in
> the same way, and because it’s a handy place to plug it in, the external
> page tables as well.
>
> However, the external page tables definitely can’t be walked once they are
> reclaimed from the TDX module. The TDX module releases the page for the
> host VMM to use, so this RCU-time free is unnecessary for external page
> tables.
>
> So move the free_page() call to TDX code. Create an
> tdp_mmu_free_unused_sp() to allow for freeing external page tables that
> have never left the TDP MMU code (i.e. don’t need freed in a special way.
>
> Link: https://lore.kernel.org/kvm/aYpjNrtGmogNzqwT@google.com/
> Not-yet-Signed-off-by: Sean Christopherson <seanjc@google.com>
> [Based on a diff by Sean, added log]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 16 +++++++++++-----
> arch/x86/kvm/vmx/tdx.c | 11 ++++++++++-
> 2 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 575033cc7fe4..18e11c1c7631 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -53,13 +53,18 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
> rcu_barrier();
> }
>
> -static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> +static void __tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> {
> - free_page((unsigned long)sp->external_spt);
> free_page((unsigned long)sp->spt);
> kmem_cache_free(mmu_page_header_cache, sp);
> }
>
> +static void tdp_mmu_free_unused_sp(struct kvm_mmu_page *sp)
> +{
> + free_page((unsigned long)sp->external_spt);
> + __tdp_mmu_free_sp(sp);
> +}
> +
> /*
> * This is called through call_rcu in order to free TDP page table memory
> * safely with respect to other kernel threads that may be operating on
> @@ -73,7 +78,8 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
> struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page,
> rcu_head);
>
> - tdp_mmu_free_sp(sp);
> + WARN_ON_ONCE(sp->external_spt);
> + __tdp_mmu_free_sp(sp);
> }
>
> void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
> @@ -1261,7 +1267,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> * failed, e.g. because a different task modified the SPTE.
> */
> if (r) {
> - tdp_mmu_free_sp(sp);
> + tdp_mmu_free_unused_sp(sp);
> goto retry;
> }
>
> @@ -1571,7 +1577,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
> * installs its own sp in place of the last sp we tried to split.
> */
> if (sp)
> - tdp_mmu_free_sp(sp);
> + tdp_mmu_free_unused_sp(sp);
>
> return 0;
> }
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index d064b40a6b31..1346e891ca94 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1782,7 +1782,16 @@ static void tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
> */
> if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) ||
> tdx_reclaim_page(virt_to_page(sp->external_spt)))
> - sp->external_spt = NULL;
> + goto out;
> +
> + /*
> + * Immediately free the S-EPT page as the TDX subsystem doesn't support
> + * freeing pages from RCU callbacks, and more importantly because
> + * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding readers.
> + */
> + free_page((unsigned long)sp->external_spt);
Since sp->external_spt is allocated in kvm_mmu_alloc_external_spt() in a similar
way as sp->spt in tdp_mmu_alloc_sp(), why bother making freeing of
sp->external_spt special? i.e., what's the downside of freeing it together with
sp->spt in tdp_mmu_free_sp() ?
> +out:
> + sp->external_spt = NULL;
> }
>
> static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-30 8:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260327201421.2824383-1-rick.p.edgecombe@intel.com>
[not found] ` <20260327201421.2824383-7-rick.p.edgecombe@intel.com>
2026-03-30 5:00 ` [PATCH 06/17] KVM: x86/tdp_mmu: Morph the !is_frozen_spte() check into a KVM_MMU_WARN_ON() Yan Zhao
[not found] ` <20260327201421.2824383-8-rick.p.edgecombe@intel.com>
2026-03-30 6:14 ` [PATCH 07/17] KVM: x86/tdp_mmu: Centralize updates to present external PTEs Yan Zhao
[not found] ` <20260327201421.2824383-9-rick.p.edgecombe@intel.com>
2026-03-30 6:28 ` [PATCH 08/17] KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use .set_external_spte() for all Yan Zhao
[not found] ` <20260327201421.2824383-10-rick.p.edgecombe@intel.com>
2026-03-30 6:43 ` [PATCH 09/17] KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT Yan Zhao
[not found] ` <20260327201421.2824383-15-rick.p.edgecombe@intel.com>
2026-03-30 7:01 ` [PATCH 14/17] KVM: x86/mmu: Remove KVM_BUG_ON() that checks lock when removing PTs Yan Zhao
[not found] ` <20260327201421.2824383-18-rick.p.edgecombe@intel.com>
2026-03-30 7:49 ` [PATCH 17/17] KVM: TDX: Move external page table freeing to TDX code Yan Zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox