public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 06/17] KVM: x86/tdp_mmu: Morph the !is_frozen_spte() check into a KVM_MMU_WARN_ON()
       [not found] ` <20260327201421.2824383-7-rick.p.edgecombe@intel.com>
@ 2026-03-30  5:00   ` Yan Zhao
  0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30  5:00 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
	dave.hansen

On Fri, Mar 27, 2026 at 01:14:10PM -0700, Rick Edgecombe wrote:
> Remove the conditional logic for handling the setting of mirror EPTs to
Should we unify the terms "mirror EPTs," "mirror TDP," and "mirror page tables"
in this series?

> frozen in __tdp_mmu_set_spte_atomic() and add it as a warning instead.
> 
> Mirror TDP needs propagate PTE changes to the to the external TDP. This
Two "to the".

> presents a problem for atomic updates which can't update both at once. So
> a special value, FROZEN_SPTE, is used as a temporary state during these
> updates to prevent concurrent operations to the PTE. If the TDP MMU tried
> to install this as a long term value, it would confuse these updates.
> Despite this __tdp_mmu_set_spte_atomic() includes a check to handle it
> being set. Remove this check and turn it into a warning.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 0809fe8e8737..338957bc5109 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -656,7 +656,13 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
>  	 */
>  	WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));
>  
> -	if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) {
> +	/*
> +	 * FROZEN_SPTE is a temporary state and should never be set via higher
> +	 * level helpers.
> +	 */
> +	KVM_MMU_WARN_ON(is_frozen_spte(new_spte));
Why is KVM_MMU_WARN_ON() used here for new_spte while WARN_ON_ONCE() is used
above for old_spte?

> +	if (is_mirror_sptep(iter->sptep)) {
>  		int ret;
>  
>  		ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 07/17] KVM: x86/tdp_mmu: Centralize updates to present external PTEs
       [not found] ` <20260327201421.2824383-8-rick.p.edgecombe@intel.com>
@ 2026-03-30  6:14   ` Yan Zhao
  0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30  6:14 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
	dave.hansen

On Fri, Mar 27, 2026 at 01:14:11PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Centralize the updates to present external PTEs to the
> handle_changed_spte() function.
> 
> When setting a PTE to present in the mirror page tables, the update needs
> to propagate to the external page tables (in TDX parlance the S-EPT).
> Today this is handled by special mirror page tables branching in
> __tdp_mmu_set_spte_atomic(), which is the only place where present PTEs
> are set for TDX.
> 
> This keeps things running, but is a bit hacked on. The hook for setting
> present leaf PTEs are added only where TDX happens to need them. For
> example, TDX does not support any of the operations that use the
> non-atomic variant, tdp_mmu_set_spte() to set present PTEs. Since the hook
> is missing there, it is very hard to understand the code from a non-TDX
> lens. If the reader doesn’t know the TDX specifics it could look like the
> external update is missing.
> 
> In addition to being confusing, it also litters the TDP MMU with
> "external" update callbacks. This is especially unfortunate because there
> is already a central place to react to TDP updates, handle_changed_spte().
> 
> Begin the process of moving towards a model where all mirror page table
> updates are forwarded to TDX code where the TDX specific logic can live
> with a more proper separation of concerns. Do this by teaching
> handle_changed_spte() how to return error codes, such that it can
> propagate the failures that may come from TDX external page table updates.
> 
> Atomic mirror page table updates need to be done in a special way to
> prevent concurrent updates to the mirror page table while the external
> page table is updated. The mirror page table is set to the frozen PTE
> value while the external version is updates. This frozen PTE dance is
"is updates" --> "is updating"?

> currently done in __tdp_mmu_set_spte_atomic(). Hoist it up a level so that
> the external update in handle_changed_spte() can be done while the PTE is
> frozen.
> 
> Link: https://lore.kernel.org/lkml/aYYn0nf2cayYu8e7@google.com/
> Not-yet-Signed-off-by: Sean Christopherson <seanjc@google.com>
> [Based on a diff by Sean Chrisopherson]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 150 ++++++++++++++++++++++---------------
>  1 file changed, 88 insertions(+), 62 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 338957bc5109..db16e81b9701 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -320,9 +320,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool mirror)
>  	}
>  }
>  
> -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> -				u64 old_spte, u64 new_spte, int level,
> -				bool shared);
> +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
> +				gfn_t gfn, u64 old_spte, u64 new_spte,
> +				int level, bool shared);
>  
>  static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> @@ -471,8 +471,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
>  			old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
>  							  FROZEN_SPTE, level);
>  		}
> -		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
> -				    old_spte, FROZEN_SPTE, level, shared);
> +		handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared);
>  
>  		if (is_mirror_sp(sp)) {
>  			KVM_BUG_ON(shared, kvm);
> @@ -508,22 +507,15 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte, int level)
>  	return NULL;
>  }
>  
> -static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sptep,
> -						 gfn_t gfn, u64 *old_spte,
> -						 u64 new_spte, int level)
> +static int __must_check set_external_spte_present(struct kvm *kvm,
> +						  gfn_t gfn, u64 old_spte,
> +						  u64 new_spte, int level)
>  {
>  	bool is_present = is_shadow_present_pte(new_spte);
>  	bool is_leaf = is_present && is_last_spte(new_spte, level);
>  	int ret = 0;
>  
>  	lockdep_assert_held(&kvm->mmu_lock);
> -	/*
> -	 * We need to lock out other updates to the SPTE until the external
> -	 * page table has been modified. Use FROZEN_SPTE similar to
> -	 * the zapping case.
> -	 */
> -	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
> -		return -EBUSY;
>  
>  	/*
>  	 * Use different call to either set up middle level
> @@ -537,17 +529,13 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
>  		KVM_BUG_ON(!external_spt, kvm);
>  		ret = kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt);
>  	}
> -	if (ret)
> -		__kvm_tdp_mmu_write_spte(sptep, *old_spte);
> -	else
> -		__kvm_tdp_mmu_write_spte(sptep, new_spte);
>  	return ret;
>  }
>  
>  /**
> - * handle_changed_spte - handle bookkeeping associated with an SPTE change
> + * __handle_changed_spte - handle bookkeeping associated with an SPTE change
>   * @kvm: kvm instance
> - * @as_id: the address space of the paging structure the SPTE was a part of
> + * @sp: the page table in which the SPTE resides
>   * @gfn: the base GFN that was mapped by the SPTE
>   * @old_spte: The value of the SPTE before the change
>   * @new_spte: The value of the SPTE after the change
> @@ -560,15 +548,17 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
>   * dirty logging updates are handled in common code, not here (see make_spte()
>   * and fast_pf_fix_direct_spte()).
>   */
> -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> -				u64 old_spte, u64 new_spte, int level,
> -				bool shared)
> +static int __handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
> +				 gfn_t gfn, u64 old_spte, u64 new_spte,
> +				 int level, bool shared)
>  {
>  	bool was_present = is_shadow_present_pte(old_spte);
>  	bool is_present = is_shadow_present_pte(new_spte);
>  	bool was_leaf = was_present && is_last_spte(old_spte, level);
>  	bool is_leaf = is_present && is_last_spte(new_spte, level);
>  	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
> +	int as_id = kvm_mmu_page_as_id(sp);
> +	int r;
>  
>  	WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL);
>  	WARN_ON_ONCE(level < PG_LEVEL_4K);
> @@ -598,9 +588,7 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>  	}
>  
>  	if (old_spte == new_spte)
> -		return;
> -
> -	trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
> +		return 0;
>  
>  	if (is_leaf)
>  		check_spte_writable_invariants(new_spte);
> @@ -627,21 +615,41 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>  			       "a temporary frozen SPTE.\n"
>  			       "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d",
>  			       as_id, gfn, old_spte, new_spte, level);
> -		return;
> +		return 0;
>  	}
>  
> -	if (is_leaf != was_leaf)
> -		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
> -
>  	/*
>  	 * Recursively handle child PTs if the change removed a subtree from
>  	 * the paging structure.  Note the WARN on the PFN changing without the
>  	 * SPTE being converted to a hugepage (leaf) or being zapped.  Shadow
>  	 * pages are kernel allocations and should never be migrated.
> +	 *
> +	 * When modifying leaf entries in mirrored page tables, propagate all
> +	 * changes to the external SPTE.
>  	 */
>  	if (was_present && !was_leaf &&
> -	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
> +	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) {
>  		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
> +	} else if (is_mirror_sp(sp) && is_present) {
> +		r = set_external_spte_present(kvm, gfn, old_spte, new_spte, level);
> +		if (r)
> +			return r;
> +	}
> +
> +	trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
> +
> +	if (is_leaf != was_leaf)
> +		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
> +
> +	return 0;
> +}
> +
> +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
> +				gfn_t gfn, u64 old_spte, u64 new_spte,
> +				int level, bool shared)
> +{
> +	KVM_BUG_ON(__handle_changed_spte(kvm, sp, gfn, old_spte, new_spte,
> +					 level, shared), kvm);
>  }
>  
>  static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
> @@ -657,32 +665,14 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
>  	WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));
>  
>  	/*
> -	 * FROZEN_SPTE is a temporary state and should never be set via higher
> -	 * level helpers.
> +	 * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and
> +	 * does not hold the mmu_lock.  On failure, i.e. if a different logical
> +	 * CPU modified the SPTE, try_cmpxchg64() updates iter->old_spte with
> +	 * the current value, so the caller operates on fresh data, e.g. if it
> +	 * retries tdp_mmu_set_spte_atomic()
>  	 */
> -	KVM_MMU_WARN_ON(is_frozen_spte(new_spte));
> -
> -	if (is_mirror_sptep(iter->sptep)) {
> -		int ret;
> -
> -		ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
> -						&iter->old_spte, new_spte, iter->level);
> -		if (ret)
> -			return ret;
> -	} else {
> -		u64 *sptep = rcu_dereference(iter->sptep);
> -
> -		/*
> -		 * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs
> -		 * and does not hold the mmu_lock.  On failure, i.e. if a
> -		 * different logical CPU modified the SPTE, try_cmpxchg64()
> -		 * updates iter->old_spte with the current value, so the caller
> -		 * operates on fresh data, e.g. if it retries
> -		 * tdp_mmu_set_spte_atomic()
> -		 */
> -		if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
> -			return -EBUSY;
> -	}
> +	if (!try_cmpxchg64(rcu_dereference(iter->sptep), &iter->old_spte, new_spte))
> +		return -EBUSY;
>  
>  	return 0;
>  }
> @@ -708,18 +698,49 @@ static inline int __must_check tdp_mmu_set_spte_atomic(struct kvm *kvm,
>  						       struct tdp_iter *iter,
>  						       u64 new_spte)
>  {
> +	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
>  	int ret;
>  
>  	lockdep_assert_held_read(&kvm->mmu_lock);
>  
> -	ret = __tdp_mmu_set_spte_atomic(kvm, iter, new_spte);
> +	/* KVM should never freeze SPTEs using higher level APIs. */
"higher level API" is ambiguous. e.g. kvm_tdp_mmu_write_spte_atomic() allows
new_spte to be FROZEN_SPTE.

What about just "callers of tdp_mmu_set_spte_atomic() should not freeze SPTEs
directly"?

> +	KVM_MMU_WARN_ON(is_frozen_spte(new_spte));
> +
> +	/*
> +	 * Temporarily freeze the SPTE until the external PTE operation has
> +	 * completed (unless the new SPTE itself will be frozen), e.g. so that
> +	 * concurrent faults don't attempt to install a child PTE in the
> +	 * external page table before the parent PTE has been written, or try
> +	 * to re-install a page table before the old one was removed.
> +	 */
> +	if (is_mirror_sptep(iter->sptep))
> +		ret = __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE);
> +	else
> +		ret = __tdp_mmu_set_spte_atomic(kvm, iter, new_spte);
>  	if (ret)
>  		return ret;
>  
> -	handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> -			    new_spte, iter->level, true);
> +	ret = __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte,
> +				    new_spte, iter->level, true);

What about adding a comment for the tricky part for the mirror page table:
while new_spte is set to FROZEN_SPTE in the above __tdp_mmu_set_spte_atomic()
for freezing the mirror page table, the original new_spte from the caller of
tdp_mmu_set_spte_atomic() is passed to __handle_changed_spte() in order to
properly update statistics and propagate to the external page table.

> -	return 0;
> +	/*
> +	 * Unfreeze the mirror SPTE.  If updating the external SPTE failed,
> +	 * restore the old SPTE so that the SPTE isn't frozen in perpetuity,
> +	 * otherwise set the mirror SPTE to the new desired value.
> +	 */
> +	if (is_mirror_sptep(iter->sptep)) {
> +		if (ret)
> +			__kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte);
> +		else
> +			__kvm_tdp_mmu_write_spte(iter->sptep, new_spte);
> +	} else {
> +		/*
> +		 * Bug the VM if handling the change failed, as failure is only
> +		 * allowed if KVM couldn't update the external SPTE.
> +		 */
> +		KVM_BUG_ON(ret, kvm);
> +	}
> +	return ret;
>  }
>  
>  /*
> @@ -738,6 +759,8 @@ static inline int __must_check tdp_mmu_set_spte_atomic(struct kvm *kvm,
>  static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
>  			    u64 old_spte, u64 new_spte, gfn_t gfn, int level)
>  {
> +	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(sptep));
> +
>  	lockdep_assert_held_write(&kvm->mmu_lock);
>  
>  	/*
> @@ -751,7 +774,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
>  
>  	old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
>  
> -	handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
> +	handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level, false);
>  
>  	/*
>  	 * Users that do non-atomic setting of PTEs don't operate on mirror
> @@ -1373,6 +1396,9 @@ static void kvm_tdp_mmu_age_spte(struct kvm *kvm, struct tdp_iter *iter)
>  {
>  	u64 new_spte;
>  
> +	if (WARN_ON_ONCE(is_mirror_sptep(iter->sptep)))
> +		return;
> +
Add a comment for why mirror page table is not expected here?

And do we need a similar WARN_ON_ONCE() in kvm_tdp_mmu_clear_dirty_pt_masked()
or clear_dirty_pt_masked()?

>  	if (spte_ad_enabled(iter->old_spte)) {
>  		iter->old_spte = tdp_mmu_clear_spte_bits_atomic(iter->sptep,
>  								shadow_accessed_mask);
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 08/17] KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use .set_external_spte() for all
       [not found] ` <20260327201421.2824383-9-rick.p.edgecombe@intel.com>
@ 2026-03-30  6:28   ` Yan Zhao
  0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30  6:28 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
	dave.hansen

On Fri, Mar 27, 2026 at 01:14:12PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Drop the dedicated .link_external_spt() for linking non-leaf S-EPT pages,
> and instead funnel everything through .set_external_spte().  Using separate
> hooks doesn't help prevent TDP MMU details from bleeding into TDX, and vice
> versa; to the contrary, dedicated callbacks will result in _more_ pollution
> when hugepage support is added, e.g. will require the TDP MMU to know
> details about the splitting rules for TDX that aren't all that relevant to
> the TDP MMU.
> 
> Ideally, KVM would provide a single pair of hooks to set S-EPT entries,
> one hook for setting SPTEs under write-lock and another for settings SPTEs
> under read-lock (e.g. to ensure the entire operation is "atomic", to allow
> for failure, etc.).  Sadly, TDX's requirement that all child S-EPT entries
> are removed before the parent makes that impractical: the TDP MMU
> deliberately prunes non-leaf SPTEs and _then_ processes its children, thus
> making it quite important for the TDP MMU to differentiate between zapping
> leaf and non-leaf S-EPT entries.
> 
> However, that's the _only_ case that's truly special, and even that case
> could be shoehorned into a single hook; it's just wouldn't be a net
"it's" --> "it".

> positive.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 09/17] KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT
       [not found] ` <20260327201421.2824383-10-rick.p.edgecombe@intel.com>
@ 2026-03-30  6:43   ` Yan Zhao
  0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30  6:43 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
	dave.hansen

On Fri, Mar 27, 2026 at 01:14:13PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Add a helper, tdx_sept_map_leaf_spte(), to wrap and isolate PAGE.ADD and
> PAGE.AUG operations, and thus complete tdx_sept_set_private_spte()'s
> transition into a "dispatch" routine for setting/writing S-EPT entries.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/kvm/vmx/tdx.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 92a846b91bac..361a75b42ae7 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1689,18 +1689,12 @@ static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
>  	return 0;
>  }
>  
> -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
> -				     enum pg_level level, u64 mirror_spte)
> +static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> +				  u64 mirror_spte)
>  {
>  	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
>  	kvm_pfn_t pfn = spte_to_pfn(mirror_spte);
>  
> -	if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm))
> -		return -EIO;
> -
> -	if (!is_last_spte(mirror_spte, level))
> -		return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte);
> -
>  	/* TODO: handle large pages. */
>  	if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm))
>  		return -EIO;
> @@ -1725,7 +1719,18 @@ static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
>  	return tdx_mem_page_aug(kvm, gfn, level, pfn);
>  }
>  
> +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
> +				     enum pg_level level, u64 mirror_spte)
> +{
>  
> +	if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm))
> +		return -EIO;
> +
> +	if (!is_last_spte(mirror_spte, level))
> +		return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte);
For symmetry, what about renaming tdx_sept_link_private_spt() to
tdx_sept_map_nonleaf_spte()?

> +
> +	return tdx_sept_map_leaf_spte(kvm, gfn, level, mirror_spte);
> +}
>  
>  /*
>   * Ensure shared and private EPTs to be flushed on all vCPUs.
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 14/17] KVM: x86/mmu: Remove KVM_BUG_ON() that checks lock when removing PTs
       [not found] ` <20260327201421.2824383-15-rick.p.edgecombe@intel.com>
@ 2026-03-30  7:01   ` Yan Zhao
  0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30  7:01 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
	dave.hansen

On Fri, Mar 27, 2026 at 01:14:18PM -0700, Rick Edgecombe wrote:
> As part of an ongoing effort to move TDX specific bits from the MMU into
> the TDX code, drop the KVM_BUG_ON() that checks the MMU lock is held for
> write while removing page tables.
> 
> Future changes forward PTE removal mirror EPT updates into the
> set_private_spte() and let TDX code parse the PTE to decide what S-EPT
> operations to take. This operations does not pass a shared bool for this
"operations" --> "operation" ?

> KVM_BUG_ON() to use in the logics future home.
By "in the logics future home", do you mean "when this logic is moved to its
future location"?
 
> But even today there are already MMU write lockdep asserts that mostly
> cover the case. Since the KVM_BUG_ON() is already a bit redundant, just
> remove it instead of trying to plumb the bool into TDX code.
> 
> Link: https://lore.kernel.org/kvm/aYUarHf3KEwHGuJe@google.com/
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 991870789863..5dc9633c866e 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -473,10 +473,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
>  		}
>  		handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared);
>  
> -		if (is_mirror_sp(sp)) {
> -			KVM_BUG_ON(shared, kvm);
> +		if (is_mirror_sp(sp))
>  			remove_external_spte(kvm, gfn, old_spte, level);
> -		}
>  	}
>  
>  	if (is_mirror_sp(sp) &&
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 17/17] KVM: TDX: Move external page table freeing to TDX code
       [not found] ` <20260327201421.2824383-18-rick.p.edgecombe@intel.com>
@ 2026-03-30  7:49   ` Yan Zhao
  0 siblings, 0 replies; 6+ messages in thread
From: Yan Zhao @ 2026-03-30  7:49 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: seanjc, pbonzini, kai.huang, kvm, kas, linux-kernel, x86,
	dave.hansen

On Fri, Mar 27, 2026 at 01:14:21PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Move the freeing of external page tables into the reclaim operation that
> lives in TDX code.
> 
> The TDP MMU supports traversing the TDP without holding locks. Page
> tables needs to be freed via RCU to prevent walking one that gets
> freed.
> 
> While none of these lockless walk operations actually happen for the
> mirror EPT, the TDP MMU none-the-less frees the mirror EPT page tables in
> the same way, and because it’s a handy place to plug it in, the external
> page tables as well.
> 
> However, the external page tables definitely can’t be walked once they are
> reclaimed from the TDX module. The TDX module releases the page for the
> host VMM to use, so this RCU-time free is unnecessary for external page
> tables.
> 
> So move the free_page() call to TDX code. Create an
> tdp_mmu_free_unused_sp() to allow for freeing external page tables that
> have never left the TDP MMU code (i.e. don’t need freed in a special way.
> 
> Link: https://lore.kernel.org/kvm/aYpjNrtGmogNzqwT@google.com/
> Not-yet-Signed-off-by: Sean Christopherson <seanjc@google.com>
> [Based on a diff by Sean, added log]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 16 +++++++++++-----
>  arch/x86/kvm/vmx/tdx.c     | 11 ++++++++++-
>  2 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 575033cc7fe4..18e11c1c7631 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -53,13 +53,18 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>  	rcu_barrier();
>  }
>  
> -static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> +static void __tdp_mmu_free_sp(struct kvm_mmu_page *sp)
>  {
> -	free_page((unsigned long)sp->external_spt);
>  	free_page((unsigned long)sp->spt);
>  	kmem_cache_free(mmu_page_header_cache, sp);
>  }
>  
> +static void tdp_mmu_free_unused_sp(struct kvm_mmu_page *sp)
> +{
> +	free_page((unsigned long)sp->external_spt);
> +	__tdp_mmu_free_sp(sp);
> +}
> +
>  /*
>   * This is called through call_rcu in order to free TDP page table memory
>   * safely with respect to other kernel threads that may be operating on
> @@ -73,7 +78,8 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
>  	struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page,
>  					       rcu_head);
>  
> -	tdp_mmu_free_sp(sp);
> +	WARN_ON_ONCE(sp->external_spt);
> +	__tdp_mmu_free_sp(sp);
>  }
>  
>  void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
> @@ -1261,7 +1267,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  		 * failed, e.g. because a different task modified the SPTE.
>  		 */
>  		if (r) {
> -			tdp_mmu_free_sp(sp);
> +			tdp_mmu_free_unused_sp(sp);
>  			goto retry;
>  		}
>  
> @@ -1571,7 +1577,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
>  	 * installs its own sp in place of the last sp we tried to split.
>  	 */
>  	if (sp)
> -		tdp_mmu_free_sp(sp);
> +		tdp_mmu_free_unused_sp(sp);
>  
>  	return 0;
>  }
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index d064b40a6b31..1346e891ca94 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1782,7 +1782,16 @@ static void tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
>  	 */
>  	if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) ||
>  	    tdx_reclaim_page(virt_to_page(sp->external_spt)))
> -		sp->external_spt = NULL;
> +		goto out;
> +
> +	/*
> +	 * Immediately free the S-EPT page as the TDX subsystem doesn't support
> +	 * freeing pages from RCU callbacks, and more importantly because
> +	 * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding readers.
> +	 */
> +	free_page((unsigned long)sp->external_spt);
Since sp->external_spt is allocated in kvm_mmu_alloc_external_spt() in a similar
way as sp->spt in tdp_mmu_alloc_sp(), why bother making freeing of
sp->external_spt special? i.e., what's the downside of freeing it together with
sp->spt in tdp_mmu_free_sp() ?

> +out:
> +	sp->external_spt = NULL;
>  }
>  
>  static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-30  8:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260327201421.2824383-1-rick.p.edgecombe@intel.com>
     [not found] ` <20260327201421.2824383-7-rick.p.edgecombe@intel.com>
2026-03-30  5:00   ` [PATCH 06/17] KVM: x86/tdp_mmu: Morph the !is_frozen_spte() check into a KVM_MMU_WARN_ON() Yan Zhao
     [not found] ` <20260327201421.2824383-8-rick.p.edgecombe@intel.com>
2026-03-30  6:14   ` [PATCH 07/17] KVM: x86/tdp_mmu: Centralize updates to present external PTEs Yan Zhao
     [not found] ` <20260327201421.2824383-9-rick.p.edgecombe@intel.com>
2026-03-30  6:28   ` [PATCH 08/17] KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use .set_external_spte() for all Yan Zhao
     [not found] ` <20260327201421.2824383-10-rick.p.edgecombe@intel.com>
2026-03-30  6:43   ` [PATCH 09/17] KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT Yan Zhao
     [not found] ` <20260327201421.2824383-15-rick.p.edgecombe@intel.com>
2026-03-30  7:01   ` [PATCH 14/17] KVM: x86/mmu: Remove KVM_BUG_ON() that checks lock when removing PTs Yan Zhao
     [not found] ` <20260327201421.2824383-18-rick.p.edgecombe@intel.com>
2026-03-30  7:49   ` [PATCH 17/17] KVM: TDX: Move external page table freeing to TDX code Yan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox