From: sashiko-bot@kernel.org
To: "Yan Zhao" <yan.y.zhao@intel.com>
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH v3 09/15] KVM: x86/tdp_mmu: Centrally propagate to-present/atomic zap updates to external PTEs
Date: Thu, 28 May 2026 09:52:51 +0000 [thread overview]
Message-ID: <20260528095251.669351F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260528081225.10348-1-yan.y.zhao@intel.com>
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Mutually exclusive `else if` logic in `__handle_changed_spte()` prevents external page table updates for parent SPTEs, and incorrect ordering creates a use-after-free hazard on error rollback.
- [High] Propagating deterministic errors from `set_external_spte()` causes an infinite retry loop in callers of `tdp_mmu_set_spte_atomic()`.
--
commit e8401f58d2841529854a98f404633950aa40f76c
Author: Sean Christopherson <seanjc@google.com>
KVM: x86/tdp_mmu: Centrally propagate to-present/atomic zap updates to external PTEs
This commit moves the propagation of to-present changes and atomic zap
changes to external PTEs from __tdp_mmu_set_spte_atomic() to
__handle_changed_spte(), centralizing SPTE change handling.
Link: https://lore.kernel.org/lkml/aYYn0nf2cayYu8e7@google.com [1]
Link: https://lore.kernel.org/kvm/aY_Dq3riRLUNFZfr@google.com [2]
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
[ ... ]
> @@ -578,29 +576,51 @@ static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
[ ... ]
> /*
> * Recursively handle child PTs if the change removed a subtree from
> * the paging structure. Note the WARN on the PFN changing without the
> * SPTE being converted to a hugepage (leaf) or being zapped. Shadow
> * pages are kernel allocations and should never be migrated.
> + *
> + * For the mirror page table, propagate changes to present or changes of
> + * leaf SPTEs to !present under shared mmu_lock to the external SPTE via
> + * set_external_spte() op.
> */
> if (was_present && !was_leaf &&
> - (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
> + (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) {
> handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
> + } else if (is_mirror_sp(sp) && (is_present || shared)) {
> + int r;
> +
> + r = kvm_x86_call(set_external_spte)(kvm, gfn, old_spte, new_spte, level);
> + if (r)
> + return r;
> + }
[Severity: Critical]
Does this mutually exclusive else if logic skip external page table updates
for parent SPTEs?
When replacing a non-leaf mirror SPTE with a leaf SPTE (such as during a
hugepage collapse) or performing a shared atomic zap on a non-leaf SPTE, the
first condition is met and handle_removed_pt() is called. Because of the
else if, the set_external_spte() call is skipped entirely, leaving the
hardware external page table desynchronized.
Additionally, if this is changed to an independent if block, could the order
of operations create a use-after-free on error rollback?
If handle_removed_pt() executes first, it irreversibly unlinks and schedules
child page tables for RCU freeing. If set_external_spte() then runs and
fails (e.g. TDX rejects it), __handle_changed_spte() returns an error. The
caller tdp_mmu_set_spte_atomic() will attempt to roll back by restoring the
parent SPTE to its old value. However, since the child page tables were
already freed, the restored SPTE would point to RCU-freed memory.
Should set_external_spte() be called before handle_removed_pt() to allow a
safe rollback?
> @@ -673,14 +668,51 @@ static inline int __must_check tdp_mmu_set_spte_atomic(struct kvm *kvm,
[ ... ]
> + ret = __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte,
> + new_spte, iter->level, true);
> + /*
> + * Unfreeze the mirror SPTE. If updating the external SPTE failed,
> + * restore the old value so that the mirror SPTE isn't frozen in
> + * perpetuity, otherwise set the mirror SPTE to the new desired value.
> + */
> + if (is_mirror_sptep(iter->sptep)) {
> + if (ret)
> + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte);
> + else
> + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte);
> + } else {
> + /*
> + * Bug the VM if handling the change failed, as failure is only
> + * allowed if KVM couldn't update the external SPTE.
> + */
> + KVM_BUG_ON(ret, kvm);
> + }
> + return ret;
> }
[Severity: High]
Will propagating deterministic error codes from set_external_spte() cause
infinite retry loops in callers of tdp_mmu_set_spte_atomic()?
Callers of tdp_mmu_set_spte_atomic() generally assume that any non-zero
return value is a transient cmpxchg race (which typically returns -EBUSY),
and they implement an unconditional retry pattern on failure.
If set_external_spte() rejects an operation and returns a deterministic
error like -EINVAL (e.g. rejecting an atomic zap as mentioned in the commit
message) or -ENOMEM, this error is propagated back. Because callers do not
check the specific error code before retrying, won't this result in a
kernel livelock where the caller infinitely loops attempting the same doomed
operation?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260528080856.10141-1-yan.y.zhao@intel.com?part=9
next prev parent reply other threads:[~2026-05-28 9:52 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-28 8:08 [PATCH v3 00/15] TDX MMU refactors Yan Zhao
2026-05-28 8:10 ` [PATCH v3 01/15] KVM: TDX: Drop kvm_x86_ops.link_external_spt() Yan Zhao
2026-05-28 8:11 ` [PATCH v3 02/15] KVM: TDX: Wrap mapping of leaf and non-leaf S-EPT entries into helpers Yan Zhao
2026-05-28 8:11 ` [PATCH v3 03/15] KVM: x86/mmu: Fold set_external_spte_present() into its sole caller Yan Zhao
2026-05-28 8:11 ` [PATCH v3 04/15] KVM: x86/mmu: Plumb param "old_spte" into kvm_x86_ops.set_external_spte() Yan Zhao
2026-05-28 8:11 ` [PATCH v3 05/15] KVM: TDX: Move KVM_BUG_ON()s in __tdp_mmu_set_spte_atomic() to TDX code Yan Zhao
2026-05-28 9:45 ` sashiko-bot
2026-05-28 8:11 ` [PATCH v3 06/15] KVM: TDX: Move lockdep assert " Yan Zhao
2026-05-28 8:12 ` [PATCH v3 07/15] KVM: x86/tdp_mmu: Morph !is_frozen_spte() check into a KVM_MMU_WARN_ON() Yan Zhao
2026-05-28 8:12 ` [PATCH v3 08/15] KVM: x86/mmu: Plumb "sp" _pointer_ into the TDP MMU's handle_changed_spte() Yan Zhao
2026-05-28 8:12 ` [PATCH v3 09/15] KVM: x86/tdp_mmu: Centrally propagate to-present/atomic zap updates to external PTEs Yan Zhao
2026-05-28 9:52 ` sashiko-bot [this message]
2026-05-28 8:12 ` [PATCH v3 10/15] KVM: x86/mmu: Drop KVM_BUG_ON() on shared lock to zap child " Yan Zhao
2026-05-28 8:12 ` [PATCH v3 11/15] KVM: TDX: Hoist tdx_sept_remove_private_spte() above set_private_spte() Yan Zhao
2026-05-28 8:12 ` [PATCH v3 12/15] KVM: TDX: Drop kvm_x86_ops.remove_external_spte() Yan Zhao
2026-05-28 8:13 ` [PATCH v3 13/15] KVM: TDX: Rename tdx_sept_remove_private_spte() to show it's for leaf SPTEs Yan Zhao
2026-05-28 8:13 ` [PATCH v3 14/15] KVM: x86: Move error handling inside free_external_spt() Yan Zhao
2026-05-28 8:13 ` [PATCH v3 15/15] KVM: TDX: Move external page table freeing to TDX code Yan Zhao
2026-05-28 13:03 ` [PATCH v3 00/15] TDX MMU refactors Sean Christopherson
2026-05-29 5:34 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260528095251.669351F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.