public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  Kiryl Shutsemau <kas@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org,  linux-coco@lists.linux.dev,
	kvm@vger.kernel.org,  Kai Huang <kai.huang@intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	 Vishal Annapurve <vannapurve@google.com>,
	Ackerley Tng <ackerleytng@google.com>,
	 Sagi Shahar <sagis@google.com>,
	Binbin Wu <binbin.wu@linux.intel.com>,
	 Xiaoyao Li <xiaoyao.li@intel.com>,
	Isaku Yamahata <isaku.yamahata@intel.com>
Subject: Re: [RFC PATCH v5 08/45] KVM: x86/mmu: Propagate mirror SPTE removal to S-EPT in handle_changed_spte()
Date: Thu, 5 Feb 2026 14:33:16 -0800	[thread overview]
Message-ID: <aYUarHf3KEwHGuJe@google.com> (raw)
In-Reply-To: <aYQtIK/Lq5T3ad6V@yzhao56-desk.sh.intel.com>

On Thu, Feb 05, 2026, Yan Zhao wrote:
> On Wed, Feb 04, 2026 at 06:23:38PM -0800, Sean Christopherson wrote:
> > On Wed, Feb 04, 2026, Yan Zhao wrote:
> > > On Wed, Jan 28, 2026 at 05:14:40PM -0800, Sean Christopherson wrote:
> > > > @@ -590,10 +566,21 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> > > >  	 * the paging structure.  Note the WARN on the PFN changing without the
> > > >  	 * SPTE being converted to a hugepage (leaf) or being zapped.  Shadow
> > > >  	 * pages are kernel allocations and should never be migrated.
> > > > +	 *
> > > > +	 * When removing leaf entries from a mirror, immediately propagate the
> > > > +	 * changes to the external page tables.  Note, non-leaf mirror entries
> > > > +	 * are handled by handle_removed_pt(), as TDX requires that all leaf
> > > > +	 * entries are removed before the owning page table.  Note #2, writes
> > > > +	 * to make mirror PTEs shadow-present are propagated to external page
> > > > +	 * tables by __tdp_mmu_set_spte_atomic(), as KVM needs to ensure the
> > > > +	 * external page table was successfully updated before marking the
> > > > +	 * mirror SPTE present.
> > > >  	 */
> > > >  	if (was_present && !was_leaf &&
> > > >  	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
> > > >  		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
> > > > +	else if (was_leaf && is_mirror_sptep(sptep) && !is_leaf)
> > > Should we check !is_present instead of !is_leaf?
> > > e.g. a transition from a present leaf entry to a present non-leaf entry could
> > > also trigger this if case.
> > 
> > No, the !is_leaf check is very intentional.  At this point in the series, S-EPT
> > doesn't support hugepages.  If KVM manages to install a leaf SPTE and replaces
> > that SPTE with a non-leaf SPTE, then we absolutely want the KVM_BUG_ON() in
> > tdx_sept_remove_private_spte() to fire:
> > 
> > 	/* TODO: handle large pages. */
> > 	if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm))
> > 		return -EIO;
> But the op is named remove_external_spte().
> And the check of "level != PG_LEVEL_4K" is for removing large leaf entries.

I agree that the naming at this point in the series is unfortunate, but I don't
see it as outright wrong.  That the TDP MMU could theoretically replace the leaf
SPTE with a non-leaf SPTE doesn't change the fact that the old leaf SPTE *is*
being removed.

> Relying on this check is tricky and confusing.

If it's still confusing at the end of the series, then I'm happy to discuss how
we can make it less confusion.  But as of this point in the series, I unfortunately
don't see a better way to achieve my end goals (reducing the number of kvm_x86_ops
hooks, and reducing how many TDX specific details bleed into common MMU code).

There are "different" ways to incrementally move from where were at today, to where
I want KVM to be, but I don't see them as "better".  I.e. AFAICT, there's no way
to move incrementally with reviewable patches while also maintaining perfect/ideal
naming and flow.

> > And then later on, when S-EPT gains support for hugepages, "KVM: TDX: Add core
> > support for splitting/demoting 2MiB S-EPT to 4KiB" doesn't need to touch code
> > outside of arch/x86/kvm/vmx/tdx.c, because everything has already been plumbed
> > in.
> I haven't looked at the later patches for huge pages,

Please do.  As above, I don't think it's realistic to completely avoid some amount
of "eww" in the intermediate stages.

> but plumbing here directly for splitting does not look right when it's
> invoked under shared mmu_lock.
> See the comment below.
>  
> > > Besides, need "KVM_BUG_ON(shared, kvm)" in this case.
> > 
> > Eh, we have lockdep_assert_held_write() in the S-EPT paths that require mmu_lock
> > to be held for write.  I don't think a KVM_BUG_ON() here would add meaningful
> > value.
> Hmm, I think KVM_BUG_ON(shared, kvm) is still useful.
> If KVM invokes remove_external_spte() under shared mmu_lock, it needs to freeze
> the entry first, similar to the sequence in __tdp_mmu_set_spte_atomic().
> 
> i.e., invoking external x86 ops in handle_changed_spte() for mirror roots should
> be !shared only.

Sure, but...

> Relying on the TDX code's lockdep_assert_held_write() for warning seems less
> clear than having an explicit check here.

...that's TDX's responsibility to enforce, and I don't see any justification for
something more than a lockdep assertion.  As I've said elsewhere, several times,
at some point we have to commit to getting the code right.  Adding KVM_BUG_ON() in
Every. Single. Call. does not yield more maintainable code.  There are myriad
things KVM can screw up, many of which have far, far more harmful impact than
calling an S-EPT hook with mmu_lock held for read instead of write.

The bar for adding a KVM_BUG_ON() is not simply "this shouldn't happen".  It's,
this shouldn't happen *and* at least one of (not a complete list):

  - we've either screwed this up badly more than once
  - it's really hard to get right
  - we might not notice if we do screw it up
  - KVM might corrupt data if we continue on

  reply	other threads:[~2026-02-05 22:33 UTC|newest]

Thread overview: 148+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29  1:14 [RFC PATCH v5 00/45] TDX: Dynamic PAMT + S-EPT Hugepage Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 01/45] x86/tdx: Use pg_level in TDX APIs, not the TDX-Module's 0-based level Sean Christopherson
2026-01-29 17:37   ` Dave Hansen
2026-01-29  1:14 ` [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails" Sean Christopherson
2026-01-29 22:10   ` Edgecombe, Rick P
2026-01-29 22:23     ` Sean Christopherson
2026-01-29 22:48       ` Edgecombe, Rick P
2026-02-03  8:48   ` Yan Zhao
2026-02-03 10:30   ` Huang, Kai
2026-02-03 20:06     ` Sean Christopherson
2026-02-03 21:34       ` Huang, Kai
2026-01-29  1:14 ` [RFC PATCH v5 03/45] KVM: TDX: Account all non-transient page allocations for per-TD structures Sean Christopherson
2026-01-29 22:15   ` Edgecombe, Rick P
2026-02-03 10:36   ` Huang, Kai
2026-01-29  1:14 ` [RFC PATCH v5 04/45] KVM: x86: Make "external SPTE" ops that can fail RET0 static calls Sean Christopherson
2026-01-29 22:20   ` Edgecombe, Rick P
2026-01-30  1:28     ` Sean Christopherson
2026-01-30 17:32       ` Edgecombe, Rick P
2026-02-03 10:44         ` Huang, Kai
2026-02-04  1:16         ` Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 05/45] KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use .set_external_spte() for all Sean Christopherson
2026-01-30 23:55   ` Edgecombe, Rick P
2026-02-03 10:19   ` Yan Zhao
2026-02-03 20:05     ` Sean Christopherson
2026-02-04  6:41       ` Yan Zhao
2026-02-05 23:14         ` Sean Christopherson
2026-02-06  2:27           ` Yan Zhao
2026-02-18 19:37       ` Edgecombe, Rick P
2026-02-20 17:36         ` Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 06/45] KVM: x86/mmu: Fold set_external_spte_present() into its sole caller Sean Christopherson
2026-02-04  7:38   ` Yan Zhao
2026-02-05 23:06     ` Sean Christopherson
2026-02-06  2:29       ` Yan Zhao
2026-01-29  1:14 ` [RFC PATCH v5 07/45] KVM: x86/mmu: Plumb the SPTE _pointer_ into the TDP MMU's handle_changed_spte() Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 08/45] KVM: x86/mmu: Propagate mirror SPTE removal to S-EPT in handle_changed_spte() Sean Christopherson
2026-02-04  9:06   ` Yan Zhao
2026-02-05  2:23     ` Sean Christopherson
2026-02-05  5:39       ` Yan Zhao
2026-02-05 22:33         ` Sean Christopherson [this message]
2026-02-06  2:17           ` Yan Zhao
2026-02-06 17:41             ` Sean Christopherson
2026-02-10 10:54               ` Yan Zhao
2026-02-10 19:52                 ` Sean Christopherson
2026-02-11  2:16                   ` Yan Zhao
2026-02-14  0:36                     ` Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 09/45] KVM: x86: Rework .free_external_spt() into .reclaim_external_sp() Sean Christopherson
2026-02-04  9:45   ` Yan Zhao
2026-02-05  7:04     ` Yan Zhao
2026-02-05 22:38       ` Sean Christopherson
2026-02-06  2:30         ` Yan Zhao
2026-01-29  1:14 ` [RFC PATCH v5 10/45] x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h> Sean Christopherson
2026-01-29 18:13   ` Dave Hansen
2026-01-29  1:14 ` [RFC PATCH v5 11/45] x86/tdx: Add helpers to check return status codes Sean Christopherson
2026-01-29 18:58   ` Dave Hansen
2026-01-29 20:35     ` Sean Christopherson
2026-01-30  0:36       ` Edgecombe, Rick P
2026-02-03 20:32         ` Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 12/45] x86/virt/tdx: Simplify tdmr_get_pamt_sz() Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 13/45] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 14/45] x86/virt/tdx: Allocate reference counters for PAMT memory Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 15/45] x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 16/45] x86/virt/tdx: Add tdx_alloc/free_control_page() helpers Sean Christopherson
2026-01-30  1:30   ` Sean Christopherson
2026-02-05  6:11   ` Yan Zhao
2026-02-05 22:35     ` Sean Christopherson
2026-02-06  2:32       ` Yan Zhao
2026-02-10 17:44   ` Dave Hansen
2026-02-10 22:15     ` Edgecombe, Rick P
2026-02-10 22:19       ` Dave Hansen
2026-02-10 22:46         ` Huang, Kai
2026-02-10 22:50           ` Dave Hansen
2026-02-10 23:02             ` Huang, Kai
2026-02-11  0:50     ` Edgecombe, Rick P
2026-01-29  1:14 ` [RFC PATCH v5 17/45] x86/virt/tdx: Optimize " Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 18/45] KVM: TDX: Allocate PAMT memory for TD and vCPU control structures Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page allocator Sean Christopherson
2026-02-03 10:56   ` Huang, Kai
2026-02-03 20:12     ` Sean Christopherson
2026-02-03 20:33       ` Edgecombe, Rick P
2026-02-03 21:17         ` Sean Christopherson
2026-02-03 21:29       ` Huang, Kai
2026-02-04  2:16         ` Sean Christopherson
2026-02-04  6:45           ` Huang, Kai
2026-01-29  1:14 ` [RFC PATCH v5 20/45] KVM: x86/mmu: Allocate/free S-EPT pages using tdx_{alloc,free}_control_page() Sean Christopherson
2026-02-03 11:16   ` Huang, Kai
2026-02-03 20:17     ` Sean Christopherson
2026-02-03 21:18       ` Huang, Kai
2026-02-06  9:48   ` Yan Zhao
2026-02-06 15:01     ` Sean Christopherson
2026-02-09  9:25       ` Yan Zhao
2026-02-09 23:20         ` Sean Christopherson
2026-02-10  8:30           ` Yan Zhao
2026-02-10  0:07         ` Dave Hansen
2026-02-10  1:40           ` Yan Zhao
2026-02-09 10:41       ` Huang, Kai
2026-02-09 22:44         ` Sean Christopherson
2026-02-10 10:54           ` Huang, Kai
2026-02-09 23:40       ` Dave Hansen
2026-02-10  0:03         ` Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 21/45] x86/tdx: Add APIs to support get/put of DPAMT entries from KVM, under spinlock Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 22/45] KVM: TDX: Get/put PAMT pages when (un)mapping private memory Sean Christopherson
2026-02-06 10:20   ` Yan Zhao
2026-02-06 16:03     ` Sean Christopherson
2026-02-06 19:27       ` Edgecombe, Rick P
2026-02-06 23:18         ` Sean Christopherson
2026-02-06 23:19           ` Edgecombe, Rick P
2026-02-09 10:33           ` Huang, Kai
2026-02-09 17:08             ` Edgecombe, Rick P
2026-02-09 21:05               ` Huang, Kai
2026-01-29  1:14 ` [RFC PATCH v5 23/45] x86/virt/tdx: Enable Dynamic PAMT Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 24/45] Documentation/x86: Add documentation for TDX's " Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 25/45] *** DO NOT MERGE *** x86/virt/tdx: Don't assume guest memory is backed by struct page Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 26/45] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages Sean Christopherson
2026-01-29  1:14 ` [RFC PATCH v5 27/45] x86/virt/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate " Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 28/45] x86/virt/tdx: Extend "reset page" quirk to support " Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 29/45] x86/virt/tdx: Get/Put DPAMT page pair if and only if mapping size is 4KB Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 30/45] x86/virt/tdx: Add API to demote a 2MB mapping to 512 4KB mappings Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 31/45] KVM: x86/mmu: Prevent hugepage promotion for mirror roots in fault path Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 32/45] KVM: x86/mmu: Plumb the old_spte into kvm_x86_ops.set_external_spte() Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 33/45] KVM: TDX: Hoist tdx_sept_remove_private_spte() above set_private_spte() Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 34/45] KVM: TDX: Handle removal of leaf SPTEs in .set_private_spte() Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 35/45] KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 36/45] KVM: TDX: Move S-EPT page demotion TODO to tdx_sept_set_private_spte() Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 37/45] KVM: x86/tdp_mmu: Alloc external_spt page for mirror page table splitting Sean Christopherson
2026-02-06 10:07   ` Yan Zhao
2026-02-06 16:09     ` Sean Christopherson
2026-02-11  9:49       ` Yan Zhao
2026-01-29  1:15 ` [RFC PATCH v5 38/45] KVM: x86/mmu: Add Dynamic PAMT support in TDP MMU for vCPU-induced page split Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 39/45] KVM: TDX: Add core support for splitting/demoting 2MiB S-EPT to 4KiB Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 40/45] KVM: x86: Introduce hugepage_set_guest_inhibit() Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 41/45] KVM: TDX: Honor the guest's accept level contained in an EPT violation Sean Christopherson
2026-01-29 15:32   ` Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 42/45] KVM: guest_memfd: Add helpers to get start/end gfns give gmem+slot+pgoff Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 43/45] *** DO NOT MERGE *** KVM: guest_memfd: Add pre-zap arch hook for shared<=>private conversion Sean Christopherson
2026-02-13  7:23   ` Huang, Kai
2026-01-29  1:15 ` [RFC PATCH v5 44/45] KVM: x86/mmu: Add support for splitting S-EPT hugepages on conversion Sean Christopherson
2026-01-29 15:39   ` Sean Christopherson
2026-02-11  8:43     ` Yan Zhao
2026-02-13 15:09       ` Sean Christopherson
2026-02-06 10:14   ` Yan Zhao
2026-02-06 14:46     ` Sean Christopherson
2026-01-29  1:15 ` [RFC PATCH v5 45/45] KVM: TDX: Turn on PG_LEVEL_2M Sean Christopherson
2026-01-29 17:13 ` [RFC PATCH v5 00/45] TDX: Dynamic PAMT + S-EPT Hugepage Konrad Rzeszutek Wilk
2026-01-29 17:17   ` Dave Hansen
2026-02-04 14:38   ` Sean Christopherson
2026-02-04 15:09     ` Dave Hansen
2026-02-05 15:53       ` Sean Christopherson
2026-02-05 16:01         ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYUarHf3KEwHGuJe@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sagis@google.com \
    --cc=tglx@kernel.org \
    --cc=vannapurve@google.com \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox