From: Sean Christopherson <seanjc@google.com>
To: Kai Huang <kai.huang@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
Xiaoyao Li <xiaoyao.li@intel.com>,
Yan Y Zhao <yan.y.zhao@intel.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"kas@kernel.org" <kas@kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"ackerleytng@google.com" <ackerleytng@google.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Isaku Yamahata <isaku.yamahata@intel.com>,
"sagis@google.com" <sagis@google.com>,
"tglx@kernel.org" <tglx@kernel.org>,
Rick P Edgecombe <rick.p.edgecombe@intel.com>,
"bp@alien8.de" <bp@alien8.de>,
Vishal Annapurve <vannapurve@google.com>,
"x86@kernel.org" <x86@kernel.org>
Subject: Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page allocator
Date: Tue, 3 Feb 2026 18:16:13 -0800 [thread overview]
Message-ID: <aYKr7XODY-p6YLYa@google.com> (raw)
In-Reply-To: <a2bf6a8d9f9b61ae7264afc37d9925cf2e1f3ea9.camel@intel.com>
On Tue, Feb 03, 2026, Kai Huang wrote:
> On Tue, 2026-02-03 at 12:12 -0800, Sean Christopherson wrote:
> > On Tue, Feb 03, 2026, Kai Huang wrote:
> > > On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote:
> > > > Extend "struct kvm_mmu_memory_cache" to support a custom page allocator
> > > > so that x86's TDX can update per-page metadata on allocation and free().
> > > >
> > > > Name the allocator page_get() to align with __get_free_page(), e.g. to
> > > > communicate that it returns an "unsigned long", not a "struct page", and
> > > > to avoid collisions with macros, e.g. with alloc_page.
> > > >
> > > > Suggested-by: Kai Huang <kai.huang@intel.com>
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > >
> > > I thought it could be more generic for allocating an object, but not just a
> > > page.
> > >
> > > E.g., I thought we might be able to use it to allocate a structure which has
> > > "pair of DPAMT pages" so it could be assigned to 'struct kvm_mmu_page'. But
> > > it seems you abandoned this idea. May I ask why? Just want to understand
> > > the reasoning here.
> >
> > Because that requires more complexity and there's no known use case, and I don't
> > see an obvious way for a use case to come along. All of the motiviations for a
> > custom allocation scheme that I can think of apply only to full pages, or fit
> > nicely in a kmem_cache.
> >
> > Specifically, the "cache" logic is already bifurcated between "kmem_cache' and
> > "page" usage. Further splitting the "page" case doesn't require modifications to
> > the "kmem_cache" case, whereas providing a fully generic solution would require
> > additional changes, e.g. to handle this code:
> >
> > page = (void *)__get_free_page(gfp_flags);
> > if (page && mc->init_value)
> > memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64));
> >
> > It certainly wouldn't be much complexity, but this code is already a bit awkward,
> > so I don't think it makes sense to add support for something that will probably
> > never be used.
>
> For this particular piece of code, we can add a helper for allocating normal
> page table pages, get rid of mc->init_value completely and hook mc-page_get()
> to that helper.
Hmm, I like the idea, but I don't think it would be a net positive. In practice,
x86's "normal" page tables stop being normal, because KVM now initializes all
SPTEs with BIT(63)=1 on x86-64. And that would also incur an extra RETPOLINE on
all those allocations.
> A bonus is we can then call that helper in all places when KVM needs to
> allocate a page for normal page table instead of just calling
> get_zerod_pages() directly, e.g., like the one in
> tdp_mmu_alloc_sp_for_split(),
Huh. Actually, that's a bug, but not the one you probably expect. At a glance,
it looks like KVM incorrectly zeroing the page instead of initializing it with
SHADOW_NONPRESENT_VALUE. But it's actually a "performance" bug, because KVM
doesn't actually need to pre-initialize the page: either the page will never be
used, or every SPTE will be initialized as a child SPTE.
So that one _should_ be different, e.g. should be:
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index a32192c35099..36afd67601fc 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1456,7 +1456,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
if (!sp)
return NULL;
- sp->spt = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+ sp->spt = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!sp->spt)
goto err_spt;
> so that we can have a consistent way for allocating normal page table pages.
next prev parent reply other threads:[~2026-02-04 2:16 UTC|newest]
Thread overview: 152+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-29 1:14 [RFC PATCH v5 00/45] TDX: Dynamic PAMT + S-EPT Hugepage Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 01/45] x86/tdx: Use pg_level in TDX APIs, not the TDX-Module's 0-based level Sean Christopherson
2026-01-29 17:37 ` Dave Hansen
2026-01-29 1:14 ` [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails" Sean Christopherson
2026-01-29 22:10 ` Edgecombe, Rick P
2026-01-29 22:23 ` Sean Christopherson
2026-01-29 22:48 ` Edgecombe, Rick P
2026-02-03 8:48 ` Yan Zhao
2026-02-03 10:30 ` Huang, Kai
2026-02-03 20:06 ` Sean Christopherson
2026-02-03 21:34 ` Huang, Kai
2026-01-29 1:14 ` [RFC PATCH v5 03/45] KVM: TDX: Account all non-transient page allocations for per-TD structures Sean Christopherson
2026-01-29 22:15 ` Edgecombe, Rick P
2026-02-03 10:36 ` Huang, Kai
2026-01-29 1:14 ` [RFC PATCH v5 04/45] KVM: x86: Make "external SPTE" ops that can fail RET0 static calls Sean Christopherson
2026-01-29 22:20 ` Edgecombe, Rick P
2026-01-30 1:28 ` Sean Christopherson
2026-01-30 17:32 ` Edgecombe, Rick P
2026-02-03 10:44 ` Huang, Kai
2026-02-04 1:16 ` Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 05/45] KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use .set_external_spte() for all Sean Christopherson
2026-01-30 23:55 ` Edgecombe, Rick P
2026-02-03 10:19 ` Yan Zhao
2026-02-03 20:05 ` Sean Christopherson
2026-02-04 6:41 ` Yan Zhao
2026-02-05 23:14 ` Sean Christopherson
2026-02-06 2:27 ` Yan Zhao
2026-02-18 19:37 ` Edgecombe, Rick P
2026-02-20 17:36 ` Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 06/45] KVM: x86/mmu: Fold set_external_spte_present() into its sole caller Sean Christopherson
2026-02-04 7:38 ` Yan Zhao
2026-02-05 23:06 ` Sean Christopherson
2026-02-06 2:29 ` Yan Zhao
2026-01-29 1:14 ` [RFC PATCH v5 07/45] KVM: x86/mmu: Plumb the SPTE _pointer_ into the TDP MMU's handle_changed_spte() Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 08/45] KVM: x86/mmu: Propagate mirror SPTE removal to S-EPT in handle_changed_spte() Sean Christopherson
2026-02-04 9:06 ` Yan Zhao
2026-02-05 2:23 ` Sean Christopherson
2026-02-05 5:39 ` Yan Zhao
2026-02-05 22:33 ` Sean Christopherson
2026-02-06 2:17 ` Yan Zhao
2026-02-06 17:41 ` Sean Christopherson
2026-02-10 10:54 ` Yan Zhao
2026-02-10 19:52 ` Sean Christopherson
2026-02-11 2:16 ` Yan Zhao
2026-02-14 0:36 ` Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 09/45] KVM: x86: Rework .free_external_spt() into .reclaim_external_sp() Sean Christopherson
2026-02-04 9:45 ` Yan Zhao
2026-02-05 7:04 ` Yan Zhao
2026-02-05 22:38 ` Sean Christopherson
2026-02-06 2:30 ` Yan Zhao
2026-01-29 1:14 ` [RFC PATCH v5 10/45] x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h> Sean Christopherson
2026-01-29 18:13 ` Dave Hansen
2026-01-29 1:14 ` [RFC PATCH v5 11/45] x86/tdx: Add helpers to check return status codes Sean Christopherson
2026-01-29 18:58 ` Dave Hansen
2026-01-29 20:35 ` Sean Christopherson
2026-01-30 0:36 ` Edgecombe, Rick P
2026-02-03 20:32 ` Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 12/45] x86/virt/tdx: Simplify tdmr_get_pamt_sz() Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 13/45] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 14/45] x86/virt/tdx: Allocate reference counters for PAMT memory Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 15/45] x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 16/45] x86/virt/tdx: Add tdx_alloc/free_control_page() helpers Sean Christopherson
2026-01-30 1:30 ` Sean Christopherson
2026-02-05 6:11 ` Yan Zhao
2026-02-05 22:35 ` Sean Christopherson
2026-02-06 2:32 ` Yan Zhao
2026-02-10 17:44 ` Dave Hansen
2026-02-10 22:15 ` Edgecombe, Rick P
2026-02-10 22:19 ` Dave Hansen
2026-02-10 22:46 ` Huang, Kai
2026-02-10 22:50 ` Dave Hansen
2026-02-10 23:02 ` Huang, Kai
2026-02-11 0:50 ` Edgecombe, Rick P
2026-01-29 1:14 ` [RFC PATCH v5 17/45] x86/virt/tdx: Optimize " Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 18/45] KVM: TDX: Allocate PAMT memory for TD and vCPU control structures Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page allocator Sean Christopherson
2026-02-03 10:56 ` Huang, Kai
2026-02-03 20:12 ` Sean Christopherson
2026-02-03 20:33 ` Edgecombe, Rick P
2026-02-03 21:17 ` Sean Christopherson
2026-02-03 21:29 ` Huang, Kai
2026-02-04 2:16 ` Sean Christopherson [this message]
2026-02-04 6:45 ` Huang, Kai
2026-01-29 1:14 ` [RFC PATCH v5 20/45] KVM: x86/mmu: Allocate/free S-EPT pages using tdx_{alloc,free}_control_page() Sean Christopherson
2026-02-03 11:16 ` Huang, Kai
2026-02-03 20:17 ` Sean Christopherson
2026-02-03 21:18 ` Huang, Kai
2026-02-06 9:48 ` Yan Zhao
2026-02-06 15:01 ` Sean Christopherson
2026-02-09 9:25 ` Yan Zhao
2026-02-09 23:20 ` Sean Christopherson
2026-02-10 8:30 ` Yan Zhao
2026-02-10 0:07 ` Dave Hansen
2026-02-10 1:40 ` Yan Zhao
2026-02-09 10:41 ` Huang, Kai
2026-02-09 22:44 ` Sean Christopherson
2026-02-10 10:54 ` Huang, Kai
2026-02-09 23:40 ` Dave Hansen
2026-02-10 0:03 ` Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 21/45] x86/tdx: Add APIs to support get/put of DPAMT entries from KVM, under spinlock Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 22/45] KVM: TDX: Get/put PAMT pages when (un)mapping private memory Sean Christopherson
2026-02-06 10:20 ` Yan Zhao
2026-02-06 16:03 ` Sean Christopherson
2026-02-06 19:27 ` Edgecombe, Rick P
2026-02-06 23:18 ` Sean Christopherson
2026-02-06 23:19 ` Edgecombe, Rick P
2026-02-09 10:33 ` Huang, Kai
2026-02-09 17:08 ` Edgecombe, Rick P
2026-02-09 21:05 ` Huang, Kai
2026-01-29 1:14 ` [RFC PATCH v5 23/45] x86/virt/tdx: Enable Dynamic PAMT Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 24/45] Documentation/x86: Add documentation for TDX's " Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 25/45] *** DO NOT MERGE *** x86/virt/tdx: Don't assume guest memory is backed by struct page Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 26/45] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages Sean Christopherson
2026-01-29 1:14 ` [RFC PATCH v5 27/45] x86/virt/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate " Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 28/45] x86/virt/tdx: Extend "reset page" quirk to support " Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 29/45] x86/virt/tdx: Get/Put DPAMT page pair if and only if mapping size is 4KB Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 30/45] x86/virt/tdx: Add API to demote a 2MB mapping to 512 4KB mappings Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 31/45] KVM: x86/mmu: Prevent hugepage promotion for mirror roots in fault path Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 32/45] KVM: x86/mmu: Plumb the old_spte into kvm_x86_ops.set_external_spte() Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 33/45] KVM: TDX: Hoist tdx_sept_remove_private_spte() above set_private_spte() Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 34/45] KVM: TDX: Handle removal of leaf SPTEs in .set_private_spte() Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 35/45] KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 36/45] KVM: TDX: Move S-EPT page demotion TODO to tdx_sept_set_private_spte() Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 37/45] KVM: x86/tdp_mmu: Alloc external_spt page for mirror page table splitting Sean Christopherson
2026-02-06 10:07 ` Yan Zhao
2026-02-06 16:09 ` Sean Christopherson
2026-02-11 9:49 ` Yan Zhao
2026-01-29 1:15 ` [RFC PATCH v5 38/45] KVM: x86/mmu: Add Dynamic PAMT support in TDP MMU for vCPU-induced page split Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 39/45] KVM: TDX: Add core support for splitting/demoting 2MiB S-EPT to 4KiB Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 40/45] KVM: x86: Introduce hugepage_set_guest_inhibit() Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 41/45] KVM: TDX: Honor the guest's accept level contained in an EPT violation Sean Christopherson
2026-01-29 15:32 ` Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 42/45] KVM: guest_memfd: Add helpers to get start/end gfns give gmem+slot+pgoff Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 43/45] *** DO NOT MERGE *** KVM: guest_memfd: Add pre-zap arch hook for shared<=>private conversion Sean Christopherson
2026-02-13 7:23 ` Huang, Kai
2026-01-29 1:15 ` [RFC PATCH v5 44/45] KVM: x86/mmu: Add support for splitting S-EPT hugepages on conversion Sean Christopherson
2026-01-29 15:39 ` Sean Christopherson
2026-02-11 8:43 ` Yan Zhao
2026-02-13 15:09 ` Sean Christopherson
2026-02-06 10:14 ` Yan Zhao
2026-02-06 14:46 ` Sean Christopherson
2026-01-29 1:15 ` [RFC PATCH v5 45/45] KVM: TDX: Turn on PG_LEVEL_2M Sean Christopherson
2026-01-29 17:13 ` [RFC PATCH v5 00/45] TDX: Dynamic PAMT + S-EPT Hugepage Konrad Rzeszutek Wilk
2026-01-29 17:17 ` Dave Hansen
2026-02-04 14:38 ` Sean Christopherson
2026-02-04 15:09 ` Dave Hansen
2026-02-05 15:53 ` Sean Christopherson
2026-02-05 16:01 ` Dave Hansen
2026-04-15 21:48 ` Edgecombe, Rick P
2026-04-17 16:59 ` Sean Christopherson
2026-04-17 20:01 ` Edgecombe, Rick P
2026-05-19 0:40 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYKr7XODY-p6YLYa@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=sagis@google.com \
--cc=tglx@kernel.org \
--cc=vannapurve@google.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.