From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "seanjc@google.com" <seanjc@google.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
"Huang, Kai" <kai.huang@intel.com>,
"Li, Xiaoyao" <xiaoyao.li@intel.com>,
"Hansen, Dave" <dave.hansen@intel.com>,
"Zhao, Yan Y" <yan.y.zhao@intel.com>,
"Wu, Binbin" <binbin.wu@intel.com>,
"kas@kernel.org" <kas@kernel.org>,
"binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"Yamahata, Isaku" <isaku.yamahata@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Annapurve, Vishal" <vannapurve@google.com>,
"Gao, Chao" <chao.gao@intel.com>, "bp@alien8.de" <bp@alien8.de>,
"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v4 07/16] x86/virt/tdx: Add tdx_alloc/free_page() helpers
Date: Thu, 29 Jan 2026 17:18:58 +0000 [thread overview]
Message-ID: <9096e7a47742f4a46a7f400aac467ac78e1dfe50.camel@intel.com> (raw)
In-Reply-To: <aXq1qPYTR8vpJfc9@google.com>
On Wed, 2026-01-28 at 17:19 -0800, Sean Christopherson wrote:
> Honestly, the entire scheme is a mess. Four days of staring at this
> and I finally undertand what the code is doing. The whole "struct
> tdx_module_array_args" union is completely unnecessary, the resulting
> args.args crud is ugly, having a pile of duplicate accessors is
> brittle, the code obfuscates a simple concept, and the end result
> doesn't provide any actual protection since the kernel will happily
> overflow the buffer after the WARN.
The original sin for this, as was spotted by Nikilay in v3, is actually
that it turns out that the whole variable length thing was intended to
give the TDX module flexibility *if* it wanted to increase it in the
future. As in it's not required today. Worse, whether it would actually
grow in the specific way the code assumes is not covered in the spec.
Apparently it was based on some past internal discussions. So the
agreement on v3 was to just support the fixed two page size in the
spec.
Here was the end of that thread:
https://lore.kernel.org/kvm/da3701ea-08ea-45c9-94a8-355205a45f8e@intel.com/
This simplifies the whole thing, no union, no worse case allocations,
etc. I'm just getting back and going through mail so will check out
your full solution. (Thanks!) But from the below I think the fixed
array size code will be better still.
>
> It's also relying on the developer to correctly copy+paste the same
> register in multiple locations: ~5 depending on how you want to
> count.
>
> static u64 *dpamt_args_array_ptr_r12(struct tdx_module_array_args
> *args)
> #1
> {
> WARN_ON_ONCE(tdx_dpamt_entry_pages() > MAX_TDX_ARGS(r12));
> #2
>
> return &args->args_array[TDX_ARG_INDEX(r12)];
> #3
>
>
> u64 guest_memory_pamt_page[MAX_TDX_ARGS(r12)];
> #4
>
>
> u64 *args_array = dpamt_args_array_ptr_r12(&args);
> #5
Yea it could probably use another DEFINE or two to make it less error
prone. Vanilla DPAMT has 4 instances of rdx.
>
> After all of that boilerplate, the caller _still_ has to do the
> actual memcpy(), and for me at least, all of the above makes it
> _harder_ to understand what the code is doing.
>
> Drop the struct+union overlay and just provide a helper with wrappers
> to copy to/from a tdx_module_args structure. It's far from
> bulletproof, but it at least avoids an immediate buffer overflow, and
> defers to the kernel owner with respect to handling uninitialized
> stack data.
>
> /*
> * For SEAMCALLs that pass a bundle of pages, the TDX spec treats the
> registers
> * like an array, as they are ordered in the struct. The effective
> array size
> * is (obviously) limited by the number or registers, relative to the
> starting
> * register. Fill the register array at a given starting register,
> with sanity
> * checks to avoid overflowing the args structure.
> */
> static void dpamt_copy_regs_array(struct tdx_module_args *args, void
> *reg,
> u64 *pamt_pa_array, bool
> copy_to_regs)
> {
> int size = tdx_dpamt_entry_pages() * sizeof(*pamt_pa_array);
>
> if (WARN_ON_ONCE(reg + size > (void *)args) + sizeof(*args))
> return;
>
> /* Copy PAMT page PA's to/from the struct per the TDX ABI.
> */
> if (copy_to_regs)
> memcpy(reg, pamt_pa_array, size);
> else
> memcpy(pamt_pa_array, reg, size);
> }
>
> #define dpamt_copy_from_regs(dst, args, reg) \
> dpamt_copy_regs_array(args, &(args)->reg, dst, false)
>
> #define dpamt_copy_to_regs(args, reg, src) \
> dpamt_copy_regs_array(args, &(args)->reg, src, true)
>
> As far as the on-stack allocations go, why bother being precise?
> Except for paranoid setups which explicitly initialize the stack,
> "allocating" ~48 unused bytes is literally free. Not to mention the
> cost relative to the latency of a SEAMCALL is in the noise.
>
> /*
> * When declaring PAMT arrays on the stack, use the maximum
> theoretical number
> * of entries that can be squeezed into a SEAMCALL, as stack
> allocations are
> * practically free, i.e. any wasted space is a non-issue.
> */
> #define MAX_NR_DPAMT_ARGS (sizeof(struct tdx_module_args) /
> sizeof(u64))
>
>
> With that, callers don't have to regurgitate the same register
> multiple times, and we don't need a new wrapper for every variation
> of SEAMCALL.
> E.g.
>
>
> u64 pamt_pa_array[MAX_NR_DPAMT_ARGS];
>
> ...
>
> bool dpamt = tdx_supports_dynamic_pamt(&tdx_sysinfo) &&
> level == PG_LEVEL_2M;
> u64 pamt_pa_array[MAX_NR_DPAMT_ARGS];
> struct tdx_module_args args = {
> .rcx = gpa | pg_level_to_tdx_sept_level(level),
> .rdx = tdx_tdr_pa(td),
> .r8 = page_to_phys(new_sp),
> };
> u64 ret;
>
> if (!tdx_supports_demote_nointerrupt(&tdx_sysinfo))
> return TDX_SW_ERROR;
>
> if (dpamt) {
> if (alloc_pamt_array(pamt_pa_array, pamt_cache))
> return TDX_SW_ERROR;
>
> dpamt_copy_to_regs(&args, r12, pamt_pa_array);
> }
>
> Which to me is easier to read and much more intuitive than:
>
>
> u64 guest_memory_pamt_page[MAX_TDX_ARGS(r12)];
> struct tdx_module_array_args args = {
> .args.rcx = gpa | pg_level_to_tdx_sept_level(level),
> .args.rdx = tdx_tdr_pa(td),
> .args.r8 = PFN_PHYS(page_to_pfn(new_sp)),
> };
> struct tdx_module_array_args retry_args;
> int i = 0;
> u64 ret;
>
> if (dpamt) {
> u64 *args_array = dpamt_args_array_ptr_r12(&args);
>
> if (alloc_pamt_array(guest_memory_pamt_page,
> pamt_cache))
> return TDX_SW_ERROR;
>
> /*
> * Copy PAMT page PAs of the guest memory into the
> struct per the
> * TDX ABI
> */
> memcpy(args_array, guest_memory_pamt_page,
> tdx_dpamt_entry_pages() *
> sizeof(*args_array));
> }
What you have here is close to what I had done when I first took this
series. But it ran afoul of FORTIFY_SOUCE and required some horrible
casting to trick it. I wonder if this code will hit that issue too.
Dave didn't like the solution and suggested the union actually:
https://lore.kernel.org/kvm/355ad607-52ed-42cc-9a48-63aaa49f4c68@intel.com/#t
I'm aware of your tendency to dislike union based solutions. But since
this was purely contained to tip, I went with Dave's preference.
But I think it's all moot because the fixed size-2 solution doesn't
need union or array copying. They can be just normal tdx_module_args
args.
next prev parent reply other threads:[~2026-01-29 17:19 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-21 0:51 [PATCH v4 00/16] TDX: Enable Dynamic PAMT Rick Edgecombe
2025-11-21 0:51 ` [PATCH v4 01/16] x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h> Rick Edgecombe
2025-11-25 22:30 ` Huang, Kai
2025-11-25 22:44 ` Huang, Kai
2025-11-26 23:15 ` Edgecombe, Rick P
2025-11-26 23:14 ` Edgecombe, Rick P
2025-11-21 0:51 ` [PATCH v4 02/16] x86/tdx: Add helpers to check return status codes Rick Edgecombe
2025-11-24 8:56 ` Binbin Wu
2025-11-24 19:31 ` Edgecombe, Rick P
2025-11-25 23:07 ` Huang, Kai
2025-11-26 23:26 ` Edgecombe, Rick P
2025-11-21 0:51 ` [PATCH v4 03/16] x86/virt/tdx: Simplify tdmr_get_pamt_sz() Rick Edgecombe
2025-11-24 9:26 ` Binbin Wu
2025-11-24 19:47 ` Edgecombe, Rick P
2025-11-25 1:27 ` Binbin Wu
2025-11-21 0:51 ` [PATCH v4 04/16] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Rick Edgecombe
2025-11-25 1:50 ` Binbin Wu
2025-11-26 17:56 ` Edgecombe, Rick P
2025-12-24 9:10 ` Xu Yilun
2026-01-05 22:06 ` Edgecombe, Rick P
2026-01-06 4:01 ` Xu Yilun
2026-01-06 17:00 ` Edgecombe, Rick P
2026-01-07 6:01 ` Xu Yilun
2026-01-07 14:41 ` Edgecombe, Rick P
2026-01-08 12:53 ` Xu Yilun
2026-01-08 16:52 ` Edgecombe, Rick P
2026-01-09 2:18 ` Xu Yilun
2026-01-09 16:05 ` Edgecombe, Rick P
2026-01-12 0:24 ` Xu Yilun
2025-11-21 0:51 ` [PATCH v4 05/16] x86/virt/tdx: Allocate reference counters for PAMT memory Rick Edgecombe
2025-11-21 0:51 ` [PATCH v4 06/16] x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory Rick Edgecombe
2025-11-25 3:15 ` Binbin Wu
2025-11-26 20:47 ` Edgecombe, Rick P
2025-11-27 15:57 ` Kiryl Shutsemau
2025-12-01 22:14 ` Edgecombe, Rick P
2025-11-26 14:45 ` Nikolay Borisov
2025-11-26 20:47 ` Edgecombe, Rick P
2025-11-27 7:36 ` Nikolay Borisov
2025-12-11 0:07 ` Edgecombe, Rick P
2025-11-27 16:04 ` Kiryl Shutsemau
2025-11-21 0:51 ` [PATCH v4 07/16] x86/virt/tdx: Add tdx_alloc/free_page() helpers Rick Edgecombe
2025-11-25 8:09 ` Binbin Wu
2025-11-26 22:28 ` Edgecombe, Rick P
2026-01-29 1:19 ` Sean Christopherson
2026-01-29 17:18 ` Edgecombe, Rick P [this message]
2026-01-29 19:09 ` Sean Christopherson
2026-01-29 19:12 ` Edgecombe, Rick P
2025-11-26 1:21 ` Huang, Kai
2025-11-26 22:28 ` Edgecombe, Rick P
2025-11-27 12:29 ` Nikolay Borisov
2025-12-01 22:31 ` Edgecombe, Rick P
2025-11-27 16:11 ` Nikolay Borisov
2025-12-01 22:39 ` Edgecombe, Rick P
2025-12-02 7:38 ` Nikolay Borisov
2025-12-02 20:02 ` Edgecombe, Rick P
2025-12-03 13:46 ` Kiryl Shutsemau
2025-12-03 13:48 ` Nikolay Borisov
2025-12-03 15:41 ` Dave Hansen
2025-12-03 18:15 ` Edgecombe, Rick P
2025-12-03 18:21 ` Dave Hansen
2025-12-03 19:59 ` Edgecombe, Rick P
2025-12-03 20:13 ` Dave Hansen
2025-12-03 21:39 ` Edgecombe, Rick P
2025-12-03 21:40 ` Dave Hansen
2025-12-08 9:15 ` Yan Zhao
2025-12-08 20:27 ` Edgecombe, Rick P
2026-01-16 23:17 ` Sean Christopherson
2026-01-16 23:25 ` Edgecombe, Rick P
2026-01-16 23:40 ` Dave Hansen
2025-11-21 0:51 ` [PATCH v4 08/16] x86/virt/tdx: Optimize " Rick Edgecombe
2025-11-21 0:51 ` [PATCH v4 09/16] KVM: TDX: Allocate PAMT memory for TD control structures Rick Edgecombe
2025-11-25 9:11 ` Binbin Wu
2025-11-21 0:51 ` [PATCH v4 10/16] KVM: TDX: Allocate PAMT memory for vCPU " Rick Edgecombe
2025-11-25 9:14 ` Binbin Wu
2025-11-21 0:51 ` [PATCH v4 11/16] KVM: TDX: Add x86 ops for external spt cache Rick Edgecombe
2026-01-17 0:53 ` Sean Christopherson
2026-01-19 2:31 ` Yan Zhao
2026-01-20 8:42 ` Huang, Kai
2026-01-20 9:18 ` Yan Zhao
2026-01-20 10:00 ` Huang, Kai
2026-01-20 19:53 ` Edgecombe, Rick P
2026-01-21 22:12 ` Sean Christopherson
2026-01-21 22:34 ` Edgecombe, Rick P
2025-11-21 0:51 ` [PATCH v4 12/16] x86/virt/tdx: Add helpers to allow for pre-allocating pages Rick Edgecombe
2025-11-26 3:40 ` Binbin Wu
2025-11-26 5:21 ` Binbin Wu
2025-11-26 22:33 ` Edgecombe, Rick P
2025-11-27 2:38 ` Binbin Wu
2026-01-20 7:10 ` Huang, Kai
2026-01-20 7:46 ` Yan Zhao
2026-01-20 8:01 ` Huang, Kai
2026-01-17 1:02 ` Sean Christopherson
2026-01-21 0:52 ` Sean Christopherson
2026-01-21 0:58 ` Edgecombe, Rick P
2025-11-21 0:51 ` [PATCH v4 13/16] KVM: TDX: Handle PAMT allocation in fault path Rick Edgecombe
2025-11-26 5:56 ` Binbin Wu
2025-11-26 22:33 ` Edgecombe, Rick P
2025-11-21 0:51 ` [PATCH v4 14/16] KVM: TDX: Reclaim PAMT memory Rick Edgecombe
2025-11-26 8:53 ` Binbin Wu
2025-11-26 22:58 ` Edgecombe, Rick P
2025-11-21 0:51 ` [PATCH v4 15/16] x86/virt/tdx: Enable Dynamic PAMT Rick Edgecombe
2025-11-21 0:51 ` [PATCH v4 16/16] Documentation/x86: Add documentation for TDX's " Rick Edgecombe
2025-11-26 10:33 ` Binbin Wu
2025-11-26 20:05 ` Edgecombe, Rick P
2025-11-24 20:18 ` [PATCH v4 00/16] TDX: Enable " Sagi Shahar
2025-11-25 20:19 ` Vishal Annapurve
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9096e7a47742f4a46a7f400aac467ac78e1dfe50.camel@intel.com \
--to=rick.p.edgecombe@intel.com \
--cc=binbin.wu@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=chao.gao@intel.com \
--cc=dave.hansen@intel.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vannapurve@google.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox