Re: [PATCH v8 07/21] coco/tdx-host: Implement firmware upload sysfs ABI for TDX module updates

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Re: [PATCH v8 07/21] coco/tdx-host: Implement firmware upload sysfs ABI for TDX module updates
       [not found]   ` <06682111-2b63-444e-a431-b8dd0db2b0ab@intel.com>
@ 2026-05-06  2:35     ` Chao Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Chao Gao @ 2026-05-06  2:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Wed, Apr 29, 2026 at 04:17:10PM -0700, Dave Hansen wrote:
>On 4/27/26 08:28, Chao Gao wrote:
>> Linux kernel supports two primary firmware update mechanisms:
>>   - request_firmware()
>>   - firmware upload (or fw_upload)
>
>All the stuff here is good info, but it was hard to extract the
>implementation information from the background.
>
>I think this would do:
>
>	Select fw_upload for doing TDX module updates. The process of
>	selecting among available update images is complicated and
>	nuanced. Punt the selection policy out to userspace.

Agreed. I'll add that as a TL;DR.

>
>...
>> +static int seamldr_init(struct device *dev)
>> +{
>> +	struct fw_upload *tdx_fwl;
>> +
>> +	if (!can_expose_seamldr())
>> +		return 0;
>
>can_expose_seamldr() has a not great name.
>
>Why not just have naming that says:
>
>	if (supports_runtime_update())
>		...
>
>Why abstract this out to what can or can't be exposed?

Sure, I'll use supports_runtime_update().

I originally used can_expose_seamldr() because I was focused on the
VMCS-clobbering erratum, where SEAMLDR sysfs cannot be exposed to
userspace.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 10/21] x86/virt/seamldr: Shut down the current TDX module
       [not found]   ` <0523b07b-2df2-4cf4-bf98-6efe01780698@intel.com>
@ 2026-05-06  2:56     ` Chao Gao
  2026-05-06 20:49       ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Gao @ 2026-05-06  2:56 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Thu, Apr 30, 2026 at 11:52:50AM -0700, Dave Hansen wrote:
>On 4/27/26 08:28, Chao Gao wrote:
>>  static int do_seamldr_install_module(void *seamldr_params)
>>  {
>>  	enum module_update_state newstate, curstate = MODULE_UPDATE_START;
>> +	int cpu = smp_processor_id();
>> +	bool primary;
>>  	int ret = 0;
>>  
>> +	primary = cpumask_first(cpu_online_mask) == cpu;
>
>Isn't cpumask_first(cpu_online_mask)==0, always? I thought CPU 0 could
>never be offlined.

Not always. On x86, CPU 0 can be offlined at runtime if the kernel is booted
with the cpu0_hotplug command-line option. See cpu_hotplug.rst.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 11/21] x86/virt/tdx: Reset software states during TDX module shutdown
       [not found]   ` <20f2d821-bfa6-4db2-a968-b5455c7b5007@intel.com>
@ 2026-05-06  6:21     ` Chao Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Chao Gao @ 2026-05-06  6:21 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Thu, Apr 30, 2026 at 11:58:12AM -0700, Dave Hansen wrote:
>On 4/27/26 08:28, Chao Gao wrote:
>> +	/*
>> +	 * Clear global and per-CPU initialization flags so the new module
>> +	 * can be fully re-initialized after a successful update.
>> +	 *
>> +	 * No locks needed as no concurrent accesses can occur here.
>> +	 */
>> +	tdx_module_initialized = false;
>> +	sysinit_done = false;
>> +	sysinit_ret = 0;
>> +	for_each_possible_cpu(cpu)
>> +		per_cpu(tdx_lp_initialized, cpu) = false;
>
>This speaks to needing refactoring.
>
>If there's global TDX state, it needs to be in a global TDX state
>structure, not scattered across random global variables.
>
>Imagine the structure is:
>
>struct tdx_module_config foo;
>
>That's 0's at boot. You have to init the TDX module to bring it out of
>0's to valid state. It actually means something if you do:
>
>	memset(&foo, 0, sizeof(foo));
>
>Because it takes it right back to its bss state. That ^ is also handy
>because it naturally just works if new state gets added.
>
>Guess what will happen the next time someone adds:
>
>	int sysinit_new_fancy_thing;
>
>Someone will forget to add it to this code. Then the module gets updated
>and breaks things in fun ways.

Makes sense. I will slot in a patch like this:

From 1578bd211c732f7773475819cbf145fe9fedfcb5 Mon Sep 17 00:00:00 2001
From: Chao Gao <chao.gao@intel.com>
Date: Tue, 5 May 2026 22:46:39 -0700
Subject: [PATCH] x86/virt/tdx: Consolidate TDX global initialization state

The kernel uses several global flags to guard one-time TDX initialization
flows and prevent them from being repeated.

When the TDX module is updated, all of this state must be reset so that
the module can be initialized again. Today those flags are kept as separate
global variables, which makes the reset path awkward and easy to miss when
new state is added.

Group the flags into a single structure so they can be reset together, for
example with memset(), and so future state that needs the same handling is
easier to manage.

Drop the __ro_after_init annotation from tdx_module_initialized to make it
consistent with the other two flags.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index c0c6281b08a5..0172b432f229 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -44,6 +44,13 @@
 #include <asm/virt.h>
 #include "tdx.h"
 
+struct tdx_module_state {
+	bool initialized;
+	bool sysinit_done;
+	int sysinit_ret;
+};
+
+static struct tdx_module_state tdx_module_state;
 static u32 tdx_global_keyid __ro_after_init;
 static u32 tdx_guest_keyid_start __ro_after_init;
 static u32 tdx_nr_guest_keyids __ro_after_init;
@@ -58,7 +65,6 @@ static struct tdmr_info_list tdx_tdmr_list;
 static LIST_HEAD(tdx_memlist);
 
 static struct tdx_sys_info tdx_sysinfo __ro_after_init;
-static bool tdx_module_initialized __ro_after_init;
 
 typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args);
 
@@ -113,30 +119,28 @@ static int try_init_module_global(void)
 {
	struct tdx_module_args args = {};
	static DEFINE_RAW_SPINLOCK(sysinit_lock);
-	static bool sysinit_done;
-	static int sysinit_ret;
 
	raw_spin_lock(&sysinit_lock);
 
-	if (sysinit_done)
+	if (tdx_module_state.sysinit_done)
		goto out;
 
	/* RCX is module attributes and all bits are reserved */
	args.rcx = 0;
-	sysinit_ret = seamcall_prerr(TDH_SYS_INIT, &args);
+	tdx_module_state.sysinit_ret = seamcall_prerr(TDH_SYS_INIT, &args);
 
	/*
	 * The first SEAMCALL also detects the TDX module, thus
	 * it can fail due to the TDX module is not loaded.
	 * Dump message to let the user know.
	 */
-	if (sysinit_ret == -ENODEV)
+	if (tdx_module_state.sysinit_ret == -ENODEV)
		pr_err("module not loaded\n");
 
-	sysinit_done = true;
+	tdx_module_state.sysinit_done = true;
 out:
	raw_spin_unlock(&sysinit_lock);
-	return sysinit_ret;
+	return tdx_module_state.sysinit_ret;
 }
 
 /**
@@ -1299,7 +1303,7 @@ static __init int tdx_enable(void)
 
	register_syscore(&tdx_syscore);
 
-	tdx_module_initialized = true;
+	tdx_module_state.initialized = true;
	pr_info("TDX-Module initialized\n");
	return 0;
 }
@@ -1554,7 +1558,7 @@ void __init tdx_init(void)
 
 const struct tdx_sys_info *tdx_get_sysinfo(void)
 {
-	if (!tdx_module_initialized)
+	if (!tdx_module_state.initialized)
		return NULL;
 
	return (const struct tdx_sys_info *)&tdx_sysinfo;
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 15/21] x86/virt/tdx: Refresh TDX module version after update
       [not found]   ` <28be5180-ff74-4e4d-b392-7ba7a9b4c1c0@intel.com>
@ 2026-05-06 12:51     ` Chao Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Chao Gao @ 2026-05-06 12:51 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Thu, Apr 30, 2026 at 12:14:32PM -0700, Dave Hansen wrote:
>On 4/27/26 08:28, Chao Gao wrote:
>> The kernel exposes the TDX module version through sysfs so userspace can
>> check update compatibility. That information needs to remain accurate
>> across runtime updates.
>> 
>> A runtime update may change the module's update_version, so refresh the
>> cached version after a successful update and emit a log message to show
>> the version change.
>> 
>> Drop __ro_after_init from tdx_sysinfo because it is now updated at runtime.
>> 
>> Perform the refresh outside of stop_machine() since printk() within
>> stop_machine() would add significant latency.
>> 
>> Do not refresh the rest of tdx_sysinfo. Refreshing them at runtime could
>> disrupt running software that relies on the previously reported values.
>
>This needs more explanation. I think the reasoning is quite nuanced.
>
>tdx_sysinfo is just a cache of what the TDX module is and can do. If
>that changes, it means the TDX module changed. So you somehow need to
>argue why it's OK to hide those changes from the tdx_sysinfo users.
>
>Why would they be confused by tdx_sysinfo changes but not by the TDX
>module *itself* changing?

Good point. The key assumption here is that module updates are fully backward
compatible, so running software can continue to work without seeing the new
tdx_sysinfo. I will revise the changelog. See below.

>
>> Note that major and minor versions are not refreshed because runtime updates
>> are supported only between releases with identical major and minor versions.
>
>I'd rather have this in code than a changelog comment.
>
>If they can't change then warn if they do.

Maybe I can just drop the note as I don't want to add code to preemptively
catch theoretical module bugs.

I added it because Sashiko pointed out that assigning the whole version struct
outside stop_machine() could allow sysfs readers to observe a partially updated
version. As we don't need to print new module version, I will move that
assignment into stop_machine(), which addresses that issue. After that, there
is no need to mention that major/minor versions are identical across updates.

>
>> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
>> index 98a8d9d3ae25..c81b26c4bac1 100644
>> --- a/arch/x86/virt/vmx/tdx/seamldr.c
>> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
>> @@ -306,6 +306,8 @@ DEFINE_FREE(free_seamldr_params, struct seamldr_params *,
>>   */
>>  int seamldr_install_module(const u8 *data, u32 size)
>>  {
>> +	int ret;
>> +
>>  	struct seamldr_params *params __free(free_seamldr_params) =
>>  						init_seamldr_params(data, size);
>>  	if (IS_ERR(params))
>> @@ -314,6 +316,10 @@ int seamldr_install_module(const u8 *data, u32 size)
>>  	/* Ensure a stable set of online CPUs for the update process. */
>>  	guard(cpus_read_lock)();
>>  	set_target_state(MODULE_UPDATE_START + 1);
>> -	return stop_machine_cpuslocked(do_seamldr_install_module, params, cpu_online_mask);
>> +	ret = stop_machine_cpuslocked(do_seamldr_install_module, params, cpu_online_mask);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return tdx_module_refresh_version();
>>  }
>
>This is one reason I rather dislike guard().
>
>Does tdx_module_refresh_version() need to be guarded by 'cpus_read_lock'?

No. It can be moved out of 'cpus_read_lock'.

>
>?
>
>
>> +	/* Major/minor versions should not change across updates. */
>> +	tdx_sysinfo.version.update_version	= new.update_version;
>
>					   ^ very odd tab
>
>Also, how much of this do you *NEED*? You don't need to print the old
>version. You don't really need to _print_ the new version either.
>
>Isn't this arguably all fluff?

I initially kept tdx module update similar to the microcode update. But yes,
printing the new version is not strictly needed. Once the unnecessary
complexity is dropped, the patch becomes quite small:

commit 90e5a66b3f54af96d5895f6707ecdeef4bc4a3ed
Author: Chao Gao <chao.gao@intel.com>
Date:   Tue Mar 31 05:41:29 2026 -0700

    x86/virt/tdx: Refresh TDX module version after update
    
    The kernel exposes the TDX module version through sysfs so userspace can
    check update compatibility. That information needs to remain accurate
    across runtime updates.
    
    A runtime update may change the module's update_version, so refresh the
    cached version after a successful update.
    
    Drop __ro_after_init from tdx_sysinfo because it is now updated at runtime.
    
    Do not refresh the rest of tdx_sysinfo even if those values change. Current
    tdx_sysinfo users (e.g., KVM) can continue to work across module updates
    without seeing the new values because module updates are required to be
    backward compatible. And those users are not written to re-validate
    tdx_sysinfo values after an update, so refreshing them could be risky.
    
    For example, a user may decide both setup and teardown behavior purely
    based on the reported capabilities. If refreshed tdx_sysinfo starts
    reporting a newly gained capability, later code may assume the
    corresponding setup exists and try to use or tear it down, even though no
    such setup was done before the update.
    
    Signed-off-by: Chao Gao <chao.gao@intel.com>
    ---
    v9:
    - don't print old and new version [Dave]
    - explain why it's OK to hide changes from the tdx_sysinfo users [Dave]
    - update versions in stop_machine context
    - don't mention major/minor versions are idential across updates. That fact is
      not relevant here.

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index deb1470185ce..ab350b705910 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -66,7 +66,7 @@ static struct tdmr_info_list tdx_tdmr_list;
 /* All TDX-usable memory regions.  Protected by mem_hotplug_lock. */
 static LIST_HEAD(tdx_memlist);
 
-static struct tdx_sys_info tdx_sysinfo __ro_after_init;
+static struct tdx_sys_info tdx_sysinfo;
 
 /*
  * Do the module global initialization once and return its result.
@@ -1305,6 +1305,10 @@ int tdx_module_run_update(void)
	if (ret)
		return ret;
 
+	/* Shouldn't fail as the update has succeeded. */
+	ret = get_tdx_sys_info_version(&tdx_sysinfo.version);
+	WARN_ON_ONCE(ret);
+
	tdx_module_state.initialized = true;
	return 0;
 }
diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
index e793dec688ab..e49c300f23d4 100644
--- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
+++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
@@ -7,7 +7,7 @@
  * Include this file to other C file instead.
  */
 
-static __init int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version)
+static int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version)
 {
	int ret = 0;
	u64 val;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 09/21] x86/virt/seamldr: Introduce skeleton for TDX module updates
       [not found]   ` <5dc70847-332d-496f-b0ab-03323eff7118@intel.com>
@ 2026-05-06 13:00     ` Chao Gao
  2026-05-06 20:43       ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Gao @ 2026-05-06 13:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Thu, Apr 30, 2026 at 01:03:53PM -0700, Dave Hansen wrote:
>On 4/27/26 08:28, Chao Gao wrote:
>> +static struct {
>> +	enum module_update_state state;
>> +	int thread_ack;
>
>multi_stop_data has an atomic_t for this.
>
>You have an int.
>
>Which one is right?

You pointed out that using atomic_t and memory barriers for synchronization was
overly complicated. So, I switched to use a spinlock, and thread_ack can now be
a plain int.

See https://lore.kernel.org/kvm/31936a20-929f-489a-9dc6-0f8fcb9307f1@intel.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 09/21] x86/virt/seamldr: Introduce skeleton for TDX module updates
  2026-05-06 13:00     ` [PATCH v8 09/21] x86/virt/seamldr: Introduce skeleton for TDX module updates Chao Gao
@ 2026-05-06 20:43       ` Dave Hansen
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2026-05-06 20:43 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On 5/6/26 06:00, Chao Gao wrote:
> On Thu, Apr 30, 2026 at 01:03:53PM -0700, Dave Hansen wrote:
>> On 4/27/26 08:28, Chao Gao wrote:
>>> +static struct {
>>> +	enum module_update_state state;
>>> +	int thread_ack;
>>
>> multi_stop_data has an atomic_t for this.
>>
>> You have an int.
>>
>> Which one is right?
> 
> You pointed out that using atomic_t and memory barriers for synchronization was
> overly complicated. So, I switched to use a spinlock, and thread_ack can now be
> a plain int.

Good point.

Could you make this a bit more obvious, please?

I honestly don't like guard(). Maybe I'm old-fashioned, but I really,
really like critical sections to be, well, explicit *sections* of code.

Second, you have two functions defined next to each other with similar
names:

static void ack_state(void)

static void set_target_state(enum module_update_state state)

Both of which manipulate the same data. One takes the lock. One doesn't.
That could be fixed with comments.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 10/21] x86/virt/seamldr: Shut down the current TDX module
  2026-05-06  2:56     ` [PATCH v8 10/21] x86/virt/seamldr: Shut down the current TDX module Chao Gao
@ 2026-05-06 20:49       ` Dave Hansen
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2026-05-06 20:49 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On 5/5/26 19:56, Chao Gao wrote:
> On Thu, Apr 30, 2026 at 11:52:50AM -0700, Dave Hansen wrote:
>> On 4/27/26 08:28, Chao Gao wrote:
>>>  static int do_seamldr_install_module(void *seamldr_params)
>>>  {
>>>  	enum module_update_state newstate, curstate = MODULE_UPDATE_START;
>>> +	int cpu = smp_processor_id();
>>> +	bool primary;
>>>  	int ret = 0;
>>>  
>>> +	primary = cpumask_first(cpu_online_mask) == cpu;
>> Isn't cpumask_first(cpu_online_mask)==0, always? I thought CPU 0 could
>> never be offlined.
> Not always. On x86, CPU 0 can be offlined at runtime if the kernel is booted
> with the cpu0_hotplug command-line option. See cpu_hotplug.rst.

See e59e74dc48a309cb848ffc3d76a0d61aa6803c05.

Yes, the docs are stale.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 08/21] x86/virt/seamldr: Allocate and populate a module update request
       [not found]   ` <a52c4701-c99d-48d5-9b63-8eb1c0e589f0@intel.com>
@ 2026-05-07 13:19     ` Chao Gao
  2026-05-08 16:48       ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Gao @ 2026-05-07 13:19 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

>> header is consumed solely by the kernel to extract the sigstruct and
>> module, so validate it before processing to protect the kernel ABI. The
>> sigstruct and module are passed to and validated by P-SEAMLDR, so don't
>> duplicate any validation in the kernel.
>> 
>> Note: the sigstruct_pa field in SEAMLDR_PARAMS has been extended to
>> a 4-element array. The updated "SEAM Loader (SEAMLDR) Interface
>> Specification" will be published separately.
>
>These changelogs have all the right info, but I find them really hard to
>parse. For instance, if you're going to have a 'struct seamldr_params',
>then just stick with that name. Don't use the "SEAMLDR_PARAMS" name too.
>
>Start with the data structures:
>
>There are two important ABIs here:
>
>'struct tdx_blob'       - the on-disk and in-memory format for a TDX
> 		          module update image.
>'struct seamldr_params' - The in-memory ABI passed to the TDX module
>			  loader. Points to a single 'struct tdx_blob'

Thanks for the thorough review.

Your comments all make sense to me. I just want to confirm two points
below.


>> +	/*
>> +	 * Don't care about user passing the wrong file, but protect
>> +	 * kernel ABI by preventing accepting garbage.
>> +	 */
>> +	if (memcmp(blob->signature, "TDX-BLOB", 8))
>> +		return ERR_PTR(-EINVAL);
>
>Is there really no helper in the kernel anywhere that can safely do the
>8-byte compare against two known-to-the-compiler 8-byte-wide fields
>without hard-coding the 8?

I couldn't find a helper that automatically derives the comparison
length from the operands.  'strcmp()' is not suitable here because
'blob->signature' is not NUL-terminated.

Do you mean just avoiding the hard-coded 8, e.g.

	if (memcmp(blob->signature, "TDX-BLOB", sizeof(blob->signature)))
		return ERR_PTR(-EINVAL);

or define the 'u8 signature[8]' as a u64 and compare it with a constant, like

/* Little-endian encoding of "TDX-BLOB" string */
#define TDX_IMAGE_SIGNATURE	0x424f4c422d584454ULL

	if (blob->signature != TDX_IMAGE_SIGNATURE)
		return ERR_PTR(-EINVAL);

>> +	struct seamldr_params *params;
>> +	int module_pg_cnt, sig_pg_cnt;
>> +	const u8 *sig, *module;
>> +	int i;
>> +
>> +	params = (struct seamldr_params *)get_zeroed_page(GFP_KERNEL);
>> +	if (!params)
>> +		return ERR_PTR(-ENOMEM);
>
>kzmalloc(PAGE_SIZE, GFP_KERNEL) will save you a cast.

I noticed that 'kzalloc_obj()' can be used here, which avoids spelling out
the size and GFP flags explicitly.  So I ended up with:

    params = kzalloc_obj(*params);

If you would prefer 'kzalloc(PAGE_SIZE, GFP_KERNEL)', I can switch to that.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 17/21] x86/virt/seamldr: Abort updates on failure
       [not found]   ` <fc27ad0e-fceb-4eed-bb1c-dbfb5b913bf6@intel.com>
@ 2026-05-08  9:16     ` Chao Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Chao Gao @ 2026-05-08  9:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Thu, Apr 30, 2026 at 01:06:38PM -0700, Dave Hansen wrote:
>I don't like how this is being done.
>
> 1. Introduce this do{}while() loop
> 2. Do 20 other patches
> 3. Introduce a thing that can make it change
> 4. Change the fundamental flow of the loop, to fix #3
>
>I'd much rather have:
>
> 1. Introduce this do{}while() loop
> 2. Tweak fundamental flow of the loop from the last patch when I can
>    remember it. Allude to future failures.
> 3. Do 20 other patches
> 4. Introduce a thing that uses #2

OK, that makes sense. I'll reorder the series so this patch comes immediately
after the skeleton patch.

>
>
>> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
>> index c81b26c4bac1..9b8f571eb03f 100644
>> --- a/arch/x86/virt/vmx/tdx/seamldr.c
>> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
>> @@ -220,6 +220,7 @@ enum module_update_state {
>>  static struct {
>>  	enum module_update_state state;
>>  	int thread_ack;
>> +	bool failed;
>>  	/*
>>  	 * Protect update_data. Raw spinlock as it will be acquired from
>>  	 * interrupt-disabled contexts.
>> @@ -284,12 +285,15 @@ static int do_seamldr_install_module(void *seamldr_params)
>>  				break;
>>  			}
>>  
>> -			ack_state();
>> +			if (ret)
>> +				WRITE_ONCE(update_data.failed, true);
>> +			else
>> +				ack_state();
>>  		} else {
>>  			touch_nmi_watchdog();
>>  			rcu_momentary_eqs();
>>  		}
>
>I don't like how this is turning out either. I don't like all the nested
>conditions or ack_state() that hides its mucking with update data while
>its caller mucks with it directly. It's just all hacked together.
>
>Defer all of the acking, and *failed* acking to the ack_state() helper.

OK. I'll fold both normal and failed acking into ack_state().


>
>Also, I'm kinda peeved that you copied and pasted the  		
>touch_nmi_watchdog()/rcu_momentary_eqs() bits and none of the comments.
>This is a rather subtle use of both. If you want this to be a normal
>"spinning in stop machine" idiom, then create a helper and put the
>comments there.

Those two calls were added in stop_machine() to improve debuggability.

The issue they address is that a stop_machine() callback can hang on one
CPU. Without touch_nmi_watchdog() and rcu_momentary_eqs(), the other CPUs
that are merely spinning in the wait loop can also report hard lockup and
RCU stall warnings, which obscures the actual stuck CPU.

I agree that this behavior makes sense in stop_machine() as common
infrastructure. But this update path does not take an arbitrary callback
function, so that that debuggability is not strictly necessary here. I'll
drop those calls from this path unless there is an objection.

>
>Also, this is a case where:
>
>	do {
>		cpu_relax();
>		newstate = READ_ONCE(update_data.state);
>
>		if (newstate == curstate) {
>			// can cpu_relax() just go in here??
>			touch_nmi_watchdog();
>			rcu_momentary_eqs();
>			continue;
>		}
>	
>		switch() {
>			// state changing here
>		}
>	} while (...);
>
>is a much more sane setup. You're not paying the if() indentation cost
>for the entire state transition block. You're also putting the "shut up
>the warnings" code out of the way where you can forget about it.
>

Agreed. Will do.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 18/21] coco/tdx-host: Don't expose P-SEAMLDR features on CPUs with erratum
       [not found]   ` <abd48a30-8d51-4a86-b662-b09afb567dc5@intel.com>
@ 2026-05-08  9:50     ` Chao Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Chao Gao @ 2026-05-08  9:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Thu, Apr 30, 2026 at 01:09:30PM -0700, Dave Hansen wrote:
>On 4/27/26 08:28, Chao Gao wrote:
>> Some TDX-capable CPUs have an erratum, as documented in Intel® Trust
>> Domain CPU Architectural Extensions (May 2021 edition) Chapter 2.3:
>> 
>>   SEAMRET from the P-SEAMLDR clears the current VMCS structure pointed
>>   to by the current-VMCS pointer. A VMM that invokes the P-SEAMLDR using
>>   SEAMCALL must reload the current-VMCS, if required, using the VMPTRLD
>>   instruction.
>> 
>> Clearing the current VMCS behind KVM's back will break KVM.
>> 
>> This erratum is not present when IA32_VMX_BASIC[60] is set. Add a CPU
>> bug bit for this erratum and refuse to expose P-SEAMLDR features (e.g.,
>> TDX module updates) on affected CPUs.
>
>This seems totally random.
>
>Shouldn't this be way back when can_expose_seamldr() got defined in the
>first place?

I split this out because the erratum needs a longer changelog and some
discussion of alternatives. I also wanted the initial can_expose_seamldr()
patch to focus on introducing the gating mechanism, without bundling in
every detailed check from the start. The update do-while loop and the uABI
stuff are the core of this series, while this erratum check is not, so I
placed this patch later.

That said, I am perfectly fine with moving this patch to immediately follow
the patch that introduces can_expose_seamldr().

>> +#define X86_BUG_SEAMRET_INVD_VMCS	X86_BUG( 1*32+11) /* "seamret_invd_vmcs" SEAMRET from P-SEAMLDR clears the current VMCS */
>
>I find myself wondering if this is worth a bug bit.

The bug bit was added in v5:

https://lore.kernel.org/all/d664ac9445b1c7cc864dead103086341c374b094.camel@intel.com/#t

Kai suggested this approach for two reasons:

1. It is consistent with how X86_BUG_TDX_PW_MCE is handled.
2. It gives userspace a clue as to why the module update feature is
   unavailable.

That reasoning made sense to me, and I do not see a strong reason not to
use the "bug bit" infrastructure. If there is no objection to it, I will
add a short explanation to the changelog.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 08/21] x86/virt/seamldr: Allocate and populate a module update request
  2026-05-07 13:19     ` [PATCH v8 08/21] x86/virt/seamldr: Allocate and populate a module update request Chao Gao
@ 2026-05-08 16:48       ` Dave Hansen
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2026-05-08 16:48 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-coco, linux-kernel, x86, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On 5/7/26 06:19, Chao Gao wrote:
...
>>> +	/*
>>> +	 * Don't care about user passing the wrong file, but protect
>>> +	 * kernel ABI by preventing accepting garbage.
>>> +	 */
>>> +	if (memcmp(blob->signature, "TDX-BLOB", 8))
>>> +		return ERR_PTR(-EINVAL);
>>
>> Is there really no helper in the kernel anywhere that can safely do the
>> 8-byte compare against two known-to-the-compiler 8-byte-wide fields
>> without hard-coding the 8?
> 
> I couldn't find a helper that automatically derives the comparison
> length from the operands.  'strcmp()' is not suitable here because
> 'blob->signature' is not NUL-terminated.
> 
> Do you mean just avoiding the hard-coded 8, e.g.
> 
> 	if (memcmp(blob->signature, "TDX-BLOB", sizeof(blob->signature)))
> 		return ERR_PTR(-EINVAL);
> 
> or define the 'u8 signature[8]' as a u64 and compare it with a constant, like
> 
> /* Little-endian encoding of "TDX-BLOB" string */
> #define TDX_IMAGE_SIGNATURE	0x424f4c422d584454ULL
> 
> 	if (blob->signature != TDX_IMAGE_SIGNATURE)
> 		return ERR_PTR(-EINVAL);

Either one of those is fine with me. I'd probably do the sizeof()
variant, but no strong preference.

>>> +	struct seamldr_params *params;
>>> +	int module_pg_cnt, sig_pg_cnt;
>>> +	const u8 *sig, *module;
>>> +	int i;
>>> +
>>> +	params = (struct seamldr_params *)get_zeroed_page(GFP_KERNEL);
>>> +	if (!params)
>>> +		return ERR_PTR(-ENOMEM);
>>
>> kzmalloc(PAGE_SIZE, GFP_KERNEL) will save you a cast.
> 
> I noticed that 'kzalloc_obj()' can be used here, which avoids spelling out
> the size and GFP flags explicitly.  So I ended up with:
> 
>     params = kzalloc_obj(*params);

That's fine too.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-05-08 16:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260427152854.101171-1-chao.gao@intel.com>
     [not found] ` <20260427152854.101171-8-chao.gao@intel.com>
     [not found]   ` <06682111-2b63-444e-a431-b8dd0db2b0ab@intel.com>
2026-05-06  2:35     ` [PATCH v8 07/21] coco/tdx-host: Implement firmware upload sysfs ABI for TDX module updates Chao Gao
     [not found] ` <20260427152854.101171-11-chao.gao@intel.com>
     [not found]   ` <0523b07b-2df2-4cf4-bf98-6efe01780698@intel.com>
2026-05-06  2:56     ` [PATCH v8 10/21] x86/virt/seamldr: Shut down the current TDX module Chao Gao
2026-05-06 20:49       ` Dave Hansen
     [not found] ` <20260427152854.101171-12-chao.gao@intel.com>
     [not found]   ` <20f2d821-bfa6-4db2-a968-b5455c7b5007@intel.com>
2026-05-06  6:21     ` [PATCH v8 11/21] x86/virt/tdx: Reset software states during TDX module shutdown Chao Gao
     [not found] ` <20260427152854.101171-16-chao.gao@intel.com>
     [not found]   ` <28be5180-ff74-4e4d-b392-7ba7a9b4c1c0@intel.com>
2026-05-06 12:51     ` [PATCH v8 15/21] x86/virt/tdx: Refresh TDX module version after update Chao Gao
     [not found] ` <20260427152854.101171-10-chao.gao@intel.com>
     [not found]   ` <5dc70847-332d-496f-b0ab-03323eff7118@intel.com>
2026-05-06 13:00     ` [PATCH v8 09/21] x86/virt/seamldr: Introduce skeleton for TDX module updates Chao Gao
2026-05-06 20:43       ` Dave Hansen
     [not found] ` <20260427152854.101171-9-chao.gao@intel.com>
     [not found]   ` <a52c4701-c99d-48d5-9b63-8eb1c0e589f0@intel.com>
2026-05-07 13:19     ` [PATCH v8 08/21] x86/virt/seamldr: Allocate and populate a module update request Chao Gao
2026-05-08 16:48       ` Dave Hansen
     [not found] ` <20260427152854.101171-18-chao.gao@intel.com>
     [not found]   ` <fc27ad0e-fceb-4eed-bb1c-dbfb5b913bf6@intel.com>
2026-05-08  9:16     ` [PATCH v8 17/21] x86/virt/seamldr: Abort updates on failure Chao Gao
     [not found] ` <20260427152854.101171-19-chao.gao@intel.com>
     [not found]   ` <abd48a30-8d51-4a86-b662-b09afb567dc5@intel.com>
2026-05-08  9:50     ` [PATCH v8 18/21] coco/tdx-host: Don't expose P-SEAMLDR features on CPUs with erratum Chao Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox