From: Chao Gao <chao.gao@intel.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Li, Xiaoyao" <xiaoyao.li@intel.com>,
"Huang, Kai" <kai.huang@intel.com>,
"Zhao, Yan Y" <yan.y.zhao@intel.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"kas@kernel.org" <kas@kernel.org>,
"Chatre, Reinette" <reinette.chatre@intel.com>,
"seanjc@google.com" <seanjc@google.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>,
"Verma, Vishal L" <vishal.l.verma@intel.com>,
"nik.borisov@suse.com" <nik.borisov@suse.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"Weiny, Ira" <ira.weiny@intel.com>,
"tony.lindgren@linux.intel.com" <tony.lindgren@linux.intel.com>,
"Annapurve, Vishal" <vannapurve@google.com>,
"Shahar, Sagi" <sagis@google.com>,
"djbw@kernel.org" <djbw@kernel.org>,
"tglx@kernel.org" <tglx@kernel.org>,
"paulmck@kernel.org" <paulmck@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>, "bp@alien8.de" <bp@alien8.de>,
"yilun.xu@linux.intel.com" <yilun.xu@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v9 13/23] x86/virt/seamldr: Abort updates after a failed step
Date: Tue, 19 May 2026 18:20:28 +0800 [thread overview]
Message-ID: <agw5bA4+355hQrt2@intel.com> (raw)
In-Reply-To: <c0c5edb3cf4dfeee986a478aa3db886aa8d5db38.camel@intel.com>
On Tue, May 19, 2026 at 10:34:31AM +0800, Edgecombe, Rick P wrote:
>On Wed, 2026-05-13 at 08:09 -0700, Chao Gao wrote:
>> A TDX module update is a multi-step process, and any step can fail.
>>
>> The current update flow continues to later steps after an error.
>> Continuing after a failure can leave the TDX module in an unrecoverable
>> state.
>
>I get what you are saying here, but "continuing" vs "leaving" is a tiny bit
>confusing to me. Maybe: Continuing with subsequent update steps after a failure
>can cause the TDX module to enter an unrecoverable state?
Yes. it is better.
>
>>
>> One failure case must remain recoverable: update contention with an ongoing
>> TD build. The agreed kernel behavior for this case [1] is to fail the
>> update with -EBUSY so userspace can retry later.
>
>The link to the discussion is nice, but the explanation of just that there was
>an agreement is not saying much. But the reasoning around AVOID_COMPAT_SENSITIVE
>*is* handled in later patch. So can we say future changes will want to return
>errors to userspace for certain update failures? Then we can discuss the
>specifics when code is actual error is added?
yes. the main point is certain update failures should be recoverable so userspace
can retry.
>
>And why talk about EBUSY specifically? It is not in this patch. Stale log?
Sure. there is no need to single out EBUSY.
>
>>
>> Abort the update on any failure. This also makes the TD-build contention
>> case recoverable, because that failure occurs before any TDX module state
>> is changed.
>>
>
>Oh, maybe I didn't get what you meant above actually. The contention case is
>only recoverable because we detect it at the first step? Does "Continuing after
>a failure can leave the TDX module in an unrecoverable" really mean that any
>failure after the first step is unrecoverable?
You are right. Any failure after the initial module shutdown step is
unrecoverable.
> Or can we put it in some other
>more specific terms like that. Terms which are more specific but still not
>overly complex description of TDX module update flows?
>
>> Apply the same rule to all errors instead of special-casing
>> -EBUSY.
>
>It seems like actually it is not special cased...? The error returned is
>whatever is returned from the step.
>
>>
>> Track per-step failures, stop the update loop once a failure is observed,
>> and do not advance the state machine to the next step.
>
>Hmm, so this is actually a bunch of generic handling for each step, that really
>only works for the first one? Is the generic handling really needed?
We could special-case the first step, but that would add step-specific
error handling to the update loop. I think the simpler rule is to abort the
update on the first observed failure, regardless of which step reports it.
how about:
A TDX module update is a multi-step process, and any step can fail.
The current update flow continues to later steps after an error.
Continuing after a failure can cause the TDX module to enter an
unrecoverable state.
But certain failures during the initial module shutdown step should
simply return an error to userspace, so the update can be retried cleanly.
To preserve that recoverability, one option would be to abort the update
only for those failures, since they occur before any TDX module state is
changed. But special-casing specific failures in specific steps would
complicate the do-while() update loop for no benefit.
Simply abort update on any failure, at any step.
Track failures for each step, stop the update loop once a failure is
observed, and do not advance the state machine to the next step.
>
>>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
>> Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
>> Reviewed-by: Kai Huang <kai.huang@intel.com>
>> Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
>> Link: https://lore.kernel.org/linux-coco/aQFmOZCdw64z14cJ@google.com/ # [1]
>> ---
>> v9:
>> - Avoid nested if/else by deferring failure accounting to ack_state().
>> - Reduce indentation of the main flow.
>> - Convert the failed flag into a counter. This avoids a conditional
>> update of the flag; the counter can simply accumulate failures.
>> ---
>> arch/x86/virt/vmx/tdx/seamldr.c | 11 +++++++----
>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
>> index 7befe4a08f33..48fe71319fea 100644
>> --- a/arch/x86/virt/vmx/tdx/seamldr.c
>> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
>> @@ -170,6 +170,7 @@ enum module_update_state {
>> static struct update_ctrl {
>> enum module_update_state state;
>> int num_ack;
>> + int num_failed;
>
>Was there past discussion on why it keeps a failed count? All we need to know is
>if anything failed right? So a bool is fine too?
Kiryl suggested a boolean, and I used that in earlier versions. In v9 I
moved the failure tracking next to the ack counting in ack_state(). A
boolean still works, but it needs an extra conditional to latch the failure
state.
static void ack_state(struct update_ctrl *ctrl, int result)
{
raw_spin_lock(&ctrl->lock);
- ctrl->num_failed += !!ret;
+ if (!ctrl->failed)
+ ctrl->failed = !!ret;
ctrl->num_ack++;
if (ctrl->num_ack == num_online_cpus())
if (ctrl->num_ack == num_online_cpus() && !ctrl->num_failed)
__set_target_state(ctrl, ctrl->state + 1);
Using an int mainly to keep the failure and ack tracking similar
and avoid the extra if. (I put a note under --- to explain this.)
If you prefer, I can switch it back to bool.
next prev parent reply other threads:[~2026-05-19 10:20 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 15:09 [PATCH v9 00/23] Runtime TDX module update support Chao Gao
2026-05-13 15:09 ` [PATCH v9 01/23] x86/virt/tdx: Consolidate TDX global initialization states Chao Gao
2026-05-15 16:14 ` Dave Hansen
2026-05-18 7:43 ` Chao Gao
2026-05-18 18:00 ` Edgecombe, Rick P
2026-05-18 18:09 ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 02/23] x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h Chao Gao
2026-05-15 16:15 ` Dave Hansen
2026-05-18 7:52 ` Chao Gao
2026-05-18 16:57 ` Edgecombe, Rick P
2026-05-19 1:59 ` Edgecombe, Rick P
2026-05-19 10:24 ` Chao Gao
2026-05-13 15:09 ` [PATCH v9 03/23] x86/virt/tdx: Move low level SEAMCALL helpers out of <asm/tdx.h> Chao Gao
2026-05-13 15:09 ` [PATCH v9 04/23] coco/tdx-host: Introduce a "tdx_host" device Chao Gao
2026-05-15 16:21 ` Dave Hansen
2026-05-18 11:18 ` Chao Gao
2026-05-18 18:08 ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 05/23] coco/tdx-host: Expose TDX module version Chao Gao
2026-05-15 16:53 ` Dave Hansen
2026-05-18 11:29 ` Chao Gao
2026-05-13 15:09 ` [PATCH v9 06/23] x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs Chao Gao
2026-05-15 17:02 ` Dave Hansen
2026-05-15 17:07 ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 07/23] x86/virt/seamldr: Add a helper to retrieve P-SEAMLDR information Chao Gao
2026-05-15 17:18 ` Dave Hansen
2026-05-18 11:51 ` Chao Gao
2026-05-13 15:09 ` [PATCH v9 08/23] coco/tdx-host: Expose P-SEAMLDR information via sysfs Chao Gao
2026-05-15 17:23 ` Dave Hansen
2026-05-18 12:35 ` Chao Gao
2026-05-15 17:28 ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 09/23] coco/tdx-host: Don't expose P-SEAMLDR information on CPUs with erratum Chao Gao
2026-05-15 17:26 ` Dave Hansen
2026-05-18 12:44 ` Chao Gao
2026-05-18 15:29 ` Dave Hansen
2026-05-19 1:22 ` Edgecombe, Rick P
2026-05-13 15:09 ` [PATCH v9 10/23] coco/tdx-host: Implement firmware upload sysfs ABI for TDX module updates Chao Gao
2026-05-15 18:05 ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 11/23] x86/virt/seamldr: Allocate and populate a module update request Chao Gao
2026-05-15 6:05 ` Chao Gao
2026-05-15 18:24 ` Dave Hansen
2026-05-15 18:44 ` Edgecombe, Rick P
2026-05-15 18:59 ` Dave Hansen
2026-05-18 14:15 ` Chao Gao
2026-05-18 15:12 ` Dave Hansen
2026-05-15 19:43 ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 12/23] x86/virt/seamldr: Introduce skeleton for TDX module updates Chao Gao
2026-05-18 15:36 ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 13/23] x86/virt/seamldr: Abort updates after a failed step Chao Gao
2026-05-19 2:34 ` Edgecombe, Rick P
2026-05-19 10:20 ` Chao Gao [this message]
2026-05-13 15:09 ` [PATCH v9 14/23] x86/virt/seamldr: Shut down the current TDX module Chao Gao
2026-05-19 3:00 ` Edgecombe, Rick P
2026-05-19 12:05 ` Chao Gao
2026-05-19 16:24 ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 15/23] x86/virt/tdx: Reset software states during TDX module shutdown Chao Gao
2026-05-13 15:09 ` [PATCH v9 16/23] x86/virt/seamldr: Install a new TDX module Chao Gao
2026-05-13 15:10 ` [PATCH v9 17/23] x86/virt/seamldr: Do TDX per-CPU initialization after module installation Chao Gao
2026-05-13 15:10 ` [PATCH v9 18/23] x86/virt/tdx: Restore TDX module state Chao Gao
2026-05-13 15:10 ` [PATCH v9 19/23] x86/virt/tdx: Refresh TDX module version after update Chao Gao
2026-05-19 3:16 ` Edgecombe, Rick P
2026-05-19 10:42 ` Chao Gao
2026-05-13 15:10 ` [PATCH v9 20/23] x86/virt/tdx: Reject updates during compatibility-sensitive operations Chao Gao
2026-05-15 6:12 ` Chao Gao
2026-05-13 15:10 ` [PATCH v9 21/23] x86/virt/tdx: Enable TDX module runtime updates Chao Gao
2026-05-13 15:10 ` [PATCH v9 22/23] coco/tdx-host: Document TDX module update compatibility criteria Chao Gao
2026-05-13 15:10 ` [PATCH v9 23/23] x86/virt/tdx: Document TDX module update Chao Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agw5bA4+355hQrt2@intel.com \
--to=chao.gao@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=djbw@kernel.org \
--cc=hpa@zytor.com \
--cc=ira.weiny@intel.com \
--cc=kai.huang@intel.com \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=sagis@google.com \
--cc=seanjc@google.com \
--cc=tglx@kernel.org \
--cc=tony.lindgren@linux.intel.com \
--cc=vannapurve@google.com \
--cc=vishal.l.verma@intel.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yilun.xu@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox