Re: [PATCH v9 13/23] x86/virt/seamldr: Abort updates after a failed step

Linux Confidential Computing Development
 help / color / mirror / Atom feed

From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Gao, Chao" <chao.gao@intel.com>
Cc: "Li, Xiaoyao" <xiaoyao.li@intel.com>,
	"Huang, Kai" <kai.huang@intel.com>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"kas@kernel.org" <kas@kernel.org>,
	"Chatre, Reinette" <reinette.chatre@intel.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>,
	"nik.borisov@suse.com" <nik.borisov@suse.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"Weiny, Ira" <ira.weiny@intel.com>,
	"tony.lindgren@linux.intel.com" <tony.lindgren@linux.intel.com>,
	"Annapurve, Vishal" <vannapurve@google.com>,
	"Shahar, Sagi" <sagis@google.com>,
	"djbw@kernel.org" <djbw@kernel.org>,
	"tglx@kernel.org" <tglx@kernel.org>,
	"paulmck@kernel.org" <paulmck@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>, "bp@alien8.de" <bp@alien8.de>,
	"yilun.xu@linux.intel.com" <yilun.xu@linux.intel.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v9 13/23] x86/virt/seamldr: Abort updates after a failed step
Date: Tue, 19 May 2026 02:34:31 +0000	[thread overview]
Message-ID: <c0c5edb3cf4dfeee986a478aa3db886aa8d5db38.camel@intel.com> (raw)
In-Reply-To: <20260513151045.1420990-14-chao.gao@intel.com>

On Wed, 2026-05-13 at 08:09 -0700, Chao Gao wrote:
> A TDX module update is a multi-step process, and any step can fail.
> 
> The current update flow continues to later steps after an error.
> Continuing after a failure can leave the TDX module in an unrecoverable
> state.

I get what you are saying here, but "continuing" vs "leaving" is a tiny bit
confusing to me. Maybe: Continuing with subsequent update steps after a failure
can cause the TDX module to enter an unrecoverable state?

> 
> One failure case must remain recoverable: update contention with an ongoing
> TD build. The agreed kernel behavior for this case [1] is to fail the
> update with -EBUSY so userspace can retry later.

The link to the discussion is nice, but the explanation of just that there was
an agreement is not saying much. But the reasoning around AVOID_COMPAT_SENSITIVE
*is* handled in later patch. So can we say future changes will want to return
errors to userspace for certain update failures? Then we can discuss the
specifics when code is actual error is added?

And why talk about EBUSY specifically? It is not in this patch. Stale log? 

> 
> Abort the update on any failure. This also makes the TD-build contention
> case recoverable, because that failure occurs before any TDX module state
> is changed. 
> 

Oh, maybe I didn't get what you meant above actually. The contention case is
only recoverable because we detect it at the first step? Does "Continuing after
a failure can leave the TDX module in an unrecoverable" really mean that any
failure after the first step is unrecoverable? Or can we put it in some other
more specific terms like that. Terms which are more specific but still not
overly complex description of TDX module update flows?

> Apply the same rule to all errors instead of special-casing
> -EBUSY.

It seems like actually it is not special cased...? The error returned is
whatever is returned from the step.

> 
> Track per-step failures, stop the update loop once a failure is observed,
> and do not advance the state machine to the next step.

Hmm, so this is actually a bunch of generic handling for each step, that really
only works for the first one? Is the generic handling really needed?

> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
> Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Link: https://lore.kernel.org/linux-coco/aQFmOZCdw64z14cJ@google.com/ # [1]
> ---
> v9:
>   - Avoid nested if/else by deferring failure accounting to ack_state().
>   - Reduce indentation of the main flow.
>   - Convert the failed flag into a counter. This avoids a conditional
>     update of the flag; the counter can simply accumulate failures.
> ---
>  arch/x86/virt/vmx/tdx/seamldr.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
> index 7befe4a08f33..48fe71319fea 100644
> --- a/arch/x86/virt/vmx/tdx/seamldr.c
> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
> @@ -170,6 +170,7 @@ enum module_update_state {
>  static struct update_ctrl {
>  	enum module_update_state state;
>  	int num_ack;
> +	int num_failed;

Was there past discussion on why it keeps a failed count? All we need to know is
if anything failed right? So a bool is fine too?

>  	/*
>  	 * Protect update_ctrl. Raw spinlock as it will be acquired from
>  	 * interrupt-disabled contexts.
> @@ -187,12 +188,13 @@ static void __set_target_state(struct update_ctrl *ctrl,
>  }
>  
>  /* Last one to ack a state moves to the next state. */
> -static void ack_state(struct update_ctrl *ctrl)
> +static void ack_state(struct update_ctrl *ctrl, int result)
>  {
>  	raw_spin_lock(&ctrl->lock);
>  
> +	ctrl->num_failed += !!result;
>  	ctrl->num_ack++;
> -	if (ctrl->num_ack == num_online_cpus())
> +	if (ctrl->num_ack == num_online_cpus() && !ctrl->num_failed)
>  		__set_target_state(ctrl, ctrl->state + 1);
>  
>  	raw_spin_unlock(&ctrl->lock);
> @@ -202,6 +204,7 @@ static void init_state(struct update_ctrl *ctrl)
>  {
>  	raw_spin_lock_init(&ctrl->lock);
>  	__set_target_state(ctrl, MODULE_UPDATE_START + 1);
> +	ctrl->num_failed = 0;
>  }
>  
>  /*
> @@ -228,8 +231,8 @@ static int do_seamldr_install_module(void *seamldr_params)
>  			break;
>  		}
>  
> -		ack_state(&update_ctrl);
> -	} while (curstate != MODULE_UPDATE_DONE);
> +		ack_state(&update_ctrl, ret);
> +	} while (curstate != MODULE_UPDATE_DONE && !READ_ONCE(update_ctrl.num_failed));
>  
>  	return ret;
>  }

next prev parent reply	other threads:[~2026-05-19  2:34 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-13 15:09 [PATCH v9 00/23] Runtime TDX module update support Chao Gao
2026-05-13 15:09 ` [PATCH v9 01/23] x86/virt/tdx: Consolidate TDX global initialization states Chao Gao
2026-05-15 16:14   ` Dave Hansen
2026-05-18  7:43     ` Chao Gao
2026-05-18 18:00       ` Edgecombe, Rick P
2026-05-18 18:09         ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 02/23] x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h Chao Gao
2026-05-15 16:15   ` Dave Hansen
2026-05-18  7:52     ` Chao Gao
2026-05-18 16:57       ` Edgecombe, Rick P
2026-05-19  1:59         ` Edgecombe, Rick P
2026-05-19 10:24           ` Chao Gao
2026-05-13 15:09 ` [PATCH v9 03/23] x86/virt/tdx: Move low level SEAMCALL helpers out of <asm/tdx.h> Chao Gao
2026-05-13 15:09 ` [PATCH v9 04/23] coco/tdx-host: Introduce a "tdx_host" device Chao Gao
2026-05-15 16:21   ` Dave Hansen
2026-05-18 11:18     ` Chao Gao
2026-05-18 18:08       ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 05/23] coco/tdx-host: Expose TDX module version Chao Gao
2026-05-15 16:53   ` Dave Hansen
2026-05-18 11:29     ` Chao Gao
2026-05-13 15:09 ` [PATCH v9 06/23] x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs Chao Gao
2026-05-15 17:02   ` Dave Hansen
2026-05-15 17:07   ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 07/23] x86/virt/seamldr: Add a helper to retrieve P-SEAMLDR information Chao Gao
2026-05-15 17:18   ` Dave Hansen
2026-05-18 11:51     ` Chao Gao
2026-05-13 15:09 ` [PATCH v9 08/23] coco/tdx-host: Expose P-SEAMLDR information via sysfs Chao Gao
2026-05-15 17:23   ` Dave Hansen
2026-05-18 12:35     ` Chao Gao
2026-05-15 17:28   ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 09/23] coco/tdx-host: Don't expose P-SEAMLDR information on CPUs with erratum Chao Gao
2026-05-15 17:26   ` Dave Hansen
2026-05-18 12:44     ` Chao Gao
2026-05-18 15:29       ` Dave Hansen
2026-05-19  1:22         ` Edgecombe, Rick P
2026-05-13 15:09 ` [PATCH v9 10/23] coco/tdx-host: Implement firmware upload sysfs ABI for TDX module updates Chao Gao
2026-05-15 18:05   ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 11/23] x86/virt/seamldr: Allocate and populate a module update request Chao Gao
2026-05-15  6:05   ` Chao Gao
2026-05-15 18:24   ` Dave Hansen
2026-05-15 18:44     ` Edgecombe, Rick P
2026-05-15 18:59       ` Dave Hansen
2026-05-18 14:15     ` Chao Gao
2026-05-18 15:12       ` Dave Hansen
2026-05-15 19:43   ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 12/23] x86/virt/seamldr: Introduce skeleton for TDX module updates Chao Gao
2026-05-18 15:36   ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 13/23] x86/virt/seamldr: Abort updates after a failed step Chao Gao
2026-05-19  2:34   ` Edgecombe, Rick P [this message]
2026-05-19 10:20     ` Chao Gao
2026-05-13 15:09 ` [PATCH v9 14/23] x86/virt/seamldr: Shut down the current TDX module Chao Gao
2026-05-19  3:00   ` Edgecombe, Rick P
2026-05-19 12:05     ` Chao Gao
2026-05-19 16:24       ` Dave Hansen
2026-05-13 15:09 ` [PATCH v9 15/23] x86/virt/tdx: Reset software states during TDX module shutdown Chao Gao
2026-05-13 15:09 ` [PATCH v9 16/23] x86/virt/seamldr: Install a new TDX module Chao Gao
2026-05-13 15:10 ` [PATCH v9 17/23] x86/virt/seamldr: Do TDX per-CPU initialization after module installation Chao Gao
2026-05-13 15:10 ` [PATCH v9 18/23] x86/virt/tdx: Restore TDX module state Chao Gao
2026-05-13 15:10 ` [PATCH v9 19/23] x86/virt/tdx: Refresh TDX module version after update Chao Gao
2026-05-19  3:16   ` Edgecombe, Rick P
2026-05-19 10:42     ` Chao Gao
2026-05-13 15:10 ` [PATCH v9 20/23] x86/virt/tdx: Reject updates during compatibility-sensitive operations Chao Gao
2026-05-15  6:12   ` Chao Gao
2026-05-13 15:10 ` [PATCH v9 21/23] x86/virt/tdx: Enable TDX module runtime updates Chao Gao
2026-05-13 15:10 ` [PATCH v9 22/23] coco/tdx-host: Document TDX module update compatibility criteria Chao Gao
2026-05-13 15:10 ` [PATCH v9 23/23] x86/virt/tdx: Document TDX module update Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0c5edb3cf4dfeee986a478aa3db886aa8d5db38.camel@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=djbw@kernel.org \
    --cc=hpa@zytor.com \
    --cc=ira.weiny@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nik.borisov@suse.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=sagis@google.com \
    --cc=seanjc@google.com \
    --cc=tglx@kernel.org \
    --cc=tony.lindgren@linux.intel.com \
    --cc=vannapurve@google.com \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yilun.xu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox