public inbox for linux-coco@lists.linux.dev
 help / color / mirror / Atom feed
From: Dan Williams <djbw@kernel.org>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	 "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 "linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 "Gao, Chao" <chao.gao@intel.com>
Cc: "Li, Xiaoyao" <xiaoyao.li@intel.com>,
	 "Huang, Kai" <kai.huang@intel.com>,
	 "Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"kas@kernel.org" <kas@kernel.org>,
	 "Chatre, Reinette" <reinette.chatre@intel.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>,
	 "Verma, Vishal L" <vishal.l.verma@intel.com>,
	"nik.borisov@suse.com" <nik.borisov@suse.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	 "Weiny, Ira" <ira.weiny@intel.com>,
	"tony.lindgren@linux.intel.com" <tony.lindgren@linux.intel.com>,
	"Annapurve, Vishal" <vannapurve@google.com>,
	"sagis@google.com" <sagis@google.com>,
	"djbw@kernel.org" <djbw@kernel.org>,
	"tglx@kernel.org" <tglx@kernel.org>,
	"paulmck@kernel.org" <paulmck@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,  "bp@alien8.de" <bp@alien8.de>,
	"yilun.xu@linux.intel.com" <yilun.xu@linux.intel.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v7 17/22] x86/virt/tdx: Avoid updates during update-sensitive operations
Date: Tue, 14 Apr 2026 14:43:31 -0700	[thread overview]
Message-ID: <69deb5032ff01_147c801004c@djbw-dev.notmuch> (raw)
In-Reply-To: <af7732a08fcce4a21ff14b2038acbb85200c6f34.camel@intel.com>

Edgecombe, Rick P wrote:
[..]
> > If an update races TD build, for example, TD measurement can become
> > incorrect and attestation can fail.
> > 
> > The TDX architecture exposes two approaches:
> > 
> > 1) Avoid updates during update-sensitive operations.
> > 2) Detect incompatibility after update and recover.
> > 
> > Post-update detection (option #2) is not a good fit: as discussed in [1],
> > future module behavior may expand update-sensitive operations in ways that
> > make KVM ABIs unstable and will break userspace.
> > 
> > "Do nothing" is also not preferred: while it keeps kernel code simple, it
> > lets the issue leak into the broader stack, where both detection and
> > recovery require significantly more effort.
> 
> This subject has had a lot of debate (as linked below in the log), but the way
> this is written leaves a lot of questions. "do nothing" is not an option it
> says, but the code does just that when UPDATE_COMPAT_SENSITIVE is not supported.

Right, but that is not the kernel's problem. If you run updates on
a module that does not support conflict detection, you get to keep the
pieces.

Like other cases of Linux not wanting to deal with the eccentricities of
early modules, just have userspace know about a minimum module version
where this protocol exists and accept the risk otherwise. No pressing
need to burden the kernel with carrying worry for early modules.

> > So, use option #1. Specifically, request "avoid update-sensitive" behavior
> > during TDX module shutdown and map the resulting failure to -EBUSY so
> > userspace can distinguish an update race from other failures.
> > 
> > When the "avoid update-sensitive" feature isn't supported, proceed with
> > updates. If a race occurs between module update and update-sensitive
> > operations, failures happen at a later stage (e.g., incorrect TD
> > measurements in attestation reports for TD build). Effectively, this
> > means "let userspace update at their own risk".
> > 
> 
> Above it says we can't just do nothing, we need the flag. And then this argues
> that we can do nothing because we can rely on userspace to deal with the
> issue... This log is maybe just trying to put a brave face on an imperfect
> compromise?

Not really, the log could be simplified to just say "module versions < X
are not safe for runtime update". In general if the kernel had a minimum
supported module version concept it could collect all of these early
deprecation conditions under one check.

> So, while I don't want to re-open the debate, I'm not sure the patch
> justification is going to pass scrutiny as is.
> 
> In the link [2], Dan says "Do not make Linux carry short lived one-off
> complexity", and also "Do not include logic to disable updates, document the
> expectation in the tool."
> 
> It seems this does not exclude the option to just to always pass the compat
> flag. Basically assume that the TDX module will always support
> UPDATE_COMPAT_SENSITIVE if it supports TDX module updates. Which I guess we
> should expect should eventually be true.

Right, assume a minimum module version.

> In [2] Dan was also against checking the UPDATE_COMPAT_SENSITIVE feature0 bit to
> gate the feature.

...because if it must be supported, why check?

> For the record, I don't like allowing the update without the compat bit set, and
> my concern has nothing to do with userspace roles and responsibilities. Instead
> it's because we are over budget on complexity for handling SEAMCALL errors
> within KVM and this makes things worse to keep track of.
> 
> tdh_mem_page_add() does a KVM_BUG_ON() if it sees a non-busy error. Imagine
> working on this code and considering if it is a valid KVM_BUG_ON()? After this
> patch, the answer is...well sometimes. It depends on the previous modules
> specific feature0 bits, an understanding on admins expectations, and the
> behavior of some far away code in arch/x86. Gah.

Why would it be variable? The user tried update on a module that the
kernel deemed unfit for update. "Doctor, it KVM_BUG_ON()s when I run
update".

> Actually, the diff Dan objected to was checking and printing a specific helpful
> error. Maybe he does not mind much more simply checking an extra bit in
> tdx_supports_runtime_update()? Otherwise, I'd think to just unconditionally pass
> UPDATE_COMPAT_SENSITIVE without checking for support. Essentially mandate that
> it is always supported if TDX module update is supported.

In the end all of the hand wringing we are doing is misplaced. The root
of the problem is the TDX Module claiming "runtime update supported" to
include these corner cases of "but we corrupt things if the update tries
to replace the crypto library at the wrong time".

That problem is also solvable by not classifying those updates as
runtime compatible, or defining a protocol to mitigate the collisions.

Neither of those was chosen. Instead the module implemented 2 additional
complications Linux chose one that fails update, other VMMs chose one
that maximizes updates at the cost of needing to restart TD
construction.

I think the changelog is a bit non-commital trying to be diplomatic
about the mess. Simply, Linux wanted the easy button, all runtime
updates are safe. Instead, module exported complexity and optionality.
KVM voted for one flavor of that optionality to accommodate the module
complexity.

The solution, make modules with the option KVM wants the min requirement
and move on.

  reply	other threads:[~2026-04-14 21:43 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31 12:41 [PATCH v7 00/22] Runtime TDX module update support Chao Gao
2026-03-31 12:41 ` [PATCH v7 01/22] x86/virt/tdx: Move low level SEAMCALL helpers out of <asm/tdx.h> Chao Gao
2026-04-10 23:42   ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 02/22] coco/tdx-host: Introduce a "tdx_host" device Chao Gao
2026-03-31 12:41 ` [PATCH v7 03/22] coco/tdx-host: Expose TDX module version Chao Gao
2026-03-31 12:41 ` [PATCH v7 04/22] x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs Chao Gao
2026-04-10 23:58   ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 05/22] x86/virt/seamldr: Add a helper to retrieve P-SEAMLDR information Chao Gao
2026-04-11  0:13   ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 06/22] coco/tdx-host: Expose P-SEAMLDR information via sysfs Chao Gao
2026-03-31 14:58   ` Dave Hansen
2026-04-01  1:57     ` Chao Gao
2026-03-31 14:58   ` Dave Hansen
2026-04-01  2:25     ` Chao Gao
2026-04-13 19:08   ` Edgecombe, Rick P
2026-04-14 11:20     ` Chao Gao
2026-04-14 17:02       ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 07/22] coco/tdx-host: Implement firmware upload sysfs ABI for TDX module updates Chao Gao
2026-03-31 15:04   ` Dave Hansen
2026-04-01  3:10     ` Chao Gao
2026-03-31 15:11   ` Dave Hansen
2026-04-01  7:49     ` Chao Gao
2026-04-11  0:26   ` Edgecombe, Rick P
2026-04-14  9:50     ` Chao Gao
2026-04-14 17:04       ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 08/22] x86/virt/seamldr: Allocate and populate a module update request Chao Gao
2026-03-31 15:44   ` Dave Hansen
2026-04-01  8:27     ` Chao Gao
2026-04-11  0:33       ` Edgecombe, Rick P
2026-04-11  1:14   ` Edgecombe, Rick P
2026-04-14  9:43     ` Chao Gao
2026-04-14 17:37       ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 09/22] x86/virt/seamldr: Introduce skeleton for TDX module updates Chao Gao
2026-04-07 11:49   ` Chao Gao
2026-04-07 15:55     ` Dave Hansen
2026-04-11  1:23   ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 10/22] x86/virt/seamldr: Abort updates if errors occurred midway Chao Gao
2026-04-11  1:26   ` Edgecombe, Rick P
2026-04-14  9:59     ` Chao Gao
2026-04-14 17:41       ` Edgecombe, Rick P
2026-04-15  2:59         ` Chao Gao
2026-03-31 12:41 ` [PATCH v7 11/22] x86/virt/seamldr: Shut down the current TDX module Chao Gao
2026-04-07 11:51   ` Chao Gao
2026-04-11  1:35   ` Edgecombe, Rick P
2026-04-11  1:36     ` Edgecombe, Rick P
2026-04-14 10:09     ` Chao Gao
2026-04-14 17:34       ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 12/22] x86/virt/tdx: Reset software states during TDX module shutdown Chao Gao
2026-04-07 12:02   ` Chao Gao
2026-04-11  1:56   ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 13/22] x86/virt/seamldr: Install a new TDX module Chao Gao
2026-04-11  2:01   ` Edgecombe, Rick P
2026-04-14 10:19     ` Chao Gao
2026-04-14 17:35       ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 14/22] x86/virt/seamldr: Do TDX per-CPU initialization after updates Chao Gao
2026-04-11  2:03   ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 15/22] x86/virt/tdx: Restore TDX module state Chao Gao
2026-04-07 12:07   ` Chao Gao
2026-04-11  2:06     ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 16/22] x86/virt/tdx: Update tdx_sysinfo and check features post-update Chao Gao
2026-04-07 12:15   ` Chao Gao
2026-04-07 15:53     ` Dave Hansen
2026-04-08 12:16       ` Chao Gao
2026-03-31 12:41 ` [PATCH v7 17/22] x86/virt/tdx: Avoid updates during update-sensitive operations Chao Gao
2026-04-06 22:29   ` Sean Christopherson
2026-04-14 19:58   ` Edgecombe, Rick P
2026-04-14 21:43     ` Dan Williams [this message]
2026-04-14 22:20       ` Edgecombe, Rick P
2026-04-15  0:36         ` Dan Williams
2026-04-15  0:52           ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 18/22] coco/tdx-host: Don't expose P-SEAMLDR features on CPUs with erratum Chao Gao
2026-04-13 19:28   ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 19/22] x86/virt/tdx: Enable TDX module runtime updates Chao Gao
2026-04-13 19:40   ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 20/22] coco/tdx-host: Document TDX module update compatibility criteria Chao Gao
2026-03-31 12:41 ` [PATCH v7 21/22] x86/virt/tdx: Document TDX module update Chao Gao
2026-04-13 19:54   ` Edgecombe, Rick P
2026-03-31 12:41 ` [PATCH v7 22/22] x86/virt/seamldr: Log TDX module update failures Chao Gao
2026-04-13 20:04   ` Edgecombe, Rick P
2026-04-14 10:25     ` Chao Gao
2026-04-14 17:39       ` Edgecombe, Rick P
2026-04-15  3:01         ` Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69deb5032ff01_147c801004c@djbw-dev.notmuch \
    --to=djbw@kernel.org \
    --cc=binbin.wu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=ira.weiny@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nik.borisov@suse.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sagis@google.com \
    --cc=seanjc@google.com \
    --cc=tglx@kernel.org \
    --cc=tony.lindgren@linux.intel.com \
    --cc=vannapurve@google.com \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yilun.xu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox