linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: <dan.j.williams@intel.com>
To: Dave Hansen <dave.hansen@intel.com>, <dan.j.williams@intel.com>,
	Chao Gao <chao.gao@intel.com>
Cc: Vishal Annapurve <vannapurve@google.com>,
	"Reshetova, Elena" <elena.reshetova@intel.com>,
	"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"Chatre, Reinette" <reinette.chatre@intel.com>,
	"Weiny, Ira" <ira.weiny@intel.com>,
	"Huang, Kai" <kai.huang@intel.com>,
	"yilun.xu@linux.intel.com" <yilun.xu@linux.intel.com>,
	"sagis@google.com" <sagis@google.com>,
	"paulmck@kernel.org" <paulmck@kernel.org>,
	"nik.borisov@suse.com" <nik.borisov@suse.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	"Kirill A. Shutemov" <kas@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v2 00/21] Runtime TDX Module update support
Date: Fri, 24 Oct 2025 14:12:37 -0700	[thread overview]
Message-ID: <68fbebc54e776_10e9100fd@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <2e49e80f-fab0-4248-8dae-76543e3c6ae3@intel.com>

Dave Hansen wrote:
> On 10/24/25 12:40, dan.j.williams@intel.com wrote:
> > Dave Hansen wrote:
> >> On 10/24/25 00:43, Chao Gao wrote:
> >> ...
> >>> Beyond "the kvm_tdx object gets torn down during a build," I see two potential
> >>> issues:
> >>>
> >>> 1. TD Build and TDX migration aren't purely kernel processes -- they span multiple
> >>>    KVM ioctls. Holding a read-write lock throughout the entire process would
> >>>    require exiting to userspace while the lock is held. I think this is
> >>>    irregular, but I'm not sure if it's acceptable for read-write semaphores.
> >>
> >> Sure, I guess it's irregular. But look at it this way: let's say we
> >> concocted some scheme to use a TD build refcount and a module update
> >> flag, had them both wait_event_interruptible() on each other, and then
> >> did wakeups. That would get the same semantics without an rwsem.
> > 
> > This sounds unworkable to me.
> > 
> > First, you cannot return to userspace while holding a lock. Lockdep will
> > rightfully scream:
> > 
> >     "WARNING: lock held when returning to user space!"
> 
> Well, yup, it sure does look that way for normal lockdep-annotated lock
> types. It does seem like a sane rule to have for most things.
> 
> But, just to be clear, this is a lockdep thing and a good, solid
> semantic to have. It's not a rule that no kernel locking structure can
> ever be held when returning to userspace.

Sure, but I would submit that the lesser known cousin of the common
suggestion "do not write your own locking primitives" is "do not invent
locking schemes that involve holding locks over return to userspace". It
is rarely a good idea to the point that lockdep warns about it by
default.

> > The complexity of ensuring that a multi-stage ABI transaction completes
> > from the kernel side is painful. If that process dies in the middle of
> > its ABI sequence who cleans up these references?
> 
> The 'struct kvm_tdx' has to get destroyed at some point.

Indefinite hangs because a process goes out to lunch and fails to
destroy kvm_tdx in a reasonable timeframe now has knock-on effects.

[..]
> > The operational mechanism to make sure that one process flow does not
> > mess up another process flow is for those process to communicate with
> > *userspace* file locks, or for those process to check for failures after
> > the fact and retry. Unless you can make the build side an atomic ABI,
> > this is a documentation + userspace problem, not a kernel problem.
> 
> Yeah, that's a totally valid take on it.
> 
> My only worry is that the module update is going to be off in another
> world from the thing building TDs. We had a similar set of challenges
> around microcode updates, CPUSVN and SGX enclaves.
> 
> The guy doing "echo 1 > /sys/.../whatever" wasn't coordinating with
> every entity on the system that might run an SGX enclave. It certainly
> didn't help that enclave creation is typically done by unprivileged
> users. Maybe the KVM/TDX world is a _bit_ more narrow and they will be
> talking to each other, or the /dev/kvm permissions will be a nice funnel
> to get them talking to each other.
> 
> The SGX solution, btw, was to at least ensure forward progress (CPUSVN
> update) when the last enclave goes away. So new enclaves aren't
> *prevented* from starting but the window when the first one starts
> (enclave count going from 0->1) is leveraged to do the update.

The status quo does ensure forward progress. The TD does get built and
the update does complete, just the small matter of TD attestation
failures, right?

Note, we had a similar problem with the tsm_report interface which,
because it is configfs and not an ioctl, is a multi-stage ABI to build a
report. If 2 threads collide in building an object, userspace indeed
gets to keep the pieces, but there is:

1/ Documentation of the potential for collisions

2/ A mechanism to detect collisions. See
   /sys/kernel/config/tsm/report/$name/generation in
   Documentation/ABI/testing/configfs-tsm-report

I really would not worry about the "off in another world" problem, it is
par for the course for datacenter operations. I encountered prolific use
of file locks in operations scripts at my time at Facebook. Think of
problems like coordinating disk partitioning across various provisioning
flows. The kernel happily lets 2 fdisk processes race to write a
partition table. The only way to ensure a consistent result in that case
is userspace sequencing, not a kernel lock while some process has a
partition table open.

  reply	other threads:[~2025-10-24 21:12 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-01  2:52 [PATCH v2 00/21] Runtime TDX Module update support Chao Gao
2025-10-01  2:52 ` [PATCH v2 01/21] x86/virt/tdx: Print SEAMCALL leaf numbers in decimal Chao Gao
2025-10-01  2:52 ` [PATCH v2 02/21] x86/virt/tdx: Use %# prefix for hex values in SEAMCALL error messages Chao Gao
2025-10-01  2:52 ` [PATCH v2 03/21] x86/virt/tdx: Move low level SEAMCALL helpers out of <asm/tdx.h> Chao Gao
2025-10-01  2:52 ` [PATCH v2 04/21] x86/virt/tdx: Prepare to support P-SEAMLDR SEAMCALLs Chao Gao
2025-10-01  2:52 ` [PATCH v2 05/21] x86/virt/seamldr: Introduce a wrapper for " Chao Gao
2025-10-01  2:52 ` [PATCH v2 06/21] x86/virt/seamldr: Retrieve P-SEAMLDR information Chao Gao
2025-10-01  2:52 ` [PATCH v2 07/21] coco/tdx-host: Expose P-SEAMLDR information via sysfs Chao Gao
2025-10-30 21:54   ` Sagi Shahar
2025-10-30 23:05     ` dan.j.williams
2025-10-31 14:31       ` Sagi Shahar
2025-10-01  2:52 ` [PATCH v2 08/21] coco/tdx-host: Implement FW_UPLOAD sysfs ABI for TDX Module updates Chao Gao
2025-10-01  2:52 ` [PATCH v2 09/21] x86/virt/seamldr: Block TDX Module updates if any CPU is offline Chao Gao
2025-10-01  2:52 ` [PATCH v2 10/21] x86/virt/seamldr: Verify availability of slots for TDX Module updates Chao Gao
2025-10-01  2:52 ` [PATCH v2 11/21] x86/virt/seamldr: Allocate and populate a module update request Chao Gao
2025-10-01  2:52 ` [PATCH v2 12/21] x86/virt/seamldr: Introduce skeleton for TDX Module updates Chao Gao
2025-10-01  2:52 ` [PATCH v2 13/21] x86/virt/seamldr: Abort updates if errors occurred midway Chao Gao
2025-10-01  2:52 ` [PATCH v2 14/21] x86/virt/seamldr: Shut down the current TDX module Chao Gao
2025-10-01  2:52 ` [PATCH v2 15/21] x86/virt/tdx: Reset software states after TDX module shutdown Chao Gao
2025-10-01  2:53 ` [PATCH v2 16/21] x86/virt/seamldr: Handle TDX Module update failures Chao Gao
2025-10-28  2:53   ` Chao Gao
2025-10-01  2:53 ` [PATCH v2 17/21] x86/virt/seamldr: Install a new TDX Module Chao Gao
2025-10-01  2:53 ` [PATCH v2 18/21] x86/virt/seamldr: Do TDX per-CPU initialization after updates Chao Gao
2025-10-01  2:53 ` [PATCH v2 19/21] x86/virt/tdx: Establish contexts for the new TDX Module Chao Gao
2025-10-01  2:53 ` [PATCH v2 20/21] x86/virt/tdx: Update tdx_sysinfo and check features post-update Chao Gao
2025-10-01  2:53 ` [PATCH v2 21/21] x86/virt/tdx: Enable TDX Module runtime updates Chao Gao
2025-10-14 15:32 ` [PATCH v2 00/21] Runtime TDX Module update support Vishal Annapurve
2025-10-15  8:54   ` Reshetova, Elena
2025-10-15 14:19     ` Vishal Annapurve
2025-10-16  6:48       ` Reshetova, Elena
2025-10-15 15:02     ` Dave Hansen
2025-10-16  6:46       ` Reshetova, Elena
2025-10-16 17:47         ` Vishal Annapurve
2025-10-17 10:08           ` Reshetova, Elena
2025-10-18  0:01             ` Vishal Annapurve
2025-10-21 13:42               ` Reshetova, Elena
2025-10-22  7:14               ` Chao Gao
2025-10-22 15:42                 ` Vishal Annapurve
2025-10-23 20:31                   ` Vishal Annapurve
2025-10-23 21:10                     ` Dave Hansen
2025-10-23 22:00                       ` Vishal Annapurve
2025-10-24  7:43                       ` Chao Gao
2025-10-24 18:02                         ` Dave Hansen
2025-10-24 19:40                           ` dan.j.williams
2025-10-24 20:00                             ` Sean Christopherson
2025-10-24 20:14                               ` Dave Hansen
2025-10-24 21:09                                 ` Vishal Annapurve
2025-10-24 20:13                             ` Dave Hansen
2025-10-24 21:12                               ` dan.j.williams [this message]
2025-10-24 21:19                                 ` Dave Hansen
2025-10-25  0:54                                   ` Vishal Annapurve
2025-10-25  1:42                                     ` dan.j.williams
2025-10-25 11:55                                       ` Vishal Annapurve
2025-10-25 12:01                                         ` Vishal Annapurve
2025-10-26 21:30                                         ` dan.j.williams
2025-10-26 22:01                                           ` Vishal Annapurve
2025-10-27 18:53                                             ` dan.j.williams
2025-10-28  0:42                                               ` Vishal Annapurve
2025-10-28  2:13                                                 ` dan.j.williams
2025-10-28 17:00                                                   ` Erdem Aktas
2025-10-29  0:56                                                     ` Sean Christopherson
2025-10-29  2:17                                                       ` dan.j.williams
2025-10-29 13:48                                                         ` Sean Christopherson
2025-10-30 17:01                                                           ` Vishal Annapurve
2025-10-31  2:53                                                             ` Chao Gao
2025-11-19 22:44                                                               ` Sagi Shahar
2025-11-20  2:47                                                                 ` Chao Gao
2025-10-28 23:48                                                   ` Vishal Annapurve
2025-10-28 20:29                                                 ` dan.j.williams
2025-10-28 20:32                                                   ` dan.j.williams
2025-10-31 16:55 ` Sagi Shahar
2025-10-31 17:57   ` Vishal Annapurve
2025-11-01  2:18     ` Chao Gao
2025-11-01  2:05   ` Chao Gao
2025-11-12 14:09 ` Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68fbebc54e776_10e9100fd@dwillia2-mobl4.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=elena.reshetova@intel.com \
    --cc=hpa@zytor.com \
    --cc=ira.weiny@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kas@kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nik.borisov@suse.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sagis@google.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=x86@kernel.org \
    --cc=yilun.xu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).