Linux Confidential Computing Development

Linux Confidential Computing Development
 help / color / mirror / Atom feed

* Re: [PATCH v2 5/6] KVM: x86: Track available/dirty register masks as "unsigned long" values
From: Xiaoyao Li @ 2026-04-15 11:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kai Huang, Chang Seok Bae, kvm@vger.kernel.org,
	pbonzini@redhat.com, kas@kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, x86@kernel.org
In-Reply-To: <ad5JeT7UuiQI9Tqo@google.com>

On 4/14/2026 10:04 PM, Sean Christopherson wrote:
> On Tue, Apr 14, 2026, Xiaoyao Li wrote:
>> On 4/14/2026 7:03 AM, Huang, Kai wrote:
>>>> Because VMX and SVM make all GRPs available immediately, except
>>>> for RSP, KVM ignores avail/dirty for GPRs.  I.e. "fixing" TDX will just shift the
>>>> "bugs" elsewhere.
>>> Just want to understand:
>>>
>>> I thought the fix could be we simply remove the wrong GPRs from the list.
>>> Not sure how fixing TDX will shift bugs elsewhere?
>>
>> I'm curious too.
> 
> What I'm saying is that, _if_ there are bugs where KVM uses a register that isn't
> available, then modifying TDX's list won't actually fix anything (without more
> changes), it will just change which code is technically buggy (hence all the quotes
> above).
> 
>>>> More importantly, because the TDX-Module*requires* RCX (the GPR that holds the
>>>> mask of registers to expose to the VMM) to be hidden on TDVMCALL, KVM*can't*
>>>> do any kind of meaningful "available" tracking.
>>>>
>>> Hmm I think RCX conveys the shared GPRs and VMM can read.  Per "Table 5.323:
>>> TDH.VP.ENTER Output Operands Format #5 Definition: On TDCALL(TDG.VP.VMCALL)
>>> Following a TD Entry":
>>>
>>>     RCX   ...
>>> 	Bit(s) Name         Description
>>>
>>> 	31:0   PARAMS_MASK  Value as passed into TDCALL(TDG.VP.VMCALL) by
>>> 			    the guest TD: indicates which part of the guest
>>> 			    TD GPR and XMM state is passed as-is to the
>>> VMM
>>> 			    and back. For details, see the description of
>>> 			    TDG.VP.VMCALL in 5.5.26.
>>>
>>> I think the problem is, as said previously, currently KVM TDX code uses
>>> KVM's existing infrastructure to emulate MSR, KVM hypercall etc,  but
>>> TDVMCALL has a different ABI, thus there's a mismatch here.
>>
>> I once had patch for it internally.
>>
>> It adds back the available check for GPRs when accessing instead of assuming
>> they are always available. For normal VMX and SVM, all the GPRs are still
>> always available. But for TDX, only EXIT_INFO_1 and EXIT_INFO_2 are always
>> marked available, while others need to be explicitly set case by case.
>>
>> The good thing is it makes TDX safer that KVM won't consume invalid data
>> silently for TDX. But it adds additional overhead of checking the
>> unnecessary register availability for VMX and SVM case.
>>
>> -----------------------------&<-------------------------------------
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Date: Tue, 11 Mar 2025 07:13:29 -0400
>> Subject: [PATCH] KVM: x86: Add available check for GPRs
>>
>> Since commit de3cd117ed2f ("KVM: x86: Omit caching logic for
>> always-available GPRs"), KVM doesn't check the availability of GPRs
>> except RSP and RIP when accessing them, because they are always
>> available.
>>
>> However, it's not true when it comes to TDX. The GPRs are not available
>> after TD vcpu exits actually.
> 
>> And it relies on KVM manually sets the
>> GPRs value when needed, e.g.
>>
>>   - setting rax, rbx, rcx, rdx, rsi, for hypercall emulation in
>>     tdx_emulate_tdvmall();
>>
>>   - setting rax, rcx and rdx before MSR write emulation;
>>
>> Add the available check of GPRs read, and WARN_ON_ONCE() when unavailable.
>> It can help capture the cases of undesired GPRs consumption by TDX.
> 
> Sorry, but NAK.  I am strongly against adding any code to the GPR accessors/mutators
> just for TDX.  It's a _lot_ of code.  From commit de3cd117ed2f ("KVM: x86: Omit
> caching logic for always-available GPRs"):
> 
>      E.g. on x86_64, kvm_emulate_cpuid() is reduced from 342 to 182 bytes and
>      kvm_emulate_hypercall() from 1362 to 1143, with the total size of KVM
>      dropping by ~1000 bytes.  With CONFIG_RETPOLINE=y, the numbers are even
>      more pronounced, e.g.: 353->182, 1418->1172 and well over 2000 bytes.

yeah. I had the same feeling that bring up the reduced overhead is not 
acceptable.

> Note that updating only the "available" masks is wrong, as TDX needs to marshall
> written registers back to their correct location.

In what case it needs to marshall written registers back? Can you elaborate?

> In the end, the available/dirty tracking isn't about hardening against bugs, it's
> about deferring expensive VMREAD and VMWRITE (and guest memory) operations until
> action is required.
> 
> We could bury sanity checks behind a Kconfig of some kind, but I genuinely don't
> see much value in doing so.  These emulation flows are very static (all register
> usage is hardcoded), and so it's very much a "get it right once" sort of thing,
> i.e. the odds of a runtime check finding a bug after initial development are
> basically zero.

The initial purpose of writing the code was to find if any case/path in 
KVM that consumes the GPRs for TDX unexpectedly. Not only for the 
hypercall/MSR emulation paths. There are lots of paths in KVM consuming 
the GPRs. It's difficult to aduit every path to ensure either a) it 
won't be reachable by TDX or b) KVM syncs the valid data to the GPRs 
before accessed by TDX.

> An alternative for TDX would be to avoid bouncing through GPRs in the first place,
> e.g. by reworking __kvm_emulate_rdmsr() to not access any registers.  But I'm
> probably opposed to even that, because I doubt the end result would be an overall
> net positive for KVM.  We'd end up with duplicate code, harder to read common
> code (because of the new abstractions), and likely without meaningfully moving
> the needle in terms of finding/preventing bugs.  KVM still needs to get operands
> to/from the right parameters, though only difference is that for TDX, the parameters
> would be very "direct".


^ permalink raw reply

* Re: [PATCH v7 08/22] x86/virt/seamldr: Allocate and populate a module update request
From: Chao Gao @ 2026-04-15 11:04 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <1016261178d32a5c6f20d1507f28388dd01a15df.camel@intel.com>

>> > > +static struct seamldr_params *init_seamldr_params(const u8 *data, u32 size)
>> > > +{
>> > > +	const struct tdx_blob *blob = (const void *)data;
>> > > +	int module_size, sig_size;
>> > > +	const void *sig, *module;
>> > > +
>> > > +	/*
>> > > +	 * Ensure the size is valid otherwise reading any field from the
>> > > +	 * blob may overflow.
>> > > +	 */
>> > > +	if (size <= sizeof(struct tdx_blob) || size <= blob->offset_of_module)
>> > > +		return ERR_PTR(-EINVAL);
>> > > +
>> > > +	if (blob->version != TDX_BLOB_VERSION_1)
>> > > +		return ERR_PTR(-EINVAL);
>> > > +
>> > > +	if (blob->reserved0 || memchr_inv(blob->reserved1, 0, sizeof(blob->reserved1)))
>> > > +		return ERR_PTR(-EINVAL);
>> > > +
>> > > +	/* Split the blob into a sigstruct and a module. */
>> > > +	sig		= blob->data;
>> > > +	sig_size	= blob->offset_of_module - sizeof(struct tdx_blob);
>> > > +	module		= data + blob->offset_of_module;
>> > > +	module_size	= size - blob->offset_of_module;
>> > 
>> > Did you consider just passing the tdx_blob into alloc_seamldr_params()?
>> > Basically, this function checks the blob fields, then alloc_seamldr_params()
>> > turns blob into  struct seamldr_params without checks. The way it is, the work
>> > seems kind of spread around two functions with various checks.
>> 
>> Fine with merging them. 
>> 
>
>I wasn't suggesting to merge them. I was suggesting to have them each do a
>dedicated thing.

Ok. See the code snippet below. Most checks are in init_seamldr_params(),
except the limit checks on module_size and sig_size. Moving them there would
require duplicating the module_size/sig_size calculations. So, I just keep the
checks next to the calculations.

static struct seamldr_params *alloc_seamldr_params(const struct tdx_blob *blob)
{
	struct seamldr_params *params;
	int module_size, sig_size;
	const void *sig, *module;
	const u8 *ptr;
	int i;

	/* Split the blob into a sigstruct and a module. */
	sig		= blob->data;
	sig_size	= blob->offset_of_module - sizeof(struct tdx_blob);
	module		= (u8 *)blob + blob->offset_of_module;
	module_size	= blob->length - blob->offset_of_module;

	if (module_size > SEAMLDR_MAX_NR_MODULE_4KB_PAGES * SZ_4K)
		return ERR_PTR(-EINVAL);

	if (sig_size > SEAMLDR_MAX_NR_SIG_4KB_PAGES * SZ_4K)
		return ERR_PTR(-EINVAL);

	params = (struct seamldr_params *)get_zeroed_page(GFP_KERNEL);
	if (!params)
		return ERR_PTR(-ENOMEM);

	/*
	 * Only use version 1 when required (sigstruct > 4KB) for backward
	 * compatibility with P-SEAMLDR that lacks version 1 support.
	 */
	params->version = sig_size > SZ_4K;
	params->scenario = SEAMLDR_SCENARIO_UPDATE;

	ptr = sig;
	for (i = 0; i < sig_size / SZ_4K; i++) {
		params->sigstruct_pa[i] = vmalloc_to_pfn(ptr) << PAGE_SHIFT;
		ptr += SZ_4K;
	}

	params->num_module_pages = module_size / SZ_4K;

	ptr = module;
	for (i = 0; i < params->num_module_pages; i++) {
		params->mod_pages_pa_list[i] = vmalloc_to_pfn(ptr) << PAGE_SHIFT;
		ptr += SZ_4K;
	}

	return params;
}

static struct seamldr_params *init_seamldr_params(const u8 *data, u32 size)
{
	const struct tdx_blob *blob = (const void *)data;

	/*
	 * Ensure the size is valid otherwise reading any field from the
	 * blob may overflow.
	 */
	if (size <= sizeof(struct tdx_blob))
		return ERR_PTR(-EINVAL);

	if (blob->version != TDX_BLOB_VERSION_1)
		return ERR_PTR(-EINVAL);

	if (blob->length != size)
		return ERR_PTR(-EINVAL);

	if (memcmp(blob->signature, "TDX-BLOB", 8))
		return ERR_PTR(-EINVAL);

	if (blob->reserved0 || memchr_inv(blob->reserved1, 0, sizeof(blob->reserved1)))
		return ERR_PTR(-EINVAL);

	/* Ensure the offset_of_module is within valid range and aligned. */
	if (blob->offset_of_module >= size ||
	    blob->offset_of_module <= sizeof(struct tdx_blob))
		return ERR_PTR(-EINVAL);

	if (!IS_ALIGNED(blob->offset_of_module, SZ_4K))
		return ERR_PTR(-EINVAL);

	return alloc_seamldr_params(blob);
}

^ permalink raw reply

* Re: [PATCH v13 00/48] arm64: Support for Arm CCA in KVM
From: Steven Price @ 2026-04-15 11:01 UTC (permalink / raw)
  To: Alper Gun
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve
In-Reply-To: <CABpDEukEO4Y_fg8fv5Nr1_Pw_qOK=2UmioXk=WyPEzoEprPcGA@mail.gmail.com>

On 14/04/2026 22:40, Alper Gun wrote:
> On Wed, Mar 18, 2026 at 8:54 AM Steven Price <steven.price@arm.com> wrote:
>>
>> This series adds support for running protected VMs using KVM under the
>> Arm Confidential Compute Architecture (CCA).
>>
>> New major version number! This now targets RMM v2.0-bet0[1]. And unlike
>> for Linux this represents a significant change.
>>
>> RMM v2.0 brings with it the ability to configure the RMM to have the
>> same page size as the host (so no more RMM_PAGE_SIZE and dealing with
>> granules being different from host pages). It also introduces range
>> based APIs for many operations which should be more efficient and
>> simplifies the code in places.
>>
>> The handling of the GIC has changed, so the system registers are used to
>> pass the GIC state rather than memory. This means fewer changes to the
>> KVM code as it looks much like a normal VM in this respect.
>>
>> And of course the new uAPI introduced in the previous v12 posting is
>> retained so that also remains simplified compared to earlier postings.
>>
>> The RMM support for v2.0 is still early and so this series includes a
>> few hacks to ease the integration. Of note are that there are some RMM
>> v1.0 SMCs added to paper over areas where the RMM implementation isn't
>> quite ready for v2.0, and "SROs" (see below) are deferred to the final
>> patch in the series.
>>
>> The PMU in RMM v2.0 requires more handling on the RMM-side (and
>> therefore simplifies the implementation on Linux), but this isn't quite
>> ready yet. The Linux side is implemented (but untested).
>>
>> PSCI still requires the VMM to provide the "target" REC for operations
>> that affect another vCPU. This is likely to change in a future version
>> of the specification. There's also a desire to force PSCI to be handled
>> in the VMM for realm guests - this isn't implemented yet as I'm waiting
>> for the dust to settle on the RMM interface first.
>>
>> Stateful RMI Operations
>> -----------------------
>>
>> The RMM v2.0 spec brings a new concept of Stateful RMI Operations (SROs)
>> which allow the RMM to complete an operation over several SMC calls and
>> requesting/returning memory to the host. This has the benefit of
>> allowing interrupts to be handled in the middle of an operation (by
>> returning to the host to handle the interrupt without completing the
>> operation) and enables the RMM to dynamically allocate memory for
>> internal tracking purposes. One example of this is RMI_REC_CREATE no
>> longer needs "auxiliary granules" provided upfront but can request the
>> memory needed during the RMI_REC_CREATE operation.
>>
>> There are a fairly large number of operations that are defined as SROs
>> in the specification, but current both Linux and RMM only have support
>> for RMI_REC_CREATE and RMI_REC_DESTROY. There a number of TODOs/FIXMEs
>> in the code where support is missing.
>>
>> Given the early stage support for this, the SRO handling is all confined
>> to the final patch. This patch can be dropped to return to a pre-SRO
>> state (albeit a mixture of RMM v1.0 and v2.0 APIs) for testing purposes.
>>
>> A future posting will reorder the series to move the generic SRO support
>> to an early patch and will implement the proper support for this in all
>> RMI SMCs.
>>
>> One aspect of SROs which is not yet well captured is that in some
>> circumstances the Linux kernel will need to call an SRO call in a
>> context where memory allocation is restricted (e.g. because a spinlock
>> is held). In this case the intention is that the SRO will be cancelled,
>> the spinlock dropped so the memory allocation can be completed, and then
>> the SRO restarted (obviously after rechecking the state that the
>> spinlock was protecting). For this reason the code stores the memory
>> allocations within a struct rmi_sro_state object - see the final patch
>> for more details.
>>
>> This series is based on v7.0-rc1. It is also available as a git
>> repository:
>>
>> https://gitlab.arm.com/linux-arm/linux-cca cca-host/v13
>>
>>
> 
> Hi Steven,
> 
> I have a question regarding host kexec and kdump scenarios, and
> whether there is any plan to make them work in this initial series.
> 
> Intel TDX and AMD SEV-SNP both have a firmware shutdown command that
> is invoked during the kexec or panic code paths to safely bypass
> hardware memory protections and boot into the new kernel. As far as
> I know, there is no similar global teardown command available for
> the RMM.

Correct, the RMM specification as it stands doesn't provide a mechanism
for the host to do this. The host would have to identify all the realm
guests in the system: specifically the address of the RDs (Realm
Descriptors) and RECs (Realm Execution Contexts). It needs this to tear
down the guests and be able to undelegate the memory.

It's an interesting point and I'll raise the idea of a "firmware
shutdown command" to make this more possible.

> What is the roadmap for supporting both general kexec and
> more specifically kdump (panic) scenarios with CCA?

I don't have a roadmap I'm afraid for these. kexec in theory would be
possible with KVM gracefully terminating all realms. For kdump/panic
that sort of graceful shutdown isn't really appropriate (or likely to
succeed).

There is also some RMM configuration which cannot be repeated (see
RMI_RMM_CONFIG_SET) - which implies that the kexec kernel must be
similar to the first kernel (i.e. same page size).

Thanks,
Steve

^ permalink raw reply

* Re: [PATCH kernel 4/9] dma/swiotlb: Stop forcing SWIOTLB for TDISP devices
From: Alexey Kardashevskiy @ 2026-04-15  6:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dan.j.williams, Robin Murphy, x86, linux-kernel, kvm, linux-pci,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Sean Christopherson, Paolo Bonzini,
	Andy Lutomirski, Peter Zijlstra, Bjorn Helgaas, Marek Szyprowski,
	Andrew Morton, Catalin Marinas, Michael Ellerman, Mike Rapoport,
	Tom Lendacky, Ard Biesheuvel, Ashish Kalra, Stefano Garzarella,
	Melody Wang, Seongman Lee, Joerg Roedel, Nikunj A Dadhania,
	Michael Roth, Suravee Suthikulpanit, Andi Kleen,
	Kuppuswamy Sathyanarayanan, Tony Luck, David Woodhouse,
	Greg Kroah-Hartman, Denis Efremov, Geliang Tang, Piotr Gregor,
	Michael S. Tsirkin, Alex Williamson, Arnd Bergmann, Jesse Barnes,
	Jacob Pan, Yinghai Lu, Kevin Brodsky, Jonathan Cameron,
	Aneesh Kumar K.V (Arm), Xu Yilun, Herbert Xu, Kim Phillips,
	Konrad Rzeszutek Wilk, Stefano Stabellini, Claire Chang,
	linux-coco, iommu
In-Reply-To: <074633eb-103a-4452-9aec-6cd61e82f0cb@amd.com>



On 3/4/26 23:40, Alexey Kardashevskiy wrote:
> 
> 
> On 4/3/26 23:43, Jason Gunthorpe wrote:
>> On Wed, Mar 04, 2026 at 05:45:31PM +1100, Alexey Kardashevskiy wrote:
>>
>>>> I suspect AMD needs to use their vTOM feature to allow shared memory
>>>> to remain available to TDISP RUN with a high/low address split.
>>>
>>> I could probably do something about it bit I wonder what is the real
>>> live use case which requires leaking SME mask, have a live example
>>> which I could try recreating?
>>
>> We need shared memory allocated through a DMABUF heap:
>>
>> https://lore.kernel.org/all/20260223095136.225277-1-jiri@resnulli.us/
>>
>> To work with all PCI devices in the system, TDISP or not.
>>
>> Without this the ability for a TDISP device to ingest (encrypted) data
>> requires all kinds of memcpy..
>>
>> So the DMA API should see the DMA_ATTR_CC_DECRYPTED and setup the
>> correct dma_dddr_t either by choosing the shared alias for the TDISP
>> device's vTOM, or setting the C bit in a vIOMMU S1.
> 
> Something like that?
> 
> https://github.com/AMDESE/linux-kvm/commit/266a41a1ea746557eb63debce886ce2c98820667
> 
> With some little hacks I can make this tree do TDISP DMA to private or shared (swiotlb) memory by steering via this vTOM thing. Thanks,


Ping? Thanks,


> 
> 
> 
>>
>> Jason
> 

-- 
Alexey


^ permalink raw reply

* Re: [PATCH v7 22/22] x86/virt/seamldr: Log TDX module update failures
From: Chao Gao @ 2026-04-15  3:01 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <2b1741f544a395b75ab60617a9023651ca99a9c8.camel@intel.com>

On Wed, Apr 15, 2026 at 01:39:07AM +0800, Edgecombe, Rick P wrote:
>On Tue, 2026-04-14 at 18:25 +0800, Chao Gao wrote:
>> On Tue, Apr 14, 2026 at 04:04:05AM +0800, Edgecombe, Rick P wrote:
>> > On Tue, 2026-03-31 at 05:41 -0700, Chao Gao wrote:
>> > > Currently, there is no way to restore a TDX module from shutdown state to
>> > > running state. This means if errors occur after a successful module
>> > > shutdown, they are unrecoverable since the old module is gone but the new
>> > > module isn't installed. All subsequent SEAMCALLs to the TDX module will
>> > > fail, so TDs will be killed due to SEAMCALL failures.
>> > > 
>> > > Log a message to clarify that SEAMCALL errors are expected in this
>> > > scenario. This ensures that after update failures, the first message in
>> > > dmesg explains the situation rather than showing confusing call traces from
>> > > various code paths.
>> > 
>> > Why is this patch at the end? Is it not valid initial behavior?
>> 
>> It's not strictly required for the update to function. It just improves
>> debuggability by explaining expected SEAMCALL errors after a failure.
>> 
>> I placed it last so maintainers can easily drop it if they consider it
>> non-essential, similar to the guidance in *.
>> 
>> *: https://lore.kernel.org/kvm/63082cd1-15ab-4aaf-83ad-f72d94b9bb8e@intel.com/
>
>Dave says:
>   If you don't need it to "turn the lights on", I say kick it out.
>   
>To me that says to drop non critical pieces. Not put it last with no explanation
>that it is optional...

Sure. Will drop this patch.

^ permalink raw reply

* Re: [PATCH v7 10/22] x86/virt/seamldr: Abort updates if errors occurred midway
From: Chao Gao @ 2026-04-15  2:59 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <80a01d40855136a5825e5f353a8cc2982dde385a.camel@intel.com>

On Wed, Apr 15, 2026 at 01:41:06AM +0800, Edgecombe, Rick P wrote:
>On Tue, 2026-04-14 at 17:59 +0800, Chao Gao wrote:
>> The main point is correctness, not saving time.
>> 
>> If shutdown fails midway, the update is still recoverable — TDs can continue
>> running. But if we proceed to seamldr.install anyway, it becomes destructive.
>> Aborting early on shutdown failure preserves recoverability (this is needed to
>> handle races between updates and TD build/migration).
>> 
>> If seamldr.install itself fails, it's already destructive, so aborting early
>> there just saves time. But using the same abort mechanism for both keeps the
>> error handling uniform.
>
>If it's non-required for "turning the lights on" it seems aligned with Dave's
>suggestion you highlighted to drop it from the series.

It is required for the shutdown case. Without early abort, a shutdown
failure (recoverable) proceeds to seamldr.install, escalating it to a
destructive failure. The race handling between TD build and updates
depends on shutdown failures being non-destructive so users can retry.

^ permalink raw reply

* Re: [PATCH v2 1/6] KVM: x86: Add dedicated storage for guest RIP
From: Xiaoyao Li @ 2026-04-15  1:28 UTC (permalink / raw)
  To: Sean Christopherson, Chang S. Bae
  Cc: Paolo Bonzini, Kiryl Shutsemau, kvm, x86, linux-coco,
	linux-kernel
In-Reply-To: <ad5fUya3W9BKO_iA@google.com>

On 4/14/2026 11:37 PM, Sean Christopherson wrote:
> On Tue, Apr 14, 2026, Chang S. Bae wrote:
>> On 4/14/2026 5:31 AM, Xiaoyao Li wrote:
>>> Even leave RIP in regs[], what is the problem by just allocating the
>>> index 16-31 to R16-R31 and making RIP the index 32?
>>
>> But why?
>>
>> Even though the array isn't explicitly labeled as GPRs, that's effectively
>> how it's being used, and RIP isn't part of that set.
>>
>> I don't think there is any benefit of leaving it in regs[].
> 
> +1.  Chang's earlier argument that RIP isn't a proper GPR swayed me over, e.g. RIP
> doesn't have an architectural index.
> 
> Keeping RIP in regs[] saves one line of code in arch/x86/include/asm/kvm_host.h,
> at the cost of making the code less readable (IMO) and incorrectly suggesting that
> RIP can be accessed like other regs[].

I'm not trying to object this patch. Instead, I'm trying to understand 
the justification of the change.

So I would expected an updated changelog with above justifications 
incorporated.

^ permalink raw reply

* Re: [PATCH v7 17/22] x86/virt/tdx: Avoid updates during update-sensitive operations
From: Edgecombe, Rick P @ 2026-04-15  0:52 UTC (permalink / raw)
  To: kvm@vger.kernel.org, linux-coco@lists.linux.dev, djbw@kernel.org,
	linux-kernel@vger.kernel.org, Gao, Chao
  Cc: Li, Xiaoyao, Huang, Kai, Zhao, Yan Y, dave.hansen@linux.intel.com,
	tony.lindgren@linux.intel.com, Chatre, Reinette,
	seanjc@google.com, kas@kernel.org, binbin.wu@linux.intel.com,
	Verma, Vishal L, nik.borisov@suse.com, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, hpa@zytor.com, sagis@google.com,
	Annapurve, Vishal, tglx@kernel.org, paulmck@kernel.org,
	bp@alien8.de, yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <69dedd953ddc8_147c80100d@djbw-dev.notmuch>

On Tue, 2026-04-14 at 17:36 -0700, Dan Williams wrote:
> Yes. Just pass that flag unconditionally and treat "operand invalid" as
> another "can not do a compatible update" case.

Great, thanks Dan.

^ permalink raw reply

* Re: [PATCH v7 17/22] x86/virt/tdx: Avoid updates during update-sensitive operations
From: Dan Williams @ 2026-04-15  0:36 UTC (permalink / raw)
  To: Edgecombe, Rick P, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev, djbw@kernel.org,
	linux-kernel@vger.kernel.org, Gao, Chao
  Cc: Li, Xiaoyao, Huang, Kai, Zhao, Yan Y, dave.hansen@linux.intel.com,
	kas@kernel.org, seanjc@google.com, binbin.wu@linux.intel.com,
	Weiny, Ira, mingo@redhat.com, Verma, Vishal L,
	nik.borisov@suse.com, Chatre, Reinette, pbonzini@redhat.com,
	tony.lindgren@linux.intel.com, sagis@google.com,
	Annapurve, Vishal, hpa@zytor.com, tglx@kernel.org,
	paulmck@kernel.org, bp@alien8.de, yilun.xu@linux.intel.com,
	x86@kernel.org
In-Reply-To: <c44a4aebca86584ec33a19fbd9ca464443e66718.camel@intel.com>

Edgecombe, Rick P wrote:
> On Tue, 2026-04-14 at 14:43 -0700, Dan Williams wrote:
> > > tdh_mem_page_add() does a KVM_BUG_ON() if it sees a non-busy error. Imagine
> > > working on this code and considering if it is a valid KVM_BUG_ON()? After
> > > this
> > > patch, the answer is...well sometimes. It depends on the previous modules
> > > specific feature0 bits, an understanding on admins expectations, and the
> > > behavior of some far away code in arch/x86. Gah.
> > 
> > Why would it be variable? The user tried update on a module that the
> > kernel deemed unfit for update. "Doctor, it KVM_BUG_ON()s when I run
> > update".
> 
> The objection is not an unprepared user having an issue. It's that this adds
> burden to the TDX MMU developers who have to keep track of which errors are
> valid and which are not. Doing that is maybe _the_ major challenge for
> maintaining that code, that code is the trickiest of TDX KVM, and we have an
> easy option here to not muddy it further.

I think we are in violent agreement. If a runtime update causes
tdh_mem_page_add() failure then it is simply not a runtime update.

[..]
> > I think the changelog is a bit non-commital trying to be diplomatic
> > about the mess. Simply, Linux wanted the easy button, all runtime
> > updates are safe. Instead, module exported complexity and optionality.
> > KVM voted for one flavor of that optionality to accommodate the module
> > complexity.
> 
> It doesn't pick a flavor.

Oh, you mean in the sense that it proceeds without compat support? Then
yes, agree.

> It tries to handle them both depending on TDX module support. See:
>  
> +	if (tdx_sysinfo.features.tdx_features0 & TDX_FEATURES0_UPDATE_COMPAT)
> +		args.rcx |= TDX_SYS_SHUTDOWN_AVOID_COMPAT_SENSITIVE;
> 
> So it sounds like you have no objection to only supporting the mode that is easy
> (TDX_SYS_SHUTDOWN_AVOID_COMPAT_SENSITIVE). If we always pass this flag, then the
> TDX MMU developers can have less details to keep in their heads.

Yes. Just pass that flag unconditionally and treat "operand invalid" as
another "can not do a compatible update" case.

> I think Chao took this as an objection to unsupporting
> !TDX_FEATURES0_UPDATE_COMPAT updates, when it really was an objection to the
> kernel trying too hard to help the admin:
> https://lore.kernel.org/kvm/69a0c3d24310_1cc5100d1@dwillia2-mobl4.notmuch/

Drop !TDX_FEATURES0_UPDATE_COMPAT accommodation is the intent. Be mean
to old modules if it saves comments and conditionals in the kernel.

^ permalink raw reply

* Re: [PATCH v2 5/6] KVM: x86: Track available/dirty register masks as "unsigned long" values
From: Huang, Kai @ 2026-04-14 22:21 UTC (permalink / raw)
  To: seanjc@google.com
  Cc: linux-coco@lists.linux.dev, kvm@vger.kernel.org, Bae, Chang Seok,
	pbonzini@redhat.com, kas@kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org
In-Reply-To: <ad5h5hO7_QmJlOVk@google.com>

On Tue, 2026-04-14 at 08:48 -0700, Sean Christopherson wrote:
> On Mon, Apr 13, 2026, Kai Huang wrote:
> > On Mon, 2026-04-13 at 07:54 -0700, Sean Christopherson wrote:
> > > More importantly, because the TDX-Module *requires* RCX (the GPR that holds the
> > > mask of registers to expose to the VMM) to be hidden on TDVMCALL, KVM *can't*
> > > do any kind of meaningful "available" tracking.  
> > > 
> > 
> > Hmm I think RCX conveys the shared GPRs and VMM can read.  Per "Table 5.323:
> > TDH.VP.ENTER Output Operands Format #5 Definition: On TDCALL(TDG.VP.VMCALL)
> > Following a TD Entry":
> > 
> >   RCX   ...
> > 	Bit(s) Name         Description
> > 
> > 	31:0   PARAMS_MASK  Value as passed into TDCALL(TDG.VP.VMCALL) by
> > 			    the guest TD: indicates which part of the guest
> > 			    TD GPR and XMM state is passed as-is to the
> > VMM 
> > 			    and back. For details, see the description of
> > 			    TDG.VP.VMCALL in 5.5.26.
> 
> The problem is that bit 1 in RCX is required to be '0'.  I.e. the guest *can't*
> expose RCX to the VMM.  From the spec:
> 
>   15:0    GPR Mask Controls the transfer of GPR values:
>   Bit 0:  RAX (must be 0)
>   Bit 1:  RCX (must be 0)
> 
> And the code:
> 
>   api_error_type tdg_vp_vmcall(uint64_t controller_value)
>   {
>     api_error_type retval = TDX_OPERAND_INVALID;
>     tdx_module_local_t* tdx_local_data_ptr = get_local_data();
> 
>     tdvmcall_control_t control = { .raw = controller_value };
> 
>     // Bits 0, 1 and 4 and 63:32 of RCX must be 0
>     if (((control.gpr_select & (uint16_t)(BIT(0) | BIT(1) | BIT(4))) != 0) ||  <==== sadness
>          (control.reserved != 0))
>     {
>         retval = api_error_with_operand_id(TDX_OPERAND_INVALID, OPERAND_ID_RCX);
>         TDX_ERROR("Unsupported bits in GPR_SELECT field = 0x%x\n", control.gpr_select)
>         goto EXIT_FAILURE;
>     }
> 
> Oh, dagnabbit.  The spec also says:
> 
>   The value of RCX itself is always passed to the host VMM.
> 
> and then in code:
> 
>     td_exit_qual.gpr_select = control.gpr_select;
>     td_exit_qual.xmm_select = control.xmm_select;
> 
>     tdx_local_data_ptr->vmm_regs.rcx = td_exit_qual.raw;
> 
>     // RAX is not copied, RCX filled above, start from RDX
> 
> I don't get why TDX requires bit 1 to be 0, but whatever.

Right.  It's a bit confusing unfortunately.  Maybe because they think RCX
has a special purpose and don't want to mix it with other registers.

> 
> So I was wrong, KVM can (and should!) validate the registers coming from the
> guest.  If we want to harden TDX, that's the obvious first step.

And by "harden TDX" you mean to validate the necessary GPRs are indeed
marked as shared in RCX for each GHCI-defined TDVMCALL, but otherwise return
error to TD immediately (basically like what sev_es_validate_vmgexit() does
IIUC)?

^ permalink raw reply

* Re: [PATCH v7 17/22] x86/virt/tdx: Avoid updates during update-sensitive operations
From: Edgecombe, Rick P @ 2026-04-14 22:20 UTC (permalink / raw)
  To: kvm@vger.kernel.org, linux-coco@lists.linux.dev, djbw@kernel.org,
	linux-kernel@vger.kernel.org, Gao, Chao
  Cc: Li, Xiaoyao, Huang, Kai, Zhao, Yan Y, dave.hansen@linux.intel.com,
	kas@kernel.org, seanjc@google.com, binbin.wu@linux.intel.com,
	Weiny, Ira, mingo@redhat.com, Verma, Vishal L,
	nik.borisov@suse.com, Chatre, Reinette, pbonzini@redhat.com,
	tony.lindgren@linux.intel.com, sagis@google.com,
	Annapurve, Vishal, hpa@zytor.com, tglx@kernel.org,
	paulmck@kernel.org, bp@alien8.de, yilun.xu@linux.intel.com,
	x86@kernel.org
In-Reply-To: <69deb5032ff01_147c801004c@djbw-dev.notmuch>

On Tue, 2026-04-14 at 14:43 -0700, Dan Williams wrote:
> > tdh_mem_page_add() does a KVM_BUG_ON() if it sees a non-busy error. Imagine
> > working on this code and considering if it is a valid KVM_BUG_ON()? After
> > this
> > patch, the answer is...well sometimes. It depends on the previous modules
> > specific feature0 bits, an understanding on admins expectations, and the
> > behavior of some far away code in arch/x86. Gah.
> 
> Why would it be variable? The user tried update on a module that the
> kernel deemed unfit for update. "Doctor, it KVM_BUG_ON()s when I run
> update".

The objection is not an unprepared user having an issue. It's that this adds
burden to the TDX MMU developers who have to keep track of which errors are
valid and which are not. Doing that is maybe _the_ major challenge for
maintaining that code, that code is the trickiest of TDX KVM, and we have an
easy option here to not muddy it further.

> 
> > Actually, the diff Dan objected to was checking and printing a specific
> > helpful
> > error. Maybe he does not mind much more simply checking an extra bit in
> > tdx_supports_runtime_update()? Otherwise, I'd think to just unconditionally
> > pass
> > UPDATE_COMPAT_SENSITIVE without checking for support. Essentially mandate
> > that
> > it is always supported if TDX module update is supported.
> 
> In the end all of the hand wringing we are doing is misplaced. The root
> of the problem is the TDX Module claiming "runtime update supported" to
> include these corner cases of "but we corrupt things if the update tries
> to replace the crypto library at the wrong time".
> 
> That problem is also solvable by not classifying those updates as
> runtime compatible, or defining a protocol to mitigate the collisions.
> 
> Neither of those was chosen. Instead the module implemented 2 additional
> complications Linux chose one that fails update, other VMMs chose one
> that maximizes updates at the cost of needing to restart TD
> construction.

Totally, agree. We might be talking past each other. Let's not put burden on the
kernel developers when we can assume some TDX module behavior (Compat is always
supported) that we need, right?

> 
> I think the changelog is a bit non-commital trying to be diplomatic
> about the mess. Simply, Linux wanted the easy button, all runtime
> updates are safe. Instead, module exported complexity and optionality.
> KVM voted for one flavor of that optionality to accommodate the module
> complexity.

It doesn't pick a flavor. It tries to handle them both depending on TDX module
support. See:
 
+	if (tdx_sysinfo.features.tdx_features0 & TDX_FEATURES0_UPDATE_COMPAT)
+		args.rcx |= TDX_SYS_SHUTDOWN_AVOID_COMPAT_SENSITIVE;

So it sounds like you have no objection to only supporting the mode that is easy
(TDX_SYS_SHUTDOWN_AVOID_COMPAT_SENSITIVE). If we always pass this flag, then the
TDX MMU developers can have less details to keep in their heads.

I think Chao took this as an objection to unsupporting
!TDX_FEATURES0_UPDATE_COMPAT updates, when it really was an objection to the
kernel trying too hard to help the admin:
https://lore.kernel.org/kvm/69a0c3d24310_1cc5100d1@dwillia2-mobl4.notmuch/


> 
> The solution, make modules with the option KVM wants the min requirement
> and move on.


^ permalink raw reply

* Re: [PATCH v7 17/22] x86/virt/tdx: Avoid updates during update-sensitive operations
From: Dan Williams @ 2026-04-14 21:43 UTC (permalink / raw)
  To: Edgecombe, Rick P, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org,
	Gao, Chao
  Cc: Li, Xiaoyao, Huang, Kai, Zhao, Yan Y, dave.hansen@linux.intel.com,
	kas@kernel.org, Chatre, Reinette, seanjc@google.com,
	pbonzini@redhat.com, binbin.wu@linux.intel.com, Verma, Vishal L,
	nik.borisov@suse.com, mingo@redhat.com, Weiny, Ira,
	tony.lindgren@linux.intel.com, Annapurve, Vishal,
	sagis@google.com, djbw@kernel.org, tglx@kernel.org,
	paulmck@kernel.org, hpa@zytor.com, bp@alien8.de,
	yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <af7732a08fcce4a21ff14b2038acbb85200c6f34.camel@intel.com>

Edgecombe, Rick P wrote:
[..]
> > If an update races TD build, for example, TD measurement can become
> > incorrect and attestation can fail.
> > 
> > The TDX architecture exposes two approaches:
> > 
> > 1) Avoid updates during update-sensitive operations.
> > 2) Detect incompatibility after update and recover.
> > 
> > Post-update detection (option #2) is not a good fit: as discussed in [1],
> > future module behavior may expand update-sensitive operations in ways that
> > make KVM ABIs unstable and will break userspace.
> > 
> > "Do nothing" is also not preferred: while it keeps kernel code simple, it
> > lets the issue leak into the broader stack, where both detection and
> > recovery require significantly more effort.
> 
> This subject has had a lot of debate (as linked below in the log), but the way
> this is written leaves a lot of questions. "do nothing" is not an option it
> says, but the code does just that when UPDATE_COMPAT_SENSITIVE is not supported.

Right, but that is not the kernel's problem. If you run updates on
a module that does not support conflict detection, you get to keep the
pieces.

Like other cases of Linux not wanting to deal with the eccentricities of
early modules, just have userspace know about a minimum module version
where this protocol exists and accept the risk otherwise. No pressing
need to burden the kernel with carrying worry for early modules.

> > So, use option #1. Specifically, request "avoid update-sensitive" behavior
> > during TDX module shutdown and map the resulting failure to -EBUSY so
> > userspace can distinguish an update race from other failures.
> > 
> > When the "avoid update-sensitive" feature isn't supported, proceed with
> > updates. If a race occurs between module update and update-sensitive
> > operations, failures happen at a later stage (e.g., incorrect TD
> > measurements in attestation reports for TD build). Effectively, this
> > means "let userspace update at their own risk".
> > 
> 
> Above it says we can't just do nothing, we need the flag. And then this argues
> that we can do nothing because we can rely on userspace to deal with the
> issue... This log is maybe just trying to put a brave face on an imperfect
> compromise?

Not really, the log could be simplified to just say "module versions < X
are not safe for runtime update". In general if the kernel had a minimum
supported module version concept it could collect all of these early
deprecation conditions under one check.

> So, while I don't want to re-open the debate, I'm not sure the patch
> justification is going to pass scrutiny as is.
> 
> In the link [2], Dan says "Do not make Linux carry short lived one-off
> complexity", and also "Do not include logic to disable updates, document the
> expectation in the tool."
> 
> It seems this does not exclude the option to just to always pass the compat
> flag. Basically assume that the TDX module will always support
> UPDATE_COMPAT_SENSITIVE if it supports TDX module updates. Which I guess we
> should expect should eventually be true.

Right, assume a minimum module version.

> In [2] Dan was also against checking the UPDATE_COMPAT_SENSITIVE feature0 bit to
> gate the feature.

...because if it must be supported, why check?

> For the record, I don't like allowing the update without the compat bit set, and
> my concern has nothing to do with userspace roles and responsibilities. Instead
> it's because we are over budget on complexity for handling SEAMCALL errors
> within KVM and this makes things worse to keep track of.
> 
> tdh_mem_page_add() does a KVM_BUG_ON() if it sees a non-busy error. Imagine
> working on this code and considering if it is a valid KVM_BUG_ON()? After this
> patch, the answer is...well sometimes. It depends on the previous modules
> specific feature0 bits, an understanding on admins expectations, and the
> behavior of some far away code in arch/x86. Gah.

Why would it be variable? The user tried update on a module that the
kernel deemed unfit for update. "Doctor, it KVM_BUG_ON()s when I run
update".

> Actually, the diff Dan objected to was checking and printing a specific helpful
> error. Maybe he does not mind much more simply checking an extra bit in
> tdx_supports_runtime_update()? Otherwise, I'd think to just unconditionally pass
> UPDATE_COMPAT_SENSITIVE without checking for support. Essentially mandate that
> it is always supported if TDX module update is supported.

In the end all of the hand wringing we are doing is misplaced. The root
of the problem is the TDX Module claiming "runtime update supported" to
include these corner cases of "but we corrupt things if the update tries
to replace the crypto library at the wrong time".

That problem is also solvable by not classifying those updates as
runtime compatible, or defining a protocol to mitigate the collisions.

Neither of those was chosen. Instead the module implemented 2 additional
complications Linux chose one that fails update, other VMMs chose one
that maximizes updates at the cost of needing to restart TD
construction.

I think the changelog is a bit non-commital trying to be diplomatic
about the mess. Simply, Linux wanted the easy button, all runtime
updates are safe. Instead, module exported complexity and optionality.
KVM voted for one flavor of that optionality to accommodate the module
complexity.

The solution, make modules with the option KVM wants the min requirement
and move on.

^ permalink raw reply

* Re: [PATCH v13 00/48] arm64: Support for Arm CCA in KVM
From: Alper Gun @ 2026-04-14 21:40 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve
In-Reply-To: <20260318155413.793430-1-steven.price@arm.com>

On Wed, Mar 18, 2026 at 8:54 AM Steven Price <steven.price@arm.com> wrote:
>
> This series adds support for running protected VMs using KVM under the
> Arm Confidential Compute Architecture (CCA).
>
> New major version number! This now targets RMM v2.0-bet0[1]. And unlike
> for Linux this represents a significant change.
>
> RMM v2.0 brings with it the ability to configure the RMM to have the
> same page size as the host (so no more RMM_PAGE_SIZE and dealing with
> granules being different from host pages). It also introduces range
> based APIs for many operations which should be more efficient and
> simplifies the code in places.
>
> The handling of the GIC has changed, so the system registers are used to
> pass the GIC state rather than memory. This means fewer changes to the
> KVM code as it looks much like a normal VM in this respect.
>
> And of course the new uAPI introduced in the previous v12 posting is
> retained so that also remains simplified compared to earlier postings.
>
> The RMM support for v2.0 is still early and so this series includes a
> few hacks to ease the integration. Of note are that there are some RMM
> v1.0 SMCs added to paper over areas where the RMM implementation isn't
> quite ready for v2.0, and "SROs" (see below) are deferred to the final
> patch in the series.
>
> The PMU in RMM v2.0 requires more handling on the RMM-side (and
> therefore simplifies the implementation on Linux), but this isn't quite
> ready yet. The Linux side is implemented (but untested).
>
> PSCI still requires the VMM to provide the "target" REC for operations
> that affect another vCPU. This is likely to change in a future version
> of the specification. There's also a desire to force PSCI to be handled
> in the VMM for realm guests - this isn't implemented yet as I'm waiting
> for the dust to settle on the RMM interface first.
>
> Stateful RMI Operations
> -----------------------
>
> The RMM v2.0 spec brings a new concept of Stateful RMI Operations (SROs)
> which allow the RMM to complete an operation over several SMC calls and
> requesting/returning memory to the host. This has the benefit of
> allowing interrupts to be handled in the middle of an operation (by
> returning to the host to handle the interrupt without completing the
> operation) and enables the RMM to dynamically allocate memory for
> internal tracking purposes. One example of this is RMI_REC_CREATE no
> longer needs "auxiliary granules" provided upfront but can request the
> memory needed during the RMI_REC_CREATE operation.
>
> There are a fairly large number of operations that are defined as SROs
> in the specification, but current both Linux and RMM only have support
> for RMI_REC_CREATE and RMI_REC_DESTROY. There a number of TODOs/FIXMEs
> in the code where support is missing.
>
> Given the early stage support for this, the SRO handling is all confined
> to the final patch. This patch can be dropped to return to a pre-SRO
> state (albeit a mixture of RMM v1.0 and v2.0 APIs) for testing purposes.
>
> A future posting will reorder the series to move the generic SRO support
> to an early patch and will implement the proper support for this in all
> RMI SMCs.
>
> One aspect of SROs which is not yet well captured is that in some
> circumstances the Linux kernel will need to call an SRO call in a
> context where memory allocation is restricted (e.g. because a spinlock
> is held). In this case the intention is that the SRO will be cancelled,
> the spinlock dropped so the memory allocation can be completed, and then
> the SRO restarted (obviously after rechecking the state that the
> spinlock was protecting). For this reason the code stores the memory
> allocations within a struct rmi_sro_state object - see the final patch
> for more details.
>
> This series is based on v7.0-rc1. It is also available as a git
> repository:
>
> https://gitlab.arm.com/linux-arm/linux-cca cca-host/v13
>
>

Hi Steven,

I have a question regarding host kexec and kdump scenarios, and
whether there is any plan to make them work in this initial series.

Intel TDX and AMD SEV-SNP both have a firmware shutdown command that
is invoked during the kexec or panic code paths to safely bypass
hardware memory protections and boot into the new kernel. As far as
I know, there is no similar global teardown command available for
the RMM.

What is the roadmap for supporting both general kexec and
more specifically kdump (panic) scenarios with CCA?

Thanks,
Alper

^ permalink raw reply

* Re: [PATCH v7 17/22] x86/virt/tdx: Avoid updates during update-sensitive operations
From: Edgecombe, Rick P @ 2026-04-14 19:58 UTC (permalink / raw)
  To: kvm@vger.kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org, Gao, Chao
  Cc: Li, Xiaoyao, Huang, Kai, Zhao, Yan Y, dave.hansen@linux.intel.com,
	kas@kernel.org, Chatre, Reinette, seanjc@google.com,
	pbonzini@redhat.com, binbin.wu@linux.intel.com, Verma, Vishal L,
	nik.borisov@suse.com, mingo@redhat.com, Weiny, Ira,
	tony.lindgren@linux.intel.com, Annapurve, Vishal,
	sagis@google.com, djbw@kernel.org, tglx@kernel.org,
	paulmck@kernel.org, hpa@zytor.com, bp@alien8.de,
	yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <20260331124214.117808-18-chao.gao@intel.com>

On Tue, 2026-03-31 at 05:41 -0700, Chao Gao wrote:
> A runtime TDX module update can conflict with TD lifecycle operations that
> are update-sensitive.
> 
> Today, update-sensitive operations include:
> 
> - TD build: TD measurement is accumulated across multiple
>   TDH.MEM.PAGE.ADD, TDH.MR.EXTEND, and TDH.MR.FINALIZE calls.

How exactly does this work? There is a new error I see:
TDX_INCOMPATIBLE_MRTD_CONTEXT. It gets returned by those seamcalls sometimes.

I think(?)... TDX_INCOMPATIBLE_MRTD_CONTEXT gets returned by those seamcalls if
the problematic type of tdx module update happens between TDH.MNG.INIT and
TDH.MR.FINALIZE. So if the compat flag is not passed into shutdown... then KVM
gets some new surprise error codes to handle deep in the MMU.

> 
> - TD migration: intermediate crypto state is saved/restored across
>   interrupted/resumed TDH.EXPORT.STATE.* and TDH.IMPORT.STATE.* flows.

We don't need to consider migration for this change.

> 
> If an update races TD build, for example, TD measurement can become
> incorrect and attestation can fail.
> 
> The TDX architecture exposes two approaches:
> 
> 1) Avoid updates during update-sensitive operations.
> 2) Detect incompatibility after update and recover.
> 
> Post-update detection (option #2) is not a good fit: as discussed in [1],
> future module behavior may expand update-sensitive operations in ways that
> make KVM ABIs unstable and will break userspace.
> 
> "Do nothing" is also not preferred: while it keeps kernel code simple, it
> lets the issue leak into the broader stack, where both detection and
> recovery require significantly more effort.

This subject has had a lot of debate (as linked below in the log), but the way
this is written leaves a lot of questions. "do nothing" is not an option it
says, but the code does just that when UPDATE_COMPAT_SENSITIVE is not supported.

> 
> So, use option #1. Specifically, request "avoid update-sensitive" behavior
> during TDX module shutdown and map the resulting failure to -EBUSY so
> userspace can distinguish an update race from other failures.
> 
> When the "avoid update-sensitive" feature isn't supported, proceed with
> updates. If a race occurs between module update and update-sensitive
> operations, failures happen at a later stage (e.g., incorrect TD
> measurements in attestation reports for TD build). Effectively, this
> means "let userspace update at their own risk".
> 

Above it says we can't just do nothing, we need the flag. And then this argues
that we can do nothing because we can rely on userspace to deal with the
issue... This log is maybe just trying to put a brave face on an imperfect
compromise?

So, while I don't want to re-open the debate, I'm not sure the patch
justification is going to pass scrutiny as is.

In the link [2], Dan says "Do not make Linux carry short lived one-off
complexity", and also "Do not include logic to disable updates, document the
expectation in the tool."

It seems this does not exclude the option to just to always pass the compat
flag. Basically assume that the TDX module will always support
UPDATE_COMPAT_SENSITIVE if it supports TDX module updates. Which I guess we
should expect should eventually be true.

In [2] Dan was also against checking the UPDATE_COMPAT_SENSITIVE feature0 bit to
gate the feature.

For the record, I don't like allowing the update without the compat bit set, and
my concern has nothing to do with userspace roles and responsibilities. Instead
it's because we are over budget on complexity for handling SEAMCALL errors
within KVM and this makes things worse to keep track of.

tdh_mem_page_add() does a KVM_BUG_ON() if it sees a non-busy error. Imagine
working on this code and considering if it is a valid KVM_BUG_ON()? After this
patch, the answer is...well sometimes. It depends on the previous modules
specific feature0 bits, an understanding on admins expectations, and the
behavior of some far away code in arch/x86. Gah.

Actually, the diff Dan objected to was checking and printing a specific helpful
error. Maybe he does not mind much more simply checking an extra bit in
tdx_supports_runtime_update()? Otherwise, I'd think to just unconditionally pass
UPDATE_COMPAT_SENSITIVE without checking for support. Essentially mandate that
it is always supported if TDX module update is supported.

>  Userspace can check if
> the feature is supported or not. The alternative of blocking updates
> entirely is rejected [2] as it introduces permanent kernel complexity to
> accommodate limitations in early TDX module releases that userspace can
> handle.
> 
> Note: this implementation is based on a reference patch by Vishal [3].
> Note2: moving "NO_RBP_MOD" is just to centralize bit definitions.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
> Link: https://lore.kernel.org/linux-coco/aQIbM5m09G0FYTzE@google.com/ # [1]
> Link: https://lore.kernel.org/kvm/699fe97dc212f_2f4a100b@dwillia2-mobl4.notmuch/ # [2]
> Link: https://lore.kernel.org/linux-coco/CAGtprH_oR44Vx9Z0cfxvq5-QbyLmy_+Gn3tWm3wzHPmC1nC0eg@mail.gmail.com/ # [3]
> ---

^ permalink raw reply

* Re: [PATCH v7 10/22] x86/virt/seamldr: Abort updates if errors occurred midway
From: Edgecombe, Rick P @ 2026-04-14 17:41 UTC (permalink / raw)
  To: Gao, Chao
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <ad4P9HZEOnC3wUH3@intel.com>

On Tue, 2026-04-14 at 17:59 +0800, Chao Gao wrote:
> The main point is correctness, not saving time.
> 
> If shutdown fails midway, the update is still recoverable — TDs can continue
> running. But if we proceed to seamldr.install anyway, it becomes destructive.
> Aborting early on shutdown failure preserves recoverability (this is needed to
> handle races between updates and TD build/migration).
> 
> If seamldr.install itself fails, it's already destructive, so aborting early
> there just saves time. But using the same abort mechanism for both keeps the
> error handling uniform.

If it's non-required for "turning the lights on" it seems aligned with Dave's
suggestion you highlighted to drop it from the series.

^ permalink raw reply

* Re: [PATCH v7 22/22] x86/virt/seamldr: Log TDX module update failures
From: Edgecombe, Rick P @ 2026-04-14 17:39 UTC (permalink / raw)
  To: Gao, Chao
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <ad4WCJdJ2MrGvkWX@intel.com>

On Tue, 2026-04-14 at 18:25 +0800, Chao Gao wrote:
> On Tue, Apr 14, 2026 at 04:04:05AM +0800, Edgecombe, Rick P wrote:
> > On Tue, 2026-03-31 at 05:41 -0700, Chao Gao wrote:
> > > Currently, there is no way to restore a TDX module from shutdown state to
> > > running state. This means if errors occur after a successful module
> > > shutdown, they are unrecoverable since the old module is gone but the new
> > > module isn't installed. All subsequent SEAMCALLs to the TDX module will
> > > fail, so TDs will be killed due to SEAMCALL failures.
> > > 
> > > Log a message to clarify that SEAMCALL errors are expected in this
> > > scenario. This ensures that after update failures, the first message in
> > > dmesg explains the situation rather than showing confusing call traces from
> > > various code paths.
> > 
> > Why is this patch at the end? Is it not valid initial behavior?
> 
> It's not strictly required for the update to function. It just improves
> debuggability by explaining expected SEAMCALL errors after a failure.
> 
> I placed it last so maintainers can easily drop it if they consider it
> non-essential, similar to the guidance in *.
> 
> *: https://lore.kernel.org/kvm/63082cd1-15ab-4aaf-83ad-f72d94b9bb8e@intel.com/

Dave says:
   If you don't need it to "turn the lights on", I say kick it out.
   
To me that says to drop non critical pieces. Not put it last with no explanation
that it is optional...

^ permalink raw reply

* Re: [PATCH v7 08/22] x86/virt/seamldr: Allocate and populate a module update request
From: Edgecombe, Rick P @ 2026-04-14 17:37 UTC (permalink / raw)
  To: Gao, Chao
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <ad4MWR0DyOaI0C50@intel.com>

On Tue, 2026-04-14 at 17:43 +0800, Chao Gao wrote:
> On Sat, Apr 11, 2026 at 09:14:36AM +0800, Edgecombe, Rick P wrote:
> > On Tue, 2026-03-31 at 05:41 -0700, Chao Gao wrote:
> > > P-SEAMLDR uses the SEAMLDR_PARAMS structure to describe TDX module
> > > update requests. This structure contains physical addresses pointing to
> > > the module binary and its signature file (or sigstruct), along with an
> > > update scenario field.
> > > 
> > > TDX modules are distributed in the tdx_blob format defined in
> > > blob_structure.txt from the "Intel TDX module Binaries Repository". A
> > > tdx_blob contains a header, sigstruct, and module binary. This is also the
> > > format supplied by the userspace to the kernel.
> > > 
> > > Parse the tdx_blob format and populate a SEAMLDR_PARAMS structure
> > > accordingly. This structure will be passed to P-SEAMLDR to initiate the
> > > update.
> > 
> > The thing that confused me about this earlier was the exact reason why we are
> > checking all the fields. We discussed that we need to check the fields that
> > kernel processes, but we don't need to double check data that the TDX module
> > processes.
> > 
> > Should we explain it?
> 
> Yes. how about adding the following in the changelog:
> 
> Parse the tdx_blob format and populate a SEAMLDR_PARAMS structure. The
> header is consumed solely by the kernel to extract the sigstruct and
> module, so validate it before processing. The sigstruct and module are
> passed to and validated by P-SEAMLDR, so don't duplicate any validation
> in the kernel.

Looks good.

> 
> > And how it explains the checks below?
> 
> 
> > > +static void free_seamldr_params(struct seamldr_params *params)
> > > +{
> > > +	free_page((unsigned long)params);
> > > +}
> > 
> > Do we really need this helper? This doesn't work?
> 
> No. This works. Will do.
> 
> > 
> > DEFINE_FREE(free_seamldr_params, struct seamldr_params *,
> > 	    if (!IS_ERR_OR_NULL(_T)) free_page((unsigned long)_T))
> > 
> > 
> > > +
> > > +static struct seamldr_params *alloc_seamldr_params(const void *module, unsigned int module_size,
> > > +						   const void *sig, unsigned int sig_size)
> > > +{
> > > +	struct seamldr_params *params;
> > > +	const u8 *ptr;
> > > +	int i;
> > > +
> > > +	if (module_size > SEAMLDR_MAX_NR_MODULE_4KB_PAGES * SZ_4K)
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > > +	if (sig_size > SEAMLDR_MAX_NR_SIG_4KB_PAGES * SZ_4K)
> > > +		return ERR_PTR(-EINVAL);
> > 
> > I don't know if it's worth that much, but we could do a MIN thing here to
> > protect the loop, and lose the conditionals. If userspace passes a blob that is
> > out of spec they can deal with the module error, no?
> 
> I think we all agree: we need to do something to protect the loop. So, the
> question is just how.
> 
> looks like you prefer:
> 
> 	module_size = MIN(module_size, SEAMLDR_MAX_NR_MODULE_4KB_PAGES * SZ_4K);
> 	sig_size = MIN(sig_size, SEAMLDR_MAX_NR_SIG_4KB_PAGES * SZ_4K);
> 
> But I think the MIN approach actually requires more justification than a plain
> bounds check.
> 
> With MIN, the reasoning chain is: we don't want conditionals -> userspace
> can deal with the module error -> and that error won't be fatal to the
> system (or even if it is, it's admin-initiated). That's a lot of assumptions
> to validate.
> 
> With a bounds check and early return, the reasoning is simply: the input is out
> of range -> reject it as invalid.

Hmm, I'm convinced. I think a lot of tip code elides conditionals. I'm not sure
which they would prefer.

I guess the problem I see is that all these conditionals seem like a lot for
something that is mostly just getting passed through. So they need to come
across better somehow.

> 
> > 
> > > +
> > > +	/*
> > > +	 * Check that input buffers satisfy P-SEAMLDR's size and alignment
> > > +	 * constraints so they can be passed directly to P-SEAMLDR without
> > > +	 * relocation or copy.
> > > +	 */
> > > +	if (!IS_ALIGNED(module_size, SZ_4K) || !IS_ALIGNED(sig_size, SZ_4K) ||
> > > +	    !IS_ALIGNED((unsigned long)module, SZ_4K) ||
> > > +	    !IS_ALIGNED((unsigned long)sig, SZ_4K))
> > > +		return ERR_PTR(-EINVAL);
> > 
> > I thought you are going to reduce this checking to just to reject invalid input
> > that the kernel processes.
> > 
> > What happens if we don't check this?
> 
> If the blob->offset_of_module or blob->size is misaligned, the loops
> silently drop the unaligned portion. P-SEAMLDR will then reject the update
> after module shutdown and TDs will be killed. Since this is admin-initiated
> with an intentionally invalid blob, the consequence is acceptable.

Seems like we could drop it then.

> 
> > The vmallocs are all going to be page
> > aligned anyway. But even still, does it mess up the below loops somehow in a way
> > that hurts anything?
> 
> I agree addresses/sizes are page-aligned for valid blobs, but they are not
> guaranteed (e.g., if offset_of_module is misaligned) for all inputs. I
> removed the check once in v6 and Sashiko questioned it, so I thought the
> assumption isn't obviously correct to everyone.

Haha, as impressed as I am with AI code review... are we already at the point of
adding comments to help the AI not get confused?

> 
> Without the check, we'd need something like:
> 
> 	/*
> 	 * Assume sig/module base addresses and sizes are page-aligned.
> 	 * If violated, P-SEAMLDR will reject the update.
> 	 */
> 
> It's a few lines of straightforward validation vs. a comment that requires
> readers to verify the assumption chain (vmalloc internals, userspace
> contract, P-SEAMLDR behavior). Not sure the trade is worth it.
> 
> If you think the comment is sufficient, I'm fine dropping the checks.

The checks seem like too much. Any ideas on how to contain it more? It seems
like you are making the case for each one. But if we delete them all... does it
seem like a better tradeoff?

> 
> > 
> > I might be confused, but it seems different then we discussed.
> > 
> > > +
> > > +	params = (struct seamldr_params *)get_zeroed_page(GFP_KERNEL);
> > > +	if (!params)
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	/*
> > > +	 * Only use version 1 when required (sigstruct > 4KB) for backward
> > > +	 * compatibility with P-SEAMLDR that lacks version 1 support.
> > > +	 */
> > > +	if (sig_size > SZ_4K)
> > > +		params->version = 1;
> > > +	else
> > > +		params->version = 0;
> > 
> > I'm a bit confused by this part. What does it mean to support old P-SEAMLDRs?
> 
> All current P-SEAMLDRs only support version 0. Version 1 is a planned
> extension but not yet implemented by any P-SEAMLDR. Always requiring
> version 1 just saves us a conditional but disables updates for all existing
> P-SEAMLDRs for no reason. I think it is not worthwhile.

Sorry, I still don't understand. Why does the kernel need to set this bit? It's
an opt-in for a format change? How many old TDX modules are we talking?

I think, connecting to the discussion in the later patches, that we should try
to keep the initial implementation as small as possible. Everything we add makes
things take a little bit longer. And while we spin new versions the kernel has
zero support. So I see that this is another non-critical enhancement.

> 
> > But also could it be:
> > 
> > params->version = sig_size > SZ_4K;
> 
> ok with me. I just wanted to make it match with "version 1" in the
> comment right above.
> 
> > > +static struct seamldr_params *init_seamldr_params(const u8 *data, u32 size)
> > > +{
> > > +	const struct tdx_blob *blob = (const void *)data;
> > > +	int module_size, sig_size;
> > > +	const void *sig, *module;
> > > +
> > > +	/*
> > > +	 * Ensure the size is valid otherwise reading any field from the
> > > +	 * blob may overflow.
> > > +	 */
> > > +	if (size <= sizeof(struct tdx_blob) || size <= blob->offset_of_module)
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > > +	if (blob->version != TDX_BLOB_VERSION_1)
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > > +	if (blob->reserved0 || memchr_inv(blob->reserved1, 0, sizeof(blob->reserved1)))
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > > +	/* Split the blob into a sigstruct and a module. */
> > > +	sig		= blob->data;
> > > +	sig_size	= blob->offset_of_module - sizeof(struct tdx_blob);
> > > +	module		= data + blob->offset_of_module;
> > > +	module_size	= size - blob->offset_of_module;
> > 
> > Did you consider just passing the tdx_blob into alloc_seamldr_params()?
> > Basically, this function checks the blob fields, then alloc_seamldr_params()
> > turns blob into  struct seamldr_params without checks. The way it is, the work
> > seems kind of spread around two functions with various checks.
> 
> Fine with merging them. 
> 

I wasn't suggesting to merge them. I was suggesting to have them each do a
dedicated thing.

> How about:
> 
> static struct seamldr_params *alloc_seamldr_params(const u8 *data, u32 size)
> {
> 	const struct tdx_blob *blob = (const void *)data;
> 	struct seamldr_params *params;
> 	int module_size, sig_size;
> 	const void *sig, *module;
> 	const u8 *ptr;
> 	int i;
> 
> 	/*
> 	 * Ensure the size is valid otherwise reading any field from the
> 	 * blob may overflow.
> 	 */
> 	if (size <= sizeof(struct tdx_blob))
> 		return ERR_PTR(-EINVAL);
> 
> 	if (blob->version != TDX_BLOB_VERSION_1)
> 		return ERR_PTR(-EINVAL);
> 	
> 	if (blob->length != size)
> 		return ERR_PTR(-EINVAL);
> 
> 	if (memcmp(blob->signature, "TDX-BLOB", 8))
> 		return ERR_PTR(-EINVAL);
> 
> 	if (blob->reserved0 || memchr_inv(blob->reserved1, 0, sizeof(blob->reserved1)))
> 		return ERR_PTR(-EINVAL);
> 
> 	/* Split the blob into a sigstruct and a module. */
> 	sig		= blob->data;
> 	sig_size	= blob->offset_of_module - sizeof(struct tdx_blob);
> 	module		= data + blob->offset_of_module;
> 	module_size	= size - blob->offset_of_module;
> 
> 	if (module_size > SEAMLDR_MAX_NR_MODULE_4KB_PAGES * SZ_4K || module_size <= 0)
> 		return ERR_PTR(-EINVAL);
> 
> 	if (sig_size > SEAMLDR_MAX_NR_SIG_4KB_PAGES * SZ_4K || sig_size <= 0)
> 		return ERR_PTR(-EINVAL);
> 
> 	params = (struct seamldr_params *)get_zeroed_page(GFP_KERNEL);
> 	if (!params)
> 		return ERR_PTR(-ENOMEM);
> 
> 	/*
> 	 * Only use version 1 when required (sigstruct > 4KB) for backward
> 	 * compatibility with P-SEAMLDR that lacks version 1 support.
> 	 */
> 	params->version = sig_size > SZ_4K;
> 	params->scenario = SEAMLDR_SCENARIO_UPDATE;
> 
> 	ptr = sig;
> 	/*
> 	 * Assume sig/module base addresses and sizes are page-aligned.
> 	 * If violated, P-SEAMLDR will reject the update.
> 	 */
> 	for (i = 0; i < sig_size / SZ_4K; i++) {
> 		params->sigstruct_pa[i] = vmalloc_to_pfn(ptr) << PAGE_SHIFT;
> 		ptr += SZ_4K;
> 	}
> 
> 	params->num_module_pages = module_size / SZ_4K;
> 
> 	ptr = module;
> 	for (i = 0; i < params->num_module_pages; i++) {
> 		params->mod_pages_pa_list[i] = vmalloc_to_pfn(ptr) << PAGE_SHIFT;
> 		ptr += SZ_4K;
> 	}
> 
> 	return params;
> }


^ permalink raw reply

* Re: [PATCH v7 13/22] x86/virt/seamldr: Install a new TDX module
From: Edgecombe, Rick P @ 2026-04-14 17:35 UTC (permalink / raw)
  To: Gao, Chao
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <ad4UsiCjY6V+Ox5D@intel.com>

On Tue, 2026-04-14 at 18:19 +0800, Chao Gao wrote:
> > Why the style difference? It would be good to standardize, but the existing
> > code
> > isn't standardized. What do you think about going with this style through
> > the
> > series for the one arg ones?
> 
> Sure, happy to standardize. To confirm, you prefer this style?
> 
> 	struct tdx_module_args args = { .rcx = __pa(params) };
> 
> it looks more common than the other way.

They both seem fine to me. And, yes, this is probably in the nit category. But
if it looks sloppy it can add doubts to the code for the final review.

^ permalink raw reply

* Re: [PATCH v7 11/22] x86/virt/seamldr: Shut down the current TDX module
From: Edgecombe, Rick P @ 2026-04-14 17:34 UTC (permalink / raw)
  To: Gao, Chao
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <ad4SZdlybWkLtLEm@intel.com>

On Tue, 2026-04-14 at 18:09 +0800, Chao Gao wrote:
> Retrieve the module's handoff version from TDX global metadata and add an
> update step to shut down the module. Module shutdown has global effect, so
> it only needs to run on one CPU.

Sure.

^ permalink raw reply

* Re: [PATCH v7 07/22] coco/tdx-host: Implement firmware upload sysfs ABI for TDX module updates
From: Edgecombe, Rick P @ 2026-04-14 17:04 UTC (permalink / raw)
  To: Gao, Chao
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <ad4N6giqSJcnZ/o6@intel.com>

On Tue, 2026-04-14 at 17:50 +0800, Chao Gao wrote:
> > It's hard to review whether these error codes match because the function
> > doesn't
> > return them yet. Why isn't this patch just done later in the series after
> > everything is in place?
> 
> Good point.
> 
> The series is ordered top-down (user interface first, implementation details
> later). I could either move this patch later, or add each error mapping in the
> patch that introduces the corresponding return value. I'd prefer the latter to
> keep the top-down ordering. Would that work for you?

Either way sounds ok to me.

^ permalink raw reply

* Re: [PATCH v7 06/22] coco/tdx-host: Expose P-SEAMLDR information via sysfs
From: Edgecombe, Rick P @ 2026-04-14 17:02 UTC (permalink / raw)
  To: Gao, Chao
  Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
	Huang, Kai, x86@kernel.org, Zhao, Yan Y,
	dave.hansen@linux.intel.com, kas@kernel.org, mingo@redhat.com,
	Weiny, Ira, pbonzini@redhat.com, Chatre, Reinette,
	Verma, Vishal L, nik.borisov@suse.com, seanjc@google.com,
	linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com,
	Annapurve, Vishal, hpa@zytor.com, sagis@google.com,
	tony.lindgren@linux.intel.com, paulmck@kernel.org,
	tglx@kernel.org, yilun.xu@linux.intel.com,
	dan.j.williams@intel.com, bp@alien8.de
In-Reply-To: <ad4jFpCYbW5laTbr@intel.com>

On Tue, 2026-04-14 at 19:20 +0800, Chao Gao wrote:
> > Can you explain how all of these overlap?
> > - TDX module supports module update
> > - SEAMLDR supports NUM_REMAINING_UPDATES info
> > - SEAMLDR supports VERSION info
> > 
> > If the TDX module supports module update, do we know the SEAMLDR supports
> > this other stuff somehow? It might be worth a comment the reasoning.
> 
> VERSION and NUM_REMAINING_UPDATES are always available for any P-SEAMLDR. They
> don't depend on TDX module's update support.

The old seamldr's don't support them though, right? At least the docs I'm
looking at show NUM_REMAINING_UPDATES as a change in the spec. Oh this update is
from 2022. So quite old now.

^ permalink raw reply

* SVSM Development Call April 15, 2026
From: Jörg Rödel @ 2026-04-14 16:15 UTC (permalink / raw)
  To: coconut-svsm, linux-coco

Hi,

Here is the call for agenda items for this weeks SVSM development call.  Please
send any agenda items you have in mind as a reply to this email or raise them
in the meeting.

We will use the LF Zoom instance. Details of the meeting  can be found in our
governance repository at:

	https://github.com/coconut-svsm/governance

The link to the COCONUT-SVSM calendar is:

	https://zoom-lfx.platform.linuxfoundation.org/meetings/coconut-svsm?view=week

The meeting will be recorded and the recording eventually published.

Regards,

	Jörg

^ permalink raw reply

* Re: [PATCH v2 5/6] KVM: x86: Track available/dirty register masks as "unsigned long" values
From: Sean Christopherson @ 2026-04-14 15:48 UTC (permalink / raw)
  To: Kai Huang
  Cc: Chang Seok Bae, kvm@vger.kernel.org, pbonzini@redhat.com,
	kas@kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, x86@kernel.org
In-Reply-To: <d6f05e5fa781ccb465d27a4fb1c7c1ac1e9e95ff.camel@intel.com>

On Mon, Apr 13, 2026, Kai Huang wrote:
> On Mon, 2026-04-13 at 07:54 -0700, Sean Christopherson wrote:
> > More importantly, because the TDX-Module *requires* RCX (the GPR that holds the
> > mask of registers to expose to the VMM) to be hidden on TDVMCALL, KVM *can't*
> > do any kind of meaningful "available" tracking.  
> > 
> 
> Hmm I think RCX conveys the shared GPRs and VMM can read.  Per "Table 5.323:
> TDH.VP.ENTER Output Operands Format #5 Definition: On TDCALL(TDG.VP.VMCALL)
> Following a TD Entry":
> 
>   RCX   ...
> 	Bit(s) Name         Description
> 
> 	31:0   PARAMS_MASK  Value as passed into TDCALL(TDG.VP.VMCALL) by
> 			    the guest TD: indicates which part of the guest
> 			    TD GPR and XMM state is passed as-is to the
> VMM 
> 			    and back. For details, see the description of
> 			    TDG.VP.VMCALL in 5.5.26.

The problem is that bit 1 in RCX is required to be '0'.  I.e. the guest *can't*
expose RCX to the VMM.  From the spec:

  15:0    GPR Mask Controls the transfer of GPR values:
  Bit 0:  RAX (must be 0)
  Bit 1:  RCX (must be 0)

And the code:

  api_error_type tdg_vp_vmcall(uint64_t controller_value)
  {
    api_error_type retval = TDX_OPERAND_INVALID;
    tdx_module_local_t* tdx_local_data_ptr = get_local_data();

    tdvmcall_control_t control = { .raw = controller_value };

    // Bits 0, 1 and 4 and 63:32 of RCX must be 0
    if (((control.gpr_select & (uint16_t)(BIT(0) | BIT(1) | BIT(4))) != 0) ||  <==== sadness
         (control.reserved != 0))
    {
        retval = api_error_with_operand_id(TDX_OPERAND_INVALID, OPERAND_ID_RCX);
        TDX_ERROR("Unsupported bits in GPR_SELECT field = 0x%x\n", control.gpr_select)
        goto EXIT_FAILURE;
    }

Oh, dagnabbit.  The spec also says:

  The value of RCX itself is always passed to the host VMM.

and then in code:

    td_exit_qual.gpr_select = control.gpr_select;
    td_exit_qual.xmm_select = control.xmm_select;

    tdx_local_data_ptr->vmm_regs.rcx = td_exit_qual.raw;

    // RAX is not copied, RCX filled above, start from RDX

I don't get why TDX requires bit 1 to be 0, but whatever.

So I was wrong, KVM can (and should!) validate the registers coming from the
guest.  If we want to harden TDX, that's the obvious first step.

^ permalink raw reply

* Re: [PATCH v2 1/6] KVM: x86: Add dedicated storage for guest RIP
From: Sean Christopherson @ 2026-04-14 15:37 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: Xiaoyao Li, Paolo Bonzini, Kiryl Shutsemau, kvm, x86, linux-coco,
	linux-kernel
In-Reply-To: <87e767b6-0324-44e6-92cb-f933002dec43@intel.com>

On Tue, Apr 14, 2026, Chang S. Bae wrote:
> On 4/14/2026 5:31 AM, Xiaoyao Li wrote:
> > Even leave RIP in regs[], what is the problem by just allocating the
> > index 16-31 to R16-R31 and making RIP the index 32?
> 
> But why?
> 
> Even though the array isn't explicitly labeled as GPRs, that's effectively
> how it's being used, and RIP isn't part of that set.
> 
> I don't think there is any benefit of leaving it in regs[]. 

+1.  Chang's earlier argument that RIP isn't a proper GPR swayed me over, e.g. RIP
doesn't have an architectural index.

Keeping RIP in regs[] saves one line of code in arch/x86/include/asm/kvm_host.h,
at the cost of making the code less readable (IMO) and incorrectly suggesting that
RIP can be accessed like other regs[].

^ permalink raw reply

* Re: [PATCH v2 5/6] KVM: x86: Track available/dirty register masks as "unsigned long" values
From: Sean Christopherson @ 2026-04-14 14:04 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Kai Huang, Chang Seok Bae, kvm@vger.kernel.org,
	pbonzini@redhat.com, kas@kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, x86@kernel.org
In-Reply-To: <95a931f8-42cc-4834-953c-30c9167bfdc1@intel.com>

On Tue, Apr 14, 2026, Xiaoyao Li wrote:
> On 4/14/2026 7:03 AM, Huang, Kai wrote:
> > > Because VMX and SVM make all GRPs available immediately, except
> > > for RSP, KVM ignores avail/dirty for GPRs.  I.e. "fixing" TDX will just shift the
> > > "bugs" elsewhere.
> > Just want to understand:
> > 
> > I thought the fix could be we simply remove the wrong GPRs from the list.
> > Not sure how fixing TDX will shift bugs elsewhere?
> 
> I'm curious too.

What I'm saying is that, _if_ there are bugs where KVM uses a register that isn't
available, then modifying TDX's list won't actually fix anything (without more
changes), it will just change which code is technically buggy (hence all the quotes
above).

> > > More importantly, because the TDX-Module*requires* RCX (the GPR that holds the
> > > mask of registers to expose to the VMM) to be hidden on TDVMCALL, KVM*can't*
> > > do any kind of meaningful "available" tracking.
> > > 
> > Hmm I think RCX conveys the shared GPRs and VMM can read.  Per "Table 5.323:
> > TDH.VP.ENTER Output Operands Format #5 Definition: On TDCALL(TDG.VP.VMCALL)
> > Following a TD Entry":
> > 
> >    RCX   ...
> > 	Bit(s) Name         Description
> > 
> > 	31:0   PARAMS_MASK  Value as passed into TDCALL(TDG.VP.VMCALL) by
> > 			    the guest TD: indicates which part of the guest
> > 			    TD GPR and XMM state is passed as-is to the
> > VMM
> > 			    and back. For details, see the description of
> > 			    TDG.VP.VMCALL in 5.5.26.
> > 
> > I think the problem is, as said previously, currently KVM TDX code uses
> > KVM's existing infrastructure to emulate MSR, KVM hypercall etc,  but
> > TDVMCALL has a different ABI, thus there's a mismatch here.
> 
> I once had patch for it internally.
> 
> It adds back the available check for GPRs when accessing instead of assuming
> they are always available. For normal VMX and SVM, all the GPRs are still
> always available. But for TDX, only EXIT_INFO_1 and EXIT_INFO_2 are always
> marked available, while others need to be explicitly set case by case.
> 
> The good thing is it makes TDX safer that KVM won't consume invalid data
> silently for TDX. But it adds additional overhead of checking the
> unnecessary register availability for VMX and SVM case.
> 
> -----------------------------&<-------------------------------------
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Date: Tue, 11 Mar 2025 07:13:29 -0400
> Subject: [PATCH] KVM: x86: Add available check for GPRs
> 
> Since commit de3cd117ed2f ("KVM: x86: Omit caching logic for
> always-available GPRs"), KVM doesn't check the availability of GPRs
> except RSP and RIP when accessing them, because they are always
> available.
> 
> However, it's not true when it comes to TDX. The GPRs are not available
> after TD vcpu exits actually.

> And it relies on KVM manually sets the
> GPRs value when needed, e.g.
> 
>  - setting rax, rbx, rcx, rdx, rsi, for hypercall emulation in
>    tdx_emulate_tdvmall();
> 
>  - setting rax, rcx and rdx before MSR write emulation;
> 
> Add the available check of GPRs read, and WARN_ON_ONCE() when unavailable.
> It can help capture the cases of undesired GPRs consumption by TDX.

Sorry, but NAK.  I am strongly against adding any code to the GPR accessors/mutators
just for TDX.  It's a _lot_ of code.  From commit de3cd117ed2f ("KVM: x86: Omit
caching logic for always-available GPRs"):

    E.g. on x86_64, kvm_emulate_cpuid() is reduced from 342 to 182 bytes and
    kvm_emulate_hypercall() from 1362 to 1143, with the total size of KVM
    dropping by ~1000 bytes.  With CONFIG_RETPOLINE=y, the numbers are even
    more pronounced, e.g.: 353->182, 1418->1172 and well over 2000 bytes.

Note that updating only the "available" masks is wrong, as TDX needs to marshall
written registers back to their correct location.

In the end, the available/dirty tracking isn't about hardening against bugs, it's
about deferring expensive VMREAD and VMWRITE (and guest memory) operations until
action is required.

We could bury sanity checks behind a Kconfig of some kind, but I genuinely don't
see much value in doing so.  These emulation flows are very static (all register
usage is hardcoded), and so it's very much a "get it right once" sort of thing,
i.e. the odds of a runtime check finding a bug after initial development are
basically zero.

An alternative for TDX would be to avoid bouncing through GPRs in the first place,
e.g. by reworking __kvm_emulate_rdmsr() to not access any registers.  But I'm
probably opposed to even that, because I doubt the end result would be an overall
net positive for KVM.  We'd end up with duplicate code, harder to read common
code (because of the new abstractions), and likely without meaningfully moving
the needle in terms of finding/preventing bugs.  KVM still needs to get operands
to/from the right parameters, though only difference is that for TDX, the parameters
would be very "direct".

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox