Linux Confidential Computing Development
 help / color / mirror / Atom feed
* Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Dan Williams (nvidia) @ 2026-06-12 23:49 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, yilun.xu,
	baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-3-yilun.xu@linux.intel.com>

Xu Yilun wrote:
> TDX Module introduces a new concept called "TDX Module Extensions" to
> support long running / hard-irq preemptible flows inside. This makes TDX
> Module capable of handling complex tasks through "Extension SEAMCALLs".
> Adding more memory to TDX Module is the first step to enable Extensions.

Like I said on the cover, I think "long running hard-irq preemptible"
invites more questions that it answers. The service calls are not "long
running" on their own. I think it is sufficient to say they are
resumable unlike typical calls that run to completion while monopolizing
the CPU.

> Currently, TDX Module memory use is relatively static. But, the
> Extensions need to use memory more dynamically. While 'static' here
> means the kernel provides necessary amount of memory to TDX Module for
> its basic functionalities, 'dynamic' means extra memory is needed only
> if new add-on features are to be enabled. So add a new memory feeding
> process backed by a new SEAMCALL TDH.EXT.MEM.ADD.

Rick commented on this as well, but a simpler way to say it is
extensions receive a one time memory pool allocation at init time.  The
extension uses that pool as its baseline for its own internal state and
data for the service APIs it offers.

> The process is mostly the same as adding PAMT. The kernel queries TDX
> Module how much memory needed, allocates it, hands it over, and never
> gets it back.
> 
> TDH.EXT.MEM.ADD uses a new parameter type HPA_LIST_INFO to provide
> control (private) pages to TDX Module. This type represents a list of
> pages for TDX Module to access. It needs a 'root page' which contains
> the list of HPAs of the pages. It collapses the HPA of the root page
> and the number of valid HPAs into a 64 bit raw value for SEAMCALL
> parameters. The root page is always a medium, TDX Module never keeps
> the root page.

I mention below, but I do not think the reader cares that the TDX Module
calls an array of physical addresses a "root" page.

> 
> Introduce a tdx_clflush_hpa_list() helper to flush shared cache before
> SEAMCALL, to avoid shared cache writeback damaging these private pages.
> 
> For now, TDX Module Extensions consumes relatively large amount of
> memory (~50MB). Use contiguous page allocation to avoid permanently
> fragment too much memory. Print the allocation amount on TDX Module
> Extensions initialization for visibility.

To be clear I believe there is a low chance of fragmentation given this
allocation happening early. However, at 10s of MB the benefit of
isolating blocks of PFNs that will never be returned, it makes to not
use the buddy allocator for that.

> Co-developed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
>  arch/x86/virt/vmx/tdx/tdx.h |   1 +
>  arch/x86/virt/vmx/tdx/tdx.c | 118 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 119 insertions(+)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index a5eec8e3cc71..2335f88bbb10 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -46,6 +46,7 @@
>  #define TDH_PHYMEM_PAGE_WBINVD		41
>  #define TDH_VP_WR			43
>  #define TDH_SYS_CONFIG			45
> +#define TDH_EXT_MEM_ADD			61
>  #define TDH_SYS_DISABLE			69
>  
>  /*
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index c0c6281b08a5..622399d8da68 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -31,6 +31,7 @@
>  #include <linux/syscore_ops.h>
>  #include <linux/idr.h>
>  #include <linux/kvm_types.h>
> +#include <linux/bitfield.h>
>  #include <asm/page.h>
>  #include <asm/special_insns.h>
>  #include <asm/msr-index.h>
> @@ -1179,6 +1180,123 @@ static __init int init_tdmrs(struct tdmr_info_list *tdmr_list)
>  	return 0;
>  }
>  
> +static void tdx_clflush_hpa_list(struct page *root, unsigned int nr_pages)
> +{
> +	u64 *entries = page_to_virt(root);
> +	int i;
> +
> +	for (i = 0; i < nr_pages; i++)
> +		clflush_cache_range(__va(entries[i]), PAGE_SIZE);
> +}
> +
> +#define HPA_LIST_INFO_FIRST_ENTRY	GENMASK_U64(11, 3)
> +#define HPA_LIST_INFO_PFN		GENMASK_U64(51, 12)
> +#define HPA_LIST_INFO_LAST_ENTRY	GENMASK_U64(63, 55)
> +
> +static u64 to_hpa_list_info(struct page *root, unsigned int nr_pages)
> +{
> +	return FIELD_PREP(HPA_LIST_INFO_FIRST_ENTRY, 0) |
> +	       FIELD_PREP(HPA_LIST_INFO_PFN, page_to_pfn(root)) |
> +	       FIELD_PREP(HPA_LIST_INFO_LAST_ENTRY, nr_pages - 1);
> +}
> +
> +static int tdx_ext_mem_add(struct page *root, unsigned int nr_pages)
> +{
> +	struct tdx_module_args args = {
> +		.rcx = to_hpa_list_info(root, nr_pages),
> +	};
> +	u64 r;
> +
> +	tdx_clflush_hpa_list(root, nr_pages);
> +
> +	do {
> +		/*
> +		 * TDH_EXT_MEM_ADD is designed to use output parameter RCX to
> +		 * override/update input parameter RCX, so the caller doesn't
> +		 * have to do manual parameter update on retry call.
> +		 */
> +		r = seamcall_ret(TDH_EXT_MEM_ADD, &args);
> +	} while (r == TDX_INTERRUPTED_RESUMABLE);
> +
> +	if (r != TDX_SUCCESS)
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int tdx_ext_mem_setup(void)
> +{
> +	unsigned int nr_pages;
> +	struct page *page;
> +	u64 *root;
> +	unsigned int i;
> +	int ret;
> +
> +	nr_pages = tdx_sysinfo.ext.memory_pool_required_pages;
> +	/*
> +	 * memory_pool_required_pages == 0 means no need to add pages,
> +	 * skip the memory setup.
> +	 */
> +	if (!nr_pages)
> +		return 0;
> +
> +	root = kzalloc(PAGE_SIZE, GFP_KERNEL);
> +	if (!root)
> +		return -ENOMEM;

I think this "root" term is a holdover from the complicated TDX Connect
case where it might sometimes be this odd "singleton" object? You could
just make it this for actual type safety.

struct tdx_hpa_list {
	u64 phys[PAGE_SIZE/sizeof(u64)];
}

> +
> +	page = alloc_contig_pages(nr_pages, GFP_KERNEL, numa_mem_id(),
> +				  &node_online_map);
> +	if (!page) {
> +		ret = -ENOMEM;
> +		goto out_free_root;
> +	}
> +
> +	for (i = 0; i < nr_pages;) {
> +		unsigned int nents = min(nr_pages - i,
> +					 PAGE_SIZE / sizeof(*root));

This looks wrong, sizeof(struct page)?, or size of physical address?

Becomes less error prone if you do:

min(nr_pages - i, ARRAY_SIZE(hpa_list->phys))

> +		int j;
> +
> +		for (j = 0; j < nents; j++)

You can declare j in the for loop.

> +			root[j] = page_to_phys(page + i + j);
> +
> +		ret = tdx_ext_mem_add(virt_to_page(root), nents);
> +		/*
> +		 * No SEAMCALLs to reclaim the added pages. For simple error
> +		 * handling, leak all pages.
> +		 */
> +		WARN_ON_ONCE(ret);

Perhaps to be friendlier to folks without the source code in front of
them drop the comment and do:

WARN(ret, "Fatal: TDX Module failed (%d) to accept memory, stranded %ld pages\n", ret, nr_pages)

...the once flavor not needed, right? It's toast at this point.

^ permalink raw reply

* Re: [PATCH v14 10/44] arm64: RMI: Add support for SRO
From: Dan Williams (nvidia) @ 2026-06-12 23:07 UTC (permalink / raw)
  To: Steven Price, Gavin Shan, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Shanker Donthineni,
	Alper Gun, Aneesh Kumar K . V, Emi Kisanuki, Vishal Annapurve,
	WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <50d70588-2ebc-4c9b-98ec-68f3d04a9d21@arm.com>

Steven Price wrote:
[..]
> > alloc_pages_exact() will fail if the requested size exceeds the maximal
> > allowed
> > size (1 << MAX_PAGE_ORDER). The maximal size is usually smaller than
> > PUD_SIZE
> > but PUD_SIZE is allowed by the RMM.
> 
> This is an area where to be honest I'm really not sure what to do.
> Technically the RMM is allowed to ask for a contiguous range of 512GB
> pages (on a 4K system - larger with larger page sizes) - but clearly no
> real OS is going to be able to provide anything like that.
> 
> In practise we don't expect the RMM to do anything so crazy. It's not
> really clear to be whether even 2MB (PMD_SIZE) is needed. But the spec
> is written to be generic.
> 
> So my current approach is to calculate the required size and pass it
> into alloc_pages_exact(). For "stupidly large" values this will fail and
> Linux just doesn't support an RMM which attempts this. If there is ever
> a usecase which needs this then we'd need to find a different method of
> providing the memory (most likely some form of carveout to avoid
> fragmentation). But my view is we should wait for that usecase to be
> identified first.

Just some comparison comments as I am also going through the TDX patches
which enable "Extension SEAMCALLs". These new SEAMCALLs are similar to
the SRO mechanism [1].

TDX asks for an upfront delegation of memory at init time using
alloc_contig_pages() that is never returned until entire module is
shutdown. alloc_contig_pages() is not subject to the MAX_ORDER limit,
but not sure that alloc_contig_pages() is suitable for small+dynamic
runtime memory add / release that SRO potentially wants to do?

Does SRO always balance the size of RMI_OP_MEM_REQ_DONATE with
RMI_OP_MEM_REQ_RECLAIM, or might some donate requests be a one way
donation like TDX? Just poking to see if there is a path to preallocate
a pool vs the fine grained per-operation alloc/free.

[1]: http://lore.kernel.org/20260522034128.3144354-3-yilun.xu@linux.intel.com

^ permalink raw reply

* Re: [PATCH 01/15] x86/virt/tdx: Read global metadata for TDX Module Extensions
From: Dan Williams (nvidia) @ 2026-06-12 22:20 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, yilun.xu,
	baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-2-yilun.xu@linux.intel.com>

Xu Yilun wrote:
> Add reading of the global metadata for TDX Module Extensions.
> 
> TDX Module Extensions is an add-on feature enumerated by TDX_FEATURES0.
> But for the Module's integrity, Linux requires that all features that a
> Module advertises must have a complete, valid set of metadata, and the
> validation must succeed at core TDX initialization time.
> 
> Check TDX_FEATURES0 before reading these metadata. If a feature is
> advertised, a failure in reading associated metadata causes the entire
> TDX initialization to fail, otherwise skip.

Others already commented on the patch ordering, so I will just comment
on the changelog to recommend referring back to the "any available
extension, all the time" implementation policy rather than saying "Linux
requires" which is ambiguous.

The patch reordering will make it more clear that
memory_pool_required_pages scales based on the number of features that
Linux grows enabling for at configuration time.

^ permalink raw reply

* Re: [PATCH 00/15] Enable TDX Module Extensions and DICE-based TDX Quoting
From: Dan Williams (nvidia) @ 2026-06-12 22:03 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, yilun.xu,
	baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-1-yilun.xu@linux.intel.com>

Xu Yilun wrote:
> This posting is just to collect initial review.
> 
> Sean, Paolo, Dave please feel free to ignore for now. Sean, especially
> the x86 KVM stuff is only here as an example for the init code, and not
> ready for review.
> 
> Kiryl and Dan, we are trying to get acks for the first 4 patches of the
> series so they can be serve as a settled base for all the other work
> that uses Extensions. Please review the first 4 patches and treat the
> later ones as an example for the Extensions initialization.
> 
> == Why it's being posted ==
> 
> The TDX Module is introducing a new concept called "TDX Module
> Extensions", and several upcoming features depend on them. The
> Extensions need some extra setup at TDX module init time, and the code
> to do this is expected to be somewhat generic.
> 
> We want to get the basics of this TDX module extensions piece sorted so
> that all of the extension-based work can build on it. This series
> includes those basics, and an example usage called DICE-based TDX
> Quoting. Only the first 4 patches are about initializing the TDX module
> Extensions. I'd like some review on them. The later DICE patches are
> just included to serve as a usage example for the TDX module extension
> code.
> 
> The first 4 patches will eventually need an ack by an x86 maintainer, so
> please review with that in mind.
> 
> == Overview ==
> 
> TDX Module introduces the "TDX Module Extensions" to support long
> running / hard-irq preemptible flows inside. This makes TDX Module
> capable of handling complex tasks through "Extension SEAMCALLs".

The internal implementation details of extension seamcalls buries the
lead on why this mechanism is important, why Linux should care, and why
this brings TDX in line with the other major CC architectures. Something
like:

===
To date, SEAMCALLs have been short lived routines that monopolize the
CPU for their duration. This limits their utility for implementing
higher order security protocols or pushes complexity into Linux. The
Linux appetite for ingesting complexity is low, so TDX now adds a new
class of SEAMCALLs that are preemptible and resumable. This capability
enables higher order service APIs to carry out a security protocol like
"establish an SPDM session".

The TDX "Extension SEAMCALL" capability is akin to ARM CCA's "Stateful
RMI Operations (SRO)", and achieves similar externalized complexity
relief as a dedicated hardware coprocessor like AMD SEV-SNP. The
mechanism is "give the service environment some memory", "invoke the
service API", and "continue invoking until complete". All protocol state
is internal the service API.

The simplest class of extension SEAMCALLs to support are in support of
"DICE-based TDX Quoting", a service to turn guest launch attestation
reports into a document that can be externally verified.
===

> TDX Module allows some add-on features to use the Extension. The first
> feature to use Extensions is DICE-based TDX Quoting [1]. DICE is an
> industry-standard, certificate-backed attestation framework that layers
> evidence through a chain of certificates.
> 
> This series adds infrastructure to enable the Extensions and then
> implement DICE-based TDX Quoting.
> 
> The Extensions consumes relatively large amount of memory (~50MB). So it
> is designed to be off by default.

This confuses the TDX design with the Linux design, and sets up "50MB" as
something to be quibbled with. The Linux design is turn on all the
features that Linux knows about all the time. Unless and until the "any
available, all the time" becomes untenable it just simplifies the init
flow to not play piecemeal games. Await evidence to change the simple
policy. Suffice to say the cost of this policy will burn 10s of
megabytes.

> It must be enabled after basic TDX
> Module initialization and when add-on features require it. To enable
> the Extensions, host first adds extra memory to TDX Module via a
> SEAMCALL (TDH.EXT.MEM.ADD), then uses another SEAMCALL (TDH.EXT.INIT) to
> initialize Extensions, and then some add-on features, e.g. DICE, could
> use Extension SEAMCALLs for work. Note that host can never get the added
> memory back.
> 
> Theoretically, the Extensions doesn't need to be enabled right after
> basic TDX initialization. It could be enabled right before the first
> Extension SEAMCALL is issued. That would save or postpone memory usage.
> But it isn't worth the complexity, the needs for the Extensions are vast
> but the savings are little for a typical TDX capable system (about
> 0.001% of memory). So the Linux decision is to just enable it along with
> the basic TDX.
> 
> This series has 2 distinct parts:
> 
>   Patches  1-4:  TDX Module Extensions enabling
>   Patches  5-15: DICE-based TDX Quoting, primarily Peter's work.
> 
> == Some history ==
> 
> The TDX Module Extensions part was first posted along with TDX
> Connect [2]. Now this part is remarkably smaller because we've removed
> the generic tdx_page_array abstraction for HPA_LIST_INFO. TDX Module
> Extensions is the first user of HPA_LIST_INFO, and doesn't use it in a
> typical way (HPA_LIST_INFO can only hold at most 2MB memory). There
> isn't enough justification to make the abstraction in this series. A
> possible plan is to rebuild tdx_page_array iteratively when more use
> cases arise.

No need to talk about details not in this series. I would maybe just
note that quoting is the simplest first consumer and was chosen as the
lead vehicle over TDX Connect previously posted in case anyone asks.

^ permalink raw reply

* RE: [RFC PATCH] mm/vmalloc: add vmalloc_decrypted() and vzalloc_decrypted()
From: Michael Kelley @ 2026-06-12 19:06 UTC (permalink / raw)
  To: Jason Gunthorpe, Catalin Marinas, Christoph Hellwig
  Cc: Kameron Carr, akpm@linux-foundation.org, urezki@gmail.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, rppt@kernel.org,
	Michael Kelley, linux-coco@lists.linux.dev, Suzuki K Poulose
In-Reply-To: <20260612181807.GP1066031@ziepe.ca>

From: Jason Gunthorpe <jgg@ziepe.ca> Sent: Friday, June 12, 2026 11:18 AM
> 
> On Fri, Jun 12, 2026 at 06:49:28PM +0100, Catalin Marinas wrote:
> > On Thu, Jun 11, 2026 at 08:49:54AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Jun 08, 2026 at 04:37:02PM +0100, Catalin Marinas wrote:
> > > > > +/**
> > > > > + * vzalloc_decrypted - allocate zeroed virtually contiguous decrypted memory
> > > > > + * @size:    allocation size
> > > > > + *
> > > > > + * Like vmalloc_decrypted(), but the memory is set to zero.
> > > > > + *
> > > > > + * Return: pointer to the allocated memory or %NULL on error
> > > > > + */
> > > > > +void *vzalloc_decrypted_noprof(unsigned long size)
> > > > > +{
> > > > > +	void *addr;
> > > > > +
> > > > > +	addr = __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
> > > > > +					   GFP_KERNEL,
> > > > > +					   pgprot_decrypted(PAGE_KERNEL),
> > > > > +					   VM_DECRYPTED, NUMA_NO_NODE,
> > > > > +					   __builtin_return_address(0));
> > > > > +	if (addr)
> > > > > +		memset(addr, 0, size);
> > > >
> > > > Talking to Suzuki, the small window between set_memory_decrypted() and
> > > > memset() potentially exposing stale data is safe, at least for Arm CCA
> > > > as the memory would be scrubbed (there are other places in the kernel
> > > > where we do something similar). I assume that's also the case for other
> > > > architectures, although not sure what pKVM does.
> > >
> > > It seems like a poor practice though, this should probably be
> > > re-organized to use __GFP_ZERO so things are ordered sensibly.
> >
> > __GFP_ZERO doesn't work if the intermediate set_memory_decrypted()
> > mangles the data (e.g. changes encryption keys) and it no longer reads
> > as zeros.
> 
> I thought arches are either preserving the memory content or zeroing
> it, you are saying some arch leaves it as garbage? I'd argue that's an
> arch bug and they should clear it in their path.

AMD SEV-SNP leaves the memory contents as garbage after an encryption
or decryption state change. On the flip side, my understanding has been
that TDX zeroes the memory (or at least has an option to do so) after
such a state change, though a couple of AI chats say TDX also leaves
garbage. To be sure, I'd have to run an experiment to check in a TDX
guest on Hyper-V.

> 
> Otherwise this sharp edge is not documented and we have many other
> places getting it wrong, eg system_heap_allocate() doesn't re-zero the
> memory after decrypting it.

In the Hyper-V code that uses set_memory_decrypted()/encrypted(),
there's always an explicit call to set the memory to zero afterwards.

Michael

> 
> > > But what is the purpose of this? I guess some hyperv thing - but
> > > shouldn't we have a more structured way to "DMA map" things for the
> > > hypervisor instead of stuff like this? Why can't you use
> > > dma_alloc_coherent() which actually gives you an address that is
> > > sensible to pass to the hypervisor?
> >
> > IIRC netvsc_init_buf() uses vzalloc() to allocate some memory and that
> > buffer ends up in set_memory_decrypted() via vmbus_establish_gpadl().
> > arm64 does not support changing the decrypted/shared attributed of
> > vmalloc mappings and I don't think we should add it. Better to just
> > allocate it properly upfront.
> 
> Sure
> 
> > We might be able to use the DMA API but we won't get something like
> > vmalloc() - physically non-contiguous.
> 
> The entry point is dma_alloc_noncontiguous() and you get a scatterlist
> back.
> 
> > I think dma_alloc_noncontiguous() just falls back to
> > dma_direct_alloc_pages() in the absence of an iommu.
> 
> In all cases you get a scatterlist with a CPU list and a DMA
> list. iommu gives a smaller DMA list.
> 
> If you want a vmap then you can feed that CPU page list from the sgl
> into vmap().
> 
> A dma_alloc_noncontiguous_vmap() helper would not be hard to make, and
> IMHO, would make alot more sense for hyperv to treat the memory access
> from the hypervisor as "DMA" instead of trying to re-invent the DMA
> API.. :\
> 
> HCH was already saying we should not be allowing drivers to use
> set_memory_decrypted() at all, and hyperv is the biggest non-core user
> right now...
> 
> Jason


^ permalink raw reply

* Re: [RFC PATCH] mm/vmalloc: add vmalloc_decrypted() and vzalloc_decrypted()
From: Jason Gunthorpe @ 2026-06-12 18:18 UTC (permalink / raw)
  To: Catalin Marinas, Christoph Hellwig
  Cc: Kameron Carr, akpm, urezki, linux-mm, linux-kernel, rppt,
	mhklinux, linux-coco, Suzuki K Poulose
In-Reply-To: <aixGqCqKkQeDfUST@arm.com>

On Fri, Jun 12, 2026 at 06:49:28PM +0100, Catalin Marinas wrote:
> On Thu, Jun 11, 2026 at 08:49:54AM -0300, Jason Gunthorpe wrote:
> > On Mon, Jun 08, 2026 at 04:37:02PM +0100, Catalin Marinas wrote:
> > > > +/**
> > > > + * vzalloc_decrypted - allocate zeroed virtually contiguous decrypted memory
> > > > + * @size:    allocation size
> > > > + *
> > > > + * Like vmalloc_decrypted(), but the memory is set to zero.
> > > > + *
> > > > + * Return: pointer to the allocated memory or %NULL on error
> > > > + */
> > > > +void *vzalloc_decrypted_noprof(unsigned long size)
> > > > +{
> > > > +	void *addr;
> > > > +
> > > > +	addr = __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
> > > > +					   GFP_KERNEL,
> > > > +					   pgprot_decrypted(PAGE_KERNEL),
> > > > +					   VM_DECRYPTED, NUMA_NO_NODE,
> > > > +					   __builtin_return_address(0));
> > > > +	if (addr)
> > > > +		memset(addr, 0, size);
> > > 
> > > Talking to Suzuki, the small window between set_memory_decrypted() and
> > > memset() potentially exposing stale data is safe, at least for Arm CCA
> > > as the memory would be scrubbed (there are other places in the kernel
> > > where we do something similar). I assume that's also the case for other
> > > architectures, although not sure what pKVM does.
> > 
> > It seems like a poor practice though, this should probably be
> > re-organized to use __GFP_ZERO so things are ordered sensibly.
> 
> __GFP_ZERO doesn't work if the intermediate set_memory_decrypted()
> mangles the data (e.g. changes encryption keys) and it no longer reads
> as zeros.

I thought arches are either preserving the memory content or zeroing
it, you are saying some arch leaves it as garbage? I'd argue that's an
arch bug and they should clear it in their path.

Otherwise this sharp edge is not documented and we have many other
places getting it wrong, eg system_heap_allocate() doesn't re-zero the
memory after decrypting it.

> > But what is the purpose of this? I guess some hyperv thing - but
> > shouldn't we have a more structured way to "DMA map" things for the
> > hypervisor instead of stuff like this? Why can't you use
> > dma_alloc_coherent() which actually gives you an address that is
> > sensible to pass to the hypervisor?
> 
> IIRC netvsc_init_buf() uses vzalloc() to allocate some memory and that
> buffer ends up in set_memory_decrypted() via vmbus_establish_gpadl().
> arm64 does not support changing the decrypted/shared attributed of
> vmalloc mappings and I don't think we should add it. Better to just
> allocate it properly upfront.

Sure
 
> We might be able to use the DMA API but we won't get something like
> vmalloc() - physically non-contiguous. 

The entry point is dma_alloc_noncontiguous() and you get a scatterlist
back.

> I think dma_alloc_noncontiguous() just falls back to
> dma_direct_alloc_pages() in the absence of an iommu.

In all cases you get a scatterlist with a CPU list and a DMA
list. iommu gives a smaller DMA list.

If you want a vmap then you can feed that CPU page list from the sgl
into vmap().

A dma_alloc_noncontiguous_vmap() helper would not be hard to make, and
IMHO, would make alot more sense for hyperv to treat the memory access
from the hypervisor as "DMA" instead of trying to re-invent the DMA
API.. :\

HCH was already saying we should not be allowing drivers to use
set_memory_decrypted() at all, and hyperv is the biggest non-core user
right now...

Jason

^ permalink raw reply

* Re: [RFC PATCH] mm/vmalloc: add vmalloc_decrypted() and vzalloc_decrypted()
From: Catalin Marinas @ 2026-06-12 17:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Kameron Carr, akpm, urezki, linux-mm, linux-kernel, rppt,
	mhklinux, linux-coco, Suzuki K Poulose
In-Reply-To: <20260611114954.GC1066031@ziepe.ca>

On Thu, Jun 11, 2026 at 08:49:54AM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 08, 2026 at 04:37:02PM +0100, Catalin Marinas wrote:
> > > +/**
> > > + * vzalloc_decrypted - allocate zeroed virtually contiguous decrypted memory
> > > + * @size:    allocation size
> > > + *
> > > + * Like vmalloc_decrypted(), but the memory is set to zero.
> > > + *
> > > + * Return: pointer to the allocated memory or %NULL on error
> > > + */
> > > +void *vzalloc_decrypted_noprof(unsigned long size)
> > > +{
> > > +	void *addr;
> > > +
> > > +	addr = __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
> > > +					   GFP_KERNEL,
> > > +					   pgprot_decrypted(PAGE_KERNEL),
> > > +					   VM_DECRYPTED, NUMA_NO_NODE,
> > > +					   __builtin_return_address(0));
> > > +	if (addr)
> > > +		memset(addr, 0, size);
> > 
> > Talking to Suzuki, the small window between set_memory_decrypted() and
> > memset() potentially exposing stale data is safe, at least for Arm CCA
> > as the memory would be scrubbed (there are other places in the kernel
> > where we do something similar). I assume that's also the case for other
> > architectures, although not sure what pKVM does.
> 
> It seems like a poor practice though, this should probably be
> re-organized to use __GFP_ZERO so things are ordered sensibly.

__GFP_ZERO doesn't work if the intermediate set_memory_decrypted()
mangles the data (e.g. changes encryption keys) and it no longer reads
as zeros.

> But what is the purpose of this? I guess some hyperv thing - but
> shouldn't we have a more structured way to "DMA map" things for the
> hypervisor instead of stuff like this? Why can't you use
> dma_alloc_coherent() which actually gives you an address that is
> sensible to pass to the hypervisor?

IIRC netvsc_init_buf() uses vzalloc() to allocate some memory and that
buffer ends up in set_memory_decrypted() via vmbus_establish_gpadl().
arm64 does not support changing the decrypted/shared attributed of
vmalloc mappings and I don't think we should add it. Better to just
allocate it properly upfront.

We might be able to use the DMA API but we won't get something like
vmalloc() - physically non-contiguous. I think dma_alloc_noncontiguous()
just falls back to dma_direct_alloc_pages() in the absence of an iommu.

-- 
Catalin

^ permalink raw reply

* Re: [PATCH 1/2] x86/tdx: Add helper to query maximum TD Quote size
From: Xiaoyao Li @ 2026-06-12 14:25 UTC (permalink / raw)
  To: Peter Fang, Dave Hansen, Kiryl Shutsemau, Rick Edgecombe,
	Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, linux-coco, kvm
In-Reply-To: <20260612110853.3188196-2-peter.fang@intel.com>

On 6/12/2026 7:08 PM, Peter Fang wrote:
> TDX attestation blob ("TD Quote") sizes can grow with newer
> cryptographic schemes, so guests can no longer rely on a fixed-size
> buffer for the Quote.
> 
> Newer TDX modules report the maximum TD Quote size via a TD-scope
> metadata field. Add a helper to query it instead of exposing tdg_vm_rd()
> directly, as it can read arbitrary metadata fields.
> 
> Thanks to Xu Yilun for suggesting this.
> 
> Assisted-by: Claude:claude-opus-4-7
> Assisted-by: GitHub Copilot:gpt-5.4
> Signed-off-by: Peter Fang <peter.fang@intel.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

I have another nit other than Kiryl's

> ---
>   arch/x86/coco/tdx/tdx.c           | 19 +++++++++++++++++++
>   arch/x86/include/asm/shared/tdx.h |  1 +
>   arch/x86/include/asm/tdx.h        |  2 ++
>   3 files changed, 22 insertions(+)
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 186915a17c50..88c66c46e70a 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -197,6 +197,25 @@ u64 tdx_hcall_get_quote(u8 *buf, size_t size)
>   }
>   EXPORT_SYMBOL_GPL(tdx_hcall_get_quote);
>   
> +/**
> + * tdx_get_max_quote_size() - Get the maximum TD Quote size
> + *
> + * Read the maximum size of a TD Quote from a 4-byte TD metadata field. The TDX
> + * guest driver uses it to size the buffer for Quote retrieval. Older TDX
> + * modules do not support this field and return an error.
> + *
> + * Return: Maximum Quote size in bytes on success, or 0 on failure.
> + */
> +u32 tdx_get_max_quote_size(void)
> +{
> +	u64 val, ret;
> +
> +	ret = tdg_vm_rd(TDCS_QUOTE_MAX_SIZE, &val);
> +
> +	return ret ? 0 : (u32)val;
> +}
> +EXPORT_SYMBOL_GPL(tdx_get_max_quote_size);

Do we need to start to use

EXPORT_SYMBOL_FOR_MODULES(tdx_get_max_quote_size, "tdx-guest") ?

> +
>   static void __noreturn tdx_panic(const char *msg)
>   {
>   	struct tdx_module_args args = {
> diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
> index 049638e3da74..2880f493a8e5 100644
> --- a/arch/x86/include/asm/shared/tdx.h
> +++ b/arch/x86/include/asm/shared/tdx.h
> @@ -49,6 +49,7 @@
>   /* TDX TD-Scope Metadata. To be used by TDG.VM.WR and TDG.VM.RD */
>   #define TDCS_CONFIG_FLAGS		0x1110000300000016
>   #define TDCS_TD_CTLS			0x1110000300000017
> +#define TDCS_QUOTE_MAX_SIZE		0x9010000200000008
>   #define TDCS_NOTIFY_ENABLES		0x9100000000000010
>   #define TDCS_TOPOLOGY_ENUM_CONFIGURED	0x9100000000000019
>   
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index a149740b24e8..ac39674c9479 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -72,6 +72,8 @@ int tdx_mcall_extend_rtmr(u8 index, u8 *data);
>   
>   u64 tdx_hcall_get_quote(u8 *buf, size_t size);
>   
> +u32 tdx_get_max_quote_size(void);
> +
>   void __init tdx_dump_attributes(u64 td_attr);
>   void __init tdx_dump_td_ctls(u64 td_ctls);
>   


^ permalink raw reply

* Re: [PATCH 2/2] virt: tdx-guest: Allocate Quote buffer dynamically
From: Kiryl Shutsemau @ 2026-06-12 12:37 UTC (permalink / raw)
  To: Peter Fang
  Cc: Dave Hansen, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, linux-coco, kvm
In-Reply-To: <20260612110853.3188196-3-peter.fang@intel.com>

On Fri, Jun 12, 2026 at 04:08:49AM -0700, Peter Fang wrote:
> @@ -171,7 +171,7 @@ static void tdx_mr_deinit(const struct attribute_group *mr_grp)
>  #define GET_QUOTE_SUCCESS		0
>  #define GET_QUOTE_IN_FLIGHT		0xffffffffffffffff
>  
> -#define TDX_QUOTE_MAX_LEN		(GET_QUOTE_BUF_SIZE - sizeof(struct tdx_quote_buf))
> +#define TDX_QUOTE_BUF_LEN(n)		(offsetof(struct tdx_quote_buf, data) + (n))

I've got confused by this offsetof(). It is valid, but why not plain
sizeof()?

Otherwise looks okay to me:

Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH 1/2] x86/tdx: Add helper to query maximum TD Quote size
From: Kiryl Shutsemau @ 2026-06-12 12:36 UTC (permalink / raw)
  To: Peter Fang
  Cc: Dave Hansen, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, linux-coco, kvm
In-Reply-To: <20260612110853.3188196-2-peter.fang@intel.com>

On Fri, Jun 12, 2026 at 04:08:48AM -0700, Peter Fang wrote:
> TDX attestation blob ("TD Quote") sizes can grow with newer
> cryptographic schemes, so guests can no longer rely on a fixed-size
> buffer for the Quote.
> 
> Newer TDX modules report the maximum TD Quote size via a TD-scope
> metadata field. Add a helper to query it instead of exposing tdg_vm_rd()
> directly, as it can read arbitrary metadata fields.
> 
> Thanks to Xu Yilun for suggesting this.
> 
> Assisted-by: Claude:claude-opus-4-7
> Assisted-by: GitHub Copilot:gpt-5.4

These supposes to be on the same line, no?

Documentation/process/coding-assistants.rst:  Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]

> Signed-off-by: Peter Fang <peter.fang@intel.com>

One nit below, otherwise:

Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>

> ---
>  arch/x86/coco/tdx/tdx.c           | 19 +++++++++++++++++++
>  arch/x86/include/asm/shared/tdx.h |  1 +
>  arch/x86/include/asm/tdx.h        |  2 ++
>  3 files changed, 22 insertions(+)
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 186915a17c50..88c66c46e70a 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -197,6 +197,25 @@ u64 tdx_hcall_get_quote(u8 *buf, size_t size)
>  }
>  EXPORT_SYMBOL_GPL(tdx_hcall_get_quote);
>  
> +/**
> + * tdx_get_max_quote_size() - Get the maximum TD Quote size
> + *
> + * Read the maximum size of a TD Quote from a 4-byte TD metadata field. The TDX
> + * guest driver uses it to size the buffer for Quote retrieval. Older TDX
> + * modules do not support this field and return an error.
> + *
> + * Return: Maximum Quote size in bytes on success, or 0 on failure.
> + */
> +u32 tdx_get_max_quote_size(void)
> +{
> +	u64 val, ret;
> +
> +	ret = tdg_vm_rd(TDCS_QUOTE_MAX_SIZE, &val);
> +
> +	return ret ? 0 : (u32)val;

Cast is redundant.

> +}
> +EXPORT_SYMBOL_GPL(tdx_get_max_quote_size);
> +
>  static void __noreturn tdx_panic(const char *msg)
>  {
>  	struct tdx_module_args args = {

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* Re: [RFC PATCH 0/6] Support virtio-mem memory hotplug in TDX guests
From: Kiryl Shutsemau @ 2026-06-12 12:16 UTC (permalink / raw)
  To: Zhenzhong Duan
  Cc: marcandre.lureau, david, rick.p.edgecombe, prsampat, pbonzini,
	mst, peterx, chenyi.qiang, elena.reshetova, michaeluth,
	ackerleytng, linux-kernel, linux-coco, virtualization, x86,
	yilun.xu, xiaoyao.li, chao.p.peng
In-Reply-To: <20260604093551.1511079-1-zhenzhong.duan@intel.com>

On Thu, Jun 04, 2026 at 05:35:45AM -0400, Zhenzhong Duan wrote:
> 2. Re-accepting already-accepted memory returns errors. Ignoring these errors
> can mislead the guest into believing re-accepted memory is zeroed when it
> contains stale data.

Re-accepting concern is valid, but often overblown. Reaccepting memory
that never got allocated is fine.

> == About this series ==
> 
> This series takes a different direction, supporting start-private memory
> and addressing the limitations of previous series [1] by implementing a
> callback-based infrastructure that integrates TDX memory acceptance and
> release operations with proper subblock granularity.

You are presenting these callbacks as generic memory hotplug thingy, but
it is only plugged into virtio mem. ACPI hotplug won't accept/release
memory unless I miss something. Are you expecting them to cover non
virtio cases too?

And these callbacks feels like very ad-hoc solution.

> See Rick and Paolo's
> discussion about using TDG.MEM.PAGE.RELEASE in [1].

Having RELEASE in hotplug path without addressing private->shared
conversion first is odd. That's the most obvious path that has to be
covered first.

Hm?

> == Future work ==
> support lazy accept

It would be nice to have some outline on how we will get there to
understand if this patchset is stepping stone or dead end that has to be
thrown away later on.

Hot[un]plug is often used to manager overcommited host. Eager accept
might be counter-productive.


-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* [PATCH 1/2] x86/tdx: Add helper to query maximum TD Quote size
From: Peter Fang @ 2026-06-12 11:08 UTC (permalink / raw)
  To: Dave Hansen, Kiryl Shutsemau, Rick Edgecombe,
	Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, linux-coco, kvm, Peter Fang
In-Reply-To: <20260612110853.3188196-1-peter.fang@intel.com>

TDX attestation blob ("TD Quote") sizes can grow with newer
cryptographic schemes, so guests can no longer rely on a fixed-size
buffer for the Quote.

Newer TDX modules report the maximum TD Quote size via a TD-scope
metadata field. Add a helper to query it instead of exposing tdg_vm_rd()
directly, as it can read arbitrary metadata fields.

Thanks to Xu Yilun for suggesting this.

Assisted-by: Claude:claude-opus-4-7
Assisted-by: GitHub Copilot:gpt-5.4
Signed-off-by: Peter Fang <peter.fang@intel.com>
---
 arch/x86/coco/tdx/tdx.c           | 19 +++++++++++++++++++
 arch/x86/include/asm/shared/tdx.h |  1 +
 arch/x86/include/asm/tdx.h        |  2 ++
 3 files changed, 22 insertions(+)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 186915a17c50..88c66c46e70a 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -197,6 +197,25 @@ u64 tdx_hcall_get_quote(u8 *buf, size_t size)
 }
 EXPORT_SYMBOL_GPL(tdx_hcall_get_quote);
 
+/**
+ * tdx_get_max_quote_size() - Get the maximum TD Quote size
+ *
+ * Read the maximum size of a TD Quote from a 4-byte TD metadata field. The TDX
+ * guest driver uses it to size the buffer for Quote retrieval. Older TDX
+ * modules do not support this field and return an error.
+ *
+ * Return: Maximum Quote size in bytes on success, or 0 on failure.
+ */
+u32 tdx_get_max_quote_size(void)
+{
+	u64 val, ret;
+
+	ret = tdg_vm_rd(TDCS_QUOTE_MAX_SIZE, &val);
+
+	return ret ? 0 : (u32)val;
+}
+EXPORT_SYMBOL_GPL(tdx_get_max_quote_size);
+
 static void __noreturn tdx_panic(const char *msg)
 {
 	struct tdx_module_args args = {
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 049638e3da74..2880f493a8e5 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -49,6 +49,7 @@
 /* TDX TD-Scope Metadata. To be used by TDG.VM.WR and TDG.VM.RD */
 #define TDCS_CONFIG_FLAGS		0x1110000300000016
 #define TDCS_TD_CTLS			0x1110000300000017
+#define TDCS_QUOTE_MAX_SIZE		0x9010000200000008
 #define TDCS_NOTIFY_ENABLES		0x9100000000000010
 #define TDCS_TOPOLOGY_ENUM_CONFIGURED	0x9100000000000019
 
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index a149740b24e8..ac39674c9479 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -72,6 +72,8 @@ int tdx_mcall_extend_rtmr(u8 index, u8 *data);
 
 u64 tdx_hcall_get_quote(u8 *buf, size_t size);
 
+u32 tdx_get_max_quote_size(void);
+
 void __init tdx_dump_attributes(u64 td_attr);
 void __init tdx_dump_td_ctls(u64 td_ctls);
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH 0/2] tdx-guest: Make Quote buffer size dynamic
From: Peter Fang @ 2026-06-12 11:08 UTC (permalink / raw)
  To: Dave Hansen, Kiryl Shutsemau, Rick Edgecombe,
	Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, linux-coco, kvm, Peter Fang

Hi,

This series changes the TDX attestation driver's Quote buffer size from
a fixed constant to a value queried from the TDX module. So effectively:

  s/FIXED_BUF_SIZE/queried_buf_size/g

...in the TDX guest driver.

Terminology
===========

A "TD Quote" is an attestation structure signed with a platform key. It
contains information about a TDX guest and the platform it's running on.

The "Quote buffer" in the TDX guest driver is a memory buffer shared
between the TDX guest and the host VMM to retrieve TD Quotes. It has a
header defined in the GHCI spec [1].

Device Identifier Composition Engine ("DICE") provides a framework for
layering attestation evidence. This replaces the SGX model of contacting
an Intel server to obtain a certificate.

Problem
=======

The fixed-size Quote buffer approach is not sustainable. As
cryptographic algorithms evolve, TD Quote sizes also grow. A previous
commit [2] increased the guest driver's fixed-size Quote buffer to 128
KB to accommodate DICE Quotes, but it may still be insufficient when
those Quotes use post-quantum cryptography (PQC). PQC certificate chains
are roughly 10x-15x larger than conventional ones, which can increase
Quote sizes to several megabytes.

What's in this series
=====================

To avoid changing the driver whenever the Quote buffer becomes too
small, newer TDX modules report their maximum Quote size via a metadata
field. The guest driver uses this value for its Quote buffer when
available. Older TDX modules continue to use the 128 KB buffer.

The changes do not affect configfs-tsm-report ABIs.

Patch 1/2: Add a helper to read the QUOTE_MAX_SIZE metadata field.
Patch 2/2: Replace the fixed Quote buffer size with the queried value,
           when available.

AI use
======

I used AI tools (Claude:claude-opus-4-7, GitHub Copilot:gpt-5.4) to
proofread this cover letter and the changelogs. The series also
underwent AI code review (Claude:claude-opus-4-7), but the feedback was
limited to style suggestions.

[1] Guest Hypervisor Communication Interface (GHCI) Specification,
    Version 1.5, Section "TDG.VP.VMCALL<GetQuote>"
[2] 43185067c6fd ("configfs-tsm-report: tdx_guest: Increase Quote buffer
    size to 128KB")

Kuppuswamy Sathyanarayanan (1):
  virt: tdx-guest: Allocate Quote buffer dynamically

Peter Fang (1):
  x86/tdx: Add helper to query maximum TD Quote size

 arch/x86/coco/tdx/tdx.c                 | 19 +++++++++
 arch/x86/include/asm/shared/tdx.h       |  1 +
 arch/x86/include/asm/tdx.h              |  2 +
 drivers/virt/coco/tdx-guest/tdx-guest.c | 52 ++++++++++++++++++-------
 4 files changed, 60 insertions(+), 14 deletions(-)


base-commit: 4549871118cf616eecdd2d939f78e3b9e1dddc48
-- 
2.53.0


^ permalink raw reply

* [PATCH 2/2] virt: tdx-guest: Allocate Quote buffer dynamically
From: Peter Fang @ 2026-06-12 11:08 UTC (permalink / raw)
  To: Dave Hansen, Kiryl Shutsemau, Rick Edgecombe,
	Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, linux-coco, kvm, Peter Fang
In-Reply-To: <20260612110853.3188196-1-peter.fang@intel.com>

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

The TDX attestation driver currently uses a fixed 128 KB Quote buffer
shared with the host VMM. This may be too small for Quotes using schemes
such as post-quantum cryptography (PQC), where certificate chains can
increase the Quote size to several megabytes.

Allocate the Quote buffer based on the size reported by the TDX module
instead of always reserving a fixed-size buffer. This avoids wasting
memory on platforms that do not require larger Quotes. Older platforms
fall back to the default 128 KB buffer.

Because the Quote buffer must be physically contiguous, its size is
bound by the buddy allocator's maximum page order (4 MB), which should
be sufficient for current attestation needs.

struct tdx_quote_buf has a trailing flexible array, so use offsetof()
instead of sizeof() to calculate the header size.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Assisted-by: Claude:claude-opus-4-7
Assisted-by: GitHub Copilot:gpt-5.4
Signed-off-by: Peter Fang <peter.fang@intel.com>
---
 drivers/virt/coco/tdx-guest/tdx-guest.c | 52 ++++++++++++++++++-------
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/drivers/virt/coco/tdx-guest/tdx-guest.c b/drivers/virt/coco/tdx-guest/tdx-guest.c
index a9ecc46df187..162fb47f3fae 100644
--- a/drivers/virt/coco/tdx-guest/tdx-guest.c
+++ b/drivers/virt/coco/tdx-guest/tdx-guest.c
@@ -163,7 +163,7 @@ static void tdx_mr_deinit(const struct attribute_group *mr_grp)
  * DICE-based attestation uses layered evidence that requires
  * larger Quote size (~100K).
  */
-#define GET_QUOTE_BUF_SIZE		SZ_128K
+#define GET_QUOTE_DEFAULT_BUF_SIZE	SZ_128K
 
 #define GET_QUOTE_CMD_VER		1
 
@@ -171,7 +171,7 @@ static void tdx_mr_deinit(const struct attribute_group *mr_grp)
 #define GET_QUOTE_SUCCESS		0
 #define GET_QUOTE_IN_FLIGHT		0xffffffffffffffff
 
-#define TDX_QUOTE_MAX_LEN		(GET_QUOTE_BUF_SIZE - sizeof(struct tdx_quote_buf))
+#define TDX_QUOTE_BUF_LEN(n)		(offsetof(struct tdx_quote_buf, data) + (n))
 
 /* struct tdx_quote_buf: Format of Quote request buffer.
  * @version: Quote format version, filled by TD.
@@ -192,8 +192,9 @@ struct tdx_quote_buf {
 	u8 data[];
 };
 
-/* Quote data buffer */
+/* Quote data buffer and size */
 static void *quote_data;
+static size_t quote_data_size;
 
 /* Lock to streamline quote requests */
 static DEFINE_MUTEX(quote_lock);
@@ -210,9 +211,8 @@ static long tdx_get_report0(struct tdx_report_req __user *req)
 			     USER_SOCKPTR(req->tdreport));
 }
 
-static void free_quote_buf(void *buf)
+static void free_quote_buf(void *buf, size_t len)
 {
-	size_t len = PAGE_ALIGN(GET_QUOTE_BUF_SIZE);
 	unsigned int count = len >> PAGE_SHIFT;
 
 	if (set_memory_encrypted((unsigned long)buf, count)) {
@@ -223,19 +223,43 @@ static void free_quote_buf(void *buf)
 	free_pages_exact(buf, len);
 }
 
-static void *alloc_quote_buf(void)
+static size_t get_quote_buf_size(void)
 {
-	size_t len = PAGE_ALIGN(GET_QUOTE_BUF_SIZE);
-	unsigned int count = len >> PAGE_SHIFT;
+	size_t buf_sz = GET_QUOTE_DEFAULT_BUF_SIZE;
+	u32 quote_sz;
+
+	quote_sz = tdx_get_max_quote_size();
+
+	if (quote_sz)
+		/* Reported size does not include GetQuote header */
+		buf_sz = TDX_QUOTE_BUF_LEN(quote_sz);
+
+	return PAGE_ALIGN(buf_sz);
+}
+
+static void *alloc_quote_buf(size_t *buflen)
+{
+	unsigned int count;
+	size_t len;
 	void *addr;
 
+	len = get_quote_buf_size();
+
+	/*
+	 * This fails if the requested size exceeds the buddy allocator's
+	 * maximum order (order-10, 4MB).
+	 */
 	addr = alloc_pages_exact(len, GFP_KERNEL | __GFP_ZERO);
 	if (!addr)
 		return NULL;
 
+	count = len >> PAGE_SHIFT;
+
 	if (set_memory_decrypted((unsigned long)addr, count))
 		return NULL;
 
+	*buflen = len;
+
 	return addr;
 }
 
@@ -286,7 +310,7 @@ static int tdx_report_new_locked(struct tsm_report *report, void *data)
 	if (desc->inblob_len != TDX_REPORTDATA_LEN)
 		return -EINVAL;
 
-	memset(quote_data, 0, GET_QUOTE_BUF_SIZE);
+	memset(quote_data, 0, quote_data_size);
 
 	/* Update Quote buffer header */
 	quote_buf->version = GET_QUOTE_CMD_VER;
@@ -297,7 +321,7 @@ static int tdx_report_new_locked(struct tsm_report *report, void *data)
 	if (ret)
 		return ret;
 
-	err = tdx_hcall_get_quote(quote_data, GET_QUOTE_BUF_SIZE);
+	err = tdx_hcall_get_quote(quote_data, quote_data_size);
 	if (err) {
 		pr_err("GetQuote hypercall failed, status:%llx\n", err);
 		return -EIO;
@@ -316,7 +340,7 @@ static int tdx_report_new_locked(struct tsm_report *report, void *data)
 
 	out_len = READ_ONCE(quote_buf->out_len);
 
-	if (out_len > TDX_QUOTE_MAX_LEN)
+	if (TDX_QUOTE_BUF_LEN(out_len) > quote_data_size)
 		return -EFBIG;
 
 	buf = kvmemdup(quote_buf->data, out_len, GFP_KERNEL);
@@ -418,7 +442,7 @@ static int __init tdx_guest_init(void)
 	if (ret)
 		goto deinit_mr;
 
-	quote_data = alloc_quote_buf();
+	quote_data = alloc_quote_buf(&quote_data_size);
 	if (!quote_data) {
 		pr_err("Failed to allocate Quote buffer\n");
 		ret = -ENOMEM;
@@ -432,7 +456,7 @@ static int __init tdx_guest_init(void)
 	return 0;
 
 free_quote:
-	free_quote_buf(quote_data);
+	free_quote_buf(quote_data, quote_data_size);
 free_misc:
 	misc_deregister(&tdx_misc_dev);
 deinit_mr:
@@ -445,7 +469,7 @@ module_init(tdx_guest_init);
 static void __exit tdx_guest_exit(void)
 {
 	tsm_report_unregister(&tdx_tsm_ops);
-	free_quote_buf(quote_data);
+	free_quote_buf(quote_data, quote_data_size);
 	misc_deregister(&tdx_misc_dev);
 	tdx_mr_deinit(tdx_attr_groups[0]);
 }
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v7 6/6] coco: guest: arm64: Replace dummy CCA device with sysfs ABI
From: Aneesh Kumar K.V @ 2026-06-12  6:07 UTC (permalink / raw)
  To: Dan Williams (nvidia), linux-coco, linux-arm-kernel, linux-kernel
  Cc: Catalin Marinas, Greg KH, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price, Suzuki K Poulose, Andre Przywara
In-Reply-To: <6a2b103d77596_344af1000@djbw-dev.notmuch>

"Dan Williams (nvidia)" <djbw@kernel.org> writes:

> Aneesh Kumar K.V (Arm) wrote:
>> The SMCCC firmware driver now creates the arm-smccc platform device and
>> instantiates the CCA RSI auxiliary devices once the RSI ABI is discovered.
>> The arm64-specific arm-cca-dev platform device stub is therefore no longer
>> needed.
>> 
>> However, userspace has used the arm-cca-dev platform device to detect Arm
>> CCA Realm guests [1]. Removing it without a replacement would break that
>> detection and would also leave userspace depending on kernel device-model
>> details.
>> 
>> Add /sys/firmware/cca/realm_guest as a stable, architecture-provided ABI
>> for detecting whether the kernel is running as an Arm CCA Realm guest. The
>> file returns 1 in Realm world and 0 otherwise, similar to the existing s390
>> /sys/firmware/uv/prot_virt_guest interface for protected virtualization
>> guests.
>> 
>> Remove the dummy arm-cca-dev registration now that userspace has a
>> dedicated CCA Realm guest indicator, and document the new ABI in
>> Documentation/ABI/testing/sysfs-firmware-cca.
>
> I would have expected an attribute in /sys/class/tsm/tsmX to be the
> common protected guest indicator. Then, if you need to distinguish the
> architecture that registered that tsm it would be in the name of the
> parent device for the tsm class device.
>

It is not clear whether we need this capability early, for example in an
initrd configuration before loading the TSM driver, since
systemd-detect-virt reports the CC architecture.

Also, the general feedback was not to depend on device names or paths to
identify a confidential computing guest. Hence, parsing paths such as
../../devices/arm-rmi-dev-1/tsm/tsm0 may not be advisable.

>
> That also gives you the property that a uevent has signalled the arrival
> of tsm guest services. Otherwise, userspace still needs some custom
> device-model details to know when it can start issuing tsm requests.
>
> Is auxilliary device arrival too late in the flow for what systemd
> needs?

Systemd uses that to build part of its trust model.

static int import_credentials_qemu(ImportCredentialsContext *c) {

        if (detect_container() > 0) /* don't access /sys/ in a container */
                return 0;

        if (detect_confidential_virtualization() > 0) /* don't trust firmware if confidential VMs */
                return 0;
....

It also use that to build environment settings 

cv = detect_confidential_virtualization();
if (cv > 0) {
        r = strv_env_assign(&nl, "SYSTEMD_CONFIDENTIAL_VIRTUALIZATION", confidential_virtualization_to_string(cv));

IIUC, this would require the facility to be present even before we can
load the full set of modules.

-aneesh

^ permalink raw reply

* Re: [PATCH v7 5/6] firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC device
From: Aneesh Kumar K.V @ 2026-06-12  5:47 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-coco, linux-arm-kernel, linux-kernel
  Cc: Catalin Marinas, Greg KH, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price, Andre Przywara
In-Reply-To: <b1d4b888-bdbe-4a45-8561-4f27e0e9a1de@arm.com>

Suzuki K Poulose <suzuki.poulose@arm.com> writes:

..

>> diff --git a/include/linux/arm-smccc-rsi.h b/include/linux/arm-smccc-rsi.h
>> index fddb77986f70..ae663aa8fd7f 100644
>> --- a/include/linux/arm-smccc-rsi.h
>> +++ b/include/linux/arm-smccc-rsi.h
>> @@ -8,6 +8,8 @@
>>   
>>   #include <linux/arm-smccc.h>
>>   
>> +#define RSI_DEV_NAME "arm-rsi-dev"
>
> This shouldn't be here ? This is not part of the SMCCC RSI standard, but
> a linux thing. May be in drivers/firmware/../rsi.h ?
>

The name is used by the Arm SMCCC firmware driver
(drivers/firmware/smccc/smccc.c) and the arm-cca-guest driver.

Since it is used by the Arm SMCCC firmware driver, I used the above
header. We do not currently have a generic placeholder for RSI/RMI
definitions under drivers/.

-aneesh

^ permalink raw reply

* Re: [RFC PATCH 14/15] x86/virt/tdx: Embed version info in SEAMCALL leaf function definitions
From: Adrian Hunter @ 2026-06-12  5:47 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-15-yilun.xu@linux.intel.com>

On 22/05/2026 06:41, Xu Yilun wrote:
> Embed version information in SEAMCALL leaf function definitions rather
> than let the caller open code them. For now, only TDH.VP.INIT is
> involved.

> @@ -31,7 +44,7 @@
>  #define TDH_VP_CREATE			10
>  #define TDH_MNG_KEY_FREEID		20
>  #define TDH_MNG_INIT			21
> -#define TDH_VP_INIT			22
> +#define TDH_VP_INIT			SEAMCALL_LEAF_VER(22, 1)

FWIW I find the macro a bit ugly, and hiding the version number in
the leaf number macro a little counter-intuitive compared with setting
it at the call site.  It anyway needs some explanation at the call site.

> @@ -2217,8 +2217,8 @@ u64 tdh_vp_init(struct tdx_vp *vp, u64 initial_rcx, u32 x2apicid)
>  		.r8 = x2apicid,
>  	};
>  
> -	/* apicid requires version == 1. */
> -	return seamcall(TDH_VP_INIT | (1ULL << TDX_VERSION_SHIFT), &args);
> +	/* apicid requires version == 1. See TDH_VP_INIT definition.*/
> +	return seamcall(TDH_VP_INIT, &args);

Now the reader has to go look at TDH_VP_INIT.


^ permalink raw reply

* Re: [PATCH v7 3/6] firmware: smccc: Move RSI definitions to include/linux
From: Aneesh Kumar K.V @ 2026-06-12  5:41 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-coco, linux-arm-kernel, linux-kernel
  Cc: Catalin Marinas, Greg KH, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price, Andre Przywara
In-Reply-To: <b009e840-6b79-415c-a3da-705ea569af38@arm.com>

Suzuki K Poulose <suzuki.poulose@arm.com> writes:

> On 11/06/2026 14:04, Aneesh Kumar K.V (Arm) wrote:
>> The RSI SMCCC function IDs describe a firmware ABI and are not arm64
>> architecture specific definitions. Follow-up changes need to use them from
>> non-arch code, including drivers/firmware/smccc and the Arm CCA guest
>> driver.
>> 
>> Move the RSI SMCCC definitions from arch/arm64/include/asm/ to
>> include/linux/ so they can be shared with the driver code. This also
>> keeps the firmware interface outside architecture code, as requested [1].
>
> Please could we also mention about moving the "wrappers" only used by
> drivers accordingly ?
>

Added this

Not all helpers in rsi_cmds.h are used by architecture code. The
attestation token helper wrappers are only used by the Arm CCA guest
driver, so move them to a driver-private header under
drivers/virt/coco/arm-cca-guest/. Keep the remaining RSI command helpers,
which are shared by architecture code and drivers, in the arm64 header.


>
>> 
>> [1] https://lore.kernel.org/all/agsNO9cc7H-b0H8L@willie-the-truck
>> 
>> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
>> ---
>>   arch/arm64/include/asm/rsi_cmds.h             | 74 +---------------
>>   .../virt/coco/arm-cca-guest/arm-cca-guest.c   |  2 +
>>   drivers/virt/coco/arm-cca-guest/rsi.h         | 84 +++++++++++++++++++
>>   .../linux/arm-smccc-rsi.h                     |  6 +-
>>   4 files changed, 90 insertions(+), 76 deletions(-)
>>   create mode 100644 drivers/virt/coco/arm-cca-guest/rsi.h
>>   rename arch/arm64/include/asm/rsi_smc.h => include/linux/arm-smccc-rsi.h (98%)
>> 
>> diff --git a/arch/arm64/include/asm/rsi_cmds.h b/arch/arm64/include/asm/rsi_cmds.h
>> index 2c8763876dfb..633123a4e5d5 100644
>> --- a/arch/arm64/include/asm/rsi_cmds.h
>> +++ b/arch/arm64/include/asm/rsi_cmds.h
>> @@ -8,10 +8,9 @@
>>   
>>   #include <linux/arm-smccc.h>
>>   #include <linux/string.h>
>> +#include <linux/arm-smccc-rsi.h>
>
> super minor nit: Please keep them in the alphabetical order.
>
> With that:
>
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>

Thanks
-aneesh

^ permalink raw reply

* Re: [PATCH v7 6/6] coco: guest: arm64: Replace dummy CCA device with sysfs ABI
From: Dan Williams (nvidia) @ 2026-06-11 19:45 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm), linux-coco, linux-arm-kernel,
	linux-kernel
  Cc: Aneesh Kumar K.V (Arm), Catalin Marinas, Greg KH, Jeremy Linton,
	Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland, Sudeep Holla,
	Will Deacon, Steven Price, Suzuki K Poulose, Andre Przywara
In-Reply-To: <20260611130429.295516-7-aneesh.kumar@kernel.org>

Aneesh Kumar K.V (Arm) wrote:
> The SMCCC firmware driver now creates the arm-smccc platform device and
> instantiates the CCA RSI auxiliary devices once the RSI ABI is discovered.
> The arm64-specific arm-cca-dev platform device stub is therefore no longer
> needed.
> 
> However, userspace has used the arm-cca-dev platform device to detect Arm
> CCA Realm guests [1]. Removing it without a replacement would break that
> detection and would also leave userspace depending on kernel device-model
> details.
> 
> Add /sys/firmware/cca/realm_guest as a stable, architecture-provided ABI
> for detecting whether the kernel is running as an Arm CCA Realm guest. The
> file returns 1 in Realm world and 0 otherwise, similar to the existing s390
> /sys/firmware/uv/prot_virt_guest interface for protected virtualization
> guests.
> 
> Remove the dummy arm-cca-dev registration now that userspace has a
> dedicated CCA Realm guest indicator, and document the new ABI in
> Documentation/ABI/testing/sysfs-firmware-cca.

I would have expected an attribute in /sys/class/tsm/tsmX to be the
common protected guest indicator. Then, if you need to distinguish the
architecture that registered that tsm it would be in the name of the
parent device for the tsm class device. 

That also gives you the property that a uevent has signalled the arrival
of tsm guest services. Otherwise, userspace still needs some custom
device-model details to know when it can start issuing tsm requests.

Is auxilliary device arrival too late in the flow for what systemd
needs?

^ permalink raw reply

* Re: [RFC PATCH 13/15] KVM: TDX: Support event-notify interrupts only with userspace quoting
From: Adrian Hunter @ 2026-06-11 19:36 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-14-yilun.xu@linux.intel.com>

On 22/05/2026 06:41, Xu Yilun wrote:
> From: Peter Fang <peter.fang@intel.com>
> 
> Tie userspace SetupEventNotifyInterrupt support to userspace Quote
> generation. Delivering event-notify interrupts via userspace breaks if
> KVM never exits to userspace in the first place.

Breaks how exactly?

Seems like a TDX guest has no way to know whether the VMM will use
the Event Notify Interrupt anyway, so it cannot rely upon it, so
it should already handle the case when the interrupt does not fire.

> 
> No known guest currently requires event-notify interrupt support, so
> defer adding in-kernel support for now. Linux TDX guests use polling
> only.

If no guest is using it, then why does it need special treatment?

> 
> Update the KVM API Documentation to reflect the change.
> 
> Signed-off-by: Peter Fang <peter.fang@intel.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
>  Documentation/virt/kvm/api.rst |  8 +++++++-
>  arch/x86/kvm/vmx/tdx.c         | 20 +++++++++++++++++---
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 52bbbb553ce1..8a02745a36ee 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7335,6 +7335,9 @@ inputs and outputs of the TDVMCALL.  Currently the following values of
>     queued successfully, the TDX guest can poll the status field in the
>     shared-memory area to check whether the Quote generation is completed or
>     not. When completed, the generated Quote is returned via the same buffer.
> +   If the host kernel generates Quotes through the TDX Quoting service provided
> +   by the TDX module, KVM processes the GetQuote request and it will not appear
> +   in userspace.

There is an Attestation section in Documentation/virt/kvm/x86/intel-tdx.rst
that could be updated too.

> +                  KVM only supports version 1 of the GetQuote request.

Is that relevant here?

>  
>   * ``TDVMCALL_GET_TD_VM_CALL_INFO``: the guest has requested the support
>     status of TDVMCALLs.  The output values for the given leaf should be
> @@ -7342,7 +7345,10 @@ inputs and outputs of the TDVMCALL.  Currently the following values of
>     field of the union.
>  
>   * ``TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT``: the guest has requested to
> -   set up a notification interrupt for vector ``vector``.
> +   set up a notification interrupt for vector ``vector``.  Since this TDVMCALL
> +   is used to optimize ``TDVMCALL_GET_QUOTE``, KVM disables this support in
> +   userspace VMM if ``TDVMCALL_GET_QUOTE`` is completely handled in the kernel.
> +   KVM may add kernel support for this in the future.

Is that really necessary?

>  
>  KVM may add support for more values in the future that may cause a userspace
>  exit, even without calls to ``KVM_ENABLE_CAP`` or similar.  In this case,

^ permalink raw reply

* Re: [PATCH v6 02/11] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT
From: Vishal Annapurve @ 2026-06-11 18:47 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: bp, dave.hansen, hpa, kas, kvm, linux-coco, linux-doc,
	linux-kernel, mingo, nik.borisov, pbonzini, seanjc, tglx, x86,
	chao.gao, yan.y.zhao, kai.huang, Kirill A. Shutemov, Binbin Wu
In-Reply-To: <20260526023515.288829-3-rick.p.edgecombe@intel.com>

On Mon, May 25, 2026 at 7:35 PM Rick Edgecombe
<rick.p.edgecombe@intel.com> wrote:
>
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> The TDX Physical Address Metadata Table (PAMT) holds data about the
> physical memory used by TDX, and must be allocated by the kernel during
> TDX module initialization.
>
> The exact size of the required PAMT memory is determined by the TDX module
> and may vary between TDX module versions. Currently it is approximately
> 0.4% of the system memory. This is a significant commitment, especially if
> it is not known upfront whether the machine will run any TDX guests.
>
> Each memory region that the TDX module might use needs three separate PAMT
> allocations. One for each supported page size (1GB, 2MB, 4KB). The
> TDX module supports a new feature designed to reduce PAMT overhead called
> Dynamic PAMT. At a high level, Dynamic PAMT still has the 1GB and 2MB
> levels allocated on TDX module initialization, but the 4KB level is
> allocated dynamically during runtime.
>
> However, in the details, Dynamic PAMT still needs some smaller per 4KB
> page scoped data (currently it is 1 bit per page). The TDX module exposes
> the number of bits as a separate piece of metadata than the 4KB static
> allocation for regular PAMT. Although the size is enumerated differently,
> it is handed to the TDX module in the same way the 4KB page size PAMT
> allocation is for regular, non-dynamic PAMT.
>
> Begin to implement Dynamic PAMT in the kernel by reading the bits-per-page
> needed for Dynamic PAMT. Calculate the size needed for the bitmap,
> and use it instead of the 4KB size determined for normal PAMT, in the case
> of Dynamic PAMT.
>
> Unlike the existing metadata reading code, this code is not generated by a
> script. So adjust the comment to be more generic. Also, start to adopt a
> more normal kernel code style without the tenary statements and if
> conditionals assignments that the auto generated code has.
>
> Assisted-by: Sashiko:claude-opus-4-6
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

Kirill's comment make sense to me.

Reviewed-by: Vishal Annapurve <vannapurve@google.com>

^ permalink raw reply

* Re: [PATCH v6 01/11] x86/virt/tdx: Simplify tdmr_get_pamt_sz()
From: Vishal Annapurve @ 2026-06-11 18:25 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: bp, dave.hansen, hpa, kas, kvm, linux-coco, linux-doc,
	linux-kernel, mingo, nik.borisov, pbonzini, seanjc, tglx, x86,
	chao.gao, yan.y.zhao, kai.huang, Binbin Wu
In-Reply-To: <20260526023515.288829-2-rick.p.edgecombe@intel.com>

On Mon, May 25, 2026 at 7:35 PM Rick Edgecombe
<rick.p.edgecombe@intel.com> wrote:
>
> For each memory region that the TDX module might use (called TDMR), three
> separate traditional PAMT allocations are needed. One for each supported
> page size (1GB, 2MB, 4KB). These store information on each page in the
> TDMR. In Linux, they are allocated out of one physically contiguous block,
> in order to more efficiently use some internal TDX module book keeping
> resources. So some simple math is needed to break the single large
> allocation into three smaller allocations for each page size.
>
> There are some commonalities in the math needed to calculate the base and
> size for each smaller allocation, and so an effort was made to share logic
> across the three. Unfortunately doing this turned out unnaturally tortured,
> with a loop iterating over the three page sizes, only to call into a
> function with cases statement for each page size. In the future Dynamic
> PAMT will add more logic that is special to the 4KB page size, making the
> benefit of the math sharing even more questionable.
>
> Three is not a very high number, so get rid of the loop and just duplicate
> the small calculation three times. In doing so, setup for future Dynamic
> PAMT changes.
>
> Since the loop that iterates over it is gone, further simplify the code by
> dropping the array of intermediate size and base storage. Just store the
> values to their final locations. Accept the small complication of having
> to clear tdmr->pamt_4k_base in the error path, so that tdmr_do_pamt_func()
> will not try to operate on the TDMR struct when attempting to free it.
>
> Assisted-by: GitHub Copilot:claude-opus-4-6 Claude:claude-opus-4-7
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

Reviewed-by: Vishal Annapurve <vannapurve@google.com>

^ permalink raw reply

* Re: [RFC PATCH 10/15] x86/tdx: Move and rename Quote request structure
From: Adrian Hunter @ 2026-06-11 17:16 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-11-yilun.xu@linux.intel.com>

On 22/05/2026 06:41, Xu Yilun wrote:
> From: Peter Fang <peter.fang@intel.com>
> 
> struct tdx_quote_buf is currently used only by the guest, but the Quote
> buffer format will also be needed by the host for in-kernel Quote
> generation. Move the definition to tdx.h so it can be shared by both.
> 
> Rename the struct to tdx_quote_req to better reflect its purpose.
> 
> Signed-off-by: Peter Fang <peter.fang@intel.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
> -static int wait_for_quote_completion(struct tdx_quote_buf *quote_buf, u32 timeout)
> +static int wait_for_quote_completion(struct tdx_quote_req *quote_buf, u32 timeout)

Seems inconsistent to rename the struct but not the variable names

>  {
>  	int i = 0;

Please note, the timeout condition in wait_for_quote_completion() is
broken, in that the final value of i is timeout + 1 not timeout.
Since you are in the same area, that needs fixing that too.

>  
> @@ -269,7 +250,7 @@ static int wait_for_quote_completion(struct tdx_quote_buf *quote_buf, u32 timeou
>  static int tdx_report_new_locked(struct tsm_report *report, void *data)
>  {
>  	u8 *buf;
> -	struct tdx_quote_buf *quote_buf = quote_data;
> +	struct tdx_quote_req *quote_buf = quote_data;
>  	struct tsm_report_desc *desc = &report->desc;
>  	u32 out_len;
>  	int ret;


^ permalink raw reply

* Re: [RFC PATCH 09/15] x86/virt/tdx: Add interface to generate a Quote
From: Adrian Hunter @ 2026-06-11 17:15 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-10-yilun.xu@linux.intel.com>

On 22/05/2026 06:41, Xu Yilun wrote:
> From: Peter Fang <peter.fang@intel.com>
> 
> Use the TDX Quoting extension's TDH.QUOTE.GET SEAMCALL to generate a
> Quote. Since the interface is shared across all KVM instances,
> serialize access to the SEAMCALL buffer with a mutex.

Isn't the concurrency configurable, so supporting only 1 instance
is a decision of the software implementation, not a TDX limitation?

> +static u64 tdx_quote_get(struct tdx_td *td, u64 in_data_pa, u64 in_data_len,
> +			 u64 hpa_list_pa, u64 total_len, u64 *quote_len)
> +{
> +	struct tdx_module_args args = {
> +		.rcx = tdx_tdr_pa(td),
> +		/* Don't bother specifying the quote id */

Need to explain why

> +		.rdx = QUOTE_ID_MASK & (u64)-1,
> +		.r8 = in_data_pa,
> +		.r9 = in_data_len,
> +		.r10 = hpa_list_pa,
> +		.r11 = total_len,
> +	};
> +	u64 r;
> +
> +	do {
> +		r = seamcall_ret(TDH_QUOTE_GET, &args);
> +	} while (r == TDX_INTERRUPTED_RESUMABLE);
> +
> +	*quote_len = args.rcx;
> +
> +	return r;
> +}

...

> +	r = tdx_quote_get(td, quote_data.hpa_list[0], (u64)in_data_len,
> +			  quote_data.hpa_list_pa, quote_data.buf_len, &out_len);
> +	if (r || !out_len || out_len > quote_data.buf_len)

Is r != TDX_SUCCESS more consistent

> +		goto out;

^ permalink raw reply

* Re: [PATCH v7 5/6] firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC device
From: Suzuki K Poulose @ 2026-06-11 17:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm), linux-coco, linux-arm-kernel,
	linux-kernel
  Cc: Catalin Marinas, Greg KH, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price, Andre Przywara
In-Reply-To: <20260611130429.295516-6-aneesh.kumar@kernel.org>

On 11/06/2026 14:04, Aneesh Kumar K.V (Arm) wrote:
> The Arm CCA guest TSM provider currently binds through the arm-cca-dev
> platform device. Like arm-smccc-trng, this device is not an independent
> platform resource; it is a software representation of the RSI firmware
> service discovered through SMCCC.
> 
> Move RSI discovery into the SMCCC firmware driver. When the SMCCC conduit
> is SMC and if RSI ABI version call is supported, create an arm-rsi-dev
> SMCCC device. Convert the Arm CCA guest TSM provider to an SMCCC driver so
> it binds to that discovered RSI service and keeps module autoloading
> through the SMCCC device id table.
> 
> Keep the old arm-cca-dev platform-device registration for now. Userspace
> has used that device as a Realm-guest indicator, so removing it is left to
> a follow-up patch that adds a replacement sysfs ABI.
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>   arch/arm64/include/asm/rsi.h              |  2 -
>   arch/arm64/kernel/rsi.c                   |  2 +-
>   drivers/firmware/smccc/smccc.c            |  7 +++
>   drivers/virt/coco/arm-cca-guest/Kconfig   |  1 +
>   drivers/virt/coco/arm-cca-guest/arm-cca.c | 56 +++++++++++------------
>   include/linux/arm-smccc-rsi.h             |  2 +
>   6 files changed, 39 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
> index 88b50d660e85..5f9c8623183d 100644
> --- a/arch/arm64/include/asm/rsi.h
> +++ b/arch/arm64/include/asm/rsi.h
> @@ -10,8 +10,6 @@
>   #include <linux/jump_label.h>
>   #include <asm/rsi_cmds.h>
>   
> -#define RSI_PDEV_NAME "arm-cca-dev"
> -
>   DECLARE_STATIC_KEY_FALSE(rsi_present);
>   
>   void __init arm64_rsi_init(void);
> diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
> index 92160f2e57ff..da440f71bb64 100644
> --- a/arch/arm64/kernel/rsi.c
> +++ b/arch/arm64/kernel/rsi.c
> @@ -161,7 +161,7 @@ void __init arm64_rsi_init(void)
>   }
>   
>   static struct platform_device rsi_dev = {
> -	.name = RSI_PDEV_NAME,
> +	.name = "arm-cca-dev",
>   	.id = PLATFORM_DEVID_NONE
>   };
>   
> diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
> index a47696f3a5de..7127af3dbe5c 100644
> --- a/drivers/firmware/smccc/smccc.c
> +++ b/drivers/firmware/smccc/smccc.c
> @@ -10,6 +10,7 @@
>   #include <linux/arm-smccc.h>
>   #include <linux/kernel.h>
>   #include <linux/arm-smccc-bus.h>
> +#include <linux/arm-smccc-rsi.h>
>   
>   #include <asm/archrandom.h>
>   
> @@ -94,6 +95,12 @@ static const struct smccc_device_info smccc_devices[] __initconst = {
>   		.requires_smc   = false,
>   		.device_name    = "arm-smccc-trng",
>   	},
> +
> +	{
> +		.func_id        = SMC_RSI_ABI_VERSION,
> +		.requires_smc   = true,
> +		.device_name    = RSI_DEV_NAME,
> +	},
>   };
>   
>   static bool __init smccc_probe_smccc_device(const struct smccc_device_info *smccc_dev)
> diff --git a/drivers/virt/coco/arm-cca-guest/Kconfig b/drivers/virt/coco/arm-cca-guest/Kconfig
> index 3f0f013f03f1..ad7538750c5a 100644
> --- a/drivers/virt/coco/arm-cca-guest/Kconfig
> +++ b/drivers/virt/coco/arm-cca-guest/Kconfig
> @@ -1,6 +1,7 @@
>   config ARM_CCA_GUEST
>   	tristate "Arm CCA Guest driver"
>   	depends on ARM64
> +	depends on HAVE_ARM_SMCCC_DISCOVERY
>   	select TSM_REPORTS
>   	help
>   	  The driver provides userspace interface to request and
> diff --git a/drivers/virt/coco/arm-cca-guest/arm-cca.c b/drivers/virt/coco/arm-cca-guest/arm-cca.c
> index 0bbd1fa53ee4..4f9289ccf498 100644
> --- a/drivers/virt/coco/arm-cca-guest/arm-cca.c
> +++ b/drivers/virt/coco/arm-cca-guest/arm-cca.c
> @@ -4,6 +4,7 @@
>    */
>   
>   #include <linux/arm-smccc.h>
> +#include <linux/arm-smccc-bus.h>
>   #include <linux/cc_platform.h>
>   #include <linux/kernel.h>
>   #include <linux/mod_devicetable.h>
> @@ -189,16 +190,12 @@ static const struct tsm_report_ops arm_cca_tsm_report_ops = {
>   	.report_new = arm_cca_report_new,
>   };
>   
> -/**
> - * arm_cca_guest_init - Register with the Trusted Security Module (TSM)
> - * interface.
> - *
> - * Return:
> - * * %0        - Registered successfully with the TSM interface.
> - * * %-ENODEV  - The execution context is not an Arm Realm.
> - * * %-EBUSY   - Already registered.
> - */
> -static int __init arm_cca_guest_init(void)
> +static void unregister_cca_tsm_report(void *data)
> +{
> +	tsm_report_unregister(&arm_cca_tsm_report_ops);
> +}
> +
> +static int cca_tsm_probe(struct arm_smccc_device *sdev)
>   {
>   	int ret;
>   
> @@ -206,30 +203,33 @@ static int __init arm_cca_guest_init(void)
>   		return -ENODEV;
>   
>   	ret = tsm_report_register(&arm_cca_tsm_report_ops, NULL);
> -	if (ret < 0)
> -		pr_err("Error %d registering with TSM\n", ret);
> +	if (ret < 0) {
> +		dev_err_probe(&sdev->dev, ret, "Error registering with TSM\n");
> +		return ret;
> +	}
>   
> -	return ret;
> -}
> -module_init(arm_cca_guest_init);
> +	ret = devm_add_action_or_reset(&sdev->dev, unregister_cca_tsm_report,
> +				       NULL);
> +	if (ret < 0) {
> +		dev_err_probe(&sdev->dev, ret, "Error registering devm action\n");
> +		return ret;
> +	}
>   
> -/**
> - * arm_cca_guest_exit - unregister with the Trusted Security Module (TSM)
> - * interface.
> - */
> -static void __exit arm_cca_guest_exit(void)
> -{
> -	tsm_report_unregister(&arm_cca_tsm_report_ops);
> +	return 0;
>   }
> -module_exit(arm_cca_guest_exit);
>   
> -/* modalias, so userspace can autoload this module when RSI is available */
> -static const struct platform_device_id arm_cca_match[] __maybe_unused = {
> -	{ RSI_PDEV_NAME, 0},
> -	{ }
> +static const struct arm_smccc_device_id cca_tsm_id_table[] = {
> +	{ .name = RSI_DEV_NAME },
> +	{}
>   };
> +MODULE_DEVICE_TABLE(arm_smccc, cca_tsm_id_table);
>   
> -MODULE_DEVICE_TABLE(platform, arm_cca_match);
> +static struct arm_smccc_driver cca_tsm_driver = {
> +	.name = KBUILD_MODNAME,
> +	.probe = cca_tsm_probe,
> +	.id_table = cca_tsm_id_table,
> +};
> +module_arm_smccc_driver(cca_tsm_driver);
>   MODULE_AUTHOR("Sami Mujawar <sami.mujawar@arm.com>");
>   MODULE_DESCRIPTION("Arm CCA Guest TSM Driver");
>   MODULE_LICENSE("GPL");
> diff --git a/include/linux/arm-smccc-rsi.h b/include/linux/arm-smccc-rsi.h
> index fddb77986f70..ae663aa8fd7f 100644
> --- a/include/linux/arm-smccc-rsi.h
> +++ b/include/linux/arm-smccc-rsi.h
> @@ -8,6 +8,8 @@
>   
>   #include <linux/arm-smccc.h>
>   
> +#define RSI_DEV_NAME "arm-rsi-dev"

This shouldn't be here ? This is not part of the SMCCC RSI standard, but
a linux thing. May be in drivers/firmware/../rsi.h ?

Rest looks fine.

Suzuki


> +
>   /*
>    * This file describes the Realm Services Interface (RSI) Application Binary
>    * Interface (ABI) for SMC calls made from within the Realm to the RMM and


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox