Linux Confidential Computing Development
 help / color / mirror / Atom feed
* Re: [PATCH v8 6/7] KVM: SEV: Perform RMP optimizations on SNP guest shutdown
From: Dave Hansen @ 2026-06-18 21:42 UTC (permalink / raw)
  To: Ashish Kalra, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <25c3693a59c8f00796e84f1ffa668df6e3b734b5.1781419998.git.ashish.kalra@amd.com>

On 6/15/26 12:50, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Pages are converted from shared to private as SNP guests are launched.
> This destroys exisiting RMPOPT optimizations in the regions where
> pages are converted.
> 
> Conversely, guest pages are converted back to shared during SNP guest
> termination and their region may become eligible for RMPOPT
> optimization.

Oh, actually that would be good text for the *previous* patch too. You
might want to move some of it there.

^ permalink raw reply

* Re: [PATCH v8 5/7] x86/sev: Add interface to re-enable RMP optimizations.
From: Dave Hansen @ 2026-06-18 21:41 UTC (permalink / raw)
  To: Ashish Kalra, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <cdb8098074de8e150dcf534ab806e38744325a57.1781419998.git.ashish.kalra@amd.com>

On 6/15/26 12:49, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> RMPOPT table is a per-CPU table which indicates if 1GB regions of
> physical memory are entirely hypervisor-owned or not.
> 
> When performing host memory accesses in hypervisor mode as well as
> non-SNP guest mode, the processor may consult the RMPOPT table to
> potentially skip an RMP access and improve performance.
> 
> Events such as RMPUPDATE can clear RMP optimizations. Add an interface
> to re-enable those optimizations.

This doesn't really help me understand when or how this function might
be called.

	Normal guest evens like splitting and collapsing large pages can
	clear RMP optimizations. Without some intervention, all RMP
	optimizations would eventually be lost. Periodically re-optimize
	the system.

> The interface uses mod_delayed_work() instead of queue_delayed_work()
> so that the delay timer is reset on each call. This provides proper
> batching semantics: re-optimization runs 10 seconds after the *last*
> VM termination rather than after the first. mod_delayed_work() also
> re-queues work that is already in-flight, so a re-scan request
> during an active scan is not silently dropped.

This seems sane.

> +void snp_rmpopt_all_physmem(void)
> +{
> +	if (!cpu_feature_enabled(X86_FEATURE_RMPOPT) || !rmpopt_configured)
> +		return;
> +
> +	guard(mutex)(&rmpopt_wq_mutex);
> +
> +	if (!rmpopt_wq)
> +		return;
> +
> +	mod_delayed_work(rmpopt_wq, &rmpopt_delayed_work,
> +			 msecs_to_jiffies(RMPOPT_WORK_TIMEOUT));
> +}
> +EXPORT_SYMBOL_GPL(snp_rmpopt_all_physmem);

Does this need to be globally exported? Or can it be exported to a
single module namespace?

I'm close to being able to ack this, but it's still got a few too many
nits to ack.

^ permalink raw reply

* Re: [PATCH v8 3/7] crypto/ccp: Disable CPU hotplug while SNP is active
From: Dave Hansen @ 2026-06-18 21:35 UTC (permalink / raw)
  To: Ashish Kalra, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <1feccf6e2a56d949b30f403c0ca7949f580e5982.1781419998.git.ashish.kalra@amd.com>

On 6/15/26 12:49, Ashish Kalra wrote:
> +	/*
> +	 * Disable CPU hotplug while SNP is active.  Guard against stacking
> +	 * the disable count: the legacy SNP_SHUTDOWN_EX path clears
> +	 * snp_initialized without re-enabling hotplug, so this can run
> +	 * again while hotplug is already disabled.
> +	 */
> +	if (!snp_cpu_hotplug_disabled) {
> +		cpu_hotplug_disable();
> +		snp_cpu_hotplug_disabled = true;
> +	}

This seems like a hack, guys.

cpu_hotplug_disable() seems like more of a temporary lock than enforcing
basically permanent system state.

This seems like it would be better implemented by registering a CPU
hotplug callback and then refusing to offline if sev->snp_initialized is
set.

snp_setup_rmpopt() can be run any time, right? It doesn't need to be
after sev->snp_initialized=1.

^ permalink raw reply

* Re: [PATCH v8 2/7] x86/sev: Initialize RMPOPT configuration MSRs
From: Tom Lendacky @ 2026-06-18 21:08 UTC (permalink / raw)
  To: Ashish Kalra, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <6a4d0ec9e037d91c0fdba9c525942ca288e1b1b2.1781419998.git.ashish.kalra@amd.com>

On 6/15/26 14:48, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> The new RMPOPT instruction helps manage per-CPU RMP optimization
> structures inside the CPU. It takes a 1GB-aligned physical address
> and either returns the status of the optimizations or tries to enable
> the optimizations.
> 
> Per-CPU RMPOPT tables support at most 2 TB of addressable memory for
> RMP optimizations.
> 
> Initialize the per-CPU RMPOPT table base to the starting physical
> address. This enables RMP optimization for up to 2 TB of system RAM on
> all CPUs.
> 
> Additionally, add support to setup and enable RMPOPT once SNP is
> enabled and initialized.
> 
> Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/coco/core.c             |  2 +
>  arch/x86/include/asm/msr-index.h |  3 ++
>  arch/x86/include/asm/sev.h       |  4 ++
>  arch/x86/virt/svm/sev.c          | 70 ++++++++++++++++++++++++++++++++
>  drivers/crypto/ccp/sev-dev.c     |  3 ++
>  5 files changed, 82 insertions(+)
> 
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index 989ca9f72ba3..8c1393ddc5df 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -16,6 +16,7 @@
>  #include <asm/archrandom.h>
>  #include <asm/coco.h>
>  #include <asm/processor.h>
> +#include <asm/sev.h>
>  
>  enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
>  SYM_PIC_ALIAS(cc_vendor);
> @@ -172,6 +173,7 @@ static void amd_cc_platform_clear(enum cc_attr attr)
>  	switch (attr) {
>  	case CC_ATTR_HOST_SEV_SNP:
>  		cc_flags.host_sev_snp = 0;
> +		snp_clear_rmpopt_configured();
>  		break;
>  	default:
>  		break;
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 86554de9a3f5..28540744f1eb 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -761,6 +761,9 @@
>  #define MSR_AMD64_SEG_RMP_ENABLED_BIT	0
>  #define MSR_AMD64_SEG_RMP_ENABLED	BIT_ULL(MSR_AMD64_SEG_RMP_ENABLED_BIT)
>  #define MSR_AMD64_RMP_SEGMENT_SHIFT(x)	(((x) & GENMASK_ULL(13, 8)) >> 8)
> +#define MSR_AMD64_RMPOPT_BASE		0xc0010139
> +#define MSR_AMD64_RMPOPT_ENABLE_BIT	0
> +#define MSR_AMD64_RMPOPT_ENABLE		BIT_ULL(MSR_AMD64_RMPOPT_ENABLE_BIT)
>  
>  #define MSR_SVSM_CAA			0xc001f000
>  
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 594cfa19cbd4..0d662221615a 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -662,6 +662,8 @@ static inline void snp_leak_pages(u64 pfn, unsigned int pages)
>  	__snp_leak_pages(pfn, pages, true);
>  }
>  int snp_prepare(void);
> +void snp_setup_rmpopt(void);
> +void snp_clear_rmpopt_configured(void);
>  void snp_shutdown(void);
>  #else
>  static inline bool snp_probe_rmptable_info(void) { return false; }
> @@ -680,6 +682,8 @@ static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
>  static inline void kdump_sev_callback(void) { }
>  static inline void snp_fixup_e820_tables(void) {}
>  static inline int snp_prepare(void) { return -ENODEV; }
> +static inline void snp_setup_rmpopt(void) {}
> +static inline void snp_clear_rmpopt_configured(void) {}
>  static inline void snp_shutdown(void) {}
>  #endif
>  
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index 8bcdce98f6dc..1b5c18408f0b 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -124,6 +124,10 @@ static void *rmp_bookkeeping __ro_after_init;
>  
>  static u64 probed_rmp_base, probed_rmp_size;
>  
> +static cpumask_t rmpopt_cpumask;
> +static phys_addr_t rmpopt_pa_start;
> +static bool rmpopt_configured;

The usage of this isn't doesn't imply what the name says. How about
changing it to rmpopt_capable ?

> +
>  static LIST_HEAD(snp_leaked_pages_list);
>  static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
>  
> @@ -490,7 +494,12 @@ static bool __init setup_rmptable(void)
>  	if (rmp_cfg & MSR_AMD64_SEG_RMP_ENABLED) {
>  		if (!setup_segmented_rmptable())
>  			return false;
> +		rmpopt_configured = true;
>  	} else {
> +		/*
> +		 * RMPOPT requires a segmented RMP table, so leave
> +		 * rmpopt_configured clear on contiguous RMP systems.
> +		 */

I think this comment should be above where rmpopt_configured is set,
slightly changed to

	RMPOPT requires a segmented RMP, so indicate that the system
	is capable of configuring and running RMPOPT.

Thanks,
Tom
>  		if (!setup_contiguous_rmptable())
>  			return false;
>  	}
> @@ -555,6 +564,21 @@ int snp_prepare(void)
>  }
>  EXPORT_SYMBOL_FOR_MODULES(snp_prepare, "ccp");
>  
> +static void rmpopt_cleanup(void)
> +{
> +	int cpu;
> +
> +	cpus_read_lock();
> +
> +	for_each_cpu(cpu, &rmpopt_cpumask)
> +		WARN_ON_ONCE(wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, 0));
> +
> +	cpus_read_unlock();
> +
> +	cpumask_clear(&rmpopt_cpumask);
> +	rmpopt_pa_start = 0;
> +}
> +
>  void snp_shutdown(void)
>  {
>  	u64 syscfg;
> @@ -563,11 +587,57 @@ void snp_shutdown(void)
>  	if (syscfg & MSR_AMD64_SYSCFG_SNP_EN)
>  		return;
>  
> +	rmpopt_cleanup();
> +
>  	clear_rmp();
>  	on_each_cpu(mfd_reconfigure, NULL, 1);
>  }
>  EXPORT_SYMBOL_FOR_MODULES(snp_shutdown, "ccp");
>  
> +void snp_clear_rmpopt_configured(void)
> +{
> +	rmpopt_configured = false;
> +}
> +
> +void snp_setup_rmpopt(void)
> +{
> +	u64 rmpopt_base;
> +	int cpu;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_RMPOPT) || !rmpopt_configured)
> +		return;
> +
> +	cpus_read_lock();
> +
> +	/*
> +	 * The RMPOPT_BASE MSR is per-core, so only one thread per core needs
> +	 * to set up the RMPOPT_BASE MSR.
> +	 *
> +	 * Note: only online primary threads are included.  If a core's
> +	 * primary thread is offline, that core is not covered.  CPU hotplug
> +	 * is not currently supported with SNP enabled.
> +	 */
> +
> +	for_each_online_cpu(cpu)
> +		if (topology_is_primary_thread(cpu))
> +			cpumask_set_cpu(cpu, &rmpopt_cpumask);
> +
> +	rmpopt_pa_start = ALIGN_DOWN(PFN_PHYS(min_low_pfn), SZ_1G);
> +	rmpopt_base = rmpopt_pa_start | MSR_AMD64_RMPOPT_ENABLE;
> +
> +	/*
> +	 * Per-CPU RMPOPT tables support at most 2 TB of addressable memory
> +	 * for RMP optimizations. Initialize the per-CPU RMPOPT table base
> +	 * to the starting physical address to enable RMP optimizations for
> +	 * up to 2 TB of system RAM on all CPUs.
> +	 */
> +	for_each_cpu(cpu, &rmpopt_cpumask)
> +		WARN_ON_ONCE(wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, rmpopt_base));
> +
> +	cpus_read_unlock();
> +}
> +EXPORT_SYMBOL_FOR_MODULES(snp_setup_rmpopt, "ccp");
> +
>  /*
>   * Do the necessary preparations which are verified by the firmware as
>   * described in the SNP_INIT_EX firmware command description in the SNP
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 78f98aee7a66..217b6b19802e 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1478,6 +1478,9 @@ static int __sev_snp_init_locked(int *error, unsigned int max_snp_asid)
>  	}
>  
>  	snp_hv_fixed_pages_state_update(sev, HV_FIXED);
> +
> +	snp_setup_rmpopt();
> +
>  	sev->snp_initialized = true;
>  	dev_dbg(sev->dev, "SEV-SNP firmware initialized, SEV-TIO is %s\n",
>  		data.tio_en ? "enabled" : "disabled");


^ permalink raw reply

* Re: [PATCH v8 7/7] x86/sev: Add debugfs support for RMPOPT
From: Borislav Petkov @ 2026-06-18 20:10 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: tglx, mingo, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb, pbonzini, aik,
	Michael.Roth, KPrateek.Nayak, Tycho.Andersen, Nathan.Fontenot,
	ackerleytng, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <5849645c-f701-4768-8cdf-1f9032e3226f@amd.com>

On Thu, Jun 18, 2026 at 02:57:45PM -0500, Kalra, Ashish wrote:
> Maybe i can add a line to this patch's commit message stating it's a debug-only interface
> with no stability guarantee.

Sounds to me like you didn't really read that article.

> We have to provide some method/interface for users to verify if RMP optimizations
> are enabled for a GB range of memory.

Sounds to me like this wants to be a facility which is present in the kernel
and it is going to be an ABI.

I am unclear on the real use case but I'm open to being persuaded otherwise.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v8 7/7] x86/sev: Add debugfs support for RMPOPT
From: Kalra, Ashish @ 2026-06-18 19:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tglx, mingo, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb, pbonzini, aik,
	Michael.Roth, KPrateek.Nayak, Tycho.Andersen, Nathan.Fontenot,
	ackerleytng, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <20260618180814.GCajQ0Dv0CoRMJxbP0@fat_crate.local>


On 6/18/2026 1:08 PM, Borislav Petkov wrote:
> On Mon, Jun 15, 2026 at 07:50:56PM +0000, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> Add a debugfs interface to report per-CPU RMPOPT status across all
>> system RAM.
>>
>> To dump the per-CPU RMPOPT status for all system RAM:
>>
>> /sys/kernel/debug/rmpopt# cat rmpopt-table
>>
>> Memory @  0GB: CPU(s): none
>> Memory @  1GB: CPU(s): none
>> Memory @  2GB: CPU(s): 0-1023
>> Memory @  3GB: CPU(s): 0-1023
>> Memory @  4GB: CPU(s): none
>> Memory @  5GB: CPU(s): 0-1023
>> Memory @  6GB: CPU(s): 0-1023
>> Memory @  7GB: CPU(s): 0-1023
>> ...
>> Memory @1025GB: CPU(s): 0-1023
>> Memory @1026GB: CPU(s): 0-1023
>> Memory @1027GB: CPU(s): 0-1023
>> Memory @1028GB: CPU(s): 0-1023
>> Memory @1029GB: CPU(s): 0-1023
>> Memory @1030GB: CPU(s): 0-1023
>> Memory @1031GB: CPU(s): 0-1023
>> Memory @1032GB: CPU(s): 0-1023
>> Memory @1033GB: CPU(s): 0-1023
>> Memory @1034GB: CPU(s): 0-1023
>> Memory @1035GB: CPU(s): 0-1023
>> Memory @1036GB: CPU(s): 0-1023
>> Memory @1037GB: CPU(s): 0-1023
>> Memory @1038GB: CPU(s): none
>>
>> Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> ---
>>  arch/x86/virt/svm/sev.c | 128 ++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 128 insertions(+)
> 
> https://lwn.net/Articles/309298/
> 

Since the RMPOPT file is a diagnostic (verify the optimization took effect), debugfs is
arguably the right home for it and we are not claiming it to be an API (there is no
Documentation/ABI entry for it) and we are not presenting it as something tools should
depend on, it is a self-contained diagnostic/debug interface.

Maybe i can add a line to this patch's commit message stating it's a debug-only interface
with no stability guarantee.

We have to provide some method/interface for users to verify if RMP optimizations
are enabled for a GB range of memory.

Thanks,
Ashish

^ permalink raw reply

* Re: [PATCH v8 2/7] x86/sev: Initialize RMPOPT configuration MSRs
From: Kalra, Ashish @ 2026-06-18 18:23 UTC (permalink / raw)
  To: K Prateek Nayak, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, Tycho.Andersen, Nathan.Fontenot,
	ackerleytng, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <fb2f1105-3bef-4197-bccd-865c013ce712@amd.com>

Hello Prateek,

On 6/16/2026 1:03 AM, K Prateek Nayak wrote:
> Hello Ashish,
> 
> On 6/16/2026 1:18 AM, Ashish Kalra wrote:
>> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
>> index 8bcdce98f6dc..1b5c18408f0b 100644
>> --- a/arch/x86/virt/svm/sev.c
>> +++ b/arch/x86/virt/svm/sev.c
>> @@ -124,6 +124,10 @@ static void *rmp_bookkeeping __ro_after_init;
>>  
>>  static u64 probed_rmp_base, probed_rmp_size;
>>  
>> +static cpumask_t rmpopt_cpumask;
> 
> nit.
> 
> I believe you can use cpumask_var_t here and do a zalloc_cpumask_var()
> during snp_setup_rmpopt(). That way !X86_FEATURE_RMPOPT configs don't
> have to needlessly waste space to keep a redundant cpumask around.
> 
> Same comment for rmpopt_report_cpumask in Patch 7 which can be
> allocated dynamically during rmpopt_debugfs_setup().
> 

Yes.

>> +static phys_addr_t rmpopt_pa_start;
>> +static bool rmpopt_configured;
>> +
>>  static LIST_HEAD(snp_leaked_pages_list);
>>  static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
>>  
>> @@ -490,7 +494,12 @@ static bool __init setup_rmptable(void)
>>  	if (rmp_cfg & MSR_AMD64_SEG_RMP_ENABLED) {
>>  		if (!setup_segmented_rmptable())
>>  			return false;
>> +		rmpopt_configured = true;
>>  	} else {
>> +		/*
>> +		 * RMPOPT requires a segmented RMP table, so leave
>> +		 * rmpopt_configured clear on contiguous RMP systems.
>> +		 */
>>  		if (!setup_contiguous_rmptable())
>>  			return false;
>>  	}
>> @@ -555,6 +564,21 @@ int snp_prepare(void)
>>  }
>>  EXPORT_SYMBOL_FOR_MODULES(snp_prepare, "ccp");
>>  
>> +static void rmpopt_cleanup(void)
>> +{
>> +	int cpu;
>> +
>> +	cpus_read_lock();
> 
> nit.
> 
> You can use guard(cpus_read_lock)() unless there is a complicated
> locking pattern where you need to drop and re-acquire the read lock.

But if i use guard(cpus_read_lock)(), cpus_read_lock stays held across as it is
function-scope, so it will be still held for code following the wrmsrq_on_cpu(),
which is harmless but still changes code behavior.

Probably, the other option is to use scoped_guard form ? 

Thanks,
Ashish

> 
>> +
>> +	for_each_cpu(cpu, &rmpopt_cpumask)
>> +		WARN_ON_ONCE(wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, 0));
>> +
>> +	cpus_read_unlock();
>> +
>> +	cpumask_clear(&rmpopt_cpumask);
>> +	rmpopt_pa_start = 0;
>> +}
>> +
>>  void snp_shutdown(void)
>>  {
>>  	u64 syscfg;
> 

^ permalink raw reply

* Re: [PATCH v8 7/7] x86/sev: Add debugfs support for RMPOPT
From: Borislav Petkov @ 2026-06-18 18:08 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: tglx, mingo, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb, pbonzini, aik,
	Michael.Roth, KPrateek.Nayak, Tycho.Andersen, Nathan.Fontenot,
	ackerleytng, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <cc9aa9b6cfa2ce826f2ad53f8a13d3b7bf0790b6.1781419998.git.ashish.kalra@amd.com>

On Mon, Jun 15, 2026 at 07:50:56PM +0000, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Add a debugfs interface to report per-CPU RMPOPT status across all
> system RAM.
> 
> To dump the per-CPU RMPOPT status for all system RAM:
> 
> /sys/kernel/debug/rmpopt# cat rmpopt-table
> 
> Memory @  0GB: CPU(s): none
> Memory @  1GB: CPU(s): none
> Memory @  2GB: CPU(s): 0-1023
> Memory @  3GB: CPU(s): 0-1023
> Memory @  4GB: CPU(s): none
> Memory @  5GB: CPU(s): 0-1023
> Memory @  6GB: CPU(s): 0-1023
> Memory @  7GB: CPU(s): 0-1023
> ...
> Memory @1025GB: CPU(s): 0-1023
> Memory @1026GB: CPU(s): 0-1023
> Memory @1027GB: CPU(s): 0-1023
> Memory @1028GB: CPU(s): 0-1023
> Memory @1029GB: CPU(s): 0-1023
> Memory @1030GB: CPU(s): 0-1023
> Memory @1031GB: CPU(s): 0-1023
> Memory @1032GB: CPU(s): 0-1023
> Memory @1033GB: CPU(s): 0-1023
> Memory @1034GB: CPU(s): 0-1023
> Memory @1035GB: CPU(s): 0-1023
> Memory @1036GB: CPU(s): 0-1023
> Memory @1037GB: CPU(s): 0-1023
> Memory @1038GB: CPU(s): none
> 
> Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/virt/svm/sev.c | 128 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 128 insertions(+)

https://lwn.net/Articles/309298/

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v6 00/20] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths
From: Jason Gunthorpe @ 2026-06-18 15:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Alexey Kardashevskiy, Catalin Marinas, iommu, linux-arm-kernel,
	linux-kernel, linux-coco, Robin Murphy, Marek Szyprowski,
	Will Deacon, Marc Zyngier, Steven Price, Suzuki K Poulose,
	Jiri Pirko, Mostafa Saleh, Petr Tesarik, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <yq5aqzm4dz25.fsf@kernel.org>

On Thu, Jun 18, 2026 at 09:37:22AM +0100, Aneesh Kumar K.V wrote:
> Alexey Kardashevskiy <aik@amd.com> writes:
> 
> > On 10/6/26 00:47, Jason Gunthorpe wrote:
> >> On Tue, Jun 09, 2026 at 02:43:08PM +0100, Catalin Marinas wrote:
> >>> On Thu, Jun 04, 2026 at 02:09:39PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >>>> This series propagates DMA_ATTR_CC_SHARED through the dma-direct,
> >>>> dma-pool, and swiotlb paths so that encrypted and decrypted DMA buffers
> >>>> are handled consistently.
> >>>>
> >>>> Today, the direct DMA path mostly relies on force_dma_unencrypted() for
> >>>> shared/decrypted buffer handling. This series consolidates the
> >>>> force_dma_unencrypted() checks in the top-level functions and ensures
> >>>> that the remaining DMA interfaces use DMA attributes to make the correct
> >>>> decisions.
> >>>
> >>> Please check Sashiko's reports, it has some good points:
> >>>
> >>> https://sashiko.dev/#/patchset/20260604083959.1265923-1-aneesh.kumar@kernel.org
> >>>
> >>> I think the main one is the swiotlb_tbl_map_single() changes which break
> >>> AMD SME host support. There cc_platform_has(CC_ATTR_MEM_ENCRYPT) is true
> >>> but force_dma_unencrypted() is false. Normally you'd not end up on this
> >>> path but you can have swiotlb=force.
> >> 
> >> IMHO that's an AMD issue, not with the design of this series..
> >> 
> >> The series is right, a device that is !force_dma_decrypted() must be
> >> considerd to be a trusted device and we must never place any DMA
> >> mappings for a trusted device into shared memory.
> >
> > swiotlb=force forces swiotlb, not decryption.

If force_dma_decrypted() == true then swiotlb must allocate from a
decrypted memory pool. It is right there in the name!

The hypervisor environment should *never* set force_dma_decrypted()
because all devices can access all hypervisor memory, up to their IOVA
limits.

> > So when I try "mem_encrypt=on iommu=pt swiotlb=force" with this
> > patchset, it fails to boot. But it boots with a hack like this:

On the host side I expect this to cause swiotlb to allocate encrypted
memory and bounce to it.

>  		u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
>  		u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
>  						dev->bus_dma_limit);
> +		/*
> +		 * With memory encryption enabled, SWIOTLB is marked decrypted.
> +		 * If SWIOTLB bouncing is forced, treat the device as requiring
> +		 * decrypted DMA.
> +		 */

And this is more insane logic. The right fix is to allocate the
swiotlb bounce from the *encrypted* pools when running on the
hypervisor which requires undoing this abuse of force_dma_decrypted().

Jason

^ permalink raw reply

* Re: [PATCH v2 02/17] x86/virt/tdx: Configure add-on features on TDX module init and update
From: Dave Hansen @ 2026-06-18 15:04 UTC (permalink / raw)
  To: Xu Yilun, x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, xiaoyao.li, sohil.mehta,
	adrian.hunter, kishen.maloor, tony.lindgren, peter.fang, baolu.lu,
	zhenzhong.duan, dave.hansen, seanjc
In-Reply-To: <20260618081355.3253581-3-yilun.xu@linux.intel.com>

On 6/18/26 01:13, Xu Yilun wrote:
>  int tdx_module_run_update(void)
>  {
> +	u64 seamcall_fn = TDH_SYS_UPDATE_V0;
>  	struct tdx_module_args args = {};
>  	int ret;
>  
> -	ret = seamcall_prerr(TDH_SYS_UPDATE, &args);
> +	if (tdx_addon_feature0) {
> +		args.r9 = tdx_addon_feature0;
> +		seamcall_fn = TDH_SYS_UPDATE;
> +	}

Heh, and it falls apart into craziness immediately. See how it
immediately loses the logical information that there's a version 1 and a
version 0? The "1" isn't even visible. It's hidden in "TDH_SYS_UPDATE".

Isn't this a million times more sane?

	struct tdx_module_args args = {};
	u64 version;
  	int ret;

	if (tdx_addon_feature0) {
		args.r9 = tdx_addon_feature0;
		version = 1;
	} else {
		version = 0;
	}

	ret = seamcall_prerr(TDH_SYS_UPDATE, version, &args);


There's also zero stopping us from putting version in args:

	struct tdx_module_args args = {};
  	int ret;

	if (tdx_addon_feature0) {
		args.r9 = tdx_addon_feature0;
		args.version = 1;
	}

	ret = seamcall_prerr(TDH_SYS_UPDATE, &args);

Eh?

That gives args.version==0 in all the normal cases which just happens to
be the exact behavior we want. It also avoids having to plumb version
through all the seamcall*() wrappers.

But this is *exactly* the kind of thing that shouldn't be a part of an
attestation patch series. This could very much have been a separate
discussion and happened a month or a year ago. But now it is blocking
this DICE thing from getting done <grumble>.

^ permalink raw reply

* Re: [PATCH v2 01/17] x86/virt/tdx: Embed version info in SEAMCALL leaf function definitions
From: Dave Hansen @ 2026-06-18 14:45 UTC (permalink / raw)
  To: Xu Yilun, x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, xiaoyao.li, sohil.mehta,
	adrian.hunter, kishen.maloor, tony.lindgren, peter.fang, baolu.lu,
	zhenzhong.duan, dave.hansen, seanjc
In-Reply-To: <20260618081355.3253581-2-yilun.xu@linux.intel.com>

On 6/18/26 01:13, Xu Yilun wrote:
> Embed version information in SEAMCALL leaf function definitions rather
> than let the caller open code them. For now, only TDH.VP.INIT is
> involved.

This is jumping the gun a bit in the changelog.

What is a SEAMCALL leaf function?

How does the version fit in?

> Don't bother the caller to choose the SEAMCALL version if unnecessary.

I think I see what you are trying to say here but it's more than that.

The question is whether there should be a base seamcall() API that takes
an explicit version or whether the version should be passed in by callers.

One wrinkle is that the naming of all of these things is around
"function", "func" and "fn":

u64 __seamcall(u64 fn, struct tdx_module_args *args);

A "function" is TDH.SYS.INIT or TDH.SYS.INFO, not 'TDH.SYS.INFO v123'.

But the low-level calls could be:

	u64 __seamcall(u64 fn, u64 version, ...);
	
or

	u64 __seamcall(u64 fn, ...);

Where 'fn' encodes the function *and* version.

> The old TDX modules that only support TDH.VP.INIT v0 are all deprecated,
> so only provide the latest (v1) definition.

No, this isn't how this is going to work.

What do we *NEED* from v1? Why churn the code if we don't *NEED*
something from v1 and can live with v0? It has *ZERO* to do with the TDX
module being deprecated or whatever.

Linux stays on the old interface until we need a new interface. We are
*not* going to bump version numbers just because.


>  /*
>   * TDX module SEAMCALL leaf functions
>   */
> @@ -31,7 +44,7 @@
>  #define TDH_VP_CREATE			10
>  #define TDH_MNG_KEY_FREEID		20
>  #define TDH_MNG_INIT			21
> -#define TDH_VP_INIT			22
> +#define TDH_VP_INIT			SEAMCALL_LEAF_VER(22, 1)
>  #define TDH_PHYMEM_PAGE_RDMD		24
>  #define TDH_VP_RD			26
>  #define TDH_PHYMEM_PAGE_RECLAIM		28
> @@ -50,14 +63,6 @@
>  #define TDH_SYS_UPDATE			53
>  #define TDH_SYS_DISABLE			69

That is unreadable and patterns can't be seen. This is better:

#define TDH_MNG_INIT			SEAMCALL_LEAF_VER(21, 0)
#define TDH_VP_INIT			SEAMCALL_LEAF_VER(22, 1)
#define TDH_PHYMEM_PAGE_RDMD		SEAMCALL_LEAF_VER(24, 0)

> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1903,8 +1903,7 @@ u64 tdh_vp_init(struct tdx_vp *vp, u64 initial_rcx, u32 x2apicid)
>  		.r8 = x2apicid,
>  	};
>  
> -	/* apicid requires version == 1. */
> -	return seamcall(TDH_VP_INIT | (1ULL << TDX_VERSION_SHIFT), &args);
> +	return seamcall(TDH_VP_INIT, &args);
>  }
>  EXPORT_SYMBOL_FOR_KVM(tdh_vp_init);

But that whole scheme falls apart the first time the kernel needs
functionality from v2. You'll need:

#define TDH_VP_INIT_V0			SEAMCALL_LEAF_VER(22, 0)
#define TDH_VP_INIT_V1			SEAMCALL_LEAF_VER(22, 1)

and then the calls will do:

	if (foo)
		return seamcall(TDH_VP_INIT_V0, &args);
	else
		return seamcall(TDH_VP_INIT_V1, &args);

So this 100% goes down the road of needing #defines for *EACH* version.
That's the real implication here and the real choice.

That said, I don't particularly like:

	if (foo)
		return seamcall(TDH_VP_INIT, 0, &args);
	else
		return seamcall(TDH_VP_INIT, 1, &args);

all that much either because of the seemingly magic numbers.

The whole seamcall RAX thing is one step too clever. I think Linux did
the right thing:

5	common	open				sys_open
288	common	openat				sys_openat
437	common	openat2				sys_openat2

New ABI gets a new base number. No need for the other end of the ABI to
know that 288 is arguably a later version of 5.

Ugh. But why is this patch even in here in the first place? Why is this
even *ASSOCIATED* with DICE-based attestation? Isn't this completely
orthogonal?

^ permalink raw reply

* [PATCH v2 17/17] KVM: TDX: Support event-notify interrupts only with userspace Quoting
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

Tie userspace SetupEventNotifyInterrupt support to userspace Quote
generation. Delivering event-notify interrupts via userspace breaks if
KVM never exits to userspace in the first place.

This is an optional capability to notify the guest when Quoting has
completed. No known guest currently uses it, so defer adding in-kernel
support for now. The Linux TDX guest relies on polling only.

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 20558b0185b6..25146da3933f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -185,7 +185,7 @@ static void td_init_cpuid_entry2(struct kvm_cpuid_entry2 *entry, unsigned char i
 	tdx_clear_unsupported_cpuid(entry);
 }
 
-#define TDVMCALLINFO_SETUP_EVENT_NOTIFY_INTERRUPT	BIT(1)
+#define TDVMCALLINFO_SETUP_EVENT_NOTIFY_INTERRUPT	BIT_ULL(1)
 
 static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf,
 			     struct kvm_tdx_capabilities *caps)
@@ -202,8 +202,15 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf,
 
 	caps->cpuid.nent = td_conf->num_cpuid_config;
 
-	caps->user_tdvmcallinfo_1_r11 =
-		TDVMCALLINFO_SETUP_EVENT_NOTIFY_INTERRUPT;
+	/*
+	 * Don't advertise userspace event-notify interrupt support if TDX
+	 * quoting service is enabled, as quote generation will be handled
+	 * entirely in the kernel. Support in the kernel can be added later.
+	 */
+	if (!tdx_quote_enabled()) {
+		caps->user_tdvmcallinfo_1_r11 |=
+			TDVMCALLINFO_SETUP_EVENT_NOTIFY_INTERRUPT;
+	}
 
 	for (i = 0; i < td_conf->num_cpuid_config; i++)
 		td_init_cpuid_entry2(&caps->cpuid.entries[i], i);
@@ -1684,9 +1691,16 @@ static int tdx_get_quote(struct kvm_vcpu *vcpu)
 
 static int tdx_setup_event_notify_interrupt(struct kvm_vcpu *vcpu)
 {
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
 	u64 vector = tdx->vp_enter_args.r12;
 
+	/* See comment in init_kvm_tdx_caps() */
+	if (kvm_tdx->get_quote_in_kernel) {
+		tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED);
+		return 1;
+	}
+
 	if (vector < 32 || vector > 255) {
 		tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
 		return 1;
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 16/17] KVM: TDX: Add in-kernel Quote generation
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

Provide an in-kernel path for Quote generation when handling
TDG.VP.VMCALL<GetQuote>, without requiring an exit to userspace.

Use the core TDX API for Quote generation when the Quoting extension is
available. For simplicity, KVM checks its availability once per guest
during initialization. KVM does not handle Quoting service disruptions
or switch between the in-kernel and userspace paths.

Update the KVM API and TDX documentation to describe this new Quoting
capability.

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 Documentation/arch/x86/tdx.rst |  19 ++---
 Documentation/virt/kvm/api.rst |   3 +
 arch/x86/include/asm/tdx.h     |   9 +++
 arch/x86/kvm/vmx/tdx.h         |   6 ++
 arch/x86/kvm/vmx/tdx.c         | 135 ++++++++++++++++++++++++++++++++-
 virt/kvm/kvm_main.c            |   1 +
 6 files changed, 163 insertions(+), 10 deletions(-)

diff --git a/Documentation/arch/x86/tdx.rst b/Documentation/arch/x86/tdx.rst
index 3303499ad4c6..f02bb6919d91 100644
--- a/Documentation/arch/x86/tdx.rst
+++ b/Documentation/arch/x86/tdx.rst
@@ -522,15 +522,16 @@ provided by attestation service so the TDREPORT can be verified uniquely.
 More details about the TDREPORT can be found in Intel TDX Module
 specification, section titled "TDG.MR.REPORT Leaf".
 
-After getting the TDREPORT, the second step of the attestation process
-is to send it to the Quoting Enclave (QE) to generate the Quote. TDREPORT
-by design can only be verified on the local platform as the MAC key is
-bound to the platform. To support remote verification of the TDREPORT,
-TDX leverages Intel SGX Quoting Enclave to verify the TDREPORT locally
-and convert it to a remotely verifiable Quote. Method of sending TDREPORT
-to QE is implementation specific. Attestation software can choose
-whatever communication channel available (i.e. vsock or TCP/IP) to
-send the TDREPORT to QE and receive the Quote.
+After getting the TDREPORT, the second step of the attestation process is to
+convert it to a Quote. A TDREPORT by design can only be verified on the local
+platform, as the MAC key is bound to the platform. A Quote makes the TDREPORT
+remotely verifiable. It can be generated either through a Quoting Enclave
+(QE) in userspace or through the Quoting service in kernel space. In
+userspace, the Intel SGX Quoting Enclave verifies the TDREPORT locally and
+converts it to a Quote. The method of sending the TDREPORT to the QE and
+receiving the Quote is implementation-specific. If the TDX module supports the
+Quoting service, the kernel can convert a TDREPORT to a Quote directly through
+a SEAMCALL. In this case, the Quote is generated entirely by the TDX module.
 
 References
 ==========
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 52bbbb553ce1..4a3b69b2e602 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7335,6 +7335,9 @@ inputs and outputs of the TDVMCALL.  Currently the following values of
    queued successfully, the TDX guest can poll the status field in the
    shared-memory area to check whether the Quote generation is completed or
    not. When completed, the generated Quote is returned via the same buffer.
+   If the host kernel generates Quotes through the Quoting service provided by
+   the TDX module, KVM processes the GetQuote request and it will not appear in
+   userspace.
 
  * ``TDVMCALL_GET_TD_VM_CALL_INFO``: the guest has requested the support
    status of TDVMCALLs.  The output values for the given leaf should be
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 24bce7512de3..b9a24104415c 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -86,6 +86,15 @@ struct tdx_quote_req {
 	u8 data[];
 };
 
+#define TDX_QUOTE_REQ_HDR_SIZE		(offsetof(struct tdx_quote_req, data))
+
+/*
+ * TDG.VP.VMCALL<GetQuote> Status Codes
+ */
+#define TDX_QUOTE_STATUS_SUCCESS	0x0000000000000000ULL
+#define TDX_QUOTE_STATUS_ERROR		0x8000000000000000ULL
+#define TDX_QUOTE_STATUS_UNAVAILABLE	0x8000000000000001ULL
+
 #ifdef CONFIG_INTEL_TDX_GUEST
 
 void __init tdx_early_init(void);
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index ac8323a68b16..5e4b3aee0577 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -47,6 +47,12 @@ struct kvm_tdx {
 	 * Set/unset is protected with kvm->mmu_lock.
 	 */
 	bool wait_for_sept_zap;
+
+	/*
+	 * Whether to get the quote directly in kernel, without exiting to
+	 * userspace.
+	 */
+	bool get_quote_in_kernel;
 };
 
 /* TDX module vCPU states */
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9f7c39e0d4b5..20558b0185b6 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1538,11 +1538,133 @@ static int tdx_get_quote_user(struct kvm_vcpu *vcpu, u64 gpa, u64 size)
 	return 0;
 }
 
+static bool write_quote_status_to_guest(struct kvm_vcpu *vcpu, u64 status,
+					gpa_t gpa)
+{
+	if (kvm_vcpu_write_guest(vcpu,
+				 gpa + offsetof(struct tdx_quote_req, status),
+				 &status, sizeof(status)))
+		return false;
+
+	return true;
+}
+
+static bool write_quote_to_guest(struct kvm_vcpu *vcpu, void *quote_data,
+				 u32 quote_len, gpa_t gpa)
+{
+	if (kvm_vcpu_write_guest(vcpu,
+				 gpa + TDX_QUOTE_REQ_HDR_SIZE,
+				 quote_data, quote_len))
+		return false;
+
+	if (kvm_vcpu_write_guest(vcpu,
+				 gpa + offsetof(struct tdx_quote_req, out_len),
+				 &quote_len, sizeof(quote_len)))
+		return false;
+
+	return true;
+}
+
+static u64 get_quote_kernel(struct kvm_vcpu *vcpu, struct tdx_quote_req *req,
+			    gpa_t req_gpa, size_t total_len)
+{
+	struct tdx_td *td = &to_kvm_tdx(vcpu->kvm)->td;
+
+	/* Only support version 1 as defined in the GHCI spec */
+	if (req->version != 1)
+		return TDX_QUOTE_STATUS_ERROR;
+
+	/* Header + input data must fit in the page read from guest memory */
+	if ((size_t)req->in_len + TDX_QUOTE_REQ_HDR_SIZE > PAGE_SIZE)
+		return TDX_QUOTE_STATUS_ERROR;
+
+	/* Caller owns the requested quote */
+	void *quote_data __free(kvfree) =
+		tdx_quote_generate(td, req->data, req->in_len, &req->out_len);
+
+	if (!quote_data)
+		return TDX_QUOTE_STATUS_UNAVAILABLE;
+
+	if ((size_t)req->out_len + TDX_QUOTE_REQ_HDR_SIZE > total_len)
+		return TDX_QUOTE_STATUS_ERROR;
+
+	if (!write_quote_to_guest(vcpu, quote_data, req->out_len, req_gpa))
+		return TDX_QUOTE_STATUS_ERROR;
+
+	return TDX_QUOTE_STATUS_SUCCESS;
+}
+
+static u64 tdx_get_quote_check_args(struct kvm_vcpu *vcpu, u64 gpa, u64 size)
+{
+	gfn_t gfn_start, gfn_end;
+	u64 end;
+
+	if (!size)
+		return TDVMCALL_STATUS_INVALID_OPERAND;
+
+	if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size))
+		return TDVMCALL_STATUS_ALIGN_ERROR;
+
+	if (check_add_overflow(gpa, size, &end))
+		return TDVMCALL_STATUS_INVALID_OPERAND;
+
+	gfn_start = gpa_to_gfn(gpa);
+	gfn_end = gpa_to_gfn(end);
+
+	/*
+	 * Reject if the guest didn't explicitly convert its quote pages to
+	 * shared.
+	 */
+	if (!kvm_range_has_memory_attributes(vcpu->kvm, gfn_start, gfn_end,
+					     KVM_MEMORY_ATTRIBUTE_PRIVATE, 0))
+		return TDVMCALL_STATUS_INVALID_OPERAND;
+
+	return TDVMCALL_STATUS_SUCCESS;
+}
+
+static int tdx_get_quote_kernel(struct kvm_vcpu *vcpu, u64 gpa, u64 size)
+{
+	void *first_page = NULL;
+	u64 err, qerr;
+
+	err = tdx_get_quote_check_args(vcpu, gpa, size);
+	if (err != TDVMCALL_STATUS_SUCCESS)
+		goto out;
+
+	err = TDVMCALL_STATUS_INVALID_OPERAND;
+
+	first_page = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!first_page)
+		goto out;
+
+	/*
+	 * Read the first GetQuote page for its header + input data. The check
+	 * above ensures that this GetQuote message is at least one page in
+	 * size. in_data spanning more than a page is not supported.
+	 */
+	if (kvm_vcpu_read_guest(vcpu, gpa, first_page, PAGE_SIZE))
+		goto out;
+
+	qerr = get_quote_kernel(vcpu, first_page, (gpa_t)gpa, size);
+
+	if (write_quote_status_to_guest(vcpu, qerr, (gpa_t)gpa) &&
+	    qerr == TDX_QUOTE_STATUS_SUCCESS)
+		err = TDVMCALL_STATUS_SUCCESS;
+
+out:
+	kfree(first_page);
+	tdvmcall_set_return_code(vcpu, err);
+
+	return 1;
+}
+
 static int tdx_get_quote(struct kvm_vcpu *vcpu)
 {
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
 	u64 gpa = tdx->vp_enter_args.r12;
 	u64 size = tdx->vp_enter_args.r13;
+	int ret;
 
 	/* The gpa of buffer must have shared bit set. */
 	if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
@@ -1552,7 +1674,12 @@ static int tdx_get_quote(struct kvm_vcpu *vcpu)
 
 	gpa &= ~gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
 
-	return tdx_get_quote_user(vcpu, gpa, size);
+	if (kvm_tdx->get_quote_in_kernel)
+		ret = tdx_get_quote_kernel(vcpu, gpa, size);
+	else
+		ret = tdx_get_quote_user(vcpu, gpa, size);
+
+	return ret;
 }
 
 static int tdx_setup_event_notify_interrupt(struct kvm_vcpu *vcpu)
@@ -2751,6 +2878,12 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
 	else
 		kvm->arch.gfn_direct_bits = TDX_SHARED_BIT_PWL_4;
 
+	/*
+	 * Check only once at TD creation. Switching between userspace and
+	 * in-kernel quoting is not supported.
+	 */
+	kvm_tdx->get_quote_in_kernel = tdx_quote_enabled();
+
 	kvm_tdx->state = TD_STATE_INITIALIZED;
 out:
 	/* kfree() accepts NULL. */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 89489996fbc1..599f88a13071 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2461,6 +2461,7 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 
 	return true;
 }
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_range_has_memory_attributes);
 
 static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
 						 struct kvm_mmu_notifier_range *range)
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 15/17] KVM: TDX: Factor out userspace return path from tdx_get_quote()
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

Separate the logic that returns the GetQuote TDVMCALL exit to userspace
so that tdx_get_quote() can be extended to support in-kernel Quote
generation.

No functional change intended.

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index ed12805bbb44..9f7c39e0d4b5 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1524,6 +1524,20 @@ static int tdx_complete_simple(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int tdx_get_quote_user(struct kvm_vcpu *vcpu, u64 gpa, u64 size)
+{
+	vcpu->run->exit_reason = KVM_EXIT_TDX;
+	vcpu->run->tdx.flags = 0;
+	vcpu->run->tdx.nr = TDVMCALL_GET_QUOTE;
+	vcpu->run->tdx.get_quote.ret = TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED;
+	vcpu->run->tdx.get_quote.gpa = gpa;
+	vcpu->run->tdx.get_quote.size = size;
+
+	vcpu->arch.complete_userspace_io = tdx_complete_simple;
+
+	return 0;
+}
+
 static int tdx_get_quote(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
@@ -1536,16 +1550,9 @@ static int tdx_get_quote(struct kvm_vcpu *vcpu)
 		return 1;
 	}
 
-	vcpu->run->exit_reason = KVM_EXIT_TDX;
-	vcpu->run->tdx.flags = 0;
-	vcpu->run->tdx.nr = TDVMCALL_GET_QUOTE;
-	vcpu->run->tdx.get_quote.ret = TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED;
-	vcpu->run->tdx.get_quote.gpa = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
-	vcpu->run->tdx.get_quote.size = size;
-
-	vcpu->arch.complete_userspace_io = tdx_complete_simple;
+	gpa &= ~gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
 
-	return 0;
+	return tdx_get_quote_user(vcpu, gpa, size);
 }
 
 static int tdx_setup_event_notify_interrupt(struct kvm_vcpu *vcpu)
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 14/17] x86/tdx: Move and rename Quote request structure
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

Move struct tdx_quote_buf to tdx.h so it can be shared by the guest
driver and core TDX code, as the host will also need the Quote buffer
format for in-kernel Quote generation.

Rename the struct to tdx_quote_req to better reflect its purpose, and
replace "quote_buf" with "quote_req" in tdx-guest.c.

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Reviewed-by: Dan Williams <djbw@kernel.org>
---
 arch/x86/include/asm/tdx.h              | 20 +++++++++++
 drivers/virt/coco/tdx-guest/tdx-guest.c | 47 ++++++++-----------------
 2 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 34764838f132..24bce7512de3 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -66,6 +66,26 @@ struct ve_info {
 	u32 instr_info;
 };
 
+/**
+ * struct tdx_quote_req - Format of Quote request message
+ * @version: Quote format version, filled by TD.
+ * @status: Status code of Quote request, filled by VMM.
+ * @in_len: Length of TDREPORT, filled by TD.
+ * @out_len: Length of Quote data, filled by VMM.
+ * @data: Quote data on output or TDREPORT on input.
+ *
+ * More details of Quote request message can be found in TDX
+ * Guest-Host Communication Interface (GHCI) for Intel TDX 1.0,
+ * section titled "TDG.VP.VMCALL<GetQuote>"
+ */
+struct tdx_quote_req {
+	u64 version;
+	u64 status;
+	u32 in_len;
+	u32 out_len;
+	u8 data[];
+};
+
 #ifdef CONFIG_INTEL_TDX_GUEST
 
 void __init tdx_early_init(void);
diff --git a/drivers/virt/coco/tdx-guest/tdx-guest.c b/drivers/virt/coco/tdx-guest/tdx-guest.c
index a9ecc46df187..c84ace1cbe99 100644
--- a/drivers/virt/coco/tdx-guest/tdx-guest.c
+++ b/drivers/virt/coco/tdx-guest/tdx-guest.c
@@ -171,26 +171,7 @@ static void tdx_mr_deinit(const struct attribute_group *mr_grp)
 #define GET_QUOTE_SUCCESS		0
 #define GET_QUOTE_IN_FLIGHT		0xffffffffffffffff
 
-#define TDX_QUOTE_MAX_LEN		(GET_QUOTE_BUF_SIZE - sizeof(struct tdx_quote_buf))
-
-/* struct tdx_quote_buf: Format of Quote request buffer.
- * @version: Quote format version, filled by TD.
- * @status: Status code of Quote request, filled by VMM.
- * @in_len: Length of TDREPORT, filled by TD.
- * @out_len: Length of Quote data, filled by VMM.
- * @data: Quote data on output or TDREPORT on input.
- *
- * More details of Quote request buffer can be found in TDX
- * Guest-Host Communication Interface (GHCI) for Intel TDX 1.0,
- * section titled "TDG.VP.VMCALL<GetQuote>"
- */
-struct tdx_quote_buf {
-	u64 version;
-	u64 status;
-	u32 in_len;
-	u32 out_len;
-	u8 data[];
-};
+#define TDX_QUOTE_MAX_LEN		(GET_QUOTE_BUF_SIZE - sizeof(struct tdx_quote_req))
 
 /* Quote data buffer */
 static void *quote_data;
@@ -241,7 +222,7 @@ static void *alloc_quote_buf(void)
 
 /*
  * wait_for_quote_completion() - Wait for Quote request completion
- * @quote_buf: Address of Quote buffer.
+ * @quote_req: Address of Quote buffer.
  * @timeout: Timeout in seconds to wait for the Quote generation.
  *
  * As per TDX GHCI v1.0 specification, sec titled "TDG.VP.VMCALL<GetQuote>",
@@ -250,7 +231,7 @@ static void *alloc_quote_buf(void)
  * or error code after processing is complete. So wait till the status
  * changes from GET_QUOTE_IN_FLIGHT or the request being timed out.
  */
-static int wait_for_quote_completion(struct tdx_quote_buf *quote_buf, u32 timeout)
+static int wait_for_quote_completion(struct tdx_quote_req *quote_req, u32 timeout)
 {
 	int i = 0;
 
@@ -258,7 +239,7 @@ static int wait_for_quote_completion(struct tdx_quote_buf *quote_buf, u32 timeou
 	 * Quote requests usually take a few seconds to complete, so waking up
 	 * once per second to recheck the status is fine for this use case.
 	 */
-	while (quote_buf->status == GET_QUOTE_IN_FLIGHT && i++ < timeout) {
+	while (quote_req->status == GET_QUOTE_IN_FLIGHT && i++ < timeout) {
 		if (msleep_interruptible(MSEC_PER_SEC))
 			return -EINTR;
 	}
@@ -269,7 +250,7 @@ static int wait_for_quote_completion(struct tdx_quote_buf *quote_buf, u32 timeou
 static int tdx_report_new_locked(struct tsm_report *report, void *data)
 {
 	u8 *buf;
-	struct tdx_quote_buf *quote_buf = quote_data;
+	struct tdx_quote_req *quote_req = quote_data;
 	struct tsm_report_desc *desc = &report->desc;
 	u32 out_len;
 	int ret;
@@ -280,7 +261,7 @@ static int tdx_report_new_locked(struct tsm_report *report, void *data)
 	 * Quote buf status is still in GET_QUOTE_IN_FLIGHT (owned by
 	 * VMM), don't permit any new request.
 	 */
-	if (quote_buf->status == GET_QUOTE_IN_FLIGHT)
+	if (quote_req->status == GET_QUOTE_IN_FLIGHT)
 		return -EBUSY;
 
 	if (desc->inblob_len != TDX_REPORTDATA_LEN)
@@ -289,11 +270,11 @@ static int tdx_report_new_locked(struct tsm_report *report, void *data)
 	memset(quote_data, 0, GET_QUOTE_BUF_SIZE);
 
 	/* Update Quote buffer header */
-	quote_buf->version = GET_QUOTE_CMD_VER;
-	quote_buf->in_len = TDX_REPORT_LEN;
+	quote_req->version = GET_QUOTE_CMD_VER;
+	quote_req->in_len = TDX_REPORT_LEN;
 
 	ret = tdx_do_report(KERNEL_SOCKPTR(desc->inblob),
-			    KERNEL_SOCKPTR(quote_buf->data));
+			    KERNEL_SOCKPTR(quote_req->data));
 	if (ret)
 		return ret;
 
@@ -303,23 +284,23 @@ static int tdx_report_new_locked(struct tsm_report *report, void *data)
 		return -EIO;
 	}
 
-	ret = wait_for_quote_completion(quote_buf, getquote_timeout);
+	ret = wait_for_quote_completion(quote_req, getquote_timeout);
 	if (ret) {
 		pr_err("GetQuote request timedout\n");
 		return ret;
 	}
 
-	if (quote_buf->status != GET_QUOTE_SUCCESS) {
-		pr_debug("GetQuote request failed, status:%llx\n", quote_buf->status);
+	if (quote_req->status != GET_QUOTE_SUCCESS) {
+		pr_debug("GetQuote request failed, status:%llx\n", quote_req->status);
 		return -EIO;
 	}
 
-	out_len = READ_ONCE(quote_buf->out_len);
+	out_len = READ_ONCE(quote_req->out_len);
 
 	if (out_len > TDX_QUOTE_MAX_LEN)
 		return -EFBIG;
 
-	buf = kvmemdup(quote_buf->data, out_len, GFP_KERNEL);
+	buf = kvmemdup(quote_req->data, out_len, GFP_KERNEL);
 	if (!buf)
 		return -ENOMEM;
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 13/17] x86/virt/tdx: Enable Quoting extension
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

The Quoting extension generates TDX attestation Quotes in the TDX
module, without using a discrete Quoting engine. Enable this feature by
requesting it in TDH.SYS.CONFIG and TDH.SYS.UPDATE.

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 81e7b6b1dacb..01fb01313077 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1029,10 +1029,8 @@ static __init int construct_tdmrs(struct list_head *tmb_list,
 
 static __init void set_tdx_addon_features(void)
 {
-	/*
-	 * To add DICE-based TDX Quoting feature bit in tdx_addon_feature0 when
-	 * kernel is ready.
-	 */
+	if (tdx_sysinfo.features.tdx_features0 & TDX_FEATURES0_QUOTE)
+		tdx_addon_feature0 |= TDX_FEATURES0_QUOTE;
 }
 
 static __init int config_tdx_module(struct tdmr_info_list *tdmr_list,
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 12/17] x86/virt/tdx: Reinitialize the Quoting extension after TDX module update
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

Invoke TDH.QUOTE.INIT again after a runtime module update to trigger the
necessary rekey procedure in the TDX module.

Keep the existing Quote buffer since memory allocation is not permitted
during the update. Compatible TDX module updates must not increase the
Quote buffer size, or an undersized buffer might cause Quote generation
to fail. See [1] for module update details.

[1] Documentation/arch/x86/tdx.rst, Section "TDX module Runtime Update"

Signed-off-by: Peter Fang <peter.fang@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index ac0da4966697..81e7b6b1dacb 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1353,8 +1353,11 @@ static __init int tdx_quote_create_buf(unsigned int npages,
 	return -ENOMEM;
 }
 
-/* Initialize quoting extension */
-static __init int tdx_quote_init(void)
+/*
+ * Initialize quoting extension.
+ * It also rekeys the TDX module after a runtime module update.
+ */
+static int tdx_quote_init(void)
 {
 	struct tdx_module_args args = {};
 	u64 r;
@@ -1539,6 +1542,22 @@ static __init int init_tdx_module_extensions(void)
 	return 0;
 }
 
+static void update_tdx_quoting_extension(void)
+{
+	int ret;
+
+	if (tdx_addon_feature0 & TDX_FEATURES0_QUOTE) {
+		/*
+		 * The TDH.QUOTE.INIT call renews the quoting keys.
+		 *
+		 * A module update must not increase the quote buffer size, or
+		 * quote generation may fail and break attestation.
+		 */
+		ret = tdx_quote_init();
+		WARN_ON(ret);
+	}
+}
+
 /*
  * Mostly the same flow as init_tdx_module_extensions(), but rejects adding
  * more memory.
@@ -1561,7 +1580,13 @@ static int update_tdx_module_extensions(void)
 	if (sysinfo_ext.memory_pool_required_pages)
 		return -EFAULT;
 
-	return tdx_ext_init();
+	ret = tdx_ext_init();
+	if (ret)
+		return ret;
+
+	update_tdx_quoting_extension();
+
+	return 0;
 }
 
 static __init int init_tdx_module(void)
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 11/17] x86/virt/tdx: Add interface to generate a Quote
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

Provide an interface to generate a Quote via the TDH.QUOTE.GET
Extension-SEAMCALL. Although the TDX module may support concurrent Quote
generation, use a single shared buffer for simplicity and serialize
access with a mutex. TDX bringup code already prepares the buffer in the
format required by the TDX module.

Return a per-call buffer containing the Quote so callers don't need to
size the buffer themselves. The caller is responsible for freeing the
returned buffer.

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/include/asm/tdx.h  |  2 +
 arch/x86/virt/vmx/tdx/tdx.h |  1 +
 arch/x86/virt/vmx/tdx/tdx.c | 77 +++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 9432a736855e..34764838f132 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -148,6 +148,8 @@ struct tdx_vp {
 };
 
 bool tdx_quote_enabled(void);
+void *tdx_quote_generate(struct tdx_td *td, void *in_data, u32 in_data_len,
+			 u32 *quote_len);
 
 static inline u64 mk_keyed_paddr(u16 hkid, struct page *page)
 {
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 1afa0b10dfc9..32b13b0c85f9 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -66,6 +66,7 @@
 #define TDH_EXT_INIT			60
 #define TDH_EXT_MEM_ADD			61
 #define TDH_SYS_DISABLE			69
+#define TDH_QUOTE_GET			98
 #define TDH_QUOTE_INIT			100
 
 /* TDX page types */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 1e2c7a33c7a9..ac0da4966697 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -72,6 +72,8 @@ static LIST_HEAD(tdx_memlist);
 
 static struct tdx_sys_info tdx_sysinfo;
 
+static DEFINE_MUTEX(tdx_quote_lock);
+
 /*
  * Quote buffer shared with the TDX module for quote generation, in HPA linked
  * list format.
@@ -1208,6 +1210,81 @@ bool tdx_quote_enabled(void)
 }
 EXPORT_SYMBOL_FOR_KVM(tdx_quote_enabled);
 
+static u64 tdx_quote_get(struct tdx_td *td, u64 in_data_pa, u64 in_data_len,
+			 u64 hpa_entries_pa, u64 total_len, u64 *quote_len)
+{
+	struct tdx_module_args args = {
+		.rcx = tdx_tdr_pa(td),
+		/* [47:32] QUOTE_ID: All-1s selects the default quote format */
+		.rdx = GENMASK_U64(47, 32),
+		.r8 = in_data_pa,
+		.r9 = in_data_len,
+		.r10 = hpa_entries_pa,
+		.r11 = total_len,
+	};
+	u64 r;
+
+	do {
+		r = seamcall_ret(TDH_QUOTE_GET, &args);
+	} while (r == TDX_INTERRUPTED_RESUMABLE);
+
+	*quote_len = args.rcx;
+
+	return r;
+}
+
+/**
+ * tdx_quote_generate() - Generate a quote for a TD
+ * @td: The TD to generate the quote for.
+ * @in_data: Input data for the quote request.
+ * @in_data_len: Size of @in_data in bytes. Must not exceed one page.
+ * @quote_len: Returned size of the generated quote in bytes.
+ *
+ * Generate a quote using the TDX module. Pass the input data through the quote
+ * buffer and return the quote.
+ *
+ * Return: Newly allocated quote buffer or %NULL on failure.
+ * The caller must free the returned buffer with kvfree().
+ */
+void *tdx_quote_generate(struct tdx_td *td, void *in_data, u32 in_data_len,
+			 u32 *quote_len)
+{
+	struct tdx_quote_data *qdata = &tdx_quote;
+	void *quote_dup = NULL;
+	u64 r, out_len;
+
+	if (!tdx_quote_enabled())
+		return NULL;
+
+	mutex_lock(&tdx_quote_lock);
+
+	/*
+	 * Use the first page of the quote buffer for input data. The buffer
+	 * must be at least one page in size. @in_data may not be page-aligned,
+	 * but TDH.QUOTE.GET expects page-aligned addresses.
+	 */
+	memcpy(qdata->buf, in_data, in_data_len);
+
+	r = tdx_quote_get(td, qdata->hpa_entries[0], in_data_len,
+			  qdata->hpa_entries_pa, qdata->buf_len, &out_len);
+	if (r != TDX_SUCCESS || !out_len || out_len > qdata->buf_len)
+		goto out;
+
+	/*
+	 * The quote buffer is a shared resource, so use it only for the
+	 * SEAMCALL and copy the data out as soon as possible.
+	 */
+	quote_dup = kvmemdup(qdata->buf, out_len, GFP_KERNEL);
+
+	*quote_len = (u32)out_len;
+
+out:
+	mutex_unlock(&tdx_quote_lock);
+
+	return quote_dup;
+}
+EXPORT_SYMBOL_FOR_KVM(tdx_quote_generate);
+
 #define HPAS_PER_NODE			(PAGE_SIZE / sizeof(u64))
 
 /*
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 10/17] x86/virt/tdx: Move tdx_tdr_pa() up in the file
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

Move tdx_tdr_pa() earlier in the file to prepare for upcoming changes
that add a new Extension-SEAMCALL.

No functional change intended.

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index da55c1aeeeb8..1e2c7a33c7a9 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1186,6 +1186,11 @@ static __init int init_tdmrs(struct tdmr_info_list *tdmr_list)
 	return 0;
 }
 
+static inline u64 tdx_tdr_pa(struct tdx_td *td)
+{
+	return page_to_phys(td->tdr_page);
+}
+
 static inline phys_addr_t tdx_vmalloc_to_pa(const void *addr)
 {
 	unsigned long pfn = vmalloc_to_pfn(addr);
@@ -1966,11 +1971,6 @@ void tdx_guest_keyid_free(unsigned int keyid)
 }
 EXPORT_SYMBOL_FOR_KVM(tdx_guest_keyid_free);
 
-static inline u64 tdx_tdr_pa(struct tdx_td *td)
-{
-	return page_to_phys(td->tdr_page);
-}
-
 /*
  * The TDX module exposes a CLFLUSH_BEFORE_ALLOC bit to specify whether
  * a CLFLUSH of pages is required before handing them to the TDX module.
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 09/17] x86/virt/tdx: Add interface to check Quoting availability
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

KVM needs to know if the Quoting extension is available to determine
whether userspace must be involved in Quote generation.

Since the Quote buffer is always created during Quoting extension
bringup, checking whether the buffer exists is sufficient.

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/include/asm/tdx.h  |  2 ++
 arch/x86/virt/vmx/tdx/tdx.c | 10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 741fd97cc199..9432a736855e 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -147,6 +147,8 @@ struct tdx_vp {
 	struct page **tdcx_pages;
 };
 
+bool tdx_quote_enabled(void);
+
 static inline u64 mk_keyed_paddr(u16 hkid, struct page *page)
 {
 	u64 ret;
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 9716424a301f..da55c1aeeeb8 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1193,6 +1193,16 @@ static inline phys_addr_t tdx_vmalloc_to_pa(const void *addr)
 	return PFN_PHYS(pfn);
 }
 
+bool tdx_quote_enabled(void)
+{
+	/*
+	 * No need for locking here. The quote buffer is initialized as part of
+	 * core TDX bringup, which comes before KVM is ready for userspace.
+	 */
+	return !!tdx_quote.buf;
+}
+EXPORT_SYMBOL_FOR_KVM(tdx_quote_enabled);
+
 #define HPAS_PER_NODE			(PAGE_SIZE / sizeof(u64))
 
 /*
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 08/17] x86/virt/tdx: Prepare Quote buffer during extension bringup
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

During TDX attestation, the TDX guest asks the host to generate a
signed, verifiable structure (a "Quote"). With the Quoting extension,
the TDX module returns the Quote in pages that the host shares via an
Extension-SEAMCALL.

The SEAMCALL accepts the host buffer pages as a linked list of 4KB
"HPA_LINKED_LIST" nodes. Each entry holds the physical address of a 4KB
data page, except for the last entry, which points to the next node. The
TDX module reports the required Quote buffer size through a global
metadata field. See [1] for details.

For simplicity, let all guests share a global buffer. Build the buffer's
HPA_LINKED_LIST at Quoting extension bringup. This saves a bunch of
va-to-pa conversions at runtime.

[1] Intel TDX Module ABI specification, Section "Physical Memory
    Management Types"

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/include/asm/tdx_global_metadata.h  |   4 +
 arch/x86/virt/vmx/tdx/tdx.c                 | 115 +++++++++++++++++++-
 arch/x86/virt/vmx/tdx/tdx_global_metadata.c |  14 +++
 3 files changed, 129 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/asm/tdx_global_metadata.h
index b3442b7c88bb..17cb13a1bb40 100644
--- a/arch/x86/include/asm/tdx_global_metadata.h
+++ b/arch/x86/include/asm/tdx_global_metadata.h
@@ -57,4 +57,8 @@ struct tdx_sys_info_ext {
 	bool ext_required;
 };
 
+struct tdx_sys_info_quote {
+	u32 max_quote_size;
+};
+
 #endif
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 06c42b86b05e..9716424a301f 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -32,6 +32,7 @@
 #include <linux/idr.h>
 #include <linux/kvm_types.h>
 #include <linux/bitfield.h>
+#include <linux/vmalloc.h>
 #include <asm/page.h>
 #include <asm/special_insns.h>
 #include <asm/msr-index.h>
@@ -71,6 +72,24 @@ static LIST_HEAD(tdx_memlist);
 
 static struct tdx_sys_info tdx_sysinfo;
 
+/*
+ * Quote buffer shared with the TDX module for quote generation, in HPA linked
+ * list format.
+ *
+ * @buf: Virtual address of the quote buffer.
+ * @buf_len: Size of @buf in bytes.
+ * @hpa_entries: HPA entries, starting at the first list node.
+ * @hpa_entries_pa: Physical address for @hpa_entries.
+ */
+struct tdx_quote_data {
+	void		*buf;
+	u64		buf_len;
+	u64		*hpa_entries;
+	phys_addr_t	hpa_entries_pa;
+};
+
+static struct tdx_quote_data tdx_quote;
+
 static DEFINE_RAW_SPINLOCK(sysinit_lock);
 
 /*
@@ -1167,6 +1186,81 @@ static __init int init_tdmrs(struct tdmr_info_list *tdmr_list)
 	return 0;
 }
 
+static inline phys_addr_t tdx_vmalloc_to_pa(const void *addr)
+{
+	unsigned long pfn = vmalloc_to_pfn(addr);
+
+	return PFN_PHYS(pfn);
+}
+
+#define HPAS_PER_NODE			(PAGE_SIZE / sizeof(u64))
+
+/*
+ * Pass the quote buffer to the TDX module as an HPA linked list, where each
+ * node holds 4KB page HPAs and the last entry points to the next node.
+ */
+static __init int tdx_quote_create_buf(unsigned int npages,
+				       struct tdx_quote_data *qdata)
+{
+	unsigned int nnodes;
+	u64 *hpas;
+	void *qbuf;
+	int i, j;
+
+	if (!npages)
+		return -EINVAL;
+
+	/*
+	 * Each node holds up to (HPAS_PER_NODE - 1) 4KB page HPAs.
+	 * The last entry of the node points to the next node.
+	 */
+	nnodes = DIV_ROUND_UP(npages, HPAS_PER_NODE - 1);
+
+	hpas = vmalloc_array(nnodes, PAGE_SIZE);
+	if (!hpas)
+		return -ENOMEM;
+
+	/*
+	 * ~0ULL is the list terminator for HPA_LINKED_LIST.
+	 *
+	 * Pre-fill the last node with 0xff bytes so that unused entries are
+	 * terminators. Overwrite populated entries later.
+	 */
+	memset((u8 *)hpas + (nnodes - 1) * PAGE_SIZE, 0xff, PAGE_SIZE);
+
+	qbuf = vcalloc(npages, PAGE_SIZE);
+	if (!qbuf)
+		goto out_nomem;
+
+	/* Populate the linked list */
+	for (i = 0, j = 0; j < npages; i++) {
+		if ((i % HPAS_PER_NODE) == HPAS_PER_NODE - 1) {
+			/*
+			 * The last node entry always points to the next node.
+			 * The address of the following entry must be on next
+			 * node's page boundary.
+			 */
+			hpas[i] = tdx_vmalloc_to_pa(&hpas[i + 1]);
+			continue;
+		}
+
+		hpas[i] = tdx_vmalloc_to_pa((u8 *)qbuf + j * PAGE_SIZE);
+		j++;
+	}
+
+	qdata->buf = qbuf;
+	qdata->buf_len = (u64)npages * PAGE_SIZE;
+	qdata->hpa_entries = hpas;
+	qdata->hpa_entries_pa = tdx_vmalloc_to_pa(hpas);
+
+	return 0;
+
+out_nomem:
+	vfree(hpas);
+
+	return -ENOMEM;
+}
+
 /* Initialize quoting extension */
 static __init int tdx_quote_init(void)
 {
@@ -1185,12 +1279,25 @@ static __init int tdx_quote_init(void)
 
 static __init void init_tdx_quoting_extension(void)
 {
-	int ret;
+	struct tdx_sys_info_quote sysinfo_quote;
+	unsigned int nr_quote_pages;
+
+	if (!(tdx_addon_feature0 & TDX_FEATURES0_QUOTE))
+		return;
 
-	if (tdx_addon_feature0 & TDX_FEATURES0_QUOTE) {
-		ret = tdx_quote_init();
-		WARN_ON_ONCE(ret);
+	if (tdx_quote_init()) {
+		WARN_ON_ONCE(1);
+		return;
 	}
+
+	/* Quoting metadata is valid only after initialization */
+	if (get_tdx_sys_info_quote(&sysinfo_quote))
+		return;
+
+	nr_quote_pages = PAGE_ALIGN(sysinfo_quote.max_quote_size) /
+			 PAGE_SIZE;
+	if (tdx_quote_create_buf(nr_quote_pages, &tdx_quote))
+		pr_err("Failed to create quote buffer\n");
 }
 
 /* Initialize TDX module extensions for extension SEAMCALLs */
diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
index 84364da89649..1eb2985307c6 100644
--- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
+++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
@@ -151,3 +151,17 @@ static int get_tdx_sys_info_ext(struct tdx_sys_info_ext *sysinfo_ext)
 
 	return 0;
 }
+
+static __init int get_tdx_sys_info_quote(struct tdx_sys_info_quote *sysinfo_quote)
+{
+	int ret;
+	u64 val;
+
+	ret = read_sys_metadata_field(0x2300000200000002, &val);
+	if (ret)
+		return ret;
+
+	sysinfo_quote->max_quote_size = val;
+
+	return 0;
+}
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 07/17] x86/virt/tdx: Initialize Quoting extension
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

From: Peter Fang <peter.fang@intel.com>

Initialize the Quoting extension during TDX bringup, after enabling TDX
module Extension.

Because Quoting is an optional TDX feature, do not let initialization
failures cause TDX bringup to fail. In that case, TDX can fall back to
the existing userspace flow via a KVM return code.

Only lay the groundwork for TDX Quoting support. Leave the opt-in
portion of the initialization to a follow-up patch after fully
implementing the feature.

Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/include/asm/tdx.h  |  1 +
 arch/x86/virt/vmx/tdx/tdx.h |  1 +
 arch/x86/virt/vmx/tdx/tdx.c | 34 +++++++++++++++++++++++++++++++++-
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 5fbf89d5317c..741fd97cc199 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -36,6 +36,7 @@
 #define TDX_FEATURES0_TD_PRESERVING	BIT_ULL(1)
 #define TDX_FEATURES0_NO_RBP_MOD	BIT_ULL(18)
 #define TDX_FEATURES0_EXT		BIT_ULL(39)
+#define TDX_FEATURES0_QUOTE		BIT_ULL(50)
 
 #ifndef __ASSEMBLER__
 
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 2deb0a5c902e..1afa0b10dfc9 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -66,6 +66,7 @@
 #define TDH_EXT_INIT			60
 #define TDH_EXT_MEM_ADD			61
 #define TDH_SYS_DISABLE			69
+#define TDH_QUOTE_INIT			100
 
 /* TDX page types */
 #define	PT_NDA		0x0
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4d2940f4538a..06c42b86b05e 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1167,6 +1167,32 @@ static __init int init_tdmrs(struct tdmr_info_list *tdmr_list)
 	return 0;
 }
 
+/* Initialize quoting extension */
+static __init int tdx_quote_init(void)
+{
+	struct tdx_module_args args = {};
+	u64 r;
+
+	do {
+		r = seamcall(TDH_QUOTE_INIT, &args);
+	} while (r == TDX_INTERRUPTED_RESUMABLE);
+
+	if (r != TDX_SUCCESS)
+		return -EFAULT;
+
+	return 0;
+}
+
+static __init void init_tdx_quoting_extension(void)
+{
+	int ret;
+
+	if (tdx_addon_feature0 & TDX_FEATURES0_QUOTE) {
+		ret = tdx_quote_init();
+		WARN_ON_ONCE(ret);
+	}
+}
+
 /* Initialize TDX module extensions for extension SEAMCALLs */
 static int tdx_ext_init(void)
 {
@@ -1305,7 +1331,13 @@ static __init int init_tdx_module_extensions(void)
 	if (ret)
 		return ret;
 
-	return tdx_ext_init();
+	ret = tdx_ext_init();
+	if (ret)
+		return ret;
+
+	init_tdx_quoting_extension();
+
+	return 0;
 }
 
 /*
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 06/17] x86/virt/tdx: Re-initialize the extensions on runtime TDX module update
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

Runtime TDX module update introduces a mechanism to update the module
firmware while preserving and restoring TDX operations. As part of the
restoration process, the host must re-execute all extensions
initialization steps to restore extension SEAMCALL functionality.

However, Linux runs the update in stop_machine() context, which prevents
memory allocation. This introduces a hard restriction that the updated
TDX environment must not consume more memory for the extensions.
Consequently, the post-update initialization for the extensions is
implemented as:

  1. Detect if the extensions are supported and required.
  2. Detect if the extensions require additional memory. If yes, fail
     the update.
  3. Initialize the extensions via TDH.EXT.INIT.

The memory allocation problem is greatly mitigated since Linux applies
a policy that configures the same add-on features for boot and for
update. This policy minimizes the chance of increased memory demand. So
now the restriction only affects the compatibility rule for choosing the
update image.

Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c                 | 31 ++++++++++++++++++++-
 arch/x86/virt/vmx/tdx/tdx_global_metadata.c |  2 +-
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 900928de373a..4d2940f4538a 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1168,7 +1168,7 @@ static __init int init_tdmrs(struct tdmr_info_list *tdmr_list)
 }
 
 /* Initialize TDX module extensions for extension SEAMCALLs */
-static __init int tdx_ext_init(void)
+static int tdx_ext_init(void)
 {
 	struct tdx_module_args args = {};
 	u64 r;
@@ -1308,6 +1308,31 @@ static __init int init_tdx_module_extensions(void)
 	return tdx_ext_init();
 }
 
+/*
+ * Mostly the same flow as init_tdx_module_extensions(), but rejects adding
+ * more memory.
+ */
+static int update_tdx_module_extensions(void)
+{
+	struct tdx_sys_info_ext sysinfo_ext;
+	int ret;
+
+	if (!(tdx_sysinfo.features.tdx_features0 & TDX_FEATURES0_EXT))
+		return 0;
+
+	ret = get_tdx_sys_info_ext(&sysinfo_ext);
+	if (ret)
+		return ret;
+
+	if (!sysinfo_ext.ext_required)
+		return 0;
+
+	if (sysinfo_ext.memory_pool_required_pages)
+		return -EFAULT;
+
+	return tdx_ext_init();
+}
+
 static __init int init_tdx_module(void)
 {
 	int ret;
@@ -1498,6 +1523,10 @@ int tdx_module_run_update(void)
 	 */
 	WARN_ON_ONCE(ret);
 
+	ret = update_tdx_module_extensions();
+	if (ret)
+		return ret;
+
 	tdx_module_state.initialized = true;
 	return 0;
 }
diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
index 720cdaf76492..84364da89649 100644
--- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
+++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
@@ -132,7 +132,7 @@ static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
 	return ret;
 }
 
-static __init int get_tdx_sys_info_ext(struct tdx_sys_info_ext *sysinfo_ext)
+static int get_tdx_sys_info_ext(struct tdx_sys_info_ext *sysinfo_ext)
 {
 	int ret;
 	u64 val;
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 05/17] x86/virt/tdx: Make TDX module initialize the extensions
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

After providing all required memory to TDX module, initialize TDX
module extensions via TDH.EXT.INIT, so extension SEAMCALLs can be used.

Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.h |  1 +
 arch/x86/virt/vmx/tdx/tdx.c | 22 +++++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index a100634087e7..2deb0a5c902e 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -63,6 +63,7 @@
 #define TDH_SYS_SHUTDOWN		52
 #define TDH_SYS_UPDATE_V0		53
 #define TDH_SYS_UPDATE			SEAMCALL_LEAF_VER(TDH_SYS_UPDATE_V0, 1)
+#define TDH_EXT_INIT			60
 #define TDH_EXT_MEM_ADD			61
 #define TDH_SYS_DISABLE			69
 
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index dab17822c1c6..900928de373a 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1167,6 +1167,22 @@ static __init int init_tdmrs(struct tdmr_info_list *tdmr_list)
 	return 0;
 }
 
+/* Initialize TDX module extensions for extension SEAMCALLs */
+static __init int tdx_ext_init(void)
+{
+	struct tdx_module_args args = {};
+	u64 r;
+
+	do {
+		r = seamcall(TDH_EXT_INIT, &args);
+	} while (r == TDX_INTERRUPTED_RESUMABLE);
+
+	if (r != TDX_SUCCESS)
+		return -EFAULT;
+
+	return 0;
+}
+
 #define HPA_LIST_INFO_FIRST_ENTRY	GENMASK_U64(11, 3)
 #define HPA_LIST_INFO_PFN		GENMASK_U64(51, 12)
 #define HPA_LIST_INFO_LAST_ENTRY	GENMASK_U64(63, 55)
@@ -1285,7 +1301,11 @@ static __init int init_tdx_module_extensions(void)
 	if (!sysinfo_ext.ext_required)
 		return 0;
 
-	return tdx_ext_mem_setup(sysinfo_ext.memory_pool_required_pages);
+	ret = tdx_ext_mem_setup(sysinfo_ext.memory_pool_required_pages);
+	if (ret)
+		return ret;
+
+	return tdx_ext_init();
 }
 
 static __init int init_tdx_module(void)
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 04/17] x86/virt/tdx: Add extra memory to TDX module for the extensions
From: Xu Yilun @ 2026-06-18  8:13 UTC (permalink / raw)
  To: x86, kvm, linux-coco, linux-kernel
  Cc: djbw, kas, rick.p.edgecombe, yilun.xu, yilun.xu, xiaoyao.li,
	sohil.mehta, adrian.hunter, kishen.maloor, tony.lindgren,
	peter.fang, baolu.lu, zhenzhong.duan, dave.hansen, dave.hansen,
	seanjc
In-Reply-To: <20260618081355.3253581-1-yilun.xu@linux.intel.com>

TDX module extensions receive a one-time memory allocation at
initialization time. The extensions use this memory as the baseline for
their internal states and data required by the service APIs they offer.

Add a new memory feeding process backed by a new SEAMCALL
TDH.EXT.MEM.ADD. The process is mostly the same as adding PAMT. The
kernel queries TDX module how much memory needed by reading the
memory_pool_required_pages, allocates it, hands it over to the module,
and never gets it back.

TDH.EXT.MEM.ADD uses a new parameter type, HPA_LIST_INFO, to provide
this memory. This type represents a list of pages for TDX module to
access. It references an 'hpa_list page' which contains the list of
target HPAs. It collapses the HPA of the hpa_list page and the number
of valid target HPAs into a 64 bit raw value for SEAMCALL parameters.
The hpa_list page is always a medium, TDX module never keeps the
hpa_list page.

Don't CLFLUSH the pages handed to the TDX module, as is done for some
other SEAMCALLs. The flushing operation is not expected to be needed for
current and known future architectures. As more and more page feeding
interfaces to come, the conservative flushing operation becomes a
maintenance burden.

For now, TDX module extensions consume tens of megabytes memory that
will never be returned to host. Use contiguous page allocation to
isolate these large blocks entirely, avoiding permanent memory
fragmentation and reducing buddy allocator efficiency. Print the
allocation amount on TDX module extensions initialization for
visibility.

Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/include/asm/tdx_global_metadata.h  |   1 +
 arch/x86/virt/vmx/tdx/tdx.h                 |   1 +
 arch/x86/virt/vmx/tdx/tdx.c                 | 107 +++++++++++++++++++-
 arch/x86/virt/vmx/tdx/tdx_global_metadata.c |   6 ++
 4 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/asm/tdx_global_metadata.h
index 83fc657a438e..b3442b7c88bb 100644
--- a/arch/x86/include/asm/tdx_global_metadata.h
+++ b/arch/x86/include/asm/tdx_global_metadata.h
@@ -53,6 +53,7 @@ struct tdx_sys_info {
 };
 
 struct tdx_sys_info_ext {
+	u32 memory_pool_required_pages;
 	bool ext_required;
 };
 
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index a47e872480c7..a100634087e7 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -63,6 +63,7 @@
 #define TDH_SYS_SHUTDOWN		52
 #define TDH_SYS_UPDATE_V0		53
 #define TDH_SYS_UPDATE			SEAMCALL_LEAF_VER(TDH_SYS_UPDATE_V0, 1)
+#define TDH_EXT_MEM_ADD			61
 #define TDH_SYS_DISABLE			69
 
 /* TDX page types */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 6f3596f11d25..dab17822c1c6 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -31,6 +31,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/idr.h>
 #include <linux/kvm_types.h>
+#include <linux/bitfield.h>
 #include <asm/page.h>
 #include <asm/special_insns.h>
 #include <asm/msr-index.h>
@@ -1166,6 +1167,108 @@ static __init int init_tdmrs(struct tdmr_info_list *tdmr_list)
 	return 0;
 }
 
+#define HPA_LIST_INFO_FIRST_ENTRY	GENMASK_U64(11, 3)
+#define HPA_LIST_INFO_PFN		GENMASK_U64(51, 12)
+#define HPA_LIST_INFO_LAST_ENTRY	GENMASK_U64(63, 55)
+
+static __init u64 to_hpa_list_info(struct page *hpa_list_page,
+				   unsigned int nr_pages)
+{
+	return FIELD_PREP(HPA_LIST_INFO_FIRST_ENTRY, 0) |
+	       FIELD_PREP(HPA_LIST_INFO_PFN, page_to_pfn(hpa_list_page)) |
+	       FIELD_PREP(HPA_LIST_INFO_LAST_ENTRY, nr_pages - 1);
+}
+
+static __init int tdx_ext_mem_add(struct page *hpa_list_page,
+				  unsigned int nr_pages)
+{
+	struct tdx_module_args args = {
+		.rcx = to_hpa_list_info(hpa_list_page, nr_pages),
+	};
+	u64 r;
+
+	do {
+		/*
+		 * TDH_EXT_MEM_ADD is designed to use output parameter RCX to
+		 * override/update input parameter RCX, so the caller doesn't
+		 * have to do manual parameter update on retry call.
+		 */
+		r = seamcall_ret(TDH_EXT_MEM_ADD, &args);
+	} while (r == TDX_INTERRUPTED_RESUMABLE);
+
+	if (r != TDX_SUCCESS)
+		return -EFAULT;
+
+	return 0;
+}
+
+struct tdx_hpa_list {
+	u64 phys[PAGE_SIZE / sizeof(u64)];
+};
+
+static_assert(sizeof(struct tdx_hpa_list) == PAGE_SIZE);
+
+static __init int tdx_ext_mem_setup(unsigned int required_pages)
+{
+	struct tdx_hpa_list *hpa_list;
+	struct page *page;
+	unsigned int i;
+	int ret;
+
+	/*
+	 * memory_pool_required_pages == 0 means no need to add pages,
+	 * skip the memory setup.
+	 */
+	if (!required_pages)
+		return 0;
+
+	hpa_list = kzalloc_obj(*hpa_list);
+	if (!hpa_list)
+		return -ENOMEM;
+
+	page = alloc_contig_pages(required_pages, GFP_KERNEL, numa_mem_id(),
+				  &node_online_map);
+	if (!page) {
+		ret = -ENOMEM;
+		goto out_free_hpa_list;
+	}
+
+	i = 0;
+	while (i < required_pages) {
+		unsigned int nents = min(required_pages - i,
+					 ARRAY_SIZE(hpa_list->phys));
+		unsigned int j;
+
+		for (j = 0; j < nents; j++)
+			hpa_list->phys[j] = page_to_phys(page + i + j);
+
+		ret = tdx_ext_mem_add(virt_to_page(hpa_list), nents);
+		/*
+		 * No SEAMCALLs to reclaim the added pages. For simple error
+		 * handling, leak all pages.
+		 */
+		WARN(ret, "Fatal: TDX module rejected (%d) memory for extensions, stranded all pages\n",
+		     ret);
+		if (ret)
+			break;
+
+		i += nents;
+	}
+
+	/*
+	 * Memory for extensions can't be reclaimed once added, print out the
+	 * amount, stop tracking it and free the hpa_list page, no matter
+	 * success or failure.
+	 */
+	pr_info("%lu KB consumed for TDX module extensions\n",
+		required_pages * PAGE_SIZE / 1024);
+
+out_free_hpa_list:
+	kfree(hpa_list);
+
+	return ret;
+}
+
 static __init int init_tdx_module_extensions(void)
 {
 	struct tdx_sys_info_ext sysinfo_ext;
@@ -1182,9 +1285,7 @@ static __init int init_tdx_module_extensions(void)
 	if (!sysinfo_ext.ext_required)
 		return 0;
 
-	/* TODO: add the extensions enabling steps here */
-
-	return 0;
+	return tdx_ext_mem_setup(sysinfo_ext.memory_pool_required_pages);
 }
 
 static __init int init_tdx_module(void)
diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
index b9e1c011a990..720cdaf76492 100644
--- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
+++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
@@ -137,6 +137,12 @@ static __init int get_tdx_sys_info_ext(struct tdx_sys_info_ext *sysinfo_ext)
 	int ret;
 	u64 val;
 
+	ret = read_sys_metadata_field(0x3100000200000000, &val);
+	if (ret)
+		return ret;
+
+	sysinfo_ext->memory_pool_required_pages = val;
+
 	ret = read_sys_metadata_field(0x3100000000000001, &val);
 	if (ret)
 		return ret;
-- 
2.25.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox