Linux Confidential Computing Development
 help / color / mirror / Atom feed
* Re: [PATCH 01/15] x86/virt/tdx: Read global metadata for TDX Module Extensions
From: Xu Yilun @ 2026-05-29 16:59 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Fang, Peter, kas@kernel.org, djbw@kernel.org, x86@kernel.org,
	Xu, Yilun, Duan, Zhenzhong, baolu.lu@linux.intel.com, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, Mehta, Sohil, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev
In-Reply-To: <fd3f9e1f70babe97f98852f2a705341b86ed1132.camel@intel.com>

On Thu, May 28, 2026 at 09:00:12PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2026-05-22 at 11:41 +0800, Xu Yilun wrote:
> > +struct tdx_sys_info_ext {
> > +	u16 memory_pool_required_pages;
> 
> > +	u8 ext_required;
> 
> The docs say this is a bool.

mm.. OK.  We don't have to follow the auto-generated format now, so bool
is good to me.

> 
> > +};
> > +
> 

^ permalink raw reply

* Re: [PATCH 04/15] x86/virt/tdx: Enable the Extensions right after basic TDX Module init
From: Xu Yilun @ 2026-05-29 17:19 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Fang, Peter, kas@kernel.org, djbw@kernel.org, x86@kernel.org,
	Xu, Yilun, Duan, Zhenzhong, baolu.lu@linux.intel.com, Li, Xiaoyao,
	linux-kernel@vger.kernel.org, Mehta, Sohil, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev
In-Reply-To: <280fdea480922ad843e738b14f0b32cd977734a3.camel@intel.com>

On Thu, May 28, 2026 at 09:32:08PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2026-05-22 at 11:41 +0800, Xu Yilun wrote:
> > The detailed initialization flow for TDX Module Extensions has been
> > fully implemented.
> > 
> 
> I'm not sure what this means exactly. Why "detailed". Is that important?

It's not important. I should re-phrase, The entire initialization flow...

> 
> >  Enable the flow after basic TDX Module
> > initialization.
> > 
> > Theoretically, the Extensions doesn't need to be enabled right after
> > basic TDX initialization. It could be enabled right before the first
> > Extension SEAMCALL is issued. That would save or postpone memory usage.
> > But it isn't worth the complexity, the needs for the Extensions are vast
> > but the savings are little for a typical TDX capable system (about
> > 0.001% of memory). So the Linux decision is to just enable it along with
> > the basic TDX.
> 
> The Linux decision is whatever this patch turns out to be after community
> review. So for the patch log we just need to justify why it's a good idea, not
> not make an argument to defer to authority.

Understood. I'll re-phrase this paragraph according to all the comments,
especially the last sentence.

> 
> > 
> > Note that the Extensions initialization flow will still not start if no
> > add-on features require Extensions. The enabling of add-on features will
> > be in later patches. Until then, the system hasn't consumed extra memory.
> 
> Hmm, this patch reads like we are finally doing the initialization up until this
> point. Then it turns out we don't actually light up the new code yet... 
> 
> A lot of this diff is adding __init to the function added in the earlier
> patches. Do we need to do this? Why not add them as __init in the original
> patches?
> 
> 
> I think we maybe want to say instead that we are setting up to enable extensions
> at TDX module init time, and do the explanation of why. Then without the __init
> stuff, the patch is just about the init time decision. Which seems about right
> sized.

Yes. Since the patch doesn't actually light up anything new, I think it
could just be the first patch of Extensions so add __init at the first
place.

^ permalink raw reply

* Re: [PATCH v4 0/2] Extend KVM_HC_MAP_GPA_RANGE api to allow retry
From: Sean Christopherson @ 2026-05-29 22:47 UTC (permalink / raw)
  To: Sean Christopherson, Vishal Annapurve, Paolo Bonzini, Dave Hansen,
	Kiryl Shutsemau, Rick Edgecombe, Sagi Shahar
  Cc: Thomas Gleixner, Borislav Petkov, H. Peter Anvin, Michael Roth,
	Tom Lendacky, x86, kvm, linux-kernel, linux-coco
In-Reply-To: <20260305222627.4193305-1-sagis@google.com>

On Thu, 05 Mar 2026 22:26:25 +0000, Sagi Shahar wrote:
> In some cases, userspace might decide to split MAP_GPA requests and
> retry them the next time the guest runs. One common case is MAP_GPA
> requests received right before intrahost migration when userspace
> might decide to complete the request after the migration is complete
> to reduce blackout time.
> 
> This is v4 of the series.
> 
> [...]

Applied to kvm-x86 misc, thanks!

[1/2] KVM: TDX: Allow userspace to return errors to guest for MAPGPA
      https://github.com/kvm-x86/linux/commit/3e2dec1ede0a
[2/2] KVM: SEV: Restrict userspace return codes for KVM_HC_MAP_GPA_RANGE
      https://github.com/kvm-x86/linux/commit/5d40e5b49442

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply

* Re: [PATCH v4 01/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: Borislav Petkov @ 2026-05-30  3:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
	Kiryl Shutsemau, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov, Jan Kiszka,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	John Stultz, H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
	virtualization, xen-devel, David Woodhouse, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, Michael Kelley,
	Thomas Gleixner
In-Reply-To: <20260529144435.704127-2-seanjc@google.com>

On Fri, May 29, 2026 at 07:43:48AM -0700, Sean Christopherson wrote:
> Don't re-calibrate the TSC frequency if the TSC is known to run at a fixed
> frequency.  In practice, this is likely one big nop, as re-calibration is
> used only for SMP=n kernels, and only for hardware that is 20+ years old,
> i.e. is extremely unlikely to collide with TSC_KNOWN_FREQ.

Why do we care?

So what if it recalibrates once on UP?

Look where it is called - all old rust which no one uses anymore.

> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kernel/tsc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index c5110eb554bc..08cf6625d484 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -946,7 +946,8 @@ void recalibrate_cpu_khz(void)
>  		return;
>  
>  	cpu_khz = x86_platform.calibrate_cpu();
> -	tsc_khz = x86_platform.calibrate_tsc();
> +	if (!boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ))

cpu_feature_enabled() everywhere please.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v4 15/47] KVM: x86: Officially define CPUID 0x40000010 as PV Timing Info (TSC and Bus)
From: Christian Ludloff @ 2026-05-30 16:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Kiryl Shutsemau, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz, H. Peter Anvin,
	Rick Edgecombe, Vitaly Kuznetsov, Boris Ostrovsky, Stephen Boyd,
	kvm, linux-kernel, linux-coco, linux-hyperv, virtualization,
	xen-devel, David Woodhouse, Tom Lendacky, Nikunj A Dadhania,
	David Woodhouse, Michael Kelley, Thomas Gleixner,
	bcm-kernel-feedback-list

> + *  # EAX: (Virtual) TSC frequency in kHz.
> + *  # EBX: (Virtual) Bus (local APIC timer) frequency in kHz.
> + *  # ECX, EDX: Reserved (must be zero).

Can someone from Broadcom please speak up as to
what a non-ECX value signifies for their HV? (Asking
because I see a value of 2, not a must-be-zero.)

--
C.

^ permalink raw reply

* Re: [PATCH 00/15] Enable TDX Module Extensions and DICE-based TDX Quoting
From: Xu Yilun @ 2026-06-01  9:36 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, yilun.xu, baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <7fdc27cc-22a8-4442-9c9b-4bace9ee0d23@intel.com>

On Thu, May 28, 2026 at 12:50:34PM -0700, Sohil Mehta wrote:
> On 5/27/2026 9:52 PM, Xu Yilun wrote:
> 
> > No the memory needed varies depends on the feature or the number of
> > features. But currently I see the total requirement is ~50MB.
> > 
> This is important consideration when defining the default policy. Could
> you please elaborate on how this will scale in the future?
> 
> How are the memory requirements expected to grow with additional features?

I queried the TDX module team, and the answer is they almost grow
linear. I measured the only feature - PCIe Link encryption (SPDM) - on
my hand again, the precise memory consumption is now 35M.

In the foreseeable future, the features are SPDM, DICE & TD Migration,
so will cost ~105M at most. I think the number still works with the
default policy.

> 
> Let's say a future platform has a lot more features and needs
> significantly more memory. Wouldn't loading a legacy kernel with this
> default policy lead to excessive wastage?

A legacy kernel won't consume Extensions memory. The Extensions memory
is only required by TDX module when add-ons features are explicitly
configured via TDH.SYS.CONFIG [1]. For legacy kernel, no add-on features
configured so no memory consumption.

But yes, if the features grow rapidly out of expectation, may need new
options to switch something off. I think if we discuss later when the
need actually arises.

[1]: https://lore.kernel.org/all/20260522034128.3144354-16-yilun.xu@linux.intel.com/

> 
> Maybe I am missing something obvious. The struct in patch 1,
> memory_pool_required_pages is u16. So, will the Extensions support never
> require more than 256MB?

Good catch. TDX module team admitted this is an issue. They want to
increase the size to 4 bytes for future.

^ permalink raw reply

* Re: [PATCH v5 4/7] x86/sev: Add support to perform RMP optimizations asynchronously
From: Kalra, Ashish @ 2026-06-01 18:03 UTC (permalink / raw)
  To: Ackerley Tng, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <8b7f6c93-ad5a-45e1-aa70-945518d29ddc@amd.com>


On 5/28/2026 6:52 PM, Kalra, Ashish wrote:
> Hello Ackerley,

>>> +	/*
>>> +	 * RMPOPT scans the RMP table, stores the result of the scan in the
>>> +	 * reserved processor memory. The RMP scan is the most expensive
>>> +	 * part. If a second RMPOPT occurs, it can skip the expensive scan
>>> +	 * if they can see a cached result in the reserved processor memory.
>>> +	 *
>>> +	 * Do RMPOPT on one CPU alone. Then, follow that up with RMPOPT
>>> +	 * on every other primary thread. This potentially allows the
>>
>> I like the leader and follower comments below, thanks! With this
>> leader/follower setup, will the followers definitely see the cached scan
>> results, or might the followers still potentially not benefit from the
>> caching? If it's still only "potentially", why?
> 
> I am verifying with the H/W architects if this is always going to be true or not,
> will the followers always benefit from the scan results cached by the leader (first CPU)
> or there is a possibility that the followers cannot see/access/get the cached results
> and instead do full RMP scanning ?
> 

Following up on this, i have checked with the H/W architects, and the feedback is that
the: followers are "designed to" skip the scan if they see a cached result.

Thanks,
Ashish

^ permalink raw reply

* Re: [PATCH 00/15] Enable TDX Module Extensions and DICE-based TDX Quoting
From: Sohil Mehta @ 2026-06-01 20:17 UTC (permalink / raw)
  To: Xu Yilun
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, yilun.xu, baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <ah1SnuEHuFeX873m@yilunxu-OptiPlex-7050>


>>
>> Let's say a future platform has a lot more features and needs
>> significantly more memory. Wouldn't loading a legacy kernel with this
>> default policy lead to excessive wastage?
> 
> A legacy kernel won't consume Extensions memory. The Extensions memory
> is only required by TDX module when add-ons features are explicitly
> configured via TDH.SYS.CONFIG [1]. 

So, the TDX module will only report memory_pool_required_pages for
add-on features that have been configured by the kernel? This would be
good to clarify in the cover letter.

> For legacy kernel, no add-on features configured so no memory
> consumption.
> 

I was referring to the first kernel that has support for one TDX
extension. I am mainly trying to ensure that a kernel with support for
one TDX extension only consumes memory for that feature (even when it is
loaded on a hardware platform that supports multiple TDX extensions).

> But yes, if the features grow rapidly out of expectation, may need new
> options to switch something off. I think if we discuss later when the
> need actually arises.
> 


^ permalink raw reply

* Re: [PATCH v4 1/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: David Woodhouse @ 2026-06-01 21:46 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-2-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On Fri, 29 May 2026 07:43:48 -0700, Sean Christopherson wrote:
> Don't re-calibrate the TSC frequency if the TSC is known to run at a fixed
> frequency.  In practice, this is likely one big nop, as re-calibration is
> used only for SMP=n kernels, and only for hardware that is 20+ years old,
> i.e. is extremely unlikely to collide with TSC_KNOWN_FREQ.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 8/47] x86/tsc: Add dedicated hypervisor hooks for getting known TSC/CPU frequencies
From: David Woodhouse @ 2026-06-01 21:49 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-9-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 740 bytes --]

On Fri, 29 May 2026 07:43:55 -0700, Sean Christopherson wrote:
> Add dedicated hypervisor hooks for getting known TSC/CPU frequencies
> instead of overriding seemingly generic platform hooks, and explicitly
> priotize hypervisor-provided frequencies over native methods, but do NOT
> clobber the frequency obtained from trusted firmware.  While shuffling the
> hooks around is arguably "six of one, half dozen of the other", scoping
> them to x86_hyper_init makes their purpose more obvious, and allows for
> explicitly defining the priority of sources (as is done here).
>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 11/47] x86/tsc: Kill off x86_platform_ops.calibrate_{cpu,tsc}() hooks
From: David Woodhouse @ 2026-06-01 21:51 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-12-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

On Fri, 29 May 2026 07:43:58 -0700, Sean Christopherson wrote:
> Now that getting the CPU and/or TSC frequencies from the hypervisor uses
> dedicated hooks, drop x86_platform_ops.calibrate_{cpu,tsc}() and instead
> directly invoke the correct helper at each phase of (re)calibration.  In
> addition to eliminating unnecessary code, this makes it a bit more obvious
> when the "late" path invokes pit_hpet_ptimer_calibrate_cpu() instead of
> x86_platform_ops.calibrate_cpu().
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 13/47] x86/tsc: Fold native_calibrate_cpu() into recalibrate_cpu_khz()
From: David Woodhouse @ 2026-06-01 21:52 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-14-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 503 bytes --]

On Fri, 29 May 2026 07:44:00 -0700, Sean Christopherson wrote:
> Fold the guts of native_calibrate_cpu() into its sole remaining caller,
> recalibrate_cpu_khz() to eliminate the extra SMP=n #ifdef, and so that it's
> more obvious that directly invoking the early vs. late calibration routines
> in determine_cpu_tsc_frequencies() is intentional.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 12/47] x86/tsc: Rename pit_hpet_ptimer_calibrate_cpu() => native_calibrate_cpu_late()
From: David Woodhouse @ 2026-06-01 21:52 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-13-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 339 bytes --]

On Fri, 29 May 2026 07:43:59 -0700, Sean Christopherson wrote:
> Rename the late CPU calibration routine so that its relationship to the
> early routine is more obvious and intuitive.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 14/47] x86/kvmclock: Rename kvm_get_tsc_khz() to kvmclock_get_tsc_khz()
From: David Woodhouse @ 2026-06-01 21:53 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-15-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 400 bytes --]

On Fri, 29 May 2026 07:44:01 -0700, Sean Christopherson wrote:
> Rename kvm_get_tsc_khz() to kvmclock_get_tsc_khz() in anticipation of
> adding support for getting TSC info from PV CPUID, i.e. in a KVM specific
> way, but without non-kvmclock.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 17/47] x86/kvm: Mark TSC as reliable when it's constant and nonstop
From: David Woodhouse @ 2026-06-01 22:02 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx, sashiko-reviews
In-Reply-To: <ahnhnjvfIblFxTFX@google.com>

[-- Attachment #1: Type: text/plain, Size: 2028 bytes --]

On Fri, 29 May 2026 11:57:34 -0700, Sean Christopherson wrote:
> On Fri, May 29, 2026, sashiko-bot@kernel.org wrote:
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index 909d3e5e5bcd5..4fe9c69bf40b3 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > [ ... ]
> > > @@ -1040,7 +1041,20 @@ static void __init kvm_init_platform(void)
> > [ ... ]
> > > -	kvmclock_init();
> > > +        /*
> > > +         * If the TSC counts at a constant frequency across P/T states, counts
> > > +         * in deep C-states, and the TSC hasn't been marked unstable, treat the
> > > +         * TSC reliable, as guaranteed by KVM.  Note, the TSC unstable check
> > > +         * exists purely to honor the TSC being marked unstable via command
> > > +         * line, any runtime detection of an unstable will happen after this.
> > > +         */
> > > +	tsc_is_reliable = boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > > +			  boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > > +			  !check_tsc_unstable();
> > 
> > [Severity: High]
> > Does this evaluate check_tsc_unstable() too early to catch the command line
> > parameter?
> 
> Huh, it does indeed.
> 
> > It looks like kvm_init_platform() is called from setup_arch(), but the
> > tsc=unstable kernel parameter is parsed via __setup() later during
> > parse_args() in start_kernel().
> > 
> > If check_tsc_unstable() evaluates to 0 here because the parameter hasn't
> > been parsed yet, wouldn't it incorrectly force X86_FEATURE_TSC_RELIABLE
> > and set prefer_tsc to true?
> 
> Yep, but this is a pre-existing problem that goes all the way back to the original
> commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if invariant TSC is exposed").
> 
> We could try to fix that, but I'm _very_ strongly inclined to add (yet another)
> patch to simply drop the check_tsc_unstable() since it has always been dead code.

Yeah, kill it with fire.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 18/47] x86/kvm: Get local APIC bus frequency from PV CPUID Timing Info
From: David Woodhouse @ 2026-06-01 22:06 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx, sashiko-reviews
In-Reply-To: <ahnZ8tuHFVFTeofW@google.com>

[-- Attachment #1: Type: text/plain, Size: 1734 bytes --]

On Fri, 29 May 2026 11:24:50 -0700, Sean Christopherson wrote:
> On Fri, May 29, 2026, sashiko-bot@kernel.org wrote:
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index 4fe9c69bf40b3..c1139182121d5 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > 
> > [ ... ]
> > 
> > > @@ -1039,6 +1040,13 @@ static void __init kvm_init_platform(void)
> > >                     x86_init.hyper.get_tsc_khz = kvm_get_tsc_khz;
> > >                     x86_init.hyper.get_cpu_khz = kvm_get_tsc_khz;
> > >             }
> > > +
> > > +#ifdef CONFIG_X86_LOCAL_APIC
> > > +           /* The leaf also includes the local APIC bus/timer frequency.*/
> > > +           apic_khz = cpuid_ebx(timing_info_leaf);
> > > +           if (apic_khz)
> > > +                  lapic_timer_period = apic_khz * 1000 / HZ;
> > 
> > [Severity: High]
> > Could the intermediate multiplication of apic_khz * 1000 overflow here?
> > 
> > Since apic_khz is a u32, the multiplication is evaluated in 32-bit unsigned
> > arithmetic. If the hypervisor reports an APIC timer frequency greater than
> > ~4.29 GHz (apic_khz > 4294967), this calculation will silently overflow.
> 
> Hmm, easy enough to use mul_u64_u32_div() (I think that's the write helper for
> this?).

Yep.

> But this problem pre-exits in almost every other path that sets lapic_timer_period.
> So while I tried to avoid doing yet more tangentially related cleanup, it seems
> like adding a helper to set lapic_timer_period is the way to go.  That would also
> allow making lapic_timer_period local to arch/x86/kernel/apic/apic.c.
> 
> *sigh*

Yay, more patches!

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 31/47] x86/vmware: NOP-ify save/restore hooks when using VMware's sched_clock
From: David Woodhouse @ 2026-06-01 22:09 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529150753.714296-1-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 628 bytes --]

On Fri, 29 May 2026 08:07:52 -0700, Sean Christopherson wrote:
> NOP-ify the sched_clock save/restore hooks when using VMware's version of
> sched_clock.  This will allow extending paravirt_set_sched_clock() to set
> the save/restore hooks, without having to simultaneously change the
> behavior of VMware guests.
>
> Note, it's not at all obvious that it's safe/correct for VMware guests to
> do nothing on suspend/resume, but that's a pre-existing problem.  Leave it
> for a VMware expert to sort out.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 30/47] x86/xen/time: NOP-ify x86_platform's sched_clock save/restore hooks
From: David Woodhouse @ 2026-06-01 22:09 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529150741.714145-1-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 403 bytes --]

On Fri, 29 May 2026 08:07:41 -0700, Sean Christopherson wrote:
> NOP-ify the x86_platform sched_clock save/restore hooks when setting up
> Xen's PV clock to make it somewhat obvious the hooks aren't used when
> running as a Xen guest (Xen uses a paravirtualized suspend/resume flow).
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 46/47] x86/kvmclock: Plumb in AP-online and BSP-resume to kvmlock, for documentation
From: David Woodhouse @ 2026-06-01 22:09 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529150833.715042-1-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 543 bytes --]

On Fri, 29 May 2026 08:08:33 -0700, Sean Christopherson wrote:
> Invoke kvmclock_cpu_action() with AP_ONLINE and BSP_RESUME, even though
> kvmclock doesn't need to do anything in either case, so that the asymmetry
> of kvmclock is a detail buried in kvmclock, and to explicitly document
> that doing nothing during those phases is intentional and correct.
>
> For all intents and purposes, no functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v7 09/42] KVM: guest_memfd: Add base support for KVM_SET_MEMORY_ATTRIBUTES2
From: Michael Roth @ 2026-06-01 23:14 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, tabba, willy,
	wyihan, yan.y.zhao, forkloop, pratyush, suzuki.poulose,
	aneesh.kumar, liam, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260522-gmem-inplace-conversion-v7-9-2f0fae496530@google.com>

On Fri, May 22, 2026 at 05:17:51PM -0700, Ackerley Tng via B4 Relay wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> Introduce base support for KVM_SET_MEMORY_ATTRIBUTES2 in guest_memfd, which
> just updates attributes tracked by guest_memfd.
> 
> Validate input fields in general. Guard usage of KVM_SET_MEMORY_ATTRIBUTES2
> by making sure requested attributes are supported for this instance of kvm.
> 
> A new KVM_SET_MEMORY_ATTRIBUTES2 is defined to support writes (unlike
> KVM_SET_MEMORY_ATTRIBUTES) in addition to reads so it can provide error
> details to userspace. This will be used in a later patch.
> 
> The two ioctls use their corresponding structs with no overlap, but
> backward compatibility is baked in for future support of
> KVM_SET_MEMORY_ATTRIBUTES2 and struct kvm_memory_attributes2 in the VM
> ioctl.
> 
> The process of setting memory attributes is set up such that the later half
> will not fail due to allocation. Any necessary checks are performed before
> the point of no return.
> 
> Co-developed-by: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Vishal Annapurve <vannapurve@google.com>
> Co-developed-by: Sean Christoperson <seanjc@google.com>
> Signed-off-by: Sean Christoperson <seanjc@google.com>

Typo on the "person".

(Sent this earlier but looks like some of my emails never hit the
list so re-sending. Apologies if this is a dupe).

Thanks,

Mike

> Reviewed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---

^ permalink raw reply

* Re: [PATCH v4 02/47] x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15
From: Borislav Petkov @ 2026-06-02  3:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
	Kiryl Shutsemau, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov, Jan Kiszka,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	John Stultz, H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
	virtualization, xen-devel, David Woodhouse, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, Michael Kelley,
	Thomas Gleixner
In-Reply-To: <20260529144435.704127-3-seanjc@google.com>

On Fri, May 29, 2026 at 07:43:49AM -0700, Sean Christopherson wrote:
> +static int cpuid_get_tsc_info(struct cpuid_tsc_info *info)
> +{
> +	unsigned int ecx_hz, edx;
> +
> +	memset(info, 0, sizeof(*info));

Let's not clear this unnecessarily...

> +
> +	if (boot_cpu_data.cpuid_level < CPUID_LEAF_TSC)
> +		return -ENOENT;

... just to return here...

> +
> +	/* CPUID 15H TSC/Crystal ratio, plus optionally Crystal Hz */
> +	cpuid(CPUID_LEAF_TSC, &info->denominator, &info->numerator, &ecx_hz, &edx);
> +
> +	if (!info->denominator || !info->numerator)
> +		return -ENOENT;

... or here.

We wanna clear it here, when we'll return success.

> +
> +	/*
> +	 * Note, some CPUs provide the multiplier information, but not the core

	Note: some CPUs...

> +	 * crystal frequency.  The multiplier information is still useful for
> +	 * such CPUs, as the crystal frequency can be gleaned from CPUID.0x16.
> +	 */
> +	info->crystal_khz = ecx_hz / 1000;
> +	return 0;
> +}
> +
> +int __init cpuid_get_tsc_freq(struct cpuid_tsc_info *info)
> +{
> +	if (cpuid_get_tsc_info(info) || !info->crystal_khz)
> +		return -ENOENT;
> +
> +	info->tsc_khz = info->crystal_khz * info->numerator / info->denominator;
> +	return 0;
> +}

Unused here. Add it with its first user pls.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v5 5/5] iommufd/vdevice: add TSM request ioctl
From: Aneesh Kumar K.V @ 2026-06-02  5:10 UTC (permalink / raw)
  To: Dan Williams (nvidia), Dan Williams (nvidia),
	Alexey Kardashevskiy, linux-coco, iommu, linux-kernel, kvm
  Cc: Bjorn Helgaas, Dan Williams, Jason Gunthorpe, Joerg Roedel,
	Jonathan Cameron, Kevin Tian, Nicolin Chen, Samuel Ortiz,
	Steven Price, Suzuki K Poulose, Will Deacon, Xu Yilun,
	Shameer Kolothum, Paolo Bonzini, Tony Krowiak, Halil Pasic,
	Jason Herne, Harald Freudenberger, Holger Dengler, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Alex Williamson, Matthew Rosato, Farhan Ali,
	Eric Farman, linux-s390
In-Reply-To: <6a1774dd80f74_19737610095@djbw-dev.notmuch>

"Dan Williams (nvidia)" <djbw@kernel.org> writes:

> Aneesh Kumar K.V wrote:
>> >> I am leaning towards the latter at this point.
>> >
>> > But we already have struct pci_tsm_ops::guest_req, which is specific to
>> > the underlying CC architecture. From the above, pci_tsm_req_scope also
>> > appears to carry the same information. Is that useful?
>> >
>> 
>> I think there is value in having the VMM express the guest’s
>> confidential computing architecture, so that the TSM backend can
>> validate whether it should handle that guest request ?.
>
> Yes, that is the idea.
>
>> So it would not be the IOMMU validating the scope value, but rather
>> pci_tsm_ops::guest_req.
>> 
>> static ssize_t cca_tsm_guest_req(struct pci_tdi *tdi, enum pci_tsm_req_scope scope,
>> 		sockptr_t req, size_t req_len, sockptr_t resp,
>> 		size_t resp_len, u64 *tsm_code)
>> {
>> 	struct pci_dev *pdev = tdi->pdev;
>> 
>> 	/* reject the guest request if VMM was using the link tsm wrongly. The guest
>> 	 * was using a wrong CC archiecture with this link tsm
>> 	 */
>> 	if (scope != TSM_REQ_TYPE_CCA)
>> 		return -EINVAL;
>
> Right, iommufd is tunneling TSM requests. The tunnel should have an
> envelope of TSM_REQ_TYPE_* and an @op field. The TSM driver gets those
> from iommufd, validates the envelope and then processes @req.
>
> This self-consistency and explicitness also buys some future-proofing.
> It allows for alternate command sets within an arch, cross TSM
> implementation shared commands, IOMMUFD-to-TSM requests outside of guest
> requests.
>
>> Jason Gunthorpe <jgg@ziepe.ca> writes:
>> 
>> > On Tue, May 26, 2026 at 11:17:50PM -0700, Dan Williams (nvidia) wrote:
>> >
>> >> In that case pci_tsm_req_scope becomes tsm_req_type and is just:
>> >> 
>> >> TSM_REQ_TYPE_CCA
>> >> TSM_REQ_TYPE_SEV
>> >> TSM_REQ_TYPE_TDX
>> >> 
>> >> I am leaning towards the latter at this point.
>> >
>> > Yeah, this sounds good. I would also include an common op field that
>> > can be decoded by the TSM driver based on the TYPE above, and the
>> > usual in/out message buffers.
>> 
>> We already have iommufd_vdevice_tsm_op_ioctl() to handle common
>> operations.
>
> Per above, I believe this is about an @op value in a common location
> that iommufd can forward to the backend for validation of guest
> requests.
>
>> Right now, it handles IOMMU_VDEVICE_TSM_BIND and
>> IOMMU_VDEVICE_TSM_UNBIND. I guess we should move TSM_REQ_SET_TDI_STATE
>> operations to that as well?
>
> I think we can wait to move it to its own IOMMU operation unless/until
> there is a need to set RUN outside of an explicit guest request, right?

Something like the below? (the diff against this series)

I have not yet integrated this into the full CCA patchset for testing,
but I wanted to make sure we are aligned on the UAPI.

diff --git a/drivers/iommu/iommufd/tsm.c b/drivers/iommu/iommufd/tsm.c
index 56bb499ba7a9..345efba2e66e 100644
--- a/drivers/iommu/iommufd/tsm.c
+++ b/drivers/iommu/iommufd/tsm.c
@@ -61,17 +61,30 @@ int iommufd_vdevice_tsm_op_ioctl(struct iommufd_ucmd *ucmd)
 	return ret;
 }
 
-static bool iommufd_vdevice_tsm_req_scope_valid(u32 scope)
+static bool iommufd_vdevice_tsm_req_arch_valid(u32 tvm_arch)
 {
-	if (scope > IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_LAST)
+	switch (tvm_arch) {
+	case IOMMU_VDEVICE_TSM_TVM_ARCH_CCA:
+	case IOMMU_VDEVICE_TSM_TVM_ARCH_SEV:
+	case IOMMU_VDEVICE_TSM_TVM_ARCH_TDX:
+		return true;
+	default:
 		return false;
+	}
+}
 
-	switch (scope) {
-	case IOMMU_VDEVICE_TSM_REQ_PCI_INFO:
-	case IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE:
-	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ:
-	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE:
+static bool iommufd_vdevice_tsm_req_op_valid(u32 op, u32 tvm_arch)
+{
+	switch (op) {
+	case TSM_REQ_READ_OBJECT:
+	case TSM_REQ_REGEN_OBJECT:
+	case TSM_REQ_OBJECT_INFO:
+	case TSM_REQ_VALIDATE_MMIO:
+	case TSM_REQ_SET_TDI_STATE:
 		return true;
+	case TSM_REQ_SEV_ENABLE_DMA:
+	case TSM_REQ_SEV_DISABLE_DMA:
+		return tvm_arch == IOMMU_VDEVICE_TSM_TVM_ARCH_SEV;
 	default:
 		return false;
 	}
@@ -99,7 +112,8 @@ int iommufd_vdevice_tsm_req_ioctl(struct iommufd_ucmd *ucmd)
 	struct iommufd_vdevice *vdev;
 	struct iommu_vdevice_tsm_req *cmd = ucmd->cmd;
 	struct tsm_guest_req_info info = {
-		.scope = cmd->scope,
+		.op = cmd->op,
+		.tvm_arch = cmd->tvm_arch,
 		.req   = {
 			.user = u64_to_user_ptr(cmd->req_uptr),
 			.is_kernel = false,
@@ -112,10 +126,10 @@ int iommufd_vdevice_tsm_req_ioctl(struct iommufd_ucmd *ucmd)
 		.resp_len = cmd->resp_len,
 	};
 
-	if (cmd->__reserved)
-		return -EOPNOTSUPP;
+	if (!iommufd_vdevice_tsm_req_arch_valid(cmd->tvm_arch))
+		return -EINVAL;
 
-	if (!iommufd_vdevice_tsm_req_scope_valid(cmd->scope))
+	if (!iommufd_vdevice_tsm_req_op_valid(cmd->op, cmd->tvm_arch))
 		return -EINVAL;
 
 	vdev = iommufd_get_vdevice(ucmd->ictx, cmd->vdevice_id);
diff --git a/drivers/pci/tsm.c b/drivers/pci/tsm.c
index 5fdcd7f2e820..439241c756fd 100644
--- a/drivers/pci/tsm.c
+++ b/drivers/pci/tsm.c
@@ -378,7 +378,8 @@ EXPORT_SYMBOL_GPL(pci_tsm_bind);
 /**
  * pci_tsm_guest_req() - helper to marshal guest requests to the TSM driver
  * @pdev: @pdev representing a bound tdi
- * @scope: caller asserts this passthrough request is limited to TDISP operations
+ * @op: guest-initiated request operation
+ * @tvm_arch: guest TVM architecture
  * @req_in: Input payload forwarded from the guest
  * @in_len: Length of @req_in
  * @req_out: Output payload buffer response to the guest
@@ -387,7 +388,7 @@ EXPORT_SYMBOL_GPL(pci_tsm_bind);
  *
  * This is a common entry point for requests triggered by userspace KVM-exit
  * service handlers responding to TDI information or state change requests. The
- * scope parameter limits requests to TDISP state management, or limited debug.
+ * operation parameter limits requests to guest-initiated TSM operations.
  * This path is only suitable for commands and results that are the host kernel
  * has no use, the host is only facilitating guest to TSM communication.
  *
@@ -400,7 +401,9 @@ EXPORT_SYMBOL_GPL(pci_tsm_bind);
  * Context: Caller is responsible for calling this within the pci_tsm_bind()
  * state of the TDI.
  */
-ssize_t pci_tsm_guest_req(struct pci_dev *pdev, enum pci_tsm_req_scope scope,
+ssize_t pci_tsm_guest_req(struct pci_dev *pdev,
+			  enum iommu_vdevice_tsm_guest_req_op op,
+			  enum iommu_vdevice_tsm_guest_tvm_arch tvm_arch,
 			  sockptr_t req_in, size_t in_len, sockptr_t req_out,
 			  size_t out_len, u64 *tsm_code)
 {
@@ -408,9 +411,30 @@ ssize_t pci_tsm_guest_req(struct pci_dev *pdev, enum pci_tsm_req_scope scope,
 	struct pci_tdi *tdi;
 	int rc;
 
-	/* Forbid requests that are not directly related to TDISP operations */
-	if (scope > PCI_TSM_REQ_STATE_CHANGE)
+	switch (tvm_arch) {
+	case IOMMU_VDEVICE_TSM_TVM_ARCH_CCA:
+	case IOMMU_VDEVICE_TSM_TVM_ARCH_SEV:
+	case IOMMU_VDEVICE_TSM_TVM_ARCH_TDX:
+		break;
+	default:
 		return -EINVAL;
+	}
+
+	switch (op) {
+	case TSM_REQ_READ_OBJECT:
+	case TSM_REQ_REGEN_OBJECT:
+	case TSM_REQ_OBJECT_INFO:
+	case TSM_REQ_VALIDATE_MMIO:
+	case TSM_REQ_SET_TDI_STATE:
+		break;
+	case TSM_REQ_SEV_ENABLE_DMA:
+	case TSM_REQ_SEV_DISABLE_DMA:
+		if (tvm_arch == IOMMU_VDEVICE_TSM_TVM_ARCH_SEV)
+			break;
+		fallthrough;
+	default:
+		return -EINVAL;
+	}
 
 	ACQUIRE(rwsem_read_intr, lock)(&pci_tsm_rwsem);
 	if ((rc = ACQUIRE_ERR(rwsem_read_intr, &lock)))
@@ -430,8 +454,9 @@ ssize_t pci_tsm_guest_req(struct pci_dev *pdev, enum pci_tsm_req_scope scope,
 	tdi = pdev->tsm->tdi;
 	if (!tdi)
 		return -ENXIO;
-	return to_pci_tsm_ops(pdev->tsm)->guest_req(tdi, scope, req_in, in_len,
-						    req_out, out_len, tsm_code);
+	return to_pci_tsm_ops(pdev->tsm)->guest_req(tdi, op, tvm_arch, req_in,
+						    in_len, req_out, out_len,
+						    tsm_code);
 }
 EXPORT_SYMBOL_GPL(pci_tsm_guest_req);
 
diff --git a/drivers/virt/coco/tsm-core.c b/drivers/virt/coco/tsm-core.c
index ce01b19990f5..88cb168d8120 100644
--- a/drivers/virt/coco/tsm-core.c
+++ b/drivers/virt/coco/tsm-core.c
@@ -128,42 +128,15 @@ int tsm_unbind(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(tsm_unbind);
 
-static int tsm_pci_req_scope(u32 scope, enum pci_tsm_req_scope *pci_scope)
-{
-	switch (scope) {
-	case IOMMU_VDEVICE_TSM_REQ_PCI_INFO:
-		*pci_scope = PCI_TSM_REQ_INFO;
-		return 0;
-	case IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE:
-		*pci_scope = PCI_TSM_REQ_STATE_CHANGE;
-		return 0;
-	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ:
-		*pci_scope = PCI_TSM_REQ_DEBUG_READ;
-		return 0;
-	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE:
-		*pci_scope = PCI_TSM_REQ_DEBUG_WRITE;
-		return 0;
-	default:
-		return -EINVAL;
-	}
-}
-
 ssize_t tsm_guest_req(struct device *dev,
 		struct tsm_guest_req_info *info, u64 *tsm_code)
 {
-	int ret;
-	enum pci_tsm_req_scope pci_scope;
-
 	if (!dev_is_pci(dev))
 		return -EINVAL;
 
-	ret = tsm_pci_req_scope(info->scope, &pci_scope);
-	if (ret)
-		return ret;
-
-	return pci_tsm_guest_req(to_pci_dev(dev), pci_scope, info->req,
-				 info->req_len, info->resp, info->resp_len,
-				 tsm_code);
+	return pci_tsm_guest_req(to_pci_dev(dev), info->op, info->tvm_arch,
+				 info->req, info->req_len, info->resp,
+				 info->resp_len, tsm_code);
 }
 EXPORT_SYMBOL_GPL(tsm_guest_req);
 
diff --git a/include/linux/pci-tsm.h b/include/linux/pci-tsm.h
index ec2236a7a279..30a60551fcf5 100644
--- a/include/linux/pci-tsm.h
+++ b/include/linux/pci-tsm.h
@@ -9,7 +9,6 @@
 struct pci_tsm;
 struct tsm_dev;
 struct kvm;
-enum pci_tsm_req_scope;
 
 /*
  * struct pci_tsm_ops - manage confidential links and security state
@@ -55,7 +54,8 @@ struct pci_tsm_ops {
 					struct kvm *kvm, u32 tdi_id);
 		void (*unbind)(struct pci_tdi *tdi);
 		ssize_t (*guest_req)(struct pci_tdi *tdi,
-				     enum pci_tsm_req_scope scope,
+				     enum iommu_vdevice_tsm_guest_req_op op,
+				     enum iommu_vdevice_tsm_guest_tvm_arch tvm_arch,
 				     sockptr_t req_in, size_t in_len,
 				     sockptr_t req_out, size_t out_len,
 				     u64 *tsm_code);
@@ -160,46 +160,6 @@ static inline bool is_pci_tsm_pf0(struct pci_dev *pdev)
 	return PCI_FUNC(pdev->devfn) == 0;
 }
 
-/**
- * enum pci_tsm_req_scope - Scope of guest requests to be validated by TSM
- *
- * Guest requests are a transport for a TVM to communicate with a TSM + DSM for
- * a given TDI. A TSM driver is responsible for maintaining the kernel security
- * model and limit commands that may affect the host, or are otherwise outside
- * the typical TDISP operational model.
- */
-enum pci_tsm_req_scope {
-	/**
-	 * @PCI_TSM_REQ_INFO: Read-only, without side effects, request for
-	 * typical TDISP collateral information like Device Interface Reports.
-	 * No device secrets are permitted, and no device state is changed.
-	 */
-	PCI_TSM_REQ_INFO = IOMMU_VDEVICE_TSM_REQ_PCI_INFO,
-	/**
-	 * @PCI_TSM_REQ_STATE_CHANGE: Request to change the TDISP state from
-	 * UNLOCKED->LOCKED, LOCKED->RUN, or other architecture specific state
-	 * changes to support those transitions for a TDI. No other (unrelated
-	 * to TDISP) device / host state, configuration, or data change is
-	 * permitted.
-	 */
-	PCI_TSM_REQ_STATE_CHANGE = IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE,
-	/**
-	 * @PCI_TSM_REQ_DEBUG_READ: Read-only request for debug information
-	 *
-	 * A method to facilitate TVM information retrieval outside of typical
-	 * TDISP operational requirements. No device secrets are permitted.
-	 */
-	PCI_TSM_REQ_DEBUG_READ = IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ,
-	/**
-	 * @PCI_TSM_REQ_DEBUG_WRITE: Device state changes for debug purposes
-	 *
-	 * The request may affect the operational state of the device outside of
-	 * the TDISP operational model. If allowed, requires CAP_SYS_RAW_IO, and
-	 * will taint the kernel.
-	 */
-	PCI_TSM_REQ_DEBUG_WRITE = IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE,
-};
-
 #ifdef CONFIG_PCI_TSM
 int pci_tsm_register(struct tsm_dev *tsm_dev);
 void pci_tsm_unregister(struct tsm_dev *tsm_dev);
@@ -214,7 +174,9 @@ int pci_tsm_bind(struct pci_dev *pdev, struct kvm *kvm, u32 tdi_id);
 void pci_tsm_unbind(struct pci_dev *pdev);
 void pci_tsm_tdi_constructor(struct pci_dev *pdev, struct pci_tdi *tdi,
 			     struct kvm *kvm, u32 tdi_id);
-ssize_t pci_tsm_guest_req(struct pci_dev *pdev, enum pci_tsm_req_scope scope,
+ssize_t pci_tsm_guest_req(struct pci_dev *pdev,
+			  enum iommu_vdevice_tsm_guest_req_op op,
+			  enum iommu_vdevice_tsm_guest_tvm_arch tvm_arch,
 			  sockptr_t req_in, size_t in_len, sockptr_t req_out,
 			  size_t out_len, u64 *tsm_code);
 #else
@@ -233,7 +195,8 @@ static inline void pci_tsm_unbind(struct pci_dev *pdev)
 {
 }
 static inline ssize_t pci_tsm_guest_req(struct pci_dev *pdev,
-					enum pci_tsm_req_scope scope,
+					enum iommu_vdevice_tsm_guest_req_op op,
+					enum iommu_vdevice_tsm_guest_tvm_arch tvm_arch,
 					sockptr_t req_in, size_t in_len,
 					sockptr_t req_out, size_t out_len,
 					u64 *tsm_code)
diff --git a/include/linux/tsm.h b/include/linux/tsm.h
index b83b72bbf5e3..cba0ada5f4cb 100644
--- a/include/linux/tsm.h
+++ b/include/linux/tsm.h
@@ -7,6 +7,7 @@
 #include <linux/uuid.h>
 #include <linux/device.h>
 #include <linux/sockptr.h>
+#include <uapi/linux/iommufd.h>
 
 #define TSM_REPORT_INBLOB_MAX 64
 #define TSM_REPORT_OUTBLOB_MAX SZ_16M
@@ -132,14 +133,16 @@ int tsm_unbind(struct device *dev);
 
 /**
  * struct tsm_guest_req_info - parameter for tsm_guest_req()
- * @scope: iommufd allocated scope for tsm guest request
+ * @op: operation for the guest-initiated request
+ * @tvm_arch: guest TVM architecture
  * @req: request data buffer filled by guest
  * @req_len: the size of @req filled by guest
  * @resp: response data buffer filled by host
  * @resp_len: the size of @resp buffer filled by guest
  */
 struct tsm_guest_req_info {
-	u32 scope;
+	enum iommu_vdevice_tsm_guest_req_op op;
+	enum iommu_vdevice_tsm_guest_tvm_arch tvm_arch;
 	sockptr_t req;
 	size_t req_len;
 	sockptr_t resp;
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 70c2927c18bc..0789a705bb07 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -1375,54 +1375,46 @@ struct iommu_hw_queue_alloc {
 };
 #define IOMMU_HW_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HW_QUEUE_ALLOC)
 
-/*
- * TSM request scope values are allocated by iommufd. Each device-bus transport
- * gets a range from this number space.
+/**
+ * enum iommu_vdevice_tsm_guest_tvm_arch - guest TVM architecture
+ * @IOMMU_VDEVICE_TSM_TVM_ARCH_CCA: Arm CCA TVM
+ * @IOMMU_VDEVICE_TSM_TVM_ARCH_SEV: AMD SEV TVM
+ * @IOMMU_VDEVICE_TSM_TVM_ARCH_TDX: Intel TDX TVM
  */
-#define IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE	0
+enum iommu_vdevice_tsm_guest_tvm_arch {
+	IOMMU_VDEVICE_TSM_TVM_ARCH_CCA = 1,
+	IOMMU_VDEVICE_TSM_TVM_ARCH_SEV,
+	IOMMU_VDEVICE_TSM_TVM_ARCH_TDX,
+};
 
-enum iommu_vdevice_tsm_req_scope {
-	/*
-	 * Read-only, without side effects, request for typical TDISP
-	 * collateral information like Device Interface Reports. No device
-	 * secrets are permitted, and no device state is changed.
-	 */
-	IOMMU_VDEVICE_TSM_REQ_PCI_INFO =
-		IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE,
-	/*
-	 * Request to change the TDISP state from UNLOCKED->LOCKED,
-	 * LOCKED->RUN, or other architecture specific state changes to
-	 * support those transitions for a TDI. No other device or host state,
-	 * configuration, or data change is permitted.
-	 */
-	IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE =
-		IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE + 1,
-	/*
-	 * Read-only request for debug information outside of typical TDISP
-	 * operational requirements. No device secrets are permitted.
-	 */
-	IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ =
-		IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE + 2,
-	/*
-	 * Device state changes for debug purposes. The request may affect the
-	 * operational state of the device outside of the TDISP operational
-	 * model. If allowed, this requires CAP_SYS_RAW_IO and taints the
-	 * kernel.
-	 */
-	IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE =
-		IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE + 3,
-	IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_LAST =
-		IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE,
+/**
+ * enum iommu_vdevice_tsm_guest_req_op - operation for guest TSM requests
+ * @TSM_REQ_READ_OBJECT: Read a TSM object
+ * @TSM_REQ_REGEN_OBJECT: Regenerate a TSM object
+ * @TSM_REQ_OBJECT_INFO: Read TSM object information
+ * @TSM_REQ_VALIDATE_MMIO: Validate MMIO for the TDI
+ * @TSM_REQ_SET_TDI_STATE: Set TDI state
+ * @TSM_REQ_SEV_ENABLE_DMA: Enable SEV DMA
+ * @TSM_REQ_SEV_DISABLE_DMA: Disable SEV DMA
+ */
+enum iommu_vdevice_tsm_guest_req_op {
+	TSM_REQ_READ_OBJECT = 1,
+	TSM_REQ_REGEN_OBJECT,
+	TSM_REQ_OBJECT_INFO,
+	TSM_REQ_VALIDATE_MMIO,
+	TSM_REQ_SET_TDI_STATE,
+	TSM_REQ_SEV_ENABLE_DMA,
+	TSM_REQ_SEV_DISABLE_DMA,
 };
 
 /**
  * struct iommu_vdevice_tsm_req - ioctl(IOMMU_VDEVICE_TSM_REQ)
  * @size: sizeof(struct iommu_vdevice_tsm_req)
  * @vdevice_id: vDevice ID the guest request is for
- * @scope: One of enum iommu_vdevice_tsm_req_scope
+ * @op: One of enum iommu_vdevice_tsm_guest_req_op
+ * @tvm_arch: One of enum iommu_vdevice_tsm_guest_tvm_arch
  * @req_len: Size in bytes of the input payload at @req_uptr
  * @resp_len: Size in bytes of the output buffer at @resp_uptr
- * @__reserved: Must be 0
  * @req_uptr: Userspace pointer to the guest-provided request payload
  * @resp_uptr: Userspace pointer to the guest response buffer
  * @tsm_code: TSM-specific result code returned by the TSM implementation
@@ -1431,9 +1423,9 @@ enum iommu_vdevice_tsm_req_scope {
  * guest TSM/TDISP message transport where the host kernel only marshals
  * bytes between userspace and the TSM implementation.
  *
- * Requests outside the iommufd allocated scope values are rejected. Lower
- * layers may reject scope values that are valid in the global iommufd
- * namespace, but not permitted for a specific bus.
+ * The request operation is guest initiated. Operations that may also be host
+ * initiated are handled through IOMMU_VDEVICE_TSM_OP instead. The TSM backend
+ * validates @tvm_arch against its bound TVM architecture assumptions.
  *
  * The request payload is read from @req_uptr/@req_len. If a response is
  * expected, userspace provides @resp_uptr/@resp_len as writable storage for
@@ -1445,10 +1437,10 @@ enum iommu_vdevice_tsm_req_scope {
 struct iommu_vdevice_tsm_req {
 	__u32 size;
 	__u32 vdevice_id;
-	__u32 scope;
+	__u32 op;
+	__u32 tvm_arch;
 	__u32 req_len;
 	__u32 resp_len;
-	__u32 __reserved;
 	__aligned_u64 req_uptr;
 	__aligned_u64 resp_uptr;
 	__aligned_u64 tsm_code;

^ permalink raw reply related

* Re: [PATCH 00/15] Enable TDX Module Extensions and DICE-based TDX Quoting
From: Xu Yilun @ 2026-06-02  5:36 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, yilun.xu, baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <9e6107a9-71b1-4764-96f7-2d8e68060173@intel.com>

On Mon, Jun 01, 2026 at 01:17:59PM -0700, Sohil Mehta wrote:
> 
> >>
> >> Let's say a future platform has a lot more features and needs
> >> significantly more memory. Wouldn't loading a legacy kernel with this
> >> default policy lead to excessive wastage?
> > 
> > A legacy kernel won't consume Extensions memory. The Extensions memory
> > is only required by TDX module when add-ons features are explicitly
> > configured via TDH.SYS.CONFIG [1]. 
> 
> So, the TDX module will only report memory_pool_required_pages for
> add-on features that have been configured by the kernel? This would be

Correct.

> good to clarify in the cover letter.

Will do.

> 
> > For legacy kernel, no add-on features configured so no memory
> > consumption.
> > 
> 
> I was referring to the first kernel that has support for one TDX
> extension. I am mainly trying to ensure that a kernel with support for
> one TDX extension only consumes memory for that feature (even when it is
> loaded on a hardware platform that supports multiple TDX extensions).

Yes. The first kernel that supports for one add-on feature will only
consume memory for that feature. The other HW/FW supported features
will not be configured so will not consume extra memory.

I think I should refactor the cover-letter and changelogs based on all
these comments. Thanks for all the inputs that help me see what missed.

> 
> > But yes, if the features grow rapidly out of expectation, may need new
> > options to switch something off. I think if we discuss later when the
> > need actually arises.
> > 
> 

^ permalink raw reply

* RE: [PATCH v5 05/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Aneesh Kumar K.V @ 2026-06-02  6:05 UTC (permalink / raw)
  To: Michael Kelley, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev
  Cc: Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Jason Gunthorpe, Mostafa Saleh, Petr Tesarik,
	Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, x86@kernel.org, Jiri Pirko
In-Reply-To: <SN6PR02MB415754E94A9505C2B9739E4DD4092@SN6PR02MB4157.namprd02.prod.outlook.com>

Michael Kelley <mhklinux@outlook.com> writes:

> From: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>Sent: Thursday, May 21, 2026 9:28 PM
>> 
>> Teach the atomic DMA pool code to distinguish between encrypted and
>> unencrypted pools, and make pool allocation select the matching pool based
>> on DMA attributes.
>> 
>> Introduce a dma_gen_pool wrapper that records whether a pool is
>> unencrypted, initialize that state when the atomic pools are created, and
>> use it when expanding and resizing the pools. Update dma_alloc_from_pool()
>> to take attrs and skip pools whose encrypted state does not match
>> DMA_ATTR_CC_SHARED. Update dma_free_from_pool() accordingly.
>> 
>> Also pass DMA_ATTR_CC_SHARED from the swiotlb atomic allocation path so
>> decrypted swiotlb allocations are taken from the correct atomic pool.
>> 
>> Tested-by: Jiri Pirko <jiri@nvidia.com>
>> Reviewed-by: Mostafa Saleh <smostafa@google.com>
>> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
>> ---
>>  drivers/iommu/dma-iommu.c   |   2 +-
>>  include/linux/dma-map-ops.h |   2 +-
>>  kernel/dma/direct.c         |  11 ++-
>>  kernel/dma/pool.c           | 167 +++++++++++++++++++++++-------------
>>  kernel/dma/swiotlb.c        |   7 +-
>>  5 files changed, 123 insertions(+), 66 deletions(-)
>>
>
> [snip]
>  
>> +static __init struct dma_gen_pool *__dma_atomic_pool_init(struct dma_gen_pool *dma_pool,
>> +		size_t pool_size, gfp_t gfp)
>>  {
>> -	struct gen_pool *pool;
>>  	int ret;
>> 
>> -	pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
>> -	if (!pool)
>> +	dma_pool->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
>> +	if (!dma_pool->pool)
>>  		return NULL;
>> 
>> -	gen_pool_set_algo(pool, gen_pool_first_fit_order_align, NULL);
>> +	gen_pool_set_algo(dma_pool->pool, gen_pool_first_fit_order_align, NULL);
>> +
>> +	/* if platform is using memory encryption atomic pools are by default decrypted. */
>> +	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>> +		dma_pool->unencrypted = true;
>> +	else
>> +		dma_pool->unencrypted = false;
>
> I'm curious about the name of the "unencrypted" field in struct dma_gen_pool,
> and similarly in Patch 7 of the series for the swiotlb struct io_tlb_pool and
> struct io_tlb_mem. Up through v3 of this series, you used "decrypted", but
> starting in v4 switched to "unencrypted".
>
> To me, the above "if" statement has some cognitive dissonance in that if
> CC_ATTR_MEM_ENCRYPT is false (i.e., a normal VM), "unencrypted" is set
> to false. But I think of memory in a normal VM as "unencrypted" since it
> was never encrypted. A similar "if" statement occurs in your swiotlb changes.
>
> Two related concepts are captured by the field:
> 1) Is some action needed to put the memory into the unencrypted state,
> and to remove it from that state? This applies when assigning memory to the
> pool, or freeing the memory in the pool.
> 2) Is the memory currently in the unencrypted state? This applies when
> allocating memory from the pool to a caller.
>
> It's hard to capture all that in a short field name. But I think I prefer "decrypted"
> over "unencrypted".  The former implies that some action was taken. It's a
> little easier to think of a normal VM as *not* having decrypted memory. The
> memory was never encrypted in the first place, so no decryption action was taken.
>
> Throughout the kernel, "decrypted" occurs much more frequently than
> "unencrypted".  We have set_memory_encrypted() and set_memory_decrypted()
> that are "take action" names.  But we also have force_dma_unencrypted(),
> phys_to_dma_unencrypted(), and dma_addr_unencrypted(). So it's a bit
> of a mess.
>
>
> But maybe there's more background here that led to the change
> between your v3 and v4.
>
> Michael

The current APIs, phys_to_dma_unencrypted() and dma_addr_unencrypted(),
are the reason I changed the pool attribute name from decrypted to
unencrypted. The rationale was that nobody actually decrypted the
memory; the memory was already in an unencrypted state.

In other words, the DMA pool did not contain encrypted content that was
later decrypted. Rather, the DMA pool itself was in an unencrypted
state.

IMHO, set_memory_decrypted()/set_memory_encrypted() is the right naming
because those APIs describe an operation that transitions memory between
states. In contrast, the pool attribute describes the state of the
memory itself, which is why I used unencrypted rather than decrypted.

-aneesh

^ permalink raw reply

* RE: [PATCH v5 10/20] dma-direct: make dma_direct_map_phys() honor DMA_ATTR_CC_SHARED
From: Aneesh Kumar K.V @ 2026-06-02  6:10 UTC (permalink / raw)
  To: Michael Kelley, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev
  Cc: Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Jason Gunthorpe, Mostafa Saleh, Petr Tesarik,
	Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, x86@kernel.org, Jiri Pirko
In-Reply-To: <SN6PR02MB41574064D14D4A2734222C51D40B2@SN6PR02MB4157.namprd02.prod.outlook.com>

Michael Kelley <mhklinux@outlook.com> writes:

> From: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Sent: Thursday, May 21, 2026 9:28 PM
>> 
>> Teach dma_direct_map_phys() to select the DMA address encoding based on
>> DMA_ATTR_CC_SHARED.
>> 
>> Use phys_to_dma_unencrypted() for decrypted mappings and
>> phys_to_dma_encrypted() otherwise. If a device requires unencrypted DMA
>> but the source physical address is still encrypted, force the mapping
>> through swiotlb so the DMA address and backing memory attributes remain
>> consistent.
>> 
>> Update the arm64, x86, s390 and powerpc secure-guest setup to not use
>> swiotlb force option
>> 
>> Tested-by: Jiri Pirko <jiri@nvidia.com>
>> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>

...

> With this patch removing SWIOTLB_FORCE from four places in
> kernel code, there are no remaining places where it is set.
> The test of SWIOTLB_FORCE could be removed from
> swiotlb_init_remap(), and its definition could be deleted
> from include/linux/swiotlb.h.
>

Sure, I’ll add that as a separate patch in the series.

-aneesh

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox