* Re: [PATCH 01/15] x86/virt/tdx: Read global metadata for TDX Module Extensions
From: Xu Yilun @ 2026-06-10 3:20 UTC (permalink / raw)
To: Adrian Hunter
Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
zhenzhong.duan, xiaoyao.li
In-Reply-To: <1783e7bd-3759-41f7-93e3-2f9e21264bd4@intel.com>
On Tue, Jun 09, 2026 at 04:06:50PM +0300, Adrian Hunter wrote:
> On 22/05/2026 06:41, Xu Yilun wrote:
> > Add reading of the global metadata for TDX Module Extensions.
>
> For tip, isn't the expectation to explain the context first. The
> very first patch, might be a good place to explain a bit about
> TDX Module Extensions in general.
Yes. I'm trying to add a long context for the first patch but was
suggested to move to cover-letter. I think I can add a brief
introduction at the beginning:
TDX module introduces a new concept caled "TDX module Extension" to
support long running / hard-irq preemptible flows inside. ...
^ permalink raw reply
* Re: [PATCH v4 02/47] x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15
From: Sean Christopherson @ 2026-06-09 19:28 UTC (permalink / raw)
To: Borislav Petkov
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
Kiryl Shutsemau, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov, Jan Kiszka,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
John Stultz, H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260602034916.GGah5SvARd77mkvxe3@fat_crate.local>
On Mon, Jun 01, 2026, Borislav Petkov wrote:
> On Fri, May 29, 2026 at 07:43:49AM -0700, Sean Christopherson wrote:
> > +static int cpuid_get_tsc_info(struct cpuid_tsc_info *info)
> > +{
> > + unsigned int ecx_hz, edx;
> > +
> > + memset(info, 0, sizeof(*info));
>
> Let's not clear this unnecessarily...
>
> > +
> > + if (boot_cpu_data.cpuid_level < CPUID_LEAF_TSC)
> > + return -ENOENT;
>
> ... just to return here...
>
> > +
> > + /* CPUID 15H TSC/Crystal ratio, plus optionally Crystal Hz */
> > + cpuid(CPUID_LEAF_TSC, &info->denominator, &info->numerator, &ecx_hz, &edx);
> > +
> > + if (!info->denominator || !info->numerator)
> > + return -ENOENT;
>
> ... or here.
>
> We wanna clear it here, when we'll return success.
Actually, if we take the approach of relying on the user to check the return
code, then there's no need to zero the struct since all fields will be explicitly
written, especially if we drop the "tsc_khz" field. I was zeroing the field
purely as defense in depth.
^ permalink raw reply
* Re: [PATCH v4 01/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: Thomas Gleixner @ 2026-06-09 19:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Kiryl Shutsemau, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov, Jan Kiszka,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
John Stultz, H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley
In-Reply-To: <aihKj-0nP7bUbNHH@google.com>
On Tue, Jun 09 2026 at 10:17, Sean Christopherson wrote:
> On Fri, Jun 05, 2026, Thomas Gleixner wrote:
>> On Fri, Jun 05 2026 at 11:04, Sean Christopherson wrote:
>> But we also should have a check in the TSC init code somewhere which
>> validates that X86_FEATURE_CONSTANT_TSC is set when
>> X86_FEATURE_TSC_KNOWN_FREQ is set. X86_FEATURE_TSC_KNOWN_FREQ is useless
>> w/o X86_FEATURE_CONSTANT_TSC.
>
> Ugh, any objection to punting on this for now? KVM and Xen guests will trigger
> TSC_KNOWN_FREQ without CONSTANT_TSC, thanks to commits:
>
> e10f78050323 ("kvmclock: fix TSC calibration for nested guests")
> 898ec52d2ba0 ("x86/xen/time: Set the X86_FEATURE_TSC_KNOWN_FREQ flag in xen_tsc_khz()")
>
> Hyper-V guests might as well? Hyper-V's handling of TSC is weird, even for a
> hypervisor.
Hypervisors are ranked by weirdness? I ranked them by insanity so far.
> Even when the frequency is provided in CPUID by the hypervisor, QEMU at least
> requires a fairly explicit opt-in to advertise CONSTANT_TSC, presumably to try
> to prevent users from shooting themselves in the foot.
Bah. We really should have enforced the dependency when we introduced
KNOWN_FREQ. But that ship has sailed.
Though for correctness sake this should be fixed at some point in the
foreseeable future.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH v4 01/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: Sean Christopherson @ 2026-06-09 17:17 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Paolo Bonzini, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Kiryl Shutsemau, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov, Jan Kiszka,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
John Stultz, H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley
In-Reply-To: <87a4t86a0l.ffs@fw13>
On Fri, Jun 05, 2026, Thomas Gleixner wrote:
> On Fri, Jun 05 2026 at 11:04, Sean Christopherson wrote:
> But we also should have a check in the TSC init code somewhere which
> validates that X86_FEATURE_CONSTANT_TSC is set when
> X86_FEATURE_TSC_KNOWN_FREQ is set. X86_FEATURE_TSC_KNOWN_FREQ is useless
> w/o X86_FEATURE_CONSTANT_TSC.
Ugh, any objection to punting on this for now? KVM and Xen guests will trigger
TSC_KNOWN_FREQ without CONSTANT_TSC, thanks to commits:
e10f78050323 ("kvmclock: fix TSC calibration for nested guests")
898ec52d2ba0 ("x86/xen/time: Set the X86_FEATURE_TSC_KNOWN_FREQ flag in xen_tsc_khz()")
Hyper-V guests might as well? Hyper-V's handling of TSC is weird, even for a
hypervisor.
Even when the frequency is provided in CPUID by the hypervisor, QEMU at least
requires a fairly explicit opt-in to advertise CONSTANT_TSC, presumably to try
to prevent users from shooting themselves in the foot.
^ permalink raw reply
* Re: [PATCH 03/15] x86/virt/tdx: Make TDX Module initialize Extensions
From: Adrian Hunter @ 2026-06-09 15:14 UTC (permalink / raw)
To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-4-yilun.xu@linux.intel.com>
On 22/05/2026 06:41, Xu Yilun wrote:
> +/* Initialize the TDX Module Extensions then Extension-SEAMCALLs can be used */
Reads slightly better without "the", so taking Tony's suggestion
one word less:
"Initialize TDX Module Extensions for Extension-SEAMCALLs"
> +static int tdx_ext_init(void)
> +{
> + struct tdx_module_args args = {};
> + u64 r;
> +
> + do {
> + r = seamcall(TDH_EXT_INIT, &args);
> + } while (r == TDX_INTERRUPTED_RESUMABLE);
> +
> + if (r != TDX_SUCCESS)
There seems to be TDX_PREV_FEATURES_ENABLED which is unused,
but could it turn up here?
> + return -EFAULT;
> +
> + return 0;
> +}
Otherwise:
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
^ permalink raw reply
* SVSM Development Call June 10th, 2026
From: Jörg Rödel @ 2026-06-09 14:47 UTC (permalink / raw)
To: coconut-svsm, linux-coco
Hi,
Here is the call for agenda items for this weeks SVSM development call. Please
send any agenda items you have in mind as a reply to this email or raise them
in the meeting.
We will use the LF Zoom instance. Details of the meeting can be found in our
governance repository at:
https://github.com/coconut-svsm/governance
The link to the COCONUT-SVSM calendar is:
https://zoom-lfx.platform.linuxfoundation.org/meetings/coconut-svsm?view=week
The meeting will be recorded and the recording eventually published.
Regards,
Jörg
^ permalink raw reply
* Re: [PATCH v6 00/20] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths
From: Jason Gunthorpe @ 2026-06-09 14:47 UTC (permalink / raw)
To: Catalin Marinas, Alexey Kardashevskiy
Cc: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
linux-coco, Robin Murphy, Marek Szyprowski, Will Deacon,
Marc Zyngier, Steven Price, Suzuki K Poulose, Jiri Pirko,
Mostafa Saleh, Petr Tesarik, Dan Williams, Xu Yilun, linuxppc-dev,
linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <aigYbK12D8uKQvJF@arm.com>
On Tue, Jun 09, 2026 at 02:43:08PM +0100, Catalin Marinas wrote:
> On Thu, Jun 04, 2026 at 02:09:39PM +0530, Aneesh Kumar K.V (Arm) wrote:
> > This series propagates DMA_ATTR_CC_SHARED through the dma-direct,
> > dma-pool, and swiotlb paths so that encrypted and decrypted DMA buffers
> > are handled consistently.
> >
> > Today, the direct DMA path mostly relies on force_dma_unencrypted() for
> > shared/decrypted buffer handling. This series consolidates the
> > force_dma_unencrypted() checks in the top-level functions and ensures
> > that the remaining DMA interfaces use DMA attributes to make the correct
> > decisions.
>
> Please check Sashiko's reports, it has some good points:
>
> https://sashiko.dev/#/patchset/20260604083959.1265923-1-aneesh.kumar@kernel.org
>
> I think the main one is the swiotlb_tbl_map_single() changes which break
> AMD SME host support. There cc_platform_has(CC_ATTR_MEM_ENCRYPT) is true
> but force_dma_unencrypted() is false. Normally you'd not end up on this
> path but you can have swiotlb=force.
IMHO that's an AMD issue, not with the design of this series..
The series is right, a device that is !force_dma_decrypted() must be
considerd to be a trusted device and we must never place any DMA
mappings for a trusted device into shared memory.
That AMD has done somethine insane:
bool force_dma_unencrypted(struct device *dev)
{
/*
* For SEV, all DMA must be to unencrypted addresses.
*/
if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return true;
/*
* For SME, all DMA must be to unencrypted addresses if the
* device does not support DMA to addresses that include the
* encryption mask.
*/
if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) {
u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
dev->bus_dma_limit);
if (dma_dev_mask <= dma_enc_mask)
return true;
}
Is an AMD issue. We already have an address mask limit system built
into the DMA API, arch code should not be co-opting the CC mechanism
to create a special pool for address limited devices.
The correct thing is to ensure the DMA API is checking any address
limits on the actual true dma_addr_t, not on an intermediate like a
phys_addr before it is adjusted with any C bit. Then it is a normal
low address swiotlb bounce like any other.
I think we can ignore this Sashiko remark, in real systems the use of
swiotlb for 64 bit devices is very rare. Though it would be good to
remove this code from AMD...
Jason
^ permalink raw reply
* Re: [PATCH v6 04/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Jason Gunthorpe @ 2026-06-09 14:32 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Jiri Pirko,
Michael Kelley
In-Reply-To: <20260604083959.1265923-5-aneesh.kumar@kernel.org>
On Thu, Jun 04, 2026 at 02:09:43PM +0530, Aneesh Kumar K.V (Arm) wrote:
> struct page *dma_alloc_from_pool(struct device *dev, size_t size,
> - void **cpu_addr, gfp_t gfp,
> + void **cpu_addr, gfp_t gfp, unsigned long attrs,
> bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t))
> {
> - struct gen_pool *pool = NULL;
> + struct dma_gen_pool *dma_pool = NULL;
> struct page *page;
> bool pool_found = false;
>
> - while ((pool = dma_guess_pool(pool, gfp))) {
> + while ((dma_pool = dma_guess_pool(dma_pool, gfp))) {
> +
> + if (dma_pool->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> + continue;
I don't think you should be overloading DMA_ATTR_CC_SHARED like this.
/*
* DMA_ATTR_CC_SHARED is not a caller-visible dma_alloc_*()
* attribute. The direct allocator uses it internally after it has
* decided that the backing pages must be shared/decrypted, so the
* rest of the allocation path can consistently select DMA addresses,
* choose compatible pools and restore encryption on free.
*/
if (attrs & DMA_ATTR_CC_SHARED)
return NULL;
if (force_dma_unencrypted(dev)) {
attrs |= DMA_ATTR_CC_SHARED;
mark_mem_decrypt = true;
}
It is fine to have a bit inside the attrs that is only used by the
internal logic, but it needs to have a clearer name
__DMA_ATTR_REQUIRE_CC_SHARED perhaps.
The sashiko note does look legit though:
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent) {
page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
gfp, attrs, NULL);
if (!page)
return NULL;
I don't see anything doing the force_dma_unencrypted test along this
callchain..
I guess it should be done one step up in dma_alloc_attrs() instead of
in dma_direct_alloc()?
Jason
^ permalink raw reply
* Re: [PATCH v6 14/20] dma-direct: return struct page from dma_direct_alloc_from_pool()
From: Jason Gunthorpe @ 2026-06-09 14:15 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, stable, Michael Kelley
In-Reply-To: <20260604083959.1265923-15-aneesh.kumar@kernel.org>
On Thu, Jun 04, 2026 at 02:09:53PM +0530, Aneesh Kumar K.V (Arm) wrote:
> @@ -270,9 +270,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> * the atomic pools instead if we aren't allowed block.
> */
> if ((remap || (attrs & DMA_ATTR_CC_SHARED)) &&
> - dma_direct_use_pool(dev, gfp))
> - return dma_direct_alloc_from_pool(dev, size, dma_handle,
> - gfp, attrs);
> + dma_direct_use_pool(dev, gfp)) {
> + page = dma_direct_alloc_from_pool(dev, size,
> + dma_handle, &cpu_addr,
> + gfp, attrs);
> + return page ? cpu_addr : NULL;
> + }
You should probably put this at the start of the series so it can be
backported
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
To Petr's question I think this just shows nobody is really stressing
the PCI dma paths on CC VMs today.
if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
For instance the places even calling dma_alloc_pages() don't look like
things people would use in a CC VM.
Jason
^ permalink raw reply
* Re: [PATCH v6 14/20] dma-direct: return struct page from dma_direct_alloc_from_pool()
From: Catalin Marinas @ 2026-06-09 13:45 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, stable, Michael Kelley
In-Reply-To: <20260604083959.1265923-15-aneesh.kumar@kernel.org>
On Thu, Jun 04, 2026 at 02:09:53PM +0530, Aneesh Kumar K.V (Arm) wrote:
> Commit 5b138c534fda ("dma-direct: factor out a dma_direct_alloc_from_pool
> helper") changed dma_direct_alloc_from_pool() to return the CPU address
> from dma_alloc_from_pool(). That fits dma_direct_alloc(), but
> dma_direct_alloc_pages() also uses the helper and expects a struct page *.
>
> Fix this by making dma_direct_alloc_from_pool() return the struct page *
> again, and pass the CPU address back through an out-parameter for the
> dma_direct_alloc() caller.
>
> Fixes: 5b138c534fda ("dma-direct: factor out a dma_direct_alloc_from_pool helper")
> Cc: stable@vger.kernel.org
>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Nit: remove the empty line after Cc: stable. It may confuse tooling.
--
Catalin
^ permalink raw reply
* Re: [PATCH v6 20/20] swiotlb: remove unused SWIOTLB_FORCE flag
From: Petr Tesarik @ 2026-06-09 13:44 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260604083959.1265923-21-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:59 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> SWIOTLB_FORCE has no remaining in-tree users. Forced bouncing is now
> controlled through the swiotlb=force command line option via
> swiotlb_force_bounce.
>
> Remove the unused flag and simplify the force_bounce initialization.
>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
> include/linux/swiotlb.h | 1 -
> kernel/dma/swiotlb.c | 3 +--
> 2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 526f82e9da45..af88ca7182f4 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -15,7 +15,6 @@ struct page;
> struct scatterlist;
>
> #define SWIOTLB_VERBOSE (1 << 0) /* verbose initialization */
> -#define SWIOTLB_FORCE (1 << 1) /* force bounce buffering */
> #define SWIOTLB_ANY (1 << 2) /* allow any memory for the buffer */
These constants are kernel-internal, so let's not leave a hole in the
bitmask... I mean, what about changing SWIOTLB_ANY to (1 << 1) after
you remove SWIOTLB_FORCE?
Other than that, LGTM.
I consider this whole series a big step towards saner handling of
encrypted/decrypted memory for DMA buffers. Thank you for your effort!
Petr T
>
> /*
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index e4bd8c9eaeda..81cc4928e949 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -400,8 +400,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
> if (swiotlb_force_disable)
> return;
>
> - io_tlb_default_mem.force_bounce =
> - swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
> + io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
>
> #ifdef CONFIG_SWIOTLB_DYNAMIC
> if (!remap)
^ permalink raw reply
* Re: [PATCH v6 01/20] s390: Expose protected virtualization through cc_platform_has()
From: Catalin Marinas @ 2026-06-09 13:44 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Halil Pasic,
Matthew Rosato, Jaehoon Kim
In-Reply-To: <20260604083959.1265923-2-aneesh.kumar@kernel.org>
On Thu, Jun 04, 2026 at 02:09:40PM +0530, Aneesh Kumar K.V (Arm) wrote:
> Protected virtualization guests use memory encryption, so advertise that to
> the rest of the kernel through cc_platform_has(CC_ATTR_MEM_ENCRYPT).
>
> s390 already forces DMA mappings to be unencrypted for protected
> virtualization guests through force_dma_unencrypted(). Add
> ARCH_HAS_CC_PLATFORM and provide the matching cc_platform_has()
> implementation
>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
Nit: just drop the --- line if you did intend to cc those people.
Nothing wrong for them to end up in the commit log (proof that they've
been cc'ed if they did not reply ;)).
--
Catalin
^ permalink raw reply
* Re: [PATCH v6 00/20] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths
From: Catalin Marinas @ 2026-06-09 13:43 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260604083959.1265923-1-aneesh.kumar@kernel.org>
On Thu, Jun 04, 2026 at 02:09:39PM +0530, Aneesh Kumar K.V (Arm) wrote:
> This series propagates DMA_ATTR_CC_SHARED through the dma-direct,
> dma-pool, and swiotlb paths so that encrypted and decrypted DMA buffers
> are handled consistently.
>
> Today, the direct DMA path mostly relies on force_dma_unencrypted() for
> shared/decrypted buffer handling. This series consolidates the
> force_dma_unencrypted() checks in the top-level functions and ensures
> that the remaining DMA interfaces use DMA attributes to make the correct
> decisions.
Please check Sashiko's reports, it has some good points:
https://sashiko.dev/#/patchset/20260604083959.1265923-1-aneesh.kumar@kernel.org
I think the main one is the swiotlb_tbl_map_single() changes which break
AMD SME host support. There cc_platform_has(CC_ATTR_MEM_ENCRYPT) is true
but force_dma_unencrypted() is false. Normally you'd not end up on this
path but you can have swiotlb=force.
> Aneesh Kumar K.V (Arm) (20):
> s390: Expose protected virtualization through cc_platform_has()
> dma-direct: swiotlb: handle swiotlb alloc/free outside
> __dma_direct_alloc_pages
> dma-direct: use DMA_ATTR_CC_SHARED in alloc/free paths
> dma-pool: track decrypted atomic pools and select them via attrs
> dma: swiotlb: pass mapping attributes by reference
> dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
> dma-mapping: make dma_pgprot() honor DMA_ATTR_CC_SHARED
> dma-direct: pass attrs to dma_capable() for DMA_ATTR_CC_SHARED checks
> dma-direct: make dma_direct_map_phys() honor DMA_ATTR_CC_SHARED
> dma-direct: set decrypted flag for remapped DMA allocations
Patch 10 above...
> dma-direct: select DMA address encoding from DMA_ATTR_CC_SHARED
> dma-pool: fix page leak in atomic_pool_expand() cleanup
Patch 12...
> dma-direct: rename ret to cpu_addr in alloc helpers
> dma-direct: return struct page from dma_direct_alloc_from_pool()
> iommu/dma: Check atomic pool allocation result directly
and I think patches 14, 15 are independent fixes. Some of them even have
Fixes: tags and Cc: stable. Please move them to the beginning of the
series to avoid inadvertent dependencies and make them harder to
backport. It's also easier to follow the series without random fixes for
mainline in the middle.
> dma: swiotlb: free dynamic pools from process context
> dma: swiotlb: handle set_memory_decrypted() failures
> dma: free atomic pool pages by physical address
> swiotlb: Preserve allocation virtual address for dynamic pools
> swiotlb: remove unused SWIOTLB_FORCE flag
--
Catalin
^ permalink raw reply
* Re: [PATCH v6 19/20] swiotlb: Preserve allocation virtual address for dynamic pools
From: Petr Tesarik @ 2026-06-09 13:40 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-20-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:58 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> swiotlb_alloc_tlb() can allocate from the DMA atomic pool when a decrypted
> pool is needed from atomic context. With CONFIG_DMA_DIRECT_REMAP, the
> atomic pool is backed by remapped virtual addresses, which are not the same
> as the direct-map addresses returned by phys_to_virt().
>
> swiotlb_init_io_tlb_pool() currently reconstructs the pool virtual address
> from the physical start address. For atomic-pool backed allocations this
> stores the wrong address in pool->vaddr. Later, swiotlb_free_tlb() passes
> that address to dma_free_from_pool(), which will fail to recognize the
> chunk
>
> Pass the virtual address returned by the allocation path into
> swiotlb_init_io_tlb_pool(), and store that address in pool->vaddr. This
> keeps the pool free path using the same virtual address as the allocator.
>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Hm, so the old code was broken; you may want to add:
Fixes: 79636caad361 ("swiotlb: if swiotlb is full, fall back to a transient memory pool")
And of course:
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Thank you!
Petr T
> ---
> kernel/dma/swiotlb.c | 32 +++++++++++++++++++-------------
> 1 file changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 14d834ca298b..e4bd8c9eaeda 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -302,9 +302,9 @@ void __init swiotlb_update_mem_attributes(void)
> }
>
> static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> - unsigned long nslabs, bool late_alloc, unsigned int nareas)
> + void *vaddr, unsigned long nslabs, bool late_alloc,
> + unsigned int nareas)
> {
> - void *vaddr = phys_to_virt(start);
> unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>
> mem->nslabs = nslabs;
> @@ -445,7 +445,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
> return;
> }
>
> - swiotlb_init_io_tlb_pool(mem, __pa(tlb), nslabs, false, nareas);
> + swiotlb_init_io_tlb_pool(mem, __pa(tlb), tlb, nslabs, false, nareas);
> add_mem_pool(&io_tlb_default_mem, mem);
>
> if (flags & SWIOTLB_VERBOSE)
> @@ -553,7 +553,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> }
> }
>
> - swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
> + swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), vstart, nslabs, true,
> nareas);
> add_mem_pool(&io_tlb_default_mem, mem);
>
> @@ -664,25 +664,26 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes,
> * @phys_limit: Maximum allowed physical address of the buffer.
> * @attrs: DMA attributes for the allocation.
> * @gfp: GFP flags for the allocation.
> + * @vaddr: Receives the virtual address for the allocated buffer.
> *
> * Return: Allocated pages, or %NULL on allocation failure.
> */
> static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> - u64 phys_limit, unsigned long attrs, gfp_t gfp)
> + u64 phys_limit, unsigned long attrs, gfp_t gfp, void **vaddr)
> {
> struct page *page;
>
> + *vaddr = NULL;
> +
> /*
> * Allocate from the atomic pools if memory is encrypted and
> * the allocation is atomic, because decrypting may block.
> */
> if (!gfpflags_allow_blocking(gfp) && (attrs & DMA_ATTR_CC_SHARED)) {
> - void *vaddr;
> -
> if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
> return NULL;
>
> - return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
> + return dma_alloc_from_pool(dev, bytes, vaddr, gfp,
> attrs, dma_coherent_ok);
> }
>
> @@ -705,6 +706,8 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> return NULL;
> }
>
> + if (page)
> + *vaddr = phys_to_virt(page_to_phys(page));
> return page;
> }
>
> @@ -750,6 +753,7 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> {
> struct io_tlb_pool *pool;
> unsigned int slot_order;
> + void *tlb_vaddr;
> struct page *tlb;
> size_t pool_size;
> size_t tlb_size;
> @@ -767,7 +771,8 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> pool->unencrypted = !!(attrs & DMA_ATTR_CC_SHARED);
>
> tlb_size = nslabs << IO_TLB_SHIFT;
> - while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp))) {
> + while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp,
> + &tlb_vaddr))) {
> if (nslabs <= minslabs)
> goto error_tlb;
> nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
> @@ -781,12 +786,12 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> if (!pool->slots)
> goto error_slots;
>
> - swiotlb_init_io_tlb_pool(pool, page_to_phys(tlb), nslabs, true, nareas);
> + swiotlb_init_io_tlb_pool(pool, page_to_phys(tlb), tlb_vaddr, nslabs,
> + true, nareas);
> return pool;
>
> error_slots:
> - swiotlb_free_tlb(page_address(tlb), tlb_size,
> - !!(attrs & DMA_ATTR_CC_SHARED));
> + swiotlb_free_tlb(tlb_vaddr, tlb_size, !!(attrs & DMA_ATTR_CC_SHARED));
> error_tlb:
> kfree(pool);
> error:
> @@ -1995,7 +2000,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> mem->unencrypted = false;
> }
>
> - swiotlb_init_io_tlb_pool(pool, rmem->base, nslabs,
> + swiotlb_init_io_tlb_pool(pool, rmem->base, phys_to_virt(rmem->base),
> + nslabs,
> false, nareas);
> mem->force_bounce = true;
> mem->for_alloc = true;
^ permalink raw reply
* Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Adrian Hunter @ 2026-06-09 13:38 UTC (permalink / raw)
To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-3-yilun.xu@linux.intel.com>
On 22/05/2026 06:41, Xu Yilun wrote:
> +static int tdx_ext_mem_add(struct page *root, unsigned int nr_pages)
> +{
> + struct tdx_module_args args = {
> + .rcx = to_hpa_list_info(root, nr_pages),
> + };
> + u64 r;
> +
> + tdx_clflush_hpa_list(root, nr_pages);
> +
> + do {
> + /*
> + * TDH_EXT_MEM_ADD is designed to use output parameter RCX to
> + * override/update input parameter RCX, so the caller doesn't
> + * have to do manual parameter update on retry call.
> + */
> + r = seamcall_ret(TDH_EXT_MEM_ADD, &args);
> + } while (r == TDX_INTERRUPTED_RESUMABLE);
Kishon already mentioned checking only the status
> +
> + if (r != TDX_SUCCESS)
Similarly could this also be TDX_EXT_MEMORY_POOL_FULL?
> + return -EFAULT;
> +
> + return 0;
> +}
^ permalink raw reply
* Re: [PATCH v6 17/20] dma: swiotlb: handle set_memory_decrypted() failures
From: Petr Tesarik @ 2026-06-09 13:32 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-18-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:56 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> Check the return value when converting swiotlb pools between encrypted and
> decrypted mappings. If the default pool cannot be decrypted after early
> initialization, mark the pool fully used so it cannot satisfy future bounce
> allocations.
>
> For late initialization, return the `set_memory_decrypted()` failure. For
> restricted DMA pools, fail device initialization if the reserved pool
> cannot be decrypted.
>
> This prevents swiotlb from using pools whose encryption attributes do not
> match their metadata, and avoids returning pages with uncertain encryption
> state back to the allocator.
This works fine, but instead of effectively leaking the memory, we
could return it to the buddy allocator and reset nslabs to zero as if
SWIOTLB was not even initialized.
OTOH I don't want to overthink this, because the system is probably not
too useful after such a boot-time failure, so unless you _want_ to
improve the error path, you can simply add:
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Petr T
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
> kernel/dma/swiotlb.c | 80 +++++++++++++++++++++++++++++++++++---------
> 1 file changed, 65 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 4c56f64602ea..14d834ca298b 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -248,6 +248,23 @@ static inline unsigned long nr_slots(u64 val)
> return DIV_ROUND_UP(val, IO_TLB_SIZE);
> }
>
> +static void swiotlb_mark_pool_used(struct io_tlb_pool *pool)
> +{
> + unsigned long i;
> +
> + for (i = 0; i < pool->nareas; i++) {
> + pool->areas[i].index = 0;
> + pool->areas[i].used = pool->area_nslabs;
> + }
> +
> + for (i = 0; i < pool->nslabs; i++) {
> + pool->slots[i].list = 0;
> + pool->slots[i].orig_addr = INVALID_PHYS_ADDR;
> + pool->slots[i].alloc_size = 0;
> + pool->slots[i].pad_slots = 0;
> + }
> +}
> +
> /*
> * Early SWIOTLB allocation may be too early to allow an architecture to
> * perform the desired operations. This function allows the architecture to
> @@ -272,8 +289,16 @@ void __init swiotlb_update_mem_attributes(void)
> return;
> bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
>
> - if (io_tlb_default_mem.unencrypted)
> - set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
> + if (io_tlb_default_mem.unencrypted) {
> + int ret;
> +
> + ret = set_memory_decrypted((unsigned long)mem->vaddr,
> + bytes >> PAGE_SHIFT);
> + if (ret) {
> + pr_warn("Failed to decrypt default memory pool, disabling it\n");
> + swiotlb_mark_pool_used(mem);
> + }
> + }
> }
>
> static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> @@ -442,9 +467,10 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> {
> struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
> unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
> + unsigned int order, area_order, slot_order;
> + bool leak_pages = false;
> unsigned int nareas;
> unsigned char *vstart = NULL;
> - unsigned int order, area_order;
> bool retried = false;
> int rc = 0;
>
> @@ -504,6 +530,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> (PAGE_SIZE << order) >> 20);
> }
>
> + rc = -ENOMEM;
> nareas = limit_nareas(default_nareas, nslabs);
> area_order = get_order(array_size(sizeof(*mem->areas), nareas));
> mem->areas = (struct io_tlb_area *)
> @@ -511,14 +538,20 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> if (!mem->areas)
> goto error_area;
>
> + slot_order = get_order(array_size(sizeof(*mem->slots), nslabs));
> mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> - get_order(array_size(sizeof(*mem->slots), nslabs)));
> + slot_order);
> if (!mem->slots)
> goto error_slots;
>
> - if (io_tlb_default_mem.unencrypted)
> - set_memory_decrypted((unsigned long)vstart,
> - (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> + if (io_tlb_default_mem.unencrypted) {
> + rc = set_memory_decrypted((unsigned long)vstart,
> + (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> + if (rc) {
> + leak_pages = true;
> + goto error_decrypt;
> + }
> + }
>
> swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
> nareas);
> @@ -527,16 +560,20 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> swiotlb_print_info();
> return 0;
>
> +error_decrypt:
> + free_pages((unsigned long)mem->slots, slot_order);
> error_slots:
> free_pages((unsigned long)mem->areas, area_order);
> error_area:
> - free_pages((unsigned long)vstart, order);
> - return -ENOMEM;
> + if (!leak_pages)
> + free_pages((unsigned long)vstart, order);
> + return rc;
> }
>
> void __init swiotlb_exit(void)
> {
> struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
> + bool leak_pages = false;
> unsigned long tbl_vaddr;
> size_t tbl_size, slots_size;
> unsigned int area_order;
> @@ -552,19 +589,23 @@ void __init swiotlb_exit(void)
> tbl_size = PAGE_ALIGN(mem->end - mem->start);
> slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
>
> - if (io_tlb_default_mem.unencrypted)
> - set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> + if (io_tlb_default_mem.unencrypted) {
> + if (set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT))
> + leak_pages = true;
> + }
>
> if (mem->late_alloc) {
> area_order = get_order(array_size(sizeof(*mem->areas),
> mem->nareas));
> free_pages((unsigned long)mem->areas, area_order);
> - free_pages(tbl_vaddr, get_order(tbl_size));
> + if (!leak_pages)
> + free_pages(tbl_vaddr, get_order(tbl_size));
> free_pages((unsigned long)mem->slots, get_order(slots_size));
> } else {
> memblock_free(mem->areas,
> array_size(sizeof(*mem->areas), mem->nareas));
> - memblock_phys_free(mem->start, tbl_size);
> + if (!leak_pages)
> + memblock_phys_free(mem->start, tbl_size);
> memblock_free(mem->slots, slots_size);
> }
>
> @@ -1938,9 +1979,18 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> * restricted mem pool is decrypted by default
> */
> if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
> + int ret;
> +
> mem->unencrypted = true;
> - set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
> - rmem->size >> PAGE_SHIFT);
> + ret = set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
> + rmem->size >> PAGE_SHIFT);
> + if (ret) {
> + dev_err(dev, "Failed to decrypt restricted DMA pool\n");
> + kfree(pool->areas);
> + kfree(pool->slots);
> + kfree(mem);
> + return ret;
> + }
> } else {
> mem->unencrypted = false;
> }
^ permalink raw reply
* Re: [PATCH v6 15/20] iommu/dma: Check atomic pool allocation result directly
From: Petr Tesarik @ 2026-06-09 13:13 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-16-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:54 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> The non-blocking, non-coherent allocation path uses dma_alloc_from_pool(),
> which returns the allocated page and fills cpu_addr only on success.
>
> Do not rely on cpu_addr to detect allocation failure in this path. Check
> the returned page directly before using it for the IOMMU mapping.
>
> Fixes: 9420139f516d ("dma-pool: fix coherent pool allocations for IOMMU mappings")
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Petr T
> ---
> drivers/iommu/dma-iommu.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 725c7adb0a8d..52c599f4472c 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1671,13 +1671,16 @@ void *iommu_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> }
>
> if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> - !gfpflags_allow_blocking(gfp) && !coherent)
> + !gfpflags_allow_blocking(gfp) && !coherent) {
> page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
> gfp, attrs, NULL);
> - else
> + if (!page)
> + return NULL;
> + } else {
> cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
> - if (!cpu_addr)
> - return NULL;
> + if (!cpu_addr)
> + return NULL;
> + }
>
> *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
> dev->coherent_dma_mask);
^ permalink raw reply
* Re: [PATCH v6 16/20] dma: swiotlb: free dynamic pools from process context
From: Petr Tesarik @ 2026-06-09 13:23 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-17-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:55 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> swiotlb_dyn_free() is used after removing a dynamic swiotlb pool from
> RCU-protected lists. It can call swiotlb_free_tlb(), which may need to
> restore the encryption state of an unencrypted pool with
> set_memory_encrypted() before freeing the pages.
>
> RCU callbacks run in atomic context, but set_memory_encrypted() is not
> guaranteed to be atomic-safe on all architectures. For example, page
> attribute updates may allocate page tables or take sleeping locks.
Good catch!
> Use queue_rcu_work() for dynamic pool freeing instead. This keeps the RCU
> grace period before freeing a published pool, while running the actual pool
> teardown from workqueue context. Use the same helper for the transient-pool
> error path, since that path may also be reached from atomic DMA mapping
> context.
Strictly speaking, it's not necessary, because this is in the error
path just after allocating a transient pool. There are only two
possible scenarios:
a. The transient buffer was allocated from a sleeping context, and then
it's also OK to decrypt memory.
b. The transient buffer was allocated in atomic context, but then it was
allocated from a coherent pool and it is returned to that pool
rather than decrypted.
However, it's also fine to queue an RCU work. The logic is definitely
cleaner and easier to maintain.
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Petr T
> ---
> include/linux/swiotlb.h | 4 ++--
> kernel/dma/swiotlb.c | 19 +++++++++++--------
> 2 files changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 4dcbf3931be1..526f82e9da45 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -64,7 +64,7 @@ extern void __init swiotlb_update_mem_attributes(void);
> * @areas: Array of memory area descriptors.
> * @slots: Array of slot descriptors.
> * @node: Member of the IO TLB memory pool list.
> - * @rcu: RCU head for swiotlb_dyn_free().
> + * @dyn_free: RCU work item used to free the pool from process context.
> * @transient: %true if transient memory pool.
> */
> struct io_tlb_pool {
> @@ -79,7 +79,7 @@ struct io_tlb_pool {
> struct io_tlb_slot *slots;
> #ifdef CONFIG_SWIOTLB_DYNAMIC
> struct list_head node;
> - struct rcu_head rcu;
> + struct rcu_work dyn_free;
> bool transient;
> bool unencrypted;
> #endif
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index f4e8b241a1c4..4c56f64602ea 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -774,13 +774,10 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
> add_mem_pool(mem, pool);
> }
>
> -/**
> - * swiotlb_dyn_free() - RCU callback to free a memory pool
> - * @rcu: RCU head in the corresponding struct io_tlb_pool.
> - */
> -static void swiotlb_dyn_free(struct rcu_head *rcu)
> +static void swiotlb_dyn_free_work(struct work_struct *work)
> {
> - struct io_tlb_pool *pool = container_of(rcu, struct io_tlb_pool, rcu);
> + struct io_tlb_pool *pool =
> + container_of(to_rcu_work(work), struct io_tlb_pool, dyn_free);
> size_t slots_size = array_size(sizeof(*pool->slots), pool->nslabs);
> size_t tlb_size = pool->end - pool->start;
>
> @@ -789,6 +786,12 @@ static void swiotlb_dyn_free(struct rcu_head *rcu)
> kfree(pool);
> }
>
> +static void swiotlb_schedule_dyn_free(struct io_tlb_pool *pool)
> +{
> + INIT_RCU_WORK(&pool->dyn_free, swiotlb_dyn_free_work);
> + queue_rcu_work(system_wq, &pool->dyn_free);
> +}
> +
> /**
> * __swiotlb_find_pool() - find the IO TLB pool for a physical address
> * @dev: Device which has mapped the DMA buffer.
> @@ -835,7 +838,7 @@ static void swiotlb_del_pool(struct device *dev, struct io_tlb_pool *pool)
> list_del_rcu(&pool->node);
> spin_unlock_irqrestore(&dev->dma_io_tlb_lock, flags);
>
> - call_rcu(&pool->rcu, swiotlb_dyn_free);
> + swiotlb_schedule_dyn_free(pool);
> }
>
> #endif /* CONFIG_SWIOTLB_DYNAMIC */
> @@ -1276,7 +1279,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
> index = swiotlb_search_pool_area(dev, pool, 0, orig_addr, tbl_dma_addr,
> alloc_size, alloc_align_mask);
> if (index < 0) {
> - swiotlb_dyn_free(&pool->rcu);
> + swiotlb_schedule_dyn_free(pool);
> return -1;
> }
>
^ permalink raw reply
* Re: [PATCH v6 14/20] dma-direct: return struct page from dma_direct_alloc_from_pool()
From: Petr Tesarik @ 2026-06-09 13:12 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, stable, Michael Kelley
In-Reply-To: <20260604083959.1265923-15-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:53 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> Commit 5b138c534fda ("dma-direct: factor out a dma_direct_alloc_from_pool
> helper") changed dma_direct_alloc_from_pool() to return the CPU address
> from dma_alloc_from_pool(). That fits dma_direct_alloc(), but
> dma_direct_alloc_pages() also uses the helper and expects a struct page *.
>
> Fix this by making dma_direct_alloc_from_pool() return the struct page *
> again, and pass the CPU address back through an out-parameter for the
> dma_direct_alloc() caller.
>
> Fixes: 5b138c534fda ("dma-direct: factor out a dma_direct_alloc_from_pool helper")
> Cc: stable@vger.kernel.org
While I totally agree with the reasoning and the fix, it's interesting
that this bug has been apparently present in the kernel for 5+ years
without anybody hitting nasty memory corruption bugs.
How can it be? Is the buggy code path never actually used in practice?
Does it hint at a missed opportunity to simplify the code?
Anyway, these these thoughts are intended for a possible future
cleanup. For now, let's apply the fix as is, of course.
Petr T
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
> kernel/dma/direct.c | 21 ++++++++++++---------
> 1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4e446aa4130e..e0ab9ff3f1d6 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -157,24 +157,24 @@ static bool dma_direct_use_pool(struct device *dev, gfp_t gfp)
> return !gfpflags_allow_blocking(gfp) && !is_swiotlb_for_alloc(dev);
> }
>
> -static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
> - dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
> +static struct page *dma_direct_alloc_from_pool(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, void **cpu_addr, gfp_t gfp,
> + unsigned long attrs)
> {
> struct page *page;
> u64 phys_limit;
> - void *ret;
>
> if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_DMA_COHERENT_POOL)))
> return NULL;
>
> gfp |= dma_direct_optimal_gfp_mask(dev, &phys_limit);
> - page = dma_alloc_from_pool(dev, size, &ret, gfp, attrs,
> + page = dma_alloc_from_pool(dev, size, cpu_addr, gfp, attrs,
> dma_coherent_ok);
> if (!page)
> return NULL;
> *dma_handle = phys_to_dma_direct(dev, page_to_phys(page),
> !!(attrs & DMA_ATTR_CC_SHARED));
> - return ret;
> + return page;
> }
>
> static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
> @@ -270,9 +270,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> * the atomic pools instead if we aren't allowed block.
> */
> if ((remap || (attrs & DMA_ATTR_CC_SHARED)) &&
> - dma_direct_use_pool(dev, gfp))
> - return dma_direct_alloc_from_pool(dev, size, dma_handle,
> - gfp, attrs);
> + dma_direct_use_pool(dev, gfp)) {
> + page = dma_direct_alloc_from_pool(dev, size,
> + dma_handle, &cpu_addr,
> + gfp, attrs);
> + return page ? cpu_addr : NULL;
> + }
>
> if (is_swiotlb_for_alloc(dev)) {
> page = dma_direct_alloc_swiotlb(dev, size, attrs);
> @@ -445,7 +448,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
>
> if ((attrs & DMA_ATTR_CC_SHARED) && dma_direct_use_pool(dev, gfp))
> return dma_direct_alloc_from_pool(dev, size, dma_handle,
> - gfp, attrs);
> + &cpu_addr, gfp, attrs);
>
> if (is_swiotlb_for_alloc(dev)) {
> page = dma_direct_alloc_swiotlb(dev, size, attrs);
^ permalink raw reply
* Re: [PATCH 01/15] x86/virt/tdx: Read global metadata for TDX Module Extensions
From: Adrian Hunter @ 2026-06-09 13:06 UTC (permalink / raw)
To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-2-yilun.xu@linux.intel.com>
On 22/05/2026 06:41, Xu Yilun wrote:
> Add reading of the global metadata for TDX Module Extensions.
For tip, isn't the expectation to explain the context first. The
very first patch, might be a good place to explain a bit about
TDX Module Extensions in general.
>
> TDX Module Extensions is an add-on feature enumerated by TDX_FEATURES0.
> But for the Module's integrity, Linux requires that all features that a
> Module advertises must have a complete, valid set of metadata, and the
> validation must succeed at core TDX initialization time.
>
> Check TDX_FEATURES0 before reading these metadata. If a feature is
> advertised, a failure in reading associated metadata causes the entire
> TDX initialization to fail, otherwise skip.
>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
> arch/x86/include/asm/tdx_global_metadata.h | 6 ++++++
> arch/x86/virt/vmx/tdx/tdx.h | 1 +
> arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 16 ++++++++++++++++
> 3 files changed, 23 insertions(+)
>
> diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/asm/tdx_global_metadata.h
> index 40689c8dc67e..533afe50a3f1 100644
> --- a/arch/x86/include/asm/tdx_global_metadata.h
> +++ b/arch/x86/include/asm/tdx_global_metadata.h
> @@ -40,12 +40,18 @@ struct tdx_sys_info_td_conf {
> u64 cpuid_config_values[128][2];
> };
>
> +struct tdx_sys_info_ext {
> + u16 memory_pool_required_pages;
> + u8 ext_required;
> +};
> +
> struct tdx_sys_info {
> struct tdx_sys_info_version version;
> struct tdx_sys_info_features features;
> struct tdx_sys_info_tdmr tdmr;
> struct tdx_sys_info_td_ctrl td_ctrl;
> struct tdx_sys_info_td_conf td_conf;
> + struct tdx_sys_info_ext ext;
> };
>
> #endif
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index e2cf2dd48755..a5eec8e3cc71 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -87,6 +87,7 @@ struct tdmr_info {
>
> /* Bit definitions of TDX_FEATURES0 metadata field */
> #define TDX_FEATURES0_NO_RBP_MOD BIT(18)
> +#define TDX_FEATURES0_EXT BIT_ULL(39)
>
> /*
> * Do not put any hardware-defined TDX structure representations below
> diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> index c7db393a9cfb..3d3b56ef3d2f 100644
> --- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> +++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> @@ -100,6 +100,19 @@ static __init int get_tdx_sys_info_td_conf(struct tdx_sys_info_td_conf *sysinfo_
> return ret;
> }
>
> +static __init int get_tdx_sys_info_ext(struct tdx_sys_info_ext *sysinfo_ext)
> +{
> + int ret = 0;
> + u64 val;
> +
> + if (!ret && !(ret = read_sys_metadata_field(0x3100000100000000, &val)))
> + sysinfo_ext->memory_pool_required_pages = val;
> + if (!ret && !(ret = read_sys_metadata_field(0x3100000000000001, &val)))
> + sysinfo_ext->ext_required = val;
> +
> + return ret;
> +}
> +
> static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
> {
> int ret = 0;
> @@ -116,5 +129,8 @@ static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
> ret = ret ?: get_tdx_sys_info_td_ctrl(&sysinfo->td_ctrl);
> ret = ret ?: get_tdx_sys_info_td_conf(&sysinfo->td_conf);
>
> + if (sysinfo->features.tdx_features0 & TDX_FEATURES0_EXT)
> + ret = ret ?: get_tdx_sys_info_ext(&sysinfo->ext);
> +
> return ret;
> }
^ permalink raw reply
* Re: [PATCH v6 13/20] dma-direct: rename ret to cpu_addr in alloc helpers
From: Petr Tesarik @ 2026-06-09 12:54 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-14-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:52 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> ret in dma_direct_alloc() and dma_direct_alloc_pages() holds the returned
> CPU mapping, not a generic return value. Rename it to cpu_addr and update
> the remaining uses to match.
>
> This makes the allocation paths easier to follow and keeps the local naming
> consistent with what the variable actually represents.
>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
I wondered if cpu_addr is descriptive enough (a CPU address could
theoretically be virtual or physical), but I can see that a few other
places already use cpu_addr to hold virtual addresses, so yeah, let's
keep this name.
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Petr T
> ---
> kernel/dma/direct.c | 31 +++++++++++++++----------------
> 1 file changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index aa3489aa10a0..4e446aa4130e 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -204,7 +204,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> bool mark_mem_decrypt = false;
> bool allow_highmem = true;
> struct page *page;
> - void *ret;
> + void *cpu_addr;
>
> /*
> * DMA_ATTR_CC_SHARED is not a caller-visible dma_alloc_*()
> @@ -318,34 +318,33 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> arch_dma_prep_coherent(page, size);
>
> /* create a coherent mapping */
> - ret = dma_common_contiguous_remap(page, size, prot,
> - __builtin_return_address(0));
> - if (!ret)
> + cpu_addr = dma_common_contiguous_remap(page, size, prot,
> + __builtin_return_address(0));
> + if (!cpu_addr)
> goto out_encrypt_pages;
> } else {
> - ret = page_address(page);
> + cpu_addr = page_address(page);
> }
>
> - memset(ret, 0, size);
> + memset(cpu_addr, 0, size);
>
> if (set_uncached) {
> void *uncached_cpu_addr;
>
> arch_dma_prep_coherent(page, size);
> - uncached_cpu_addr = arch_dma_set_uncached(ret, size);
> + uncached_cpu_addr = arch_dma_set_uncached(cpu_addr, size);
> if (IS_ERR(uncached_cpu_addr))
> goto out_free_remap_pages;
> - ret = uncached_cpu_addr;
> + cpu_addr = uncached_cpu_addr;
> }
>
> *dma_handle = phys_to_dma_direct(dev, page_to_phys(page),
> !!(attrs & DMA_ATTR_CC_SHARED));
> - return ret;
> -
> + return cpu_addr;
>
> out_free_remap_pages:
> if (remap)
> - dma_common_free_remap(ret, size);
> + dma_common_free_remap(cpu_addr, size);
>
> out_encrypt_pages:
> if (mark_mem_decrypt &&
> @@ -439,7 +438,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
> {
> unsigned long attrs = 0;
> struct page *page;
> - void *ret;
> + void *cpu_addr;
>
> if (force_dma_unencrypted(dev))
> attrs |= DMA_ATTR_CC_SHARED;
> @@ -453,7 +452,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
> if (!page)
> return NULL;
>
> - ret = page_address(page);
> + cpu_addr = page_address(page);
> goto setup_page;
> }
>
> @@ -461,11 +460,11 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
> if (!page)
> return NULL;
>
> - ret = page_address(page);
> - if ((attrs & DMA_ATTR_CC_SHARED) && dma_set_decrypted(dev, ret, size))
> + cpu_addr = page_address(page);
> + if ((attrs & DMA_ATTR_CC_SHARED) && dma_set_decrypted(dev, cpu_addr, size))
> goto out_leak_pages;
> setup_page:
> - memset(ret, 0, size);
> + memset(cpu_addr, 0, size);
> *dma_handle = phys_to_dma_direct(dev, page_to_phys(page),
> !!(attrs & DMA_ATTR_CC_SHARED));
> return page;
^ permalink raw reply
* Re: [PATCH v6 08/20] dma-direct: pass attrs to dma_capable() for DMA_ATTR_CC_SHARED checks
From: Petr Tesarik @ 2026-06-09 12:50 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Jiri Pirko,
Michael Kelley
In-Reply-To: <20260604083959.1265923-9-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:47 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> Teach dma_capable() about DMA_ATTR_CC_SHARED so the capability
> check can reject encrypted DMA addresses for devices that require
> unencrypted/shared DMA.
>
> Also propagate DMA_ATTR_CC_SHARED in swiotlb_map() when the selected
> SWIOTLB pool is decrypted so the capability check sees the correct DMA
> address attribute.
>
> Tested-by: Jiri Pirko <jiri@nvidia.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Petr T
> ---
> arch/x86/kernel/amd_gart_64.c | 30 ++++++++++++++++--------------
> drivers/xen/swiotlb-xen.c | 6 +++---
> include/linux/dma-direct.h | 10 +++++++++-
> kernel/dma/direct.h | 6 +++---
> kernel/dma/swiotlb.c | 2 +-
> 5 files changed, 32 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
> index e8000a56732e..b5f1f031d45b 100644
> --- a/arch/x86/kernel/amd_gart_64.c
> +++ b/arch/x86/kernel/amd_gart_64.c
> @@ -180,22 +180,23 @@ static void iommu_full(struct device *dev, size_t size, int dir)
> }
>
> static inline int
> -need_iommu(struct device *dev, unsigned long addr, size_t size)
> +need_iommu(struct device *dev, unsigned long addr, size_t size, unsigned long attrs)
> {
> - return force_iommu || !dma_capable(dev, addr, size, true);
> + return force_iommu || !dma_capable(dev, addr, size, true, attrs);
> }
>
> static inline int
> -nonforced_iommu(struct device *dev, unsigned long addr, size_t size)
> +nonforced_iommu(struct device *dev, unsigned long addr, size_t size,
> + unsigned long attrs)
> {
> - return !dma_capable(dev, addr, size, true);
> + return !dma_capable(dev, addr, size, true, attrs);
> }
>
> /* Map a single continuous physical area into the IOMMU.
> * Caller needs to check if the iommu is needed and flush.
> */
> static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem,
> - size_t size, int dir, unsigned long align_mask)
> + size_t size, int dir, unsigned long align_mask, unsigned long attrs)
> {
> unsigned long npages = iommu_num_pages(phys_mem, size, PAGE_SIZE);
> unsigned long iommu_page;
> @@ -206,7 +207,7 @@ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem,
>
> iommu_page = alloc_iommu(dev, npages, align_mask);
> if (iommu_page == -1) {
> - if (!nonforced_iommu(dev, phys_mem, size))
> + if (!nonforced_iommu(dev, phys_mem, size, attrs))
> return phys_mem;
> if (panic_on_overflow)
> panic("dma_map_area overflow %lu bytes\n", size);
> @@ -231,10 +232,10 @@ static dma_addr_t gart_map_phys(struct device *dev, phys_addr_t paddr,
> if (unlikely(attrs & DMA_ATTR_MMIO))
> return DMA_MAPPING_ERROR;
>
> - if (!need_iommu(dev, paddr, size))
> + if (!need_iommu(dev, paddr, size, attrs))
> return paddr;
>
> - bus = dma_map_area(dev, paddr, size, dir, 0);
> + bus = dma_map_area(dev, paddr, size, dir, 0, attrs);
> flush_gart();
>
> return bus;
> @@ -289,7 +290,7 @@ static void gart_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
>
> /* Fallback for dma_map_sg in case of overflow */
> static int dma_map_sg_nonforce(struct device *dev, struct scatterlist *sg,
> - int nents, int dir)
> + int nents, int dir, unsigned long attrs)
> {
> struct scatterlist *s;
> int i;
> @@ -301,8 +302,8 @@ static int dma_map_sg_nonforce(struct device *dev, struct scatterlist *sg,
> for_each_sg(sg, s, nents, i) {
> unsigned long addr = sg_phys(s);
>
> - if (nonforced_iommu(dev, addr, s->length)) {
> - addr = dma_map_area(dev, addr, s->length, dir, 0);
> + if (nonforced_iommu(dev, addr, s->length, attrs)) {
> + addr = dma_map_area(dev, addr, s->length, dir, 0, attrs);
> if (addr == DMA_MAPPING_ERROR) {
> if (i > 0)
> gart_unmap_sg(dev, sg, i, dir, 0);
> @@ -401,7 +402,7 @@ static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents,
> s->dma_address = addr;
> BUG_ON(s->length == 0);
>
> - nextneed = need_iommu(dev, addr, s->length);
> + nextneed = need_iommu(dev, addr, s->length, attrs);
>
> /* Handle the previous not yet processed entries */
> if (i > start) {
> @@ -449,7 +450,7 @@ static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents,
>
> /* When it was forced or merged try again in a dumb way */
> if (force_iommu || iommu_merge) {
> - out = dma_map_sg_nonforce(dev, sg, nents, dir);
> + out = dma_map_sg_nonforce(dev, sg, nents, dir, attrs);
> if (out > 0)
> return out;
> }
> @@ -473,7 +474,8 @@ gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr,
> return vaddr;
>
> *dma_addr = dma_map_area(dev, virt_to_phys(vaddr), size,
> - DMA_BIDIRECTIONAL, (1UL << get_order(size)) - 1);
> + DMA_BIDIRECTIONAL,
> + (1UL << get_order(size)) - 1, attrs);
> flush_gart();
> if (unlikely(*dma_addr == DMA_MAPPING_ERROR))
> goto out_free;
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 8c4abe65cd49..e2538824ef52 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -212,7 +212,7 @@ static dma_addr_t xen_swiotlb_map_phys(struct device *dev, phys_addr_t phys,
> BUG_ON(dir == DMA_NONE);
>
> if (attrs & DMA_ATTR_MMIO) {
> - if (unlikely(!dma_capable(dev, phys, size, false))) {
> + if (unlikely(!dma_capable(dev, phys, size, false, attrs))) {
> dev_err_once(
> dev,
> "DMA addr %pa+%zu overflow (mask %llx, bus limit %llx).\n",
> @@ -231,7 +231,7 @@ static dma_addr_t xen_swiotlb_map_phys(struct device *dev, phys_addr_t phys,
> * we can safely return the device addr and not worry about bounce
> * buffering it.
> */
> - if (dma_capable(dev, dev_addr, size, true) &&
> + if (dma_capable(dev, dev_addr, size, true, attrs) &&
> !dma_kmalloc_needs_bounce(dev, size, dir) &&
> !range_straddles_page_boundary(phys, size) &&
> !xen_arch_need_swiotlb(dev, phys, dev_addr) &&
> @@ -253,7 +253,7 @@ static dma_addr_t xen_swiotlb_map_phys(struct device *dev, phys_addr_t phys,
> /*
> * Ensure that the address returned is DMA'ble
> */
> - if (unlikely(!dma_capable(dev, dev_addr, size, true))) {
> + if (unlikely(!dma_capable(dev, dev_addr, size, true, attrs))) {
> __swiotlb_tbl_unmap_single(dev, map, size, dir,
> attrs | DMA_ATTR_SKIP_CPU_SYNC,
> swiotlb_find_pool(dev, map));
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index 94fad4e7c11e..daa31a1adf7b 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -135,12 +135,20 @@ static inline bool force_dma_unencrypted(struct device *dev)
> #endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
>
> static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size,
> - bool is_ram)
> + bool is_ram, unsigned long attrs)
> {
> dma_addr_t end = addr + size - 1;
>
> if (addr == DMA_MAPPING_ERROR)
> return false;
> + /*
> + * The DMA address was derived from encrypted RAM, but this device
> + * requires unencrypted DMA addresses. Treat it as not DMA-capable
> + * so the caller can fall back to a suitable SWIOTLB pool.
> + */
> + if (!(attrs & DMA_ATTR_CC_SHARED) && force_dma_unencrypted(dev))
> + return false;
> +
> if (is_ram && !IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
> min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
> return false;
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 7140c208c123..e05dc7649366 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -101,15 +101,15 @@ static inline dma_addr_t dma_direct_map_phys(struct device *dev,
>
> if (attrs & DMA_ATTR_MMIO) {
> dma_addr = phys;
> - if (unlikely(!dma_capable(dev, dma_addr, size, false)))
> + if (unlikely(!dma_capable(dev, dma_addr, size, false, attrs)))
> goto err_overflow;
> } else if (attrs & DMA_ATTR_CC_SHARED) {
> dma_addr = phys_to_dma_unencrypted(dev, phys);
> - if (unlikely(!dma_capable(dev, dma_addr, size, false)))
> + if (unlikely(!dma_capable(dev, dma_addr, size, false, attrs)))
> goto err_overflow;
> } else {
> dma_addr = phys_to_dma(dev, phys);
> - if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
> + if (unlikely(!dma_capable(dev, dma_addr, size, true, attrs)) ||
> dma_kmalloc_needs_bounce(dev, size, dir)) {
> if (is_swiotlb_active(dev) &&
> !(attrs & DMA_ATTR_REQUIRE_COHERENT))
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 2bf3981db35d..f4e8b241a1c4 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -1678,7 +1678,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
> else
> dma_addr = phys_to_dma_encrypted(dev, swiotlb_addr);
>
> - if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> + if (unlikely(!dma_capable(dev, dma_addr, size, true, attrs))) {
> __swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
> attrs | DMA_ATTR_SKIP_CPU_SYNC,
> swiotlb_find_pool(dev, swiotlb_addr));
^ permalink raw reply
* Re: [PATCH v6 06/20] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Petr Tesarik @ 2026-06-09 12:48 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Jiri Pirko,
Michael Kelley
In-Reply-To: <20260604083959.1265923-7-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:45 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> Teach swiotlb to distinguish between encrypted and decrypted bounce
> buffer pools, and make allocation and mapping paths select a pool whose
> state matches the requested DMA attributes.
>
> Add a unencrypted flag to io_tlb_mem, initialize it for the default and
> restricted pools, and propagate DMA_ATTR_CC_SHARED into swiotlb pool
> allocation. Reject swiotlb alloc/map requests when the selected pool does
> not match the required encrypted/decrypted state.
>
> Also return DMA addresses with the matching phys_to_dma_{encrypted,
> unencrypted} helper so the DMA address encoding stays consistent with the
> chosen pool.
>
> Tested-by: Jiri Pirko <jiri@nvidia.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
> include/linux/dma-direct.h | 10 +++
> include/linux/swiotlb.h | 8 +-
> kernel/dma/direct.c | 13 +++-
> kernel/dma/swiotlb.c | 154 ++++++++++++++++++++++++++++---------
> 4 files changed, 142 insertions(+), 43 deletions(-)
>
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index c249912456f9..94fad4e7c11e 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -77,6 +77,10 @@ static inline dma_addr_t dma_range_map_max(const struct bus_dma_region *map)
> #ifndef phys_to_dma_unencrypted
> #define phys_to_dma_unencrypted phys_to_dma
> #endif
> +
> +#ifndef phys_to_dma_encrypted
> +#define phys_to_dma_encrypted phys_to_dma
> +#endif
> #else
> static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
> {
> @@ -90,6 +94,12 @@ static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
> {
> return dma_addr_unencrypted(__phys_to_dma(dev, paddr));
> }
> +
> +static inline dma_addr_t phys_to_dma_encrypted(struct device *dev,
> + phys_addr_t paddr)
> +{
> + return dma_addr_encrypted(__phys_to_dma(dev, paddr));
> +}
> /*
> * If memory encryption is supported, phys_to_dma will set the memory encryption
> * bit in the DMA address, and dma_to_phys will clear it.
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 29187cec90d8..4dcbf3931be1 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -81,6 +81,7 @@ struct io_tlb_pool {
> struct list_head node;
> struct rcu_head rcu;
> bool transient;
> + bool unencrypted;
IIUC this is a copy of the unencrypted member in the corresponding
struct io_tlb_mem. In other words, if pools are allocated dynamically,
all pools must have the same encryption state, correct?
> #endif
> };
>
> @@ -111,6 +112,7 @@ struct io_tlb_mem {
> struct dentry *debugfs;
> bool force_bounce;
> bool for_alloc;
> + bool unencrypted;
> #ifdef CONFIG_SWIOTLB_DYNAMIC
> bool can_grow;
> u64 phys_limit;
> @@ -282,7 +284,8 @@ static inline void swiotlb_sync_single_for_cpu(struct device *dev,
> extern void swiotlb_print_info(void);
>
> #ifdef CONFIG_DMA_RESTRICTED_POOL
> -struct page *swiotlb_alloc(struct device *dev, size_t size);
> +struct page *swiotlb_alloc(struct device *dev, size_t size,
> + unsigned long attrs);
> bool swiotlb_free(struct device *dev, struct page *page, size_t size);
> void swiotlb_free_from_pool(struct device *dev, phys_addr_t tlb_addr,
> size_t size, struct io_tlb_pool *pool);
> @@ -292,7 +295,8 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
> return dev->dma_io_tlb_mem->for_alloc;
> }
> #else
> -static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
> +static inline struct page *swiotlb_alloc(struct device *dev, size_t size,
> + unsigned long attrs)
> {
> return NULL;
> }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 681f16a984ab..0b4a26c6b6fd 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -96,9 +96,10 @@ static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
> return ret;
> }
>
> -static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
> +static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size,
> + unsigned long attrs)
> {
> - struct page *page = swiotlb_alloc(dev, size);
> + struct page *page = swiotlb_alloc(dev, size, attrs);
>
> if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> swiotlb_free(dev, page, size);
> @@ -258,8 +259,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> gfp, attrs);
>
> if (is_swiotlb_for_alloc(dev)) {
> - page = dma_direct_alloc_swiotlb(dev, size);
> + page = dma_direct_alloc_swiotlb(dev, size, attrs);
> if (page) {
> + /*
> + * swiotlb allocations comes from pool already marked
> + * decrypted
> + */
> mark_mem_decrypt = false;
> goto setup_page;
> }
> @@ -407,7 +412,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
> gfp, attrs);
>
> if (is_swiotlb_for_alloc(dev)) {
> - page = dma_direct_alloc_swiotlb(dev, size);
> + page = dma_direct_alloc_swiotlb(dev, size, attrs);
> if (!page)
> return NULL;
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 78ce05857c00..2bf3981db35d 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -259,10 +259,21 @@ void __init swiotlb_update_mem_attributes(void)
> struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
> unsigned long bytes;
>
> + /*
> + * if platform support memory encryption, swiotlb buffers are
> + * decrypted by default.
> + */
> + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> + io_tlb_default_mem.unencrypted = true;
> + else
> + io_tlb_default_mem.unencrypted = false;
> +
> if (!mem->nslabs || mem->late_alloc)
> return;
> bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
> - set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
> +
> + if (io_tlb_default_mem.unencrypted)
> + set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
> }
>
> static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> @@ -505,8 +516,10 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> if (!mem->slots)
> goto error_slots;
>
> - set_memory_decrypted((unsigned long)vstart,
> - (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> + if (io_tlb_default_mem.unencrypted)
> + set_memory_decrypted((unsigned long)vstart,
> + (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> +
> swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
> nareas);
> add_mem_pool(&io_tlb_default_mem, mem);
> @@ -539,7 +552,9 @@ void __init swiotlb_exit(void)
> tbl_size = PAGE_ALIGN(mem->end - mem->start);
> slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
>
> - set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> + if (io_tlb_default_mem.unencrypted)
> + set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> +
> if (mem->late_alloc) {
> area_order = get_order(array_size(sizeof(*mem->areas),
> mem->nareas));
> @@ -563,6 +578,7 @@ void __init swiotlb_exit(void)
> * @gfp: GFP flags for the allocation.
> * @bytes: Size of the buffer.
> * @phys_limit: Maximum allowed physical address of the buffer.
> + * @unencrypted: true to allocate unencrypted memory, false for encrypted memory
> *
> * Allocate pages from the buddy allocator. If successful, make the allocated
> * pages decrypted that they can be used for DMA.
> @@ -570,7 +586,8 @@ void __init swiotlb_exit(void)
> * Return: Decrypted pages, %NULL on allocation failure, or ERR_PTR(-EAGAIN)
> * if the allocated physical address was above @phys_limit.
> */
> -static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> +static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes,
> + u64 phys_limit, bool unencrypted)
> {
> unsigned int order = get_order(bytes);
> struct page *page;
> @@ -588,13 +605,13 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> }
>
> vaddr = phys_to_virt(paddr);
> - if (set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
> + if (unencrypted && set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
> goto error;
> return page;
>
> error:
> /* Intentional leak if pages cannot be encrypted again. */
> - if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> + if (unencrypted && !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> __free_pages(page, order);
> return NULL;
> }
> @@ -604,30 +621,26 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> * @dev: Device for which a memory pool is allocated.
> * @bytes: Size of the buffer.
> * @phys_limit: Maximum allowed physical address of the buffer.
> + * @attrs: DMA attributes for the allocation.
> * @gfp: GFP flags for the allocation.
> *
> * Return: Allocated pages, or %NULL on allocation failure.
> */
> static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> - u64 phys_limit, gfp_t gfp)
> + u64 phys_limit, unsigned long attrs, gfp_t gfp)
If my assumption above is correct, then I prefer to add a struct
io_tlb_mem *mem parameter here and calculate the allocation attributes
inside this function, so you don't have to repeat it in the callers.
> {
> struct page *page;
> - unsigned long attrs = 0;
>
> /*
> * Allocate from the atomic pools if memory is encrypted and
> * the allocation is atomic, because decrypting may block.
> */
> - if (!gfpflags_allow_blocking(gfp) && dev && force_dma_unencrypted(dev)) {
> + if (!gfpflags_allow_blocking(gfp) && (attrs & DMA_ATTR_CC_SHARED)) {
You're removing the check that dev is non-NULL. This is fine, because
the only call with dev == NULL is from swiotlb_dyn_alloc(), and that one
uses GFP_KERNEL (i.e. allows blocking). However, if this is an intended
optimization, I'd rather have it in a separate commit, with this
explanation why it's OK to do it.
The rest of the patch looks good to me.
Petr T
> void *vaddr;
>
> if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
> return NULL;
>
> - /* swiotlb considered decrypted by default */
> - if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> - attrs = DMA_ATTR_CC_SHARED;
> -
> return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
> attrs, dma_coherent_ok);
> }
> @@ -638,7 +651,8 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> else if (phys_limit <= DMA_BIT_MASK(32))
> gfp |= __GFP_DMA32;
>
> - while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit))) {
> + while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit,
> + !!(attrs & DMA_ATTR_CC_SHARED)))) {
> if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
> phys_limit < DMA_BIT_MASK(64) &&
> !(gfp & (__GFP_DMA32 | __GFP_DMA)))
> @@ -657,15 +671,18 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> * swiotlb_free_tlb() - free a dynamically allocated IO TLB buffer
> * @vaddr: Virtual address of the buffer.
> * @bytes: Size of the buffer.
> + * @unencrypted: true if @vaddr was allocated decrypted and must be
> + * re-encrypted before being freed
> */
> -static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> +static void swiotlb_free_tlb(void *vaddr, size_t bytes, bool unencrypted)
> {
> if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> dma_free_from_pool(NULL, vaddr, bytes))
> return;
>
> /* Intentional leak if pages cannot be encrypted again. */
> - if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> + if (!unencrypted ||
> + !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> __free_pages(virt_to_page(vaddr), get_order(bytes));
> }
>
> @@ -676,6 +693,7 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> * @nslabs: Desired (maximum) number of slabs.
> * @nareas: Number of areas.
> * @phys_limit: Maximum DMA buffer physical address.
> + * @attrs: DMA attributes for the allocation.
> * @gfp: GFP flags for the allocations.
> *
> * Allocate and initialize a new IO TLB memory pool. The actual number of
> @@ -686,7 +704,8 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> */
> static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> unsigned long minslabs, unsigned long nslabs,
> - unsigned int nareas, u64 phys_limit, gfp_t gfp)
> + unsigned int nareas, u64 phys_limit,
> + unsigned long attrs, gfp_t gfp)
> {
> struct io_tlb_pool *pool;
> unsigned int slot_order;
> @@ -704,9 +723,10 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> if (!pool)
> goto error;
> pool->areas = (void *)pool + sizeof(*pool);
> + pool->unencrypted = !!(attrs & DMA_ATTR_CC_SHARED);
>
> tlb_size = nslabs << IO_TLB_SHIFT;
> - while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, gfp))) {
> + while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp))) {
> if (nslabs <= minslabs)
> goto error_tlb;
> nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
> @@ -724,7 +744,8 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> return pool;
>
> error_slots:
> - swiotlb_free_tlb(page_address(tlb), tlb_size);
> + swiotlb_free_tlb(page_address(tlb), tlb_size,
> + !!(attrs & DMA_ATTR_CC_SHARED));
> error_tlb:
> kfree(pool);
> error:
> @@ -742,7 +763,9 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
> struct io_tlb_pool *pool;
>
> pool = swiotlb_alloc_pool(NULL, IO_TLB_MIN_SLABS, default_nslabs,
> - default_nareas, mem->phys_limit, GFP_KERNEL);
> + default_nareas, mem->phys_limit,
> + mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
> + GFP_KERNEL);
> if (!pool) {
> pr_warn_ratelimited("Failed to allocate new pool");
> return;
> @@ -762,7 +785,7 @@ static void swiotlb_dyn_free(struct rcu_head *rcu)
> size_t tlb_size = pool->end - pool->start;
>
> free_pages((unsigned long)pool->slots, get_order(slots_size));
> - swiotlb_free_tlb(pool->vaddr, tlb_size);
> + swiotlb_free_tlb(pool->vaddr, tlb_size, pool->unencrypted);
> kfree(pool);
> }
>
> @@ -1037,13 +1060,11 @@ static void dec_transient_used(struct io_tlb_mem *mem, unsigned int nslots)
> * Return: Index of the first allocated slot, or -1 on error.
> */
> static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool,
> - int area_index, phys_addr_t orig_addr, size_t alloc_size,
> - unsigned int alloc_align_mask)
> + int area_index, phys_addr_t orig_addr, dma_addr_t tbl_dma_addr,
> + size_t alloc_size, unsigned int alloc_align_mask)
> {
> struct io_tlb_area *area = pool->areas + area_index;
> unsigned long boundary_mask = dma_get_seg_boundary(dev);
> - dma_addr_t tbl_dma_addr =
> - phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> unsigned long max_slots = get_max_slots(boundary_mask);
> unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
> unsigned int nslots = nr_slots(alloc_size), stride;
> @@ -1056,6 +1077,8 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> BUG_ON(!nslots);
> BUG_ON(area_index >= pool->nareas);
>
> + tbl_dma_addr &= boundary_mask;
> +
> /*
> * Historically, swiotlb allocations >= PAGE_SIZE were guaranteed to be
> * page-aligned in the absence of any other alignment requirements.
> @@ -1167,6 +1190,7 @@ static int swiotlb_search_area(struct device *dev, int start_cpu,
> {
> struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> struct io_tlb_pool *pool;
> + dma_addr_t tbl_dma_addr;
> int area_index;
> int index = -1;
>
> @@ -1175,9 +1199,15 @@ static int swiotlb_search_area(struct device *dev, int start_cpu,
> if (cpu_offset >= pool->nareas)
> continue;
> area_index = (start_cpu + cpu_offset) & (pool->nareas - 1);
> +
> + if (mem->unencrypted)
> + tbl_dma_addr = phys_to_dma_unencrypted(dev, pool->start);
> + else
> + tbl_dma_addr = phys_to_dma_encrypted(dev, pool->start);
> +
> index = swiotlb_search_pool_area(dev, pool, area_index,
> - orig_addr, alloc_size,
> - alloc_align_mask);
> + orig_addr, tbl_dma_addr,
> + alloc_size, alloc_align_mask);
> if (index >= 0) {
> *retpool = pool;
> break;
> @@ -1207,6 +1237,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
> {
> struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> struct io_tlb_pool *pool;
> + dma_addr_t tbl_dma_addr;
> unsigned long nslabs;
> unsigned long flags;
> u64 phys_limit;
> @@ -1232,11 +1263,17 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
> nslabs = nr_slots(alloc_size);
> phys_limit = min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
> pool = swiotlb_alloc_pool(dev, nslabs, nslabs, 1, phys_limit,
> + mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
> GFP_NOWAIT);
> if (!pool)
> return -1;
>
> - index = swiotlb_search_pool_area(dev, pool, 0, orig_addr,
> + if (mem->unencrypted)
> + tbl_dma_addr = phys_to_dma_unencrypted(dev, pool->start);
> + else
> + tbl_dma_addr = phys_to_dma_encrypted(dev, pool->start);
> +
> + index = swiotlb_search_pool_area(dev, pool, 0, orig_addr, tbl_dma_addr,
> alloc_size, alloc_align_mask);
> if (index < 0) {
> swiotlb_dyn_free(&pool->rcu);
> @@ -1281,15 +1318,23 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
> size_t alloc_size, unsigned int alloc_align_mask,
> struct io_tlb_pool **retpool)
> {
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> struct io_tlb_pool *pool;
> + dma_addr_t tbl_dma_addr;
> int start, i;
> int index;
>
> - *retpool = pool = &dev->dma_io_tlb_mem->defpool;
> + *retpool = pool = &mem->defpool;
> + if (mem->unencrypted)
> + tbl_dma_addr = phys_to_dma_unencrypted(dev, pool->start);
> + else
> + tbl_dma_addr = phys_to_dma_encrypted(dev, pool->start);
> +
> i = start = raw_smp_processor_id() & (pool->nareas - 1);
> do {
> index = swiotlb_search_pool_area(dev, pool, i, orig_addr,
> - alloc_size, alloc_align_mask);
> + tbl_dma_addr, alloc_size,
> + alloc_align_mask);
> if (index >= 0)
> return index;
> if (++i >= pool->nareas)
> @@ -1372,9 +1417,19 @@ static unsigned long mem_used(struct io_tlb_mem *mem)
> * any pre- or post-padding for alignment
> * @alloc_align_mask: Required start and end alignment of the allocated buffer
> * @dir: DMA direction
> - * @attrs: Optional DMA attributes for the map operation
> + * @attrs: Optional DMA attributes for the map operation, updated
> + * to match the selected SWIOTLB pool
> *
> * Find and allocate a suitable sequence of IO TLB slots for the request.
> + * The device's SWIOTLB pool must match the device's current DMA encryption
> + * requirements. If the device requires decrypted DMA, bouncing is done through
> + * an unencrypted pool and the mapping is marked shared. If the device can DMA
> + * to encrypted memory, bouncing is done through an encrypted pool even when the
> + * original DMA address was unencrypted. Enabling encrypted DMA for a device is
> + * therefore expected to update its default io_tlb_mem to an encrypted pool, so
> + * later bounce mappings for both encrypted and decrypted original memory use
> + * that encrypted pool.
> + *
> * The allocated space starts at an alignment specified by alloc_align_mask,
> * and the size of the allocated space is rounded up so that the total amount
> * of allocated space is a multiple of (alloc_align_mask + 1). If
> @@ -1411,6 +1466,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
>
> + /* swiotlb pool is incorrect for this device */
> + if (unlikely(mem->unencrypted != force_dma_unencrypted(dev)))
> + return (phys_addr_t)DMA_MAPPING_ERROR;
> +
> + /* Force attrs to match the kind of memory in the pool */
> + if (mem->unencrypted)
> + *attrs |= DMA_ATTR_CC_SHARED;
> + else
> + *attrs &= ~DMA_ATTR_CC_SHARED;
> +
> /*
> * The default swiotlb memory pool is allocated with PAGE_SIZE
> * alignment. If a mapping is requested with larger alignment,
> @@ -1608,8 +1673,11 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
> if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
> return DMA_MAPPING_ERROR;
>
> - /* Ensure that the address returned is DMA'ble */
> - dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
> + if (attrs & DMA_ATTR_CC_SHARED)
> + dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
> + else
> + dma_addr = phys_to_dma_encrypted(dev, swiotlb_addr);
> +
> if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> __swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
> attrs | DMA_ATTR_SKIP_CPU_SYNC,
> @@ -1773,7 +1841,7 @@ static inline void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
>
> #ifdef CONFIG_DMA_RESTRICTED_POOL
>
> -struct page *swiotlb_alloc(struct device *dev, size_t size)
> +struct page *swiotlb_alloc(struct device *dev, size_t size, unsigned long attrs)
> {
> struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> struct io_tlb_pool *pool;
> @@ -1784,6 +1852,9 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
> if (!mem)
> return NULL;
>
> + if (mem->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> + return NULL;
> +
> align = (1 << (get_order(size) + PAGE_SHIFT)) - 1;
> index = swiotlb_find_slots(dev, 0, size, align, &pool);
> if (index == -1)
> @@ -1859,9 +1930,18 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> kfree(mem);
> return -ENOMEM;
> }
> + /*
> + * if platform supports memory encryption,
> + * restricted mem pool is decrypted by default
> + */
> + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
> + mem->unencrypted = true;
> + set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
> + rmem->size >> PAGE_SHIFT);
> + } else {
> + mem->unencrypted = false;
> + }
>
> - set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
> - rmem->size >> PAGE_SHIFT);
> swiotlb_init_io_tlb_pool(pool, rmem->base, nslabs,
> false, nareas);
> mem->force_bounce = true;
^ permalink raw reply
* Re: [PATCH v4 10/47] x86/tsc: Consolidate forcing of X86_FEATURE_TSC_KNOWN_FREQ for PV code
From: Sean Christopherson @ 2026-06-09 12:28 UTC (permalink / raw)
To: Thomas Gleixner
Cc: David Woodhouse, Paolo Bonzini, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
Juergen Gross, Daniel Lezcano, John Stultz, H. Peter Anvin,
Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, Tom Lendacky, Nikunj A Dadhania,
Michael Kelley
In-Reply-To: <87a4t440js.ffs@fw13>
On Tue, Jun 09, 2026, Thomas Gleixner wrote:
> On Mon, Jun 08 2026 at 15:38, Sean Christopherson wrote:
> > On Sat, Jun 06, 2026, David Woodhouse wrote:
> >> > Along with:
> >> >
> >> > if (!hypervisor_is_type(X86_HYPER_NATIVE)) {
> >> > if (tsc_khz_early)
> >> > pr_warn("Ignoring non-sensical tsc_early_khz command line argument\n");
> >> >
> >> > or something daft like that.
> >
> > Ya, I ended up in the same place once Sashiko pointed out that skipping the SNP/TDX
> > setup was hazardous[*], and also once I realized that tsc_khz_early *complemented*
> > the refinement instead of replacing it.
> >
> > This is what I have locally:
> >
> > if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
> > known_tsc_khz = snp_secure_tsc_init();
> > else if (boot_cpu_has(X86_FEATURE_TDX_GUEST))
> > known_tsc_khz = tdx_tsc_init();
> >
> > /*
> > * If the TSC frequency wasn't provided by trusted firmware, try to get
> > * it from the hypervisor (which is untrusted when running as a CoCo guest).
> > */
> > if (!known_tsc_khz && x86_init.hyper.get_tsc_khz)
> > known_tsc_khz = x86_init.hyper.get_tsc_khz();
> >
> > /*
> > * Mark the TSC frequency as known if it was obtained from a hypervisor
> > * or trusted firmware. Don't mark the frequency as known if the user
> > * specified the frequency, as the user-provided frequency is intended
> > * as a "starting point", not a known, guaranteed frequency.
> > */
> > if (known_tsc_khz && !tsc_early_khz)
> > setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
>
> If the frequenct is known via the above then you want to set the
> KNOWN_FREQ feature bit unconditionally. SNP/TDX/hypervisor override the
> command line argument as you print below.
Doh, forgot to remove that check when I shuffled things around. Thank you!
^ permalink raw reply
* Re: [PATCH v6 04/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Petr Tesarik @ 2026-06-09 12:23 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Jiri Pirko,
Michael Kelley
In-Reply-To: <20260604083959.1265923-5-aneesh.kumar@kernel.org>
On Thu, 4 Jun 2026 14:09:43 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> Teach the atomic DMA pool code to distinguish between encrypted and
> unencrypted pools, and make pool allocation select the matching pool based
> on DMA attributes.
>
> Introduce a dma_gen_pool wrapper that records whether a pool is
> unencrypted, initialize that state when the atomic pools are created, and
> use it when expanding and resizing the pools. Update dma_alloc_from_pool()
> to take attrs and skip pools whose encrypted state does not match
> DMA_ATTR_CC_SHARED. Update dma_free_from_pool() accordingly.
>
> Also pass DMA_ATTR_CC_SHARED from the swiotlb atomic allocation path so
> decrypted swiotlb allocations are taken from the correct atomic pool.
>
> Tested-by: Jiri Pirko <jiri@nvidia.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Reviewed-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
FWIW this also looks good to me, but I don't think I'm the best person
to review changed to DMA generic pools.
Petr T
> ---
> drivers/iommu/dma-iommu.c | 2 +-
> include/linux/dma-map-ops.h | 2 +-
> kernel/dma/direct.c | 11 ++-
> kernel/dma/pool.c | 167 +++++++++++++++++++++++-------------
> kernel/dma/swiotlb.c | 7 +-
> 5 files changed, 123 insertions(+), 66 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 54d96e847f16..c2595bee3d41 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1673,7 +1673,7 @@ void *iommu_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> !gfpflags_allow_blocking(gfp) && !coherent)
> page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
> - gfp, NULL);
> + gfp, attrs, NULL);
> else
> cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
> if (!cpu_addr)
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 6a1832a73cad..696b2c3a2305 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -212,7 +212,7 @@ void *dma_common_pages_remap(struct page **pages, size_t size, pgprot_t prot,
> void dma_common_free_remap(void *cpu_addr, size_t size);
>
> struct page *dma_alloc_from_pool(struct device *dev, size_t size,
> - void **cpu_addr, gfp_t flags,
> + void **cpu_addr, gfp_t flags, unsigned long attrs,
> bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t));
> bool dma_free_from_pool(struct device *dev, void *start, size_t size);
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 90dc5057a0c0..681f16a984ab 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -154,7 +154,7 @@ static bool dma_direct_use_pool(struct device *dev, gfp_t gfp)
> }
>
> static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
> - dma_addr_t *dma_handle, gfp_t gfp)
> + dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
> {
> struct page *page;
> u64 phys_limit;
> @@ -164,7 +164,8 @@ static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
> return NULL;
>
> gfp |= dma_direct_optimal_gfp_mask(dev, &phys_limit);
> - page = dma_alloc_from_pool(dev, size, &ret, gfp, dma_coherent_ok);
> + page = dma_alloc_from_pool(dev, size, &ret, gfp, attrs,
> + dma_coherent_ok);
> if (!page)
> return NULL;
> *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
> @@ -253,7 +254,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> */
> if ((remap || (attrs & DMA_ATTR_CC_SHARED)) &&
> dma_direct_use_pool(dev, gfp))
> - return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
> + return dma_direct_alloc_from_pool(dev, size, dma_handle,
> + gfp, attrs);
>
> if (is_swiotlb_for_alloc(dev)) {
> page = dma_direct_alloc_swiotlb(dev, size);
> @@ -401,7 +403,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
> attrs |= DMA_ATTR_CC_SHARED;
>
> if ((attrs & DMA_ATTR_CC_SHARED) && dma_direct_use_pool(dev, gfp))
> - return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
> + return dma_direct_alloc_from_pool(dev, size, dma_handle,
> + gfp, attrs);
>
> if (is_swiotlb_for_alloc(dev)) {
> page = dma_direct_alloc_swiotlb(dev, size);
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 2b2fbb709242..be78474a6c49 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -12,12 +12,18 @@
> #include <linux/set_memory.h>
> #include <linux/slab.h>
> #include <linux/workqueue.h>
> +#include <linux/cc_platform.h>
>
> -static struct gen_pool *atomic_pool_dma __ro_after_init;
> +struct dma_gen_pool {
> + bool unencrypted;
> + struct gen_pool *pool;
> +};
> +
> +static struct dma_gen_pool atomic_pool_dma __ro_after_init;
> static unsigned long pool_size_dma;
> -static struct gen_pool *atomic_pool_dma32 __ro_after_init;
> +static struct dma_gen_pool atomic_pool_dma32 __ro_after_init;
> static unsigned long pool_size_dma32;
> -static struct gen_pool *atomic_pool_kernel __ro_after_init;
> +static struct dma_gen_pool atomic_pool_kernel __ro_after_init;
> static unsigned long pool_size_kernel;
>
> /* Size can be defined by the coherent_pool command line */
> @@ -76,11 +82,12 @@ static bool cma_in_zone(gfp_t gfp)
> return true;
> }
>
> -static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
> +static int atomic_pool_expand(struct dma_gen_pool *dma_pool, size_t pool_size,
> gfp_t gfp)
> {
> unsigned int order;
> struct page *page = NULL;
> + bool leak_pages = false;
> void *addr;
> int ret = -ENOMEM;
>
> @@ -113,12 +120,17 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
> * Memory in the atomic DMA pools must be unencrypted, the pools do not
> * shrink so no re-encryption occurs in dma_direct_free().
> */
> - ret = set_memory_decrypted((unsigned long)page_to_virt(page),
> - 1 << order);
> - if (ret)
> - goto remove_mapping;
> - ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
> - pool_size, NUMA_NO_NODE);
> + if (dma_pool->unencrypted) {
> + ret = set_memory_decrypted((unsigned long)page_to_virt(page),
> + 1 << order);
> + if (ret) {
> + leak_pages = true;
> + goto remove_mapping;
> + }
> + }
> +
> + ret = gen_pool_add_virt(dma_pool->pool, (unsigned long)addr,
> + page_to_phys(page), pool_size, NUMA_NO_NODE);
> if (ret)
> goto encrypt_mapping;
>
> @@ -126,62 +138,67 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
> return 0;
>
> encrypt_mapping:
> - ret = set_memory_encrypted((unsigned long)page_to_virt(page),
> - 1 << order);
> - if (WARN_ON_ONCE(ret)) {
> - /* Decrypt succeeded but encrypt failed, purposely leak */
> - goto out;
> - }
> + if (dma_pool->unencrypted &&
> + set_memory_encrypted((unsigned long)page_to_virt(page), 1 << order))
> + leak_pages = true;
> +
> remove_mapping:
> #ifdef CONFIG_DMA_DIRECT_REMAP
> dma_common_free_remap(addr, pool_size);
> free_page:
> - __free_pages(page, order);
> + if (!leak_pages)
> + __free_pages(page, order);
> #endif
> out:
> return ret;
> }
>
> -static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp)
> +static void atomic_pool_resize(struct dma_gen_pool *dma_pool, gfp_t gfp)
> {
> - if (pool && gen_pool_avail(pool) < atomic_pool_size)
> - atomic_pool_expand(pool, gen_pool_size(pool), gfp);
> + if (dma_pool->pool && gen_pool_avail(dma_pool->pool) < atomic_pool_size)
> + atomic_pool_expand(dma_pool, gen_pool_size(dma_pool->pool), gfp);
> }
>
> static void atomic_pool_work_fn(struct work_struct *work)
> {
> if (IS_ENABLED(CONFIG_ZONE_DMA))
> - atomic_pool_resize(atomic_pool_dma,
> + atomic_pool_resize(&atomic_pool_dma,
> GFP_KERNEL | GFP_DMA);
> if (IS_ENABLED(CONFIG_ZONE_DMA32))
> - atomic_pool_resize(atomic_pool_dma32,
> + atomic_pool_resize(&atomic_pool_dma32,
> GFP_KERNEL | GFP_DMA32);
> - atomic_pool_resize(atomic_pool_kernel, GFP_KERNEL);
> + atomic_pool_resize(&atomic_pool_kernel, GFP_KERNEL);
> }
>
> -static __init struct gen_pool *__dma_atomic_pool_init(size_t pool_size,
> - gfp_t gfp)
> +static __init struct dma_gen_pool *__dma_atomic_pool_init(struct dma_gen_pool *dma_pool,
> + size_t pool_size, gfp_t gfp)
> {
> - struct gen_pool *pool;
> int ret;
>
> - pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
> - if (!pool)
> + dma_pool->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
> + if (!dma_pool->pool)
> return NULL;
>
> - gen_pool_set_algo(pool, gen_pool_first_fit_order_align, NULL);
> + gen_pool_set_algo(dma_pool->pool, gen_pool_first_fit_order_align, NULL);
> +
> + /* if platform is using memory encryption atomic pools are by default decrypted. */
> + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> + dma_pool->unencrypted = true;
> + else
> + dma_pool->unencrypted = false;
>
> - ret = atomic_pool_expand(pool, pool_size, gfp);
> + ret = atomic_pool_expand(dma_pool, pool_size, gfp);
> if (ret) {
> - gen_pool_destroy(pool);
> + gen_pool_destroy(dma_pool->pool);
> + dma_pool->pool = NULL;
> pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n",
> pool_size >> 10, &gfp);
> return NULL;
> }
>
> pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n",
> - gen_pool_size(pool) >> 10, &gfp);
> - return pool;
> + gen_pool_size(dma_pool->pool) >> 10, &gfp);
> + return dma_pool;
> }
>
> #ifdef CONFIG_ZONE_DMA32
> @@ -207,21 +224,22 @@ static int __init dma_atomic_pool_init(void)
>
> /* All memory might be in the DMA zone(s) to begin with */
> if (has_managed_zone(ZONE_NORMAL)) {
> - atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
> - GFP_KERNEL);
> - if (!atomic_pool_kernel)
> + __dma_atomic_pool_init(&atomic_pool_kernel, atomic_pool_size, GFP_KERNEL);
> + if (!atomic_pool_kernel.pool)
> ret = -ENOMEM;
> }
> +
> if (has_managed_dma()) {
> - atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
> - GFP_KERNEL | GFP_DMA);
> - if (!atomic_pool_dma)
> + __dma_atomic_pool_init(&atomic_pool_dma, atomic_pool_size,
> + GFP_KERNEL | GFP_DMA);
> + if (!atomic_pool_dma.pool)
> ret = -ENOMEM;
> }
> +
> if (has_managed_dma32) {
> - atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
> - GFP_KERNEL | GFP_DMA32);
> - if (!atomic_pool_dma32)
> + __dma_atomic_pool_init(&atomic_pool_dma32, atomic_pool_size,
> + GFP_KERNEL | GFP_DMA32);
> + if (!atomic_pool_dma32.pool)
> ret = -ENOMEM;
> }
>
> @@ -230,19 +248,44 @@ static int __init dma_atomic_pool_init(void)
> }
> postcore_initcall(dma_atomic_pool_init);
>
> -static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
> +static inline struct dma_gen_pool *__dma_guess_pool(struct dma_gen_pool *first,
> + struct dma_gen_pool *second, struct dma_gen_pool *third)
> {
> - if (prev == NULL) {
> + if (first->pool)
> + return first;
> + if (second && second->pool)
> + return second;
> + if (third && third->pool)
> + return third;
> + return NULL;
> +}
> +
> +static inline struct dma_gen_pool *dma_guess_pool(struct dma_gen_pool *prev,
> + gfp_t gfp)
> +{
> + if (!prev) {
> if (gfp & GFP_DMA)
> - return atomic_pool_dma ?: atomic_pool_dma32 ?: atomic_pool_kernel;
> + return __dma_guess_pool(&atomic_pool_dma,
> + &atomic_pool_dma32,
> + &atomic_pool_kernel);
> +
> if (gfp & GFP_DMA32)
> - return atomic_pool_dma32 ?: atomic_pool_dma ?: atomic_pool_kernel;
> - return atomic_pool_kernel ?: atomic_pool_dma32 ?: atomic_pool_dma;
> + return __dma_guess_pool(&atomic_pool_dma32,
> + &atomic_pool_dma,
> + &atomic_pool_kernel);
> +
> + return __dma_guess_pool(&atomic_pool_kernel,
> + &atomic_pool_dma32,
> + &atomic_pool_dma);
> }
> - if (prev == atomic_pool_kernel)
> - return atomic_pool_dma32 ? atomic_pool_dma32 : atomic_pool_dma;
> - if (prev == atomic_pool_dma32)
> - return atomic_pool_dma;
> +
> + if (prev == &atomic_pool_kernel)
> + return __dma_guess_pool(&atomic_pool_dma32,
> + &atomic_pool_dma, NULL);
> +
> + if (prev == &atomic_pool_dma32)
> + return __dma_guess_pool(&atomic_pool_dma, NULL, NULL);
> +
> return NULL;
> }
>
> @@ -272,16 +315,20 @@ static struct page *__dma_alloc_from_pool(struct device *dev, size_t size,
> }
>
> struct page *dma_alloc_from_pool(struct device *dev, size_t size,
> - void **cpu_addr, gfp_t gfp,
> + void **cpu_addr, gfp_t gfp, unsigned long attrs,
> bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t))
> {
> - struct gen_pool *pool = NULL;
> + struct dma_gen_pool *dma_pool = NULL;
> struct page *page;
> bool pool_found = false;
>
> - while ((pool = dma_guess_pool(pool, gfp))) {
> + while ((dma_pool = dma_guess_pool(dma_pool, gfp))) {
> +
> + if (dma_pool->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> + continue;
> +
> pool_found = true;
> - page = __dma_alloc_from_pool(dev, size, pool, cpu_addr,
> + page = __dma_alloc_from_pool(dev, size, dma_pool->pool, cpu_addr,
> phys_addr_ok);
> if (page)
> return page;
> @@ -296,12 +343,14 @@ struct page *dma_alloc_from_pool(struct device *dev, size_t size,
>
> bool dma_free_from_pool(struct device *dev, void *start, size_t size)
> {
> - struct gen_pool *pool = NULL;
> + struct dma_gen_pool *dma_pool = NULL;
> +
> + while ((dma_pool = dma_guess_pool(dma_pool, 0))) {
>
> - while ((pool = dma_guess_pool(pool, 0))) {
> - if (!gen_pool_has_addr(pool, (unsigned long)start, size))
> + if (!gen_pool_has_addr(dma_pool->pool, (unsigned long)start, size))
> continue;
> - gen_pool_free(pool, (unsigned long)start, size);
> +
> + gen_pool_free(dma_pool->pool, (unsigned long)start, size);
> return true;
> }
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index ac03a6856c2e..be4d418d92ac 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -612,6 +612,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> u64 phys_limit, gfp_t gfp)
> {
> struct page *page;
> + unsigned long attrs = 0;
>
> /*
> * Allocate from the atomic pools if memory is encrypted and
> @@ -623,8 +624,12 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
> return NULL;
>
> + /* swiotlb considered decrypted by default */
> + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> + attrs = DMA_ATTR_CC_SHARED;
> +
> return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
> - dma_coherent_ok);
> + attrs, dma_coherent_ok);
> }
>
> gfp &= ~GFP_ZONEMASK;
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox