* [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption
@ 2026-04-08 19:47 Mostafa Saleh
2026-04-08 19:47 ` [RFC PATCH v3 1/5] swiotlb: Return state of memory from swiotlb_alloc() Mostafa Saleh
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: Mostafa Saleh @ 2026-04-08 19:47 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="y", Size: 4478 bytes --]
Introduction
============
This is the third version of the fixes for direct-dma dealing with
memory encryption and restricted-dma.
Changes in v3:
- Instead of extending the logic by using is_swiotlb_for_alloc(),
follow Jason’s suggestion and propagate the state of the memory
allocated.
- Remove checks out of dma_set_*() based on Jason suggestion
- Remove documentation for now until we are close to the final
proposal and add it later if needed.
Background
==========
At the moment the following hypervisor guests will need to deal with
memory encryption:
- pKVM (ARM): Documentation/virt/kvm/arm/hypercalls.rst
- ARM CCA: Documentation/arch/arm64/arm-cca.rst
- Intel TDX: Documentation/arch/x86/tdx.rst
- AMD SEV: Documentation/arch/x86/amd-memory-encryption.rst
- PPC SVM: Documentation/arch/powerpc/ultravisor.rst
- Hyper-V: Documentation/virt/hyperv/coco.rst
AFAICT, all (confidential) guests running under those have the memory
encrypted by default and guests will then explicitly share the memory
back if needed.
The main use cases for decrypting(sharing) memory are:
- Sharing memory back to the host through SWIOTLB (for virtio...)
- Hypervisor specific communication (ex: snp_msg, GHCB, VMBUS...)
- Shared/emulated resources: VGARAM (x86-SEV), GIC ITS tables (arm64)
While encrypting memory is typically used for reverting the
set_memory_decrypted() either in error handling or in freeing shared
resources back to the kernel.
Design
======
This series focuses mainly on dma-direct interaction with memory
encryption which is the complicated case.
At the moment memory encryption and dma-direct interacts in 2 ways:
1) force_dma_direct(): if true, memory will be decrypted by default
on allocation.
2) Restricted DMA: where memory is pre-decrypted and managed by
SWIOTLB.
With a third possible usage on the way [1] where the DMA-API allows
an attr for decrypted memory.
Instead of open coding many checks with is_swiotlb_for_alloc() and
force_dma_unencrypted().
Make __dma_direct_alloc_pages() return the state of allocated memory
encapsulated on the new internal type dma_page.
Then based on the memory state, dma-direct can identify what to do
based on the cases:
- Memory needs to be decrypted but is not: dma-direct will decrypt
the memory and use the proper phys address conversions and page
table prot.
- Memory is already decrypted: dma-direct will not decrypt the memory
but it will use the proper phys address conversions and page table
prot.
The free part is more tricky as we already lose the information about
allocation, so we have to check with each allocator separately, so
swiotlb_is_decrypted() is added for SWIOTLB which is only allocator
that can return decrypted memory.
Testing
=======
I was able to test this only under pKVM (arm64) as I have no
access to other systems.
Future work
===========
Two other things I am also looking at which are related to restricted
DMA pools, so they should be a different series.
1) Private pools: Currently all restricted DMA pools are decrypted
(shared) by default. Having private pools would be useful for
device assignment when bouncing is needed (as for non-coherent
devices)
2) Optimizations for memory sharing. In some cases, allocations from
restricted dma-pools are page aligned. For CoCo cases, that means
that it will be cheaper to share memory in-place instead of
bouncing.
Both of these add new semantics which need to be done carefully to
avoid regressions, and might be a good candidate for a topic in the
next LPC.
Patches
=======
- 1 Extend swiotlb
- 2-4 Refactoring
- 5 Fixes
v1: https://lore.kernel.org/all/20260305170335.963568-1-smostafa@google.com/
v2: https://lore.kernel.org/all/20260330145043.1586623-1-smostafa@google.com/
[1] https://lore.kernel.org/all/20260305123641.164164-1-jiri@resnulli.us/
Mostafa Saleh (5):
swiotlb: Return state of memory from swiotlb_alloc()
dma-mapping: Move encryption in __dma_direct_free_pages()
dma-mapping: Decrypt memory on remap
dma-mapping: Encapsulate memory state during allocation
dma-mapping: Fix memory decryption issues
include/linux/swiotlb.h | 25 +++++++-
kernel/dma/direct.c | 134 +++++++++++++++++++++++++++-------------
kernel/dma/swiotlb.c | 23 ++++++-
3 files changed, 135 insertions(+), 47 deletions(-)
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH v3 1/5] swiotlb: Return state of memory from swiotlb_alloc()
2026-04-08 19:47 [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
@ 2026-04-08 19:47 ` Mostafa Saleh
2026-04-08 19:47 ` [RFC PATCH v3 2/5] dma-mapping: Move encryption in __dma_direct_free_pages() Mostafa Saleh
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Mostafa Saleh @ 2026-04-08 19:47 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
Make swiotlb_alloc() return the state of the allocated memory, at
the moment all the pools are decrypted but that would change soon.
In the next patches dma-direct will use the returned state to
determine whether to decrypt the memory and use the proper memory
decryption/encryption related functions.
Also, add swiotlb_is_decrypted(), that will be used before calling
swiotlb_free() to check whether the memory needs to be encrypted
by the caller.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
include/linux/swiotlb.h | 25 +++++++++++++++++++++++--
kernel/dma/direct.c | 2 +-
kernel/dma/swiotlb.c | 23 ++++++++++++++++++++++-
3 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 3dae0f592063..24be65494ce8 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -63,6 +63,7 @@ extern void __init swiotlb_update_mem_attributes(void);
* @area_nslabs: Number of slots in each area.
* @areas: Array of memory area descriptors.
* @slots: Array of slot descriptors.
+ * @decrypted: Whether the pool was decrypted or left in default state.
* @node: Member of the IO TLB memory pool list.
* @rcu: RCU head for swiotlb_dyn_free().
* @transient: %true if transient memory pool.
@@ -77,6 +78,7 @@ struct io_tlb_pool {
unsigned int area_nslabs;
struct io_tlb_area *areas;
struct io_tlb_slot *slots;
+ bool decrypted;
#ifdef CONFIG_SWIOTLB_DYNAMIC
struct list_head node;
struct rcu_head rcu;
@@ -281,16 +283,31 @@ static inline void swiotlb_sync_single_for_cpu(struct device *dev,
extern void swiotlb_print_info(void);
+/*
+ * This contains the state of pages returned by swiotlb_alloc()
+ * A page can either be:
+ * SWIOTLB_PAGE_DEFAULT: The page was not decrypted by the pool.
+ * SWIOTLB_PAGE_DECRYPTED: The page was decrypted by the pool.
+ */
+enum swiotlb_page_state {
+ SWIOTLB_PAGE_DEFAULT,
+ SWIOTLB_PAGE_DECRYPTED,
+};
+
#ifdef CONFIG_DMA_RESTRICTED_POOL
-struct page *swiotlb_alloc(struct device *dev, size_t size);
+struct page *swiotlb_alloc(struct device *dev, size_t size,
+ enum swiotlb_page_state *state);
bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+bool swiotlb_is_decrypted(struct device *dev, struct page *page, size_t size);
+
static inline bool is_swiotlb_for_alloc(struct device *dev)
{
return dev->dma_io_tlb_mem->for_alloc;
}
#else
-static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
+static inline struct page *swiotlb_alloc(struct device *dev, size_t size,
+ enum swiotlb_page_state *state)
{
return NULL;
}
@@ -299,6 +316,10 @@ static inline bool swiotlb_free(struct device *dev, struct page *page,
{
return false;
}
+static inline bool swiotlb_is_decrypted(struct device *dev, struct page *page, size_t size)
+{
+ return false;
+}
static inline bool is_swiotlb_for_alloc(struct device *dev)
{
return false;
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8f43a930716d..6efb5973fbd3 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -106,7 +106,7 @@ static void __dma_direct_free_pages(struct device *dev, struct page *page,
static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
{
- struct page *page = swiotlb_alloc(dev, size);
+ struct page *page = swiotlb_alloc(dev, size, NULL);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
swiotlb_free(dev, page, size);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9fd73700ddcf..8468ee5d3ff2 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -1763,7 +1763,8 @@ static inline void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
#ifdef CONFIG_DMA_RESTRICTED_POOL
-struct page *swiotlb_alloc(struct device *dev, size_t size)
+struct page *swiotlb_alloc(struct device *dev, size_t size,
+ enum swiotlb_page_state *state)
{
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
struct io_tlb_pool *pool;
@@ -1787,6 +1788,8 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
return NULL;
}
+ if (state)
+ *state = pool->decrypted ? SWIOTLB_PAGE_DECRYPTED : SWIOTLB_PAGE_DEFAULT;
return pfn_to_page(PFN_DOWN(tlb_addr));
}
@@ -1804,6 +1807,18 @@ bool swiotlb_free(struct device *dev, struct page *page, size_t size)
return true;
}
+bool swiotlb_is_decrypted(struct device *dev, struct page *page, size_t size)
+{
+ phys_addr_t tlb_addr = page_to_phys(page);
+ struct io_tlb_pool *pool;
+
+ pool = swiotlb_find_pool(dev, tlb_addr);
+ if (!pool)
+ return false;
+
+ return pool->decrypted;
+}
+
static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
struct device *dev)
{
@@ -1844,6 +1859,12 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
return -ENOMEM;
}
+ /*
+ * At the moment all restricted dma pools are always decrypted,
+ * although that should change soon with CCA solutions introducing
+ * device passthrough.
+ */
+ pool->decrypted = true;
set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
rmem->size >> PAGE_SHIFT);
swiotlb_init_io_tlb_pool(pool, rmem->base, nslabs,
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH v3 2/5] dma-mapping: Move encryption in __dma_direct_free_pages()
2026-04-08 19:47 [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
2026-04-08 19:47 ` [RFC PATCH v3 1/5] swiotlb: Return state of memory from swiotlb_alloc() Mostafa Saleh
@ 2026-04-08 19:47 ` Mostafa Saleh
2026-04-10 17:45 ` Jason Gunthorpe
2026-04-08 19:47 ` [RFC PATCH v3 3/5] dma-mapping: Decrypt memory on remap Mostafa Saleh
` (3 subsequent siblings)
5 siblings, 1 reply; 9+ messages in thread
From: Mostafa Saleh @ 2026-04-08 19:47 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
In the next patches, we will need to avoid encrypting memory allocated
from SWIOTLB, so instead of calling dma_set_encrypted() before
__dma_direct_free_pages(), call it inside, conditional on the memory
state passed to the function.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
kernel/dma/direct.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 6efb5973fbd3..ce74f213ec40 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -97,8 +97,11 @@ static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
}
static void __dma_direct_free_pages(struct device *dev, struct page *page,
- size_t size)
+ size_t size, bool encrypt)
{
+ if (encrypt && dma_set_encrypted(dev, page_address(page), size))
+ return;
+
if (swiotlb_free(dev, page, size))
return;
dma_free_contiguous(dev, page, size);
@@ -203,7 +206,7 @@ static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
{
- bool remap = false, set_uncached = false;
+ bool remap = false, set_uncached = false, encrypt = false;
struct page *page;
void *ret;
@@ -298,10 +301,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return ret;
out_encrypt_pages:
- if (dma_set_encrypted(dev, page_address(page), size))
- return NULL;
+ encrypt = true;
out_free_pages:
- __dma_direct_free_pages(dev, page, size);
+ __dma_direct_free_pages(dev, page, size, encrypt);
return NULL;
out_leak_pages:
return NULL;
@@ -311,6 +313,7 @@ void dma_direct_free(struct device *dev, size_t size,
void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs)
{
unsigned int page_order = get_order(size);
+ bool encrypt = false;
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
!force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
@@ -343,11 +346,10 @@ void dma_direct_free(struct device *dev, size_t size,
} else {
if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED))
arch_dma_clear_uncached(cpu_addr, size);
- if (dma_set_encrypted(dev, cpu_addr, size))
- return;
+ encrypt = true;
}
- __dma_direct_free_pages(dev, dma_direct_to_page(dev, dma_addr), size);
+ __dma_direct_free_pages(dev, dma_direct_to_page(dev, dma_addr), size, encrypt);
}
struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -384,9 +386,7 @@ void dma_direct_free_pages(struct device *dev, size_t size,
dma_free_from_pool(dev, vaddr, size))
return;
- if (dma_set_encrypted(dev, vaddr, size))
- return;
- __dma_direct_free_pages(dev, page, size);
+ __dma_direct_free_pages(dev, page, size, true);
}
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH v3 3/5] dma-mapping: Decrypt memory on remap
2026-04-08 19:47 [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
2026-04-08 19:47 ` [RFC PATCH v3 1/5] swiotlb: Return state of memory from swiotlb_alloc() Mostafa Saleh
2026-04-08 19:47 ` [RFC PATCH v3 2/5] dma-mapping: Move encryption in __dma_direct_free_pages() Mostafa Saleh
@ 2026-04-08 19:47 ` Mostafa Saleh
2026-04-08 19:47 ` [RFC PATCH v3 4/5] dma-mapping: Encapsulate memory state during allocation Mostafa Saleh
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Mostafa Saleh @ 2026-04-08 19:47 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
In case memory needs to be remapped on systems with
force_dma_unencrypted(), where this memory is not allocated
from a restricted-dma pool, this was currently ignored, while only
setting the decrypted pgprot in the remapped alias.
The memory still needs to be decrypted in that case.
With memory decryption, don't allow highmem allocations, but that
shouldn't be a problem on such modern systems.
Also, move force_dma_unencrypted() outside of dma_set_* to make it
clear to be able to use more generic logic to decided memory
state.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: f3c962226dbe ("dma-direct: clean up the remapping checks in dma_direct_alloc")
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
kernel/dma/direct.c | 31 ++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index ce74f213ec40..de63e0449700 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -79,8 +79,6 @@ bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
{
- if (!force_dma_unencrypted(dev))
- return 0;
return set_memory_decrypted((unsigned long)vaddr, PFN_UP(size));
}
@@ -88,8 +86,6 @@ static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
{
int ret;
- if (!force_dma_unencrypted(dev))
- return 0;
ret = set_memory_encrypted((unsigned long)vaddr, PFN_UP(size));
if (ret)
pr_warn_ratelimited("leaking DMA memory that can't be re-encrypted\n");
@@ -206,7 +202,7 @@ static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
{
- bool remap = false, set_uncached = false, encrypt = false;
+ bool remap = false, set_uncached = false, decrypt = force_dma_unencrypted(dev);
struct page *page;
void *ret;
@@ -215,7 +211,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
- !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev))
+ !decrypt && !is_swiotlb_for_alloc(dev))
return dma_direct_alloc_no_mapping(dev, size, dma_handle, gfp);
if (!dev_is_dma_coherent(dev)) {
@@ -249,12 +245,15 @@ void *dma_direct_alloc(struct device *dev, size_t size,
* Remapping or decrypting memory may block, allocate the memory from
* the atomic pools instead if we aren't allowed block.
*/
- if ((remap || force_dma_unencrypted(dev)) &&
+ if ((remap || decrypt) &&
dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
- /* we always manually zero the memory once we are done */
- page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true);
+ /*
+ * we always manually zero the memory once we are done, and only allow
+ * high mem if pages doesn't need decryption.
+ */
+ page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, !decrypt);
if (!page)
return NULL;
@@ -268,10 +267,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
set_uncached = false;
}
+ if (decrypt && dma_set_decrypted(dev, page_address(page), size))
+ goto out_leak_pages;
if (remap) {
pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
- if (force_dma_unencrypted(dev))
+ if (decrypt)
prot = pgprot_decrypted(prot);
/* remove any dirty cache lines on the kernel alias */
@@ -281,11 +282,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
ret = dma_common_contiguous_remap(page, size, prot,
__builtin_return_address(0));
if (!ret)
- goto out_free_pages;
+ goto out_encrypt_pages;
} else {
ret = page_address(page);
- if (dma_set_decrypted(dev, ret, size))
- goto out_leak_pages;
}
memset(ret, 0, size);
@@ -301,9 +300,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return ret;
out_encrypt_pages:
- encrypt = true;
-out_free_pages:
- __dma_direct_free_pages(dev, page, size, encrypt);
+ __dma_direct_free_pages(dev, page, size, decrypt);
return NULL;
out_leak_pages:
return NULL;
@@ -366,7 +363,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
return NULL;
ret = page_address(page);
- if (dma_set_decrypted(dev, ret, size))
+ if (force_dma_unencrypted(dev) && dma_set_decrypted(dev, ret, size))
goto out_leak_pages;
memset(ret, 0, size);
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH v3 4/5] dma-mapping: Encapsulate memory state during allocation
2026-04-08 19:47 [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
` (2 preceding siblings ...)
2026-04-08 19:47 ` [RFC PATCH v3 3/5] dma-mapping: Decrypt memory on remap Mostafa Saleh
@ 2026-04-08 19:47 ` Mostafa Saleh
2026-04-10 18:05 ` Jason Gunthorpe
2026-04-08 19:47 ` [RFC PATCH v3 5/5] dma-mapping: Fix memory decryption issues Mostafa Saleh
2026-04-10 17:43 ` [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Jason Gunthorpe
5 siblings, 1 reply; 9+ messages in thread
From: Mostafa Saleh @ 2026-04-08 19:47 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
Introduce a new dma-direct internal type dma_page which is
"struct page" and a bit indicate whether the memory has been decrypted
or not.
This is useful to pass such information encapsulated through
allocation functions, which is currently set from swiotlb_alloc().
No functional changes.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
kernel/dma/direct.c | 58 +++++++++++++++++++++++++++++++++++----------
1 file changed, 46 insertions(+), 12 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index de63e0449700..204bc566480c 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -16,6 +16,33 @@
#include <linux/pci-p2pdma.h>
#include "direct.h"
+/*
+ * Represent DMA allocation and 1 bit flag for it's state
+ */
+struct dma_page {
+ unsigned long val;
+};
+
+#define DMA_PAGE_DECRYPTED_FLAG BIT(0)
+
+#define DMA_PAGE_NULL ((struct dma_page){ .val = 0 })
+
+static inline struct dma_page page_to_dma_page(struct page *page, bool decrypted)
+{
+ struct dma_page dma_page;
+
+ dma_page.val = (unsigned long)page;
+ if (decrypted)
+ dma_page.val |= DMA_PAGE_DECRYPTED_FLAG;
+
+ return dma_page;
+}
+
+static inline struct page *dma_page_to_page(struct dma_page dma_page)
+{
+ return (struct page *)(dma_page.val & ~DMA_PAGE_DECRYPTED_FLAG);
+}
+
/*
* Most architectures use ZONE_DMA for the first 16 Megabytes, but some use
* it for entirely different regions. In that case the arch code needs to
@@ -103,20 +130,21 @@ static void __dma_direct_free_pages(struct device *dev, struct page *page,
dma_free_contiguous(dev, page, size);
}
-static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
+static struct dma_page dma_direct_alloc_swiotlb(struct device *dev, size_t size)
{
- struct page *page = swiotlb_alloc(dev, size, NULL);
+ enum swiotlb_page_state state;
+ struct page *page = swiotlb_alloc(dev, size, &state);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
swiotlb_free(dev, page, size);
- return NULL;
+ return DMA_PAGE_NULL;
}
- return page;
+ return page_to_dma_page(page, state == SWIOTLB_PAGE_DECRYPTED);
}
-static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
- gfp_t gfp, bool allow_highmem)
+static struct dma_page __dma_direct_alloc_pages(struct device *dev, size_t size,
+ gfp_t gfp, bool allow_highmem)
{
int node = dev_to_node(dev);
struct page *page;
@@ -132,7 +160,7 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
if (page) {
if (dma_coherent_ok(dev, page_to_phys(page), size) &&
(allow_highmem || !PageHighMem(page)))
- return page;
+ return page_to_dma_page(page, false);
dma_free_contiguous(dev, page, size);
}
@@ -148,10 +176,10 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
else if (IS_ENABLED(CONFIG_ZONE_DMA) && !(gfp & GFP_DMA))
gfp = (gfp & ~GFP_DMA32) | GFP_DMA;
else
- return NULL;
+ return DMA_PAGE_NULL;
}
- return page;
+ return page_to_dma_page(page, false);
}
/*
@@ -184,9 +212,11 @@ static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp)
{
+ struct dma_page dma_page;
struct page *page;
- page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true);
+ dma_page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true);
+ page = dma_page_to_page(dma_page);
if (!page)
return NULL;
@@ -203,6 +233,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
{
bool remap = false, set_uncached = false, decrypt = force_dma_unencrypted(dev);
+ struct dma_page dma_page;
struct page *page;
void *ret;
@@ -253,7 +284,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
* we always manually zero the memory once we are done, and only allow
* high mem if pages doesn't need decryption.
*/
- page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, !decrypt);
+ dma_page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, !decrypt);
+ page = dma_page_to_page(dma_page);
if (!page)
return NULL;
@@ -352,13 +384,15 @@ void dma_direct_free(struct device *dev, size_t size,
struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
{
+ struct dma_page dma_page;
struct page *page;
void *ret;
if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
- page = __dma_direct_alloc_pages(dev, size, gfp, false);
+ dma_page = __dma_direct_alloc_pages(dev, size, gfp, false);
+ page = dma_page_to_page(dma_page);
if (!page)
return NULL;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH v3 5/5] dma-mapping: Fix memory decryption issues
2026-04-08 19:47 [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
` (3 preceding siblings ...)
2026-04-08 19:47 ` [RFC PATCH v3 4/5] dma-mapping: Encapsulate memory state during allocation Mostafa Saleh
@ 2026-04-08 19:47 ` Mostafa Saleh
2026-04-10 17:43 ` [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Jason Gunthorpe
5 siblings, 0 replies; 9+ messages in thread
From: Mostafa Saleh @ 2026-04-08 19:47 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
Fix 2 existing issues:
1) In case a device have a restricted DMA pool, memory will be
decrypted (which is now returned in the state from swiotlb_alloc().
Later the main function will attempt to decrypt the memory if
force_dma_unencrypted() is true.
Which results in the memory being decrypted twice.
Change that to only encrypt/decrypt memory that is not already
decrypted as indicated in the new dma_page struct.
2) Using phys_to_dma_unencrypted() is not enlighted about already
decrypted memory and will use the wrong functions for that.
Fixes: f4111e39a52a ("swiotlb: Add restricted DMA alloc/free support")
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
kernel/dma/direct.c | 41 ++++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 204bc566480c..26611d5e5757 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -43,6 +43,11 @@ static inline struct page *dma_page_to_page(struct dma_page dma_page)
return (struct page *)(dma_page.val & ~DMA_PAGE_DECRYPTED_FLAG);
}
+static inline bool is_dma_page_decrypted(struct dma_page dma_page)
+{
+ return dma_page.val & DMA_PAGE_DECRYPTED_FLAG;
+}
+
/*
* Most architectures use ZONE_DMA for the first 16 Megabytes, but some use
* it for entirely different regions. In that case the arch code needs to
@@ -51,9 +56,9 @@ static inline struct page *dma_page_to_page(struct dma_page dma_page)
u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
static inline dma_addr_t phys_to_dma_direct(struct device *dev,
- phys_addr_t phys)
+ phys_addr_t phys, bool already_decrypted)
{
- if (force_dma_unencrypted(dev))
+ if (already_decrypted || force_dma_unencrypted(dev))
return phys_to_dma_unencrypted(dev, phys);
return phys_to_dma(dev, phys);
}
@@ -67,7 +72,7 @@ static inline struct page *dma_direct_to_page(struct device *dev,
u64 dma_direct_get_required_mask(struct device *dev)
{
phys_addr_t phys = (phys_addr_t)(max_pfn - 1) << PAGE_SHIFT;
- u64 max_dma = phys_to_dma_direct(dev, phys);
+ u64 max_dma = phys_to_dma_direct(dev, phys, false);
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
}
@@ -96,7 +101,7 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 *phys_limit)
bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
{
- dma_addr_t dma_addr = phys_to_dma_direct(dev, phys);
+ dma_addr_t dma_addr = phys_to_dma_direct(dev, phys, false);
if (dma_addr == DMA_MAPPING_ERROR)
return false;
@@ -122,11 +127,14 @@ static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
static void __dma_direct_free_pages(struct device *dev, struct page *page,
size_t size, bool encrypt)
{
- if (encrypt && dma_set_encrypted(dev, page_address(page), size))
+ bool keep_encrypted = swiotlb_is_decrypted(dev, page, size);
+
+ if (!keep_encrypted && encrypt && dma_set_encrypted(dev, page_address(page), size))
return;
if (swiotlb_free(dev, page, size))
return;
+
dma_free_contiguous(dev, page, size);
}
@@ -205,7 +213,7 @@ static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
page = dma_alloc_from_pool(dev, size, &ret, gfp, dma_coherent_ok);
if (!page)
return NULL;
- *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
+ *dma_handle = phys_to_dma_direct(dev, page_to_phys(page), false);
return ret;
}
@@ -225,7 +233,8 @@ static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
arch_dma_prep_coherent(page, size);
/* return the page pointer as the opaque cookie */
- *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
+ *dma_handle = phys_to_dma_direct(dev, page_to_phys(page),
+ is_dma_page_decrypted(dma_page));
return page;
}
@@ -234,6 +243,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
{
bool remap = false, set_uncached = false, decrypt = force_dma_unencrypted(dev);
struct dma_page dma_page;
+ bool already_decrypted;
struct page *page;
void *ret;
@@ -289,6 +299,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
if (!page)
return NULL;
+ already_decrypted = is_dma_page_decrypted(dma_page);
/*
* dma_alloc_contiguous can return highmem pages depending on a
* combination the cma= arguments and per-arch setup. These need to be
@@ -299,12 +310,13 @@ void *dma_direct_alloc(struct device *dev, size_t size,
set_uncached = false;
}
- if (decrypt && dma_set_decrypted(dev, page_address(page), size))
+ if (!already_decrypted && decrypt &&
+ dma_set_decrypted(dev, page_address(page), size))
goto out_leak_pages;
if (remap) {
pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
- if (decrypt)
+ if (decrypt || already_decrypted)
prot = pgprot_decrypted(prot);
/* remove any dirty cache lines on the kernel alias */
@@ -328,11 +340,11 @@ void *dma_direct_alloc(struct device *dev, size_t size,
goto out_encrypt_pages;
}
- *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
+ *dma_handle = phys_to_dma_direct(dev, page_to_phys(page), already_decrypted);
return ret;
out_encrypt_pages:
- __dma_direct_free_pages(dev, page, size, decrypt);
+ __dma_direct_free_pages(dev, page, size, decrypt && !already_decrypted);
return NULL;
out_leak_pages:
return NULL;
@@ -385,6 +397,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
{
struct dma_page dma_page;
+ bool already_decrypted;
struct page *page;
void *ret;
@@ -396,11 +409,13 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
if (!page)
return NULL;
+ already_decrypted = is_dma_page_decrypted(dma_page);
ret = page_address(page);
- if (force_dma_unencrypted(dev) && dma_set_decrypted(dev, ret, size))
+ if (!already_decrypted && force_dma_unencrypted(dev) &&
+ dma_set_decrypted(dev, ret, size))
goto out_leak_pages;
memset(ret, 0, size);
- *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
+ *dma_handle = phys_to_dma_direct(dev, page_to_phys(page), already_decrypted);
return page;
out_leak_pages:
return NULL;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption
2026-04-08 19:47 [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
` (4 preceding siblings ...)
2026-04-08 19:47 ` [RFC PATCH v3 5/5] dma-mapping: Fix memory decryption issues Mostafa Saleh
@ 2026-04-10 17:43 ` Jason Gunthorpe
5 siblings, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2026-04-10 17:43 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Wed, Apr 08, 2026 at 07:47:37PM +0000, Mostafa Saleh wrote:
> Introduction
> ============
> This is the third version of the fixes for direct-dma dealing with
> memory encryption and restricted-dma.
>
> Changes in v3:
> - Instead of extending the logic by using is_swiotlb_for_alloc(),
> follow Jason’s suggestion and propagate the state of the memory
> allocated.
> - Remove checks out of dma_set_*() based on Jason suggestion
> - Remove documentation for now until we are close to the final
> proposal and add it later if needed.
There are a number of Sashiko remarks that look plausible that should
be investigated:
https://sashiko.dev/#/patchset/20260408194750.2280873-1-smostafa%40google.com
> Design
> ======
> This series focuses mainly on dma-direct interaction with memory
> encryption which is the complicated case.
> At the moment memory encryption and dma-direct interacts in 2 ways:
> 1) force_dma_direct(): if true, memory will be decrypted by default
> on allocation.
> 2) Restricted DMA: where memory is pre-decrypted and managed by
> SWIOTLB.
>
> With a third possible usage on the way [1] where the DMA-API allows
> an attr for decrypted memory.
This [1] was merged now
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v3 2/5] dma-mapping: Move encryption in __dma_direct_free_pages()
2026-04-08 19:47 ` [RFC PATCH v3 2/5] dma-mapping: Move encryption in __dma_direct_free_pages() Mostafa Saleh
@ 2026-04-10 17:45 ` Jason Gunthorpe
0 siblings, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2026-04-10 17:45 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Wed, Apr 08, 2026 at 07:47:39PM +0000, Mostafa Saleh wrote:
> In the next patches, we will need to avoid encrypting memory allocated
> from SWIOTLB, so instead of calling dma_set_encrypted() before
> __dma_direct_free_pages(), call it inside, conditional on the memory
> state passed to the function.
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
> kernel/dma/direct.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 6efb5973fbd3..ce74f213ec40 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -97,8 +97,11 @@ static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
> }
>
> static void __dma_direct_free_pages(struct device *dev, struct page *page,
> - size_t size)
> + size_t size, bool encrypt)
> {
This feels like it would be nicer if it could be the
swiotlb_page_state instead of a bool, maybe the enum needs a different
name.
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v3 4/5] dma-mapping: Encapsulate memory state during allocation
2026-04-08 19:47 ` [RFC PATCH v3 4/5] dma-mapping: Encapsulate memory state during allocation Mostafa Saleh
@ 2026-04-10 18:05 ` Jason Gunthorpe
0 siblings, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2026-04-10 18:05 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Wed, Apr 08, 2026 at 07:47:41PM +0000, Mostafa Saleh wrote:
> Introduce a new dma-direct internal type dma_page which is
> "struct page" and a bit indicate whether the memory has been decrypted
> or not.
> This is useful to pass such information encapsulated through
> allocation functions, which is currently set from swiotlb_alloc().
>
> No functional changes.
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
> kernel/dma/direct.c | 58 +++++++++++++++++++++++++++++++++++----------
> 1 file changed, 46 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index de63e0449700..204bc566480c 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -16,6 +16,33 @@
> #include <linux/pci-p2pdma.h>
> #include "direct.h"
>
> +/*
> + * Represent DMA allocation and 1 bit flag for it's state
> + */
I'd explain this wrappers a pointer and uses the low PAGE_SHIFT bits
for flags..
> +struct dma_page {
> + unsigned long val;
unintptr_t ?
> @@ -103,20 +130,21 @@ static void __dma_direct_free_pages(struct device *dev, struct page *page,
> dma_free_contiguous(dev, page, size);
> }
>
> -static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
> +static struct dma_page dma_direct_alloc_swiotlb(struct device *dev, size_t size)
> {
> - struct page *page = swiotlb_alloc(dev, size, NULL);
> + enum swiotlb_page_state state;
> + struct page *page = swiotlb_alloc(dev, size, &state);
>
> if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> swiotlb_free(dev, page, size);
> - return NULL;
> + return DMA_PAGE_NULL;
> }
>
> - return page;
> + return page_to_dma_page(page, state == SWIOTLB_PAGE_DECRYPTED);
Should the struct dma_page have been introduced earlier instead of the
swiotlb_page_state ? Seems a bit odd to have both
If these are actually internally allocated struct pages, could you use
the struct page memory itself to record the decrypted state? That
would require more significant changes to the allocator calls.
> @@ -184,9 +212,11 @@ static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
> static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
> dma_addr_t *dma_handle, gfp_t gfp)
> {
> + struct dma_page dma_page;
> struct page *page;
>
> - page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true);
> + dma_page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true);
> + page = dma_page_to_page(dma_page);
> if (!page)
> return NULL;
I would expect to see more usage of the dma_page here..
Like I don't think this is really right:
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
Does page_to_phys(page) really work on decrypted memory? On CCA it
will return the protected alias which doesn't seem like something
useful?
static inline dma_addr_t phys_to_dma_direct(struct device *dev,
phys_addr_t phys)
{
if (force_dma_unencrypted(dev))
return phys_to_dma_unencrypted(dev, phys);
return phys_to_dma(dev, phys);
Above is all nonsense now that you have a direct indication of the
address is decrypted memory or not, it should also be used right here
directly.
if (is_dma_page_decrypted(dma_page))
*dma_handle = phys_to_dma_unencrypted(..)
else
*dma_handle = phys_to_dma(..);
The later patch just makes it worse by adding even more confusing
flags to phys_to_dma_direct().
I think it should work out that everyone already knows what memory
type they are working with before they call down to
phys_to_dma_direct() - the calls to force_dma_unecrypted() here are
just hacks because it previously did not.
Anyhow, I think this series is alot better than the previous one. If
you work a little harder to make it so there is only one
force_dma_unecrypted() per high level DMA API call that would be
perfect.
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-10 18:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 19:47 [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
2026-04-08 19:47 ` [RFC PATCH v3 1/5] swiotlb: Return state of memory from swiotlb_alloc() Mostafa Saleh
2026-04-08 19:47 ` [RFC PATCH v3 2/5] dma-mapping: Move encryption in __dma_direct_free_pages() Mostafa Saleh
2026-04-10 17:45 ` Jason Gunthorpe
2026-04-08 19:47 ` [RFC PATCH v3 3/5] dma-mapping: Decrypt memory on remap Mostafa Saleh
2026-04-08 19:47 ` [RFC PATCH v3 4/5] dma-mapping: Encapsulate memory state during allocation Mostafa Saleh
2026-04-10 18:05 ` Jason Gunthorpe
2026-04-08 19:47 ` [RFC PATCH v3 5/5] dma-mapping: Fix memory decryption issues Mostafa Saleh
2026-04-10 17:43 ` [RFC PATCH v3 0/5] dma-mapping: Fixes for memory encryption Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox