* [RFC PATCH v2 0/5] dma-mapping: Fixes for memory encryption
@ 2026-03-30 14:50 Mostafa Saleh
2026-03-30 14:50 ` [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL Mostafa Saleh
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-30 14:50 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
Introduction
============
This is the second version of the fixes for direct-dma dealing with
memory encryption and restricted-dma.
In this version, one more fix is included and I go a step further
and attempt to clean up the code to consolidate it and make it less
error prone.
Especially with more users coming, such as using decrypted dma-bufs [1]
which also interacts with dma-direct.
Lastly I added a new documentation for memory encryption based on my
conclusion (Documentation/core-api/dma-direct-memory-encryption.rst)
Background
==========
At the moment the following hypervisor guests will need to deal with
memory encryption:
- pKVM (ARM): Documentation/virt/kvm/arm/hypercalls.rst
- ARM CCA: Documentation/arch/arm64/arm-cca.rst
- Intel TDX: Documentation/arch/x86/tdx.rst
- AMD SEV: Documentation/arch/x86/amd-memory-encryption.rst
- PPC SVM: Documentation/arch/powerpc/ultravisor.rst
- Hyper-V: Documentation/virt/hyperv/coco.rst
AFAICT, all (confidential) guests running under those have the memory
encrypted by default and guests will then explicitly share the memory
back if needed.
The main use cases for decrypting(sharing) memory are:
- Sharing memory back to the host through SWIOTLB (for virtio...)
- Hypervisor specific communication (ex: snp_msg, GHCB, VMBUS...)
- Shared/emulated resources: VGARAM (x86-SEV), GIC ITS tables (arm64)
While encrypting memory is typically used for reverting the
set_memory_decrypted() either in error handling or in freeing shared
resources back to the kernel.
Design
======
This series focuses mainly on dma-direct interaction with memory
encryption which is the complicated case.
At the moment memory encryption and dma-direct interacts in 2 ways:
1) force_dma_direct(): if true, memory will be decrypted by default
on allocation.
2) Restricted DMA: where memory is pre-decrypted and managed by
SWIOTLB.
With a third possible usage on the way [1] where the DMA-API allows
an attr for decrypted memory.
Instead of open coding many checks with is_swiotlb_for_alloc() and
force_dma_unencrypted() which doesn’t have the same exact semantics,
add some helpers to abstract that logic into the code:
* dma_external_decryption(dev): Returns true if the pages are decrypted
but managed externally.
* dma_owns_decryption(dev): Returns true if the pages need to be
explicitly decrypted and managed by the `dma-direct` layer (as the
architecture forces unencrypted DMA).
* is_dma_decrypted(dev): Returns true if the memory being used is in
a decrypted state, regardless of who manages it.
Testing
=======
I was able to test this only under pKVM (arm64) as I have no
access to other systems.
Future work
===========
Two other things I am also looking at which are related to restricted
DMA pools, so they should be a different series.
1) Private pools: Currently all restricted DMA pools are decrypted
(shared) by default. Having private pools would be useful for
device assignment when bouncing is needed (as for non-coherent
devices)
2) Optimizations for memory sharing. In some cases, allocations from
restricted dma-pools are page aligned. For CoCo cases, that means
that it will be cheaper to share memory in-place instead of
bouncing.
Both of these add new semantics which need to be done carefully to
avoid regressions, and might be a good candidate for a topic in the
next LPC.
Patches
=======
- 1-3 Fixes
- 4 Refactoring
- 5 Documentation
v1: https://lore.kernel.org/all/20260305170335.963568-1-smostafa@google.com/
[1] https://lore.kernel.org/all/20260305123641.164164-1-jiri@resnulli.us/
Mostafa Saleh (5):
dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL
dma-mapping: Use the correct phys_to_dma() for DMA_RESTRICTED_POOL
dma-mapping: Decrypt memory on remap
dma-mapping: Refactor memory encryption usage
dma-mapping: Add doc for memory encryption
.../core-api/dma-direct-memory-encryption.rst | 77 +++++++++++++++++++
kernel/dma/direct.c | 51 ++++++++----
2 files changed, 114 insertions(+), 14 deletions(-)
create mode 100644 Documentation/core-api/dma-direct-memory-encryption.rst
--
2.53.0.1185.g05d4b7b318-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL
2026-03-30 14:50 [RFC PATCH v2 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
@ 2026-03-30 14:50 ` Mostafa Saleh
2026-03-30 15:06 ` Jason Gunthorpe
2026-03-30 14:50 ` [RFC PATCH v2 2/5] dma-mapping: Use the correct phys_to_dma() for DMA_RESTRICTED_POOL Mostafa Saleh
` (3 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-30 14:50 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
In case a device have a restricted DMA pool, it will be decrypted
by default.
However, in the path of dma_direct_alloc() memory can be allocated
from this pool using, __dma_direct_alloc_pages() =>
dma_direct_alloc_swiotlb()
After that from the same function, it will attempt to decrypt it
using dma_set_decrypted() if force_dma_unencrypted().
Which results in the memory being decrypted twice.
It's not clear how the does realm world/hypervisors deal with that,
for example:
- CCA: Clear a bit in the page table and call realm IPA_STATE_SET.
- TDX: Issue a hypercall.
- pKVM: Which doesn't implement force_dma_unencrypted() at the moment,
uses a share hypercall.
Change that to only encrypt/decrypt memory that are not allocated
from the restricted dma pools.
Fixes: f4111e39a52a ("swiotlb: Add restricted DMA alloc/free support")
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
kernel/dma/direct.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8f43a930716d..27d804f0473f 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -79,7 +79,7 @@ bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
{
- if (!force_dma_unencrypted(dev))
+ if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
return 0;
return set_memory_decrypted((unsigned long)vaddr, PFN_UP(size));
}
@@ -88,7 +88,7 @@ static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
{
int ret;
- if (!force_dma_unencrypted(dev))
+ if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
return 0;
ret = set_memory_encrypted((unsigned long)vaddr, PFN_UP(size));
if (ret)
--
2.53.0.1185.g05d4b7b318-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v2 2/5] dma-mapping: Use the correct phys_to_dma() for DMA_RESTRICTED_POOL
2026-03-30 14:50 [RFC PATCH v2 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
2026-03-30 14:50 ` [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL Mostafa Saleh
@ 2026-03-30 14:50 ` Mostafa Saleh
2026-03-30 15:09 ` Jason Gunthorpe
2026-03-30 14:50 ` [RFC PATCH v2 3/5] dma-mapping: Decrypt memory on remap Mostafa Saleh
` (2 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-30 14:50 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
As restricted dma pools are always decrypted, in swiotlb.c it uses
phys_to_dma_unencrypted() for address conversion.
However, in DMA-direct, calls to phys_to_dma_direct() with
force_dma_unencrypted() returning false, will fallback to
phys_to_dma() which is inconsistent for memory allocated from
restricted dma pools.
Fixes: f4111e39a52a ("swiotlb: Add restricted DMA alloc/free support")
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
kernel/dma/direct.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 27d804f0473f..1a402bb956d9 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -26,7 +26,7 @@ u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
static inline dma_addr_t phys_to_dma_direct(struct device *dev,
phys_addr_t phys)
{
- if (force_dma_unencrypted(dev))
+ if (force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
return phys_to_dma_unencrypted(dev, phys);
return phys_to_dma(dev, phys);
}
--
2.53.0.1185.g05d4b7b318-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v2 3/5] dma-mapping: Decrypt memory on remap
2026-03-30 14:50 [RFC PATCH v2 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
2026-03-30 14:50 ` [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL Mostafa Saleh
2026-03-30 14:50 ` [RFC PATCH v2 2/5] dma-mapping: Use the correct phys_to_dma() for DMA_RESTRICTED_POOL Mostafa Saleh
@ 2026-03-30 14:50 ` Mostafa Saleh
2026-03-30 15:19 ` Jason Gunthorpe
2026-03-30 14:50 ` [RFC PATCH v2 4/5] dma-mapping: Refactor memory encryption usage Mostafa Saleh
2026-03-30 14:50 ` [RFC PATCH v2 5/5] dma-mapping: Add doc for memory encryption Mostafa Saleh
4 siblings, 1 reply; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-30 14:50 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
In case memory needs to be remapped on systems with
force_dma_unencrypted(), where this memory is not allocated
from a restricted-dma pool, this was currently ignored, while only
setting the decrypted pgprot in the remapped alias.
The memory still needs to be decrypted in that case.
With memory decryption, don't allow highmem allocations, but that
shouldn't be a problem on such modern systems.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: f3c962226dbe ("dma-direct: clean up the remapping checks in dma_direct_alloc")
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
kernel/dma/direct.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 1a402bb956d9..a4260689bcc8 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -203,6 +203,7 @@ static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
{
+ bool allow_highmem = !force_dma_unencrypted(dev);
bool remap = false, set_uncached = false;
struct page *page;
void *ret;
@@ -251,7 +252,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
/* we always manually zero the memory once we are done */
- page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true);
+ page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, allow_highmem);
if (!page)
return NULL;
@@ -265,6 +266,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
set_uncached = false;
}
+ if (dma_set_decrypted(dev, page_address(page), size))
+ goto out_leak_pages;
+
if (remap) {
pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
@@ -278,11 +282,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
ret = dma_common_contiguous_remap(page, size, prot,
__builtin_return_address(0));
if (!ret)
- goto out_free_pages;
+ goto out_encrypt_pages;
} else {
ret = page_address(page);
- if (dma_set_decrypted(dev, ret, size))
- goto out_leak_pages;
}
memset(ret, 0, size);
@@ -300,7 +302,6 @@ void *dma_direct_alloc(struct device *dev, size_t size,
out_encrypt_pages:
if (dma_set_encrypted(dev, page_address(page), size))
return NULL;
-out_free_pages:
__dma_direct_free_pages(dev, page, size);
return NULL;
out_leak_pages:
@@ -339,7 +340,12 @@ void dma_direct_free(struct device *dev, size_t size,
return;
if (is_vmalloc_addr(cpu_addr)) {
+ void *vaddr = page_address(dma_direct_to_page(dev, dma_addr));
+
vunmap(cpu_addr);
+
+ if (dma_set_encrypted(dev, vaddr, size))
+ return;
} else {
if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED))
arch_dma_clear_uncached(cpu_addr, size);
--
2.53.0.1185.g05d4b7b318-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v2 4/5] dma-mapping: Refactor memory encryption usage
2026-03-30 14:50 [RFC PATCH v2 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
` (2 preceding siblings ...)
2026-03-30 14:50 ` [RFC PATCH v2 3/5] dma-mapping: Decrypt memory on remap Mostafa Saleh
@ 2026-03-30 14:50 ` Mostafa Saleh
2026-03-30 15:27 ` Jason Gunthorpe
2026-03-30 14:50 ` [RFC PATCH v2 5/5] dma-mapping: Add doc for memory encryption Mostafa Saleh
4 siblings, 1 reply; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-30 14:50 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
At the moment dma-direct deals with memory encryption in 2 cases
- Pre-decrypted restricted dma-pools
- Arch code through force_dma_unencrypted()
In the first case, the memory is owned by the pool and the decryption
is not managed by the dma-direct.
However, it should be aware of it to use the appropriate phys_to_dma*
and page table prot.
For the second case, it’s the job of the dma-direct to manage the
decryption of the allocated memory.
As there have been bugs in this code due to wrong or missing
checks and there are more use cases coming for memory decryption,
we need more robust checks in the code to abstract the core logic,
so introduce some local helpers:
- dma_external_decryption(): For pages decrypted but managed externally
- dma_owns_decryption(): For pages need to be decrypted and managed
by dma-direct
- is_dma_decrypted(): To check if memory is decrypted
Note that this patch is not a no-op as there are some subtle changes
which are actually theoretical bug fixes in dma_direct_mmap() and
dma_direct_alloc() where the wrong prot might be used for remap.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
kernel/dma/direct.c | 37 +++++++++++++++++++++++++++----------
1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a4260689bcc8..1078e1b38a34 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -23,10 +23,27 @@
*/
u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
+/* Memory is decrypted and managed externally. */
+static inline bool dma_external_decryption(struct device *dev)
+{
+ return is_swiotlb_for_alloc(dev);
+}
+
+/* Memory needs to be decrypted by the dma-direct layer. */
+static inline bool dma_owns_decryption(struct device *dev)
+{
+ return force_dma_unencrypted(dev) && !dma_external_decryption(dev);
+}
+
+static inline bool is_dma_decrypted(struct device *dev)
+{
+ return force_dma_unencrypted(dev) || dma_external_decryption(dev);
+}
+
static inline dma_addr_t phys_to_dma_direct(struct device *dev,
phys_addr_t phys)
{
- if (force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
+ if (is_dma_decrypted(dev))
return phys_to_dma_unencrypted(dev, phys);
return phys_to_dma(dev, phys);
}
@@ -79,7 +96,7 @@ bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
{
- if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
+ if (!dma_owns_decryption(dev))
return 0;
return set_memory_decrypted((unsigned long)vaddr, PFN_UP(size));
}
@@ -88,7 +105,7 @@ static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
{
int ret;
- if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
+ if (!dma_owns_decryption(dev))
return 0;
ret = set_memory_encrypted((unsigned long)vaddr, PFN_UP(size));
if (ret)
@@ -203,7 +220,7 @@ static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
{
- bool allow_highmem = !force_dma_unencrypted(dev);
+ bool allow_highmem = !dma_owns_decryption(dev);
bool remap = false, set_uncached = false;
struct page *page;
void *ret;
@@ -213,7 +230,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
- !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev))
+ !is_dma_decrypted(dev))
return dma_direct_alloc_no_mapping(dev, size, dma_handle, gfp);
if (!dev_is_dma_coherent(dev)) {
@@ -247,7 +264,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
* Remapping or decrypting memory may block, allocate the memory from
* the atomic pools instead if we aren't allowed block.
*/
- if ((remap || force_dma_unencrypted(dev)) &&
+ if ((remap || dma_owns_decryption(dev)) &&
dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
@@ -272,7 +289,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
if (remap) {
pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
- if (force_dma_unencrypted(dev))
+ if (is_dma_decrypted(dev))
prot = pgprot_decrypted(prot);
/* remove any dirty cache lines on the kernel alias */
@@ -314,7 +331,7 @@ void dma_direct_free(struct device *dev, size_t size,
unsigned int page_order = get_order(size);
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
- !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
+ !is_dma_decrypted(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
dma_free_contiguous(dev, cpu_addr, size);
return;
@@ -362,7 +379,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
struct page *page;
void *ret;
- if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp))
+ if (dma_owns_decryption(dev) && dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
page = __dma_direct_alloc_pages(dev, size, gfp, false);
@@ -530,7 +547,7 @@ int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
int ret = -ENXIO;
vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs);
- if (force_dma_unencrypted(dev))
+ if (is_dma_decrypted(dev))
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
--
2.53.0.1185.g05d4b7b318-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v2 5/5] dma-mapping: Add doc for memory encryption
2026-03-30 14:50 [RFC PATCH v2 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
` (3 preceding siblings ...)
2026-03-30 14:50 ` [RFC PATCH v2 4/5] dma-mapping: Refactor memory encryption usage Mostafa Saleh
@ 2026-03-30 14:50 ` Mostafa Saleh
4 siblings, 0 replies; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-30 14:50 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: robin.murphy, m.szyprowski, will, maz, suzuki.poulose,
catalin.marinas, jiri, jgg, aneesh.kumar, Mostafa Saleh
Add a document for memory encryption usage with dma-direct.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../core-api/dma-direct-memory-encryption.rst | 77 +++++++++++++++++++
1 file changed, 77 insertions(+)
create mode 100644 Documentation/core-api/dma-direct-memory-encryption.rst
diff --git a/Documentation/core-api/dma-direct-memory-encryption.rst b/Documentation/core-api/dma-direct-memory-encryption.rst
new file mode 100644
index 000000000000..a780279292b5
--- /dev/null
+++ b/Documentation/core-api/dma-direct-memory-encryption.rst
@@ -0,0 +1,77 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================
+DMA Direct and Memory Encryption Integration
+============================================
+
+Introduction
+------------
+Modern platforms introduce memory encryption features (e.g., AMD SEV, Intel TDX,
+ARM CCA, and pKVM) typically for CoCo when running protected virtual machines.
+
+These guests typically boot with their memory encrypted by default.
+
+In some cases this memory needs to be accessed by the untrusted host or the
+VMM which then requires this memory to be decrypted. One typical case is
+dealing with emulated device (e.g., virtio) which are handled by direct-dma
+code as these devices are not behind an IOMMU.
+
+That means, the memory used by these devices must be decrypted before accessed
+by the untrusted host.
+
+It must be clarified that encrypted/decrypted may not always be
+cryptographic; in a broader sense, a decrypted page means that it is
+accessible or "shared" with the untrusted host.
+
+Ownership
+---------
+The direct-dma layer deals with memory encryption in two distinct scenarios:
+
+1. **Externally Managed Decryption (e.g., Restricted DMA Pools)**
+ In some setups (like a device restricted to a specific SWIOTLB pool, i.e.,
+ `DMA_RESTRICTED_POOL`), an entire region of memory is pre-decrypted during
+ boot or pool initialization. The memory is owned by the pool, and the
+ transitions (encryption/decryption) are **not** managed by direct-dma on
+ a per-allocation basis.
+ See Documentation/core-api/swiotlb.rst
+
+2. **DMA Direct Managed Decryption (e.g., `force_dma_unencrypted()`)**
+ For standard coherent DMA allocations the direct-dma layer is explicitly
+ responsible for managing the decryption. It must decrypt the pages upon
+ allocation and re-encrypt them upon freeing.
+
+To cleanly separate these concerns, the core logic is abstracted via three
+internal helpers:
+
+* ``dma_external_decryption(dev)``: Returns true if the pages are decrypted but
+ managed externally. For example, if the device allocates from a restricted
+ DMA pool.
+* ``dma_owns_decryption(dev)``: Returns true if the pages need to be explicitly
+ decrypted and managed by the direct-dma layer (i.e., the architecture forces
+ unencrypted DMA, and it's not handled by an external pool).
+* ``is_dma_decrypted(dev)``: Returns true if the memory being used is in a
+ decrypted state, regardless of who manages it.
+
+Addressing and Page Protections
+-------------------------------
+When memory is decrypted (whether externally or by direct-dma), the layer must
+adjust physical-to-DMA address conversions and page protections:
+
+* **DMA Address Conversion:**
+ Decrypted memory often requires a specific bit to be cleared or set in the DMA
+ address (e.g., stripping the encryption bit). If ``is_dma_decrypted(dev)`` is
+ true, the conversion uses ``phys_to_dma_unencrypted()`` instead of the standard
+ ``phys_to_dma()``.
+
+* **Page Protections (Remap and Mmap):**
+ When remapping decrypted pages into the kernel virtual address space (vmalloc)
+ or mapping them to user space via ``mmap()``, the page protection attributes
+ must reflect the decrypted state. If ``is_dma_decrypted(dev)`` is true, the
+ layer applies ``pgprot_decrypted(prot)`` to ensure the CPU accesses the memory
+ with the correct encryption attributes.
+
+Notes
+-----
+In many cases when memory encryption/decryption fails the page will be leaked,
+that's was added for TDX, where ``set_memory_encrypted()`` or
+``set_memory_decrypted()``may fail while the page remains shared.
--
2.53.0.1185.g05d4b7b318-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL
2026-03-30 14:50 ` [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL Mostafa Saleh
@ 2026-03-30 15:06 ` Jason Gunthorpe
2026-03-30 20:43 ` Mostafa Saleh
0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2026-03-30 15:06 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Mon, Mar 30, 2026 at 02:50:39PM +0000, Mostafa Saleh wrote:
> In case a device have a restricted DMA pool, it will be decrypted
> by default.
>
> However, in the path of dma_direct_alloc() memory can be allocated
> from this pool using, __dma_direct_alloc_pages() =>
> dma_direct_alloc_swiotlb()
>
> After that from the same function, it will attempt to decrypt it
> using dma_set_decrypted() if force_dma_unencrypted().
>
> Which results in the memory being decrypted twice.
>
> It's not clear how the does realm world/hypervisors deal with that,
> for example:
> - CCA: Clear a bit in the page table and call realm IPA_STATE_SET.
> - TDX: Issue a hypercall.
> - pKVM: Which doesn't implement force_dma_unencrypted() at the moment,
> uses a share hypercall.
>
> Change that to only encrypt/decrypt memory that are not allocated
> from the restricted dma pools.
>
> Fixes: f4111e39a52a ("swiotlb: Add restricted DMA alloc/free support")
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
> kernel/dma/direct.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 8f43a930716d..27d804f0473f 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -79,7 +79,7 @@ bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
>
> static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
> {
> - if (!force_dma_unencrypted(dev))
> + if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
> return 0;
This seems really obtuse, I would expect the decryption state of the
memory to be known by the caller. If dma_direct_alloc_swiotlb() can
return decrypted or encrypted memory it needs to return a flag saying
that. It shouldn't be deduced by checking dev flags in random places
like this.
Double decryption is certainly a bug, I do not expect that to work.
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 2/5] dma-mapping: Use the correct phys_to_dma() for DMA_RESTRICTED_POOL
2026-03-30 14:50 ` [RFC PATCH v2 2/5] dma-mapping: Use the correct phys_to_dma() for DMA_RESTRICTED_POOL Mostafa Saleh
@ 2026-03-30 15:09 ` Jason Gunthorpe
2026-03-30 20:47 ` Mostafa Saleh
0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2026-03-30 15:09 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Mon, Mar 30, 2026 at 02:50:40PM +0000, Mostafa Saleh wrote:
> As restricted dma pools are always decrypted, in swiotlb.c it uses
> phys_to_dma_unencrypted() for address conversion.
>
> However, in DMA-direct, calls to phys_to_dma_direct() with
> force_dma_unencrypted() returning false, will fallback to
> phys_to_dma() which is inconsistent for memory allocated from
> restricted dma pools.
>
> Fixes: f4111e39a52a ("swiotlb: Add restricted DMA alloc/free support")
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
> kernel/dma/direct.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 27d804f0473f..1a402bb956d9 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -26,7 +26,7 @@ u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
> static inline dma_addr_t phys_to_dma_direct(struct device *dev,
> phys_addr_t phys)
> {
> - if (force_dma_unencrypted(dev))
> + if (force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
> return phys_to_dma_unencrypted(dev, phys);
Same remark, I think the force_dma_unencrypted() was a hack to make up
for a flag here. In these lower layers we need to annotate if phys is
encrypted/decrypted and take the proper action universially.
The force_dma_unencrypted() should only be done way up the call chain
where we decide to get a phys that is decrypted. Once we have a
decrypted phys it should be carried with an annotation throughout all
the other places.
Then this is more like:
if (flags & FLAG_DECRYPTED)
return phys_to_dma_unencrypted(dev, phys);
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 3/5] dma-mapping: Decrypt memory on remap
2026-03-30 14:50 ` [RFC PATCH v2 3/5] dma-mapping: Decrypt memory on remap Mostafa Saleh
@ 2026-03-30 15:19 ` Jason Gunthorpe
2026-03-30 20:49 ` Mostafa Saleh
0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2026-03-30 15:19 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Mon, Mar 30, 2026 at 02:50:41PM +0000, Mostafa Saleh wrote:
> @@ -265,6 +266,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> set_uncached = false;
> }
>
> + if (dma_set_decrypted(dev, page_address(page), size))
> + goto out_leak_pages;
> +
> if (remap) {
> pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
>
> if (force_dma_unencrypted(dev))
> prot = pgprot_decrypted(prot);
It seems confusing, why do we unconditionally call something called
dma_set_decrypted() and then conditionally call pgprot_decrypted()?
So, I think the same remark, lets not sprinkle these tests all over
the place and risk them becoming inconsistent. It should be much more
direct, like:
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO,
allow_highmem, &flags);
if (!dev_can_dma_from_encrypted(dev) && !(flags & FLAG_DECRYPTED)) {
dma_set_decrypted(dev, page_address(page));
flags = FLAG_DECRYPTED;
}
if (flags & FLAG_DECRYPTED)
prot = pgprot_decrypted(prot);
And so on.
The one place we should see a force_dma_unencrypted() is directly
before setting the flag.
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 4/5] dma-mapping: Refactor memory encryption usage
2026-03-30 14:50 ` [RFC PATCH v2 4/5] dma-mapping: Refactor memory encryption usage Mostafa Saleh
@ 2026-03-30 15:27 ` Jason Gunthorpe
0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2026-03-30 15:27 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Mon, Mar 30, 2026 at 02:50:42PM +0000, Mostafa Saleh wrote:
> At the moment dma-direct deals with memory encryption in 2 cases
> - Pre-decrypted restricted dma-pools
> - Arch code through force_dma_unencrypted()
>
> In the first case, the memory is owned by the pool and the decryption
> is not managed by the dma-direct.
>
> However, it should be aware of it to use the appropriate phys_to_dma*
> and page table prot.
>
> For the second case, it’s the job of the dma-direct to manage the
> decryption of the allocated memory.
>
> As there have been bugs in this code due to wrong or missing
> checks and there are more use cases coming for memory decryption,
> we need more robust checks in the code to abstract the core logic,
> so introduce some local helpers:
> - dma_external_decryption(): For pages decrypted but managed externally
> - dma_owns_decryption(): For pages need to be decrypted and managed
> by dma-direct
> - is_dma_decrypted(): To check if memory is decrypted
I can't even make sense of what this is trying to explain, talking
about page management along with 'dev' is nonsense. The management of
pages is intrinsic to the API, it doesn't change.
I think start with adding a direct flags annotation and then come back
to figure out if we need some kind of helpers.
I would expect any helper taking in dev to only be answering two dev
questions:
'dev can dma from encrypted(dev)'
'dev can dma from unencrytped(dev)'
At each of the points in the API flow the phys under consideration is
known to be encrypted or decrypted, and those two helpers tell
everything needed.
So I'd expect the restricted flow to look more like
- phys comes in to be dma mapped, it is encrypted
- 'dev can dma from encrypted(dev)' fails so we go to swiotlb
- swiotlb allocates from a restricted pool, and learns through flags
that the new phys is decrypted
- decrypted phys flows through the rest of the machinery.
We never check 'dev can dma from encrypted(dev)' a second time.
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL
2026-03-30 15:06 ` Jason Gunthorpe
@ 2026-03-30 20:43 ` Mostafa Saleh
2026-03-31 11:34 ` Suzuki K Poulose
0 siblings, 1 reply; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-30 20:43 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Mon, Mar 30, 2026 at 12:06:54PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 30, 2026 at 02:50:39PM +0000, Mostafa Saleh wrote:
> > In case a device have a restricted DMA pool, it will be decrypted
> > by default.
> >
> > However, in the path of dma_direct_alloc() memory can be allocated
> > from this pool using, __dma_direct_alloc_pages() =>
> > dma_direct_alloc_swiotlb()
> >
> > After that from the same function, it will attempt to decrypt it
> > using dma_set_decrypted() if force_dma_unencrypted().
> >
> > Which results in the memory being decrypted twice.
> >
> > It's not clear how the does realm world/hypervisors deal with that,
> > for example:
> > - CCA: Clear a bit in the page table and call realm IPA_STATE_SET.
> > - TDX: Issue a hypercall.
> > - pKVM: Which doesn't implement force_dma_unencrypted() at the moment,
> > uses a share hypercall.
> >
> > Change that to only encrypt/decrypt memory that are not allocated
> > from the restricted dma pools.
> >
> > Fixes: f4111e39a52a ("swiotlb: Add restricted DMA alloc/free support")
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > ---
> > kernel/dma/direct.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 8f43a930716d..27d804f0473f 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -79,7 +79,7 @@ bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
> >
> > static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
> > {
> > - if (!force_dma_unencrypted(dev))
> > + if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
> > return 0;
>
> This seems really obtuse, I would expect the decryption state of the
> memory to be known by the caller. If dma_direct_alloc_swiotlb() can
> return decrypted or encrypted memory it needs to return a flag saying
> that. It shouldn't be deduced by checking dev flags in random places
> like this.
At the moment restricted dma is always decrypted, also it’s per device
so we don’t have to check this per allocation.
I can change the signature for __dma_direct_alloc_pages() to make it
return an extra flag but that feels more complicated as it changes
dma_direct_alloc_swiotlb() , swiotlb_alloc() with its callers.
I can investigate this approach further.
Thanks,
Mostafa
>
> Double decryption is certainly a bug, I do not expect that to work.
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 2/5] dma-mapping: Use the correct phys_to_dma() for DMA_RESTRICTED_POOL
2026-03-30 15:09 ` Jason Gunthorpe
@ 2026-03-30 20:47 ` Mostafa Saleh
2026-03-30 22:28 ` Jason Gunthorpe
0 siblings, 1 reply; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-30 20:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Mon, Mar 30, 2026 at 12:09:03PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 30, 2026 at 02:50:40PM +0000, Mostafa Saleh wrote:
> > As restricted dma pools are always decrypted, in swiotlb.c it uses
> > phys_to_dma_unencrypted() for address conversion.
> >
> > However, in DMA-direct, calls to phys_to_dma_direct() with
> > force_dma_unencrypted() returning false, will fallback to
> > phys_to_dma() which is inconsistent for memory allocated from
> > restricted dma pools.
> >
> > Fixes: f4111e39a52a ("swiotlb: Add restricted DMA alloc/free support")
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > ---
> > kernel/dma/direct.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 27d804f0473f..1a402bb956d9 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -26,7 +26,7 @@ u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
> > static inline dma_addr_t phys_to_dma_direct(struct device *dev,
> > phys_addr_t phys)
> > {
> > - if (force_dma_unencrypted(dev))
> > + if (force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
> > return phys_to_dma_unencrypted(dev, phys);
>
> Same remark, I think the force_dma_unencrypted() was a hack to make up
> for a flag here. In these lower layers we need to annotate if phys is
> encrypted/decrypted and take the proper action universially.
>
> The force_dma_unencrypted() should only be done way up the call chain
> where we decide to get a phys that is decrypted. Once we have a
> decrypted phys it should be carried with an annotation throughout all
> the other places.
Can you please clarify what you mean by annotation in this context?
As I believe any tracking in the vmemmap is a big NO.
As replied to the first patch I can attempt to implement this approach
(by passing a flag around) and see how intrusive it would be.
Thanks,
Mostafa
>
> Then this is more like:
>
> if (flags & FLAG_DECRYPTED)
> return phys_to_dma_unencrypted(dev, phys);
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 3/5] dma-mapping: Decrypt memory on remap
2026-03-30 15:19 ` Jason Gunthorpe
@ 2026-03-30 20:49 ` Mostafa Saleh
2026-03-30 22:30 ` Jason Gunthorpe
0 siblings, 1 reply; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-30 20:49 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Mon, Mar 30, 2026 at 12:19:00PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 30, 2026 at 02:50:41PM +0000, Mostafa Saleh wrote:
>
> > @@ -265,6 +266,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> > set_uncached = false;
> > }
> >
> > + if (dma_set_decrypted(dev, page_address(page), size))
> > + goto out_leak_pages;
> > +
> > if (remap) {
> > pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
> >
> > if (force_dma_unencrypted(dev))
> > prot = pgprot_decrypted(prot);
>
> It seems confusing, why do we unconditionally call something called
> dma_set_decrypted() and then conditionally call pgprot_decrypted()?
dma_set_decrypted() will call force_dma_unencrypted() so that check
is consistent.
>
> So, I think the same remark, lets not sprinkle these tests all over
> the place and risk them becoming inconsistent. It should be much more
> direct, like:
I agree we shouldn’t be sprinkling all these random calls all over the code,
that’s why I was trying to consolidate the logic in the next patch.
>
> page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO,
> allow_highmem, &flags);
>
> if (!dev_can_dma_from_encrypted(dev) && !(flags & FLAG_DECRYPTED)) {
> dma_set_decrypted(dev, page_address(page));
> flags = FLAG_DECRYPTED;
> }
>
> if (flags & FLAG_DECRYPTED)
> prot = pgprot_decrypted(prot);
>
> And so on.
>
> The one place we should see a force_dma_unencrypted() is directly
> before setting the flag.
I will look more into this, but my main worry would be
phys_to_dma_direct() and it's callers as I am not sure if is possible to
preserve the alloction origin in all contexts.
Thanks,
Mostafa
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 2/5] dma-mapping: Use the correct phys_to_dma() for DMA_RESTRICTED_POOL
2026-03-30 20:47 ` Mostafa Saleh
@ 2026-03-30 22:28 ` Jason Gunthorpe
0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2026-03-30 22:28 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Mon, Mar 30, 2026 at 08:47:41PM +0000, Mostafa Saleh wrote:
> > The force_dma_unencrypted() should only be done way up the call chain
> > where we decide to get a phys that is decrypted. Once we have a
> > decrypted phys it should be carried with an annotation throughout all
> > the other places.
>
> Can you please clarify what you mean by annotation in this context?
> As I believe any tracking in the vmemmap is a big NO.
It would have to be a flag pass along the phys, or phys & flag in a
struct or some kind of approach like that.
> As replied to the first patch I can attempt to implement this approach
> (by passing a flag around) and see how intrusive it would be.
I'm less concerned about intrusive and more about making this
understandable.
When we reach a function with a phys it should know what that phys is,
not call a bunch of random helpers to hopefully correctly guess what
it is, that's unmaintainable spaghetti even if it is fewer changes.
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 3/5] dma-mapping: Decrypt memory on remap
2026-03-30 20:49 ` Mostafa Saleh
@ 2026-03-30 22:30 ` Jason Gunthorpe
0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2026-03-30 22:30 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
suzuki.poulose, catalin.marinas, jiri, aneesh.kumar
On Mon, Mar 30, 2026 at 08:49:31PM +0000, Mostafa Saleh wrote:
> > The one place we should see a force_dma_unencrypted() is directly
> > before setting the flag.
>
> I will look more into this, but my main worry would be
> phys_to_dma_direct() and it's callers as I am not sure if is possible to
> preserve the alloction origin in all contexts.
I'd rather have a very small number of places that recover the phys &
flag through some convoluted logic than have it sprinkled all over the
place in ever layer..
The helpers were not helpful, they were much more confusing.
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL
2026-03-30 20:43 ` Mostafa Saleh
@ 2026-03-31 11:34 ` Suzuki K Poulose
2026-03-31 12:50 ` Mostafa Saleh
0 siblings, 1 reply; 17+ messages in thread
From: Suzuki K Poulose @ 2026-03-31 11:34 UTC (permalink / raw)
To: Mostafa Saleh, Jason Gunthorpe
Cc: iommu, linux-kernel, robin.murphy, m.szyprowski, will, maz,
catalin.marinas, jiri, aneesh.kumar
On 30/03/2026 21:43, Mostafa Saleh wrote:
> On Mon, Mar 30, 2026 at 12:06:54PM -0300, Jason Gunthorpe wrote:
>> On Mon, Mar 30, 2026 at 02:50:39PM +0000, Mostafa Saleh wrote:
>>> In case a device have a restricted DMA pool, it will be decrypted
>>> by default.
>>>
>>> However, in the path of dma_direct_alloc() memory can be allocated
>>> from this pool using, __dma_direct_alloc_pages() =>
>>> dma_direct_alloc_swiotlb()
>>>
>>> After that from the same function, it will attempt to decrypt it
>>> using dma_set_decrypted() if force_dma_unencrypted().
>>>
>>> Which results in the memory being decrypted twice.
>>>
>>> It's not clear how the does realm world/hypervisors deal with that,
>>> for example:
>>> - CCA: Clear a bit in the page table and call realm IPA_STATE_SET.
>>> - TDX: Issue a hypercall.
>>> - pKVM: Which doesn't implement force_dma_unencrypted() at the moment,
>>> uses a share hypercall.
>>>
>>> Change that to only encrypt/decrypt memory that are not allocated
>>> from the restricted dma pools.
>>>
>>> Fixes: f4111e39a52a ("swiotlb: Add restricted DMA alloc/free support")
>>> Signed-off-by: Mostafa Saleh <smostafa@google.com>
>>> ---
>>> kernel/dma/direct.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>>> index 8f43a930716d..27d804f0473f 100644
>>> --- a/kernel/dma/direct.c
>>> +++ b/kernel/dma/direct.c
>>> @@ -79,7 +79,7 @@ bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
>>>
>>> static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
>>> {
>>> - if (!force_dma_unencrypted(dev))
>>> + if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
>>> return 0;
>>
>> This seems really obtuse, I would expect the decryption state of the
>> memory to be known by the caller. If dma_direct_alloc_swiotlb() can
>> return decrypted or encrypted memory it needs to return a flag saying
>> that. It shouldn't be deduced by checking dev flags in random places
>> like this.
>
> At the moment restricted dma is always decrypted, also it’s per device
> so we don’t have to check this per allocation.
Doesn't the initial state depend on platform ? For CCA, the Realm must
decide how it wants to use a given region, which for the restricted DMA
pool, it can be made decrypted. Could the VM OS decide to make this
decrypted at boot ?
Suzuki
> I can change the signature for __dma_direct_alloc_pages() to make it
> return an extra flag but that feels more complicated as it changes
> dma_direct_alloc_swiotlb() , swiotlb_alloc() with its callers.
>
> I can investigate this approach further.
>
> Thanks,
> Mostafa
>
>>
>> Double decryption is certainly a bug, I do not expect that to work.
>>
>> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL
2026-03-31 11:34 ` Suzuki K Poulose
@ 2026-03-31 12:50 ` Mostafa Saleh
0 siblings, 0 replies; 17+ messages in thread
From: Mostafa Saleh @ 2026-03-31 12:50 UTC (permalink / raw)
To: Suzuki K Poulose
Cc: Jason Gunthorpe, iommu, linux-kernel, robin.murphy, m.szyprowski,
will, maz, catalin.marinas, jiri, aneesh.kumar
On Tue, Mar 31, 2026 at 12:34:20PM +0100, Suzuki K Poulose wrote:
> On 30/03/2026 21:43, Mostafa Saleh wrote:
> > On Mon, Mar 30, 2026 at 12:06:54PM -0300, Jason Gunthorpe wrote:
> > > On Mon, Mar 30, 2026 at 02:50:39PM +0000, Mostafa Saleh wrote:
> > > > In case a device have a restricted DMA pool, it will be decrypted
> > > > by default.
> > > >
> > > > However, in the path of dma_direct_alloc() memory can be allocated
> > > > from this pool using, __dma_direct_alloc_pages() =>
> > > > dma_direct_alloc_swiotlb()
> > > >
> > > > After that from the same function, it will attempt to decrypt it
> > > > using dma_set_decrypted() if force_dma_unencrypted().
> > > >
> > > > Which results in the memory being decrypted twice.
> > > >
> > > > It's not clear how the does realm world/hypervisors deal with that,
> > > > for example:
> > > > - CCA: Clear a bit in the page table and call realm IPA_STATE_SET.
> > > > - TDX: Issue a hypercall.
> > > > - pKVM: Which doesn't implement force_dma_unencrypted() at the moment,
> > > > uses a share hypercall.
> > > >
> > > > Change that to only encrypt/decrypt memory that are not allocated
> > > > from the restricted dma pools.
> > > >
> > > > Fixes: f4111e39a52a ("swiotlb: Add restricted DMA alloc/free support")
> > > > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > > > ---
> > > > kernel/dma/direct.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > > > index 8f43a930716d..27d804f0473f 100644
> > > > --- a/kernel/dma/direct.c
> > > > +++ b/kernel/dma/direct.c
> > > > @@ -79,7 +79,7 @@ bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
> > > > static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
> > > > {
> > > > - if (!force_dma_unencrypted(dev))
> > > > + if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
> > > > return 0;
> > >
> > > This seems really obtuse, I would expect the decryption state of the
> > > memory to be known by the caller. If dma_direct_alloc_swiotlb() can
> > > return decrypted or encrypted memory it needs to return a flag saying
> > > that. It shouldn't be deduced by checking dev flags in random places
> > > like this.
> >
> > At the moment restricted dma is always decrypted, also it’s per device
> > so we don’t have to check this per allocation.
>
> Doesn't the initial state depend on platform ? For CCA, the Realm must
> decide how it wants to use a given region, which for the restricted DMA
> pool, it can be made decrypted. Could the VM OS decide to make this
> decrypted at boot ?
>
At the moment no [1], the pool is decrypted unconditionally.
As mentioned in the cover letter under "Future work", I believe
giving the OS the ability to have undecrypted pools is important for
confidential DMA.
Initially, I thought that can be a per device property (so the
platform will keep the memory encrypted for physical devices and
decrypt it for emulated ones).
But, that might cause runtime issues as a pool can be shared between
multiple devices. So, I beleive it's better to have this as per-pool
property(from device-tree for ex) and then the platform can do any
validation of assumptions it needs in the runtime.
This is a bit hairy, as I mentioned it would be a good topic to
discuss in the next LPC.
Thanks,
Mostafa
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/swiotlb.c#n1847
> Suzuki
>
>
> > I can change the signature for __dma_direct_alloc_pages() to make it
> > return an extra flag but that feels more complicated as it changes
> > dma_direct_alloc_swiotlb() , swiotlb_alloc() with its callers.
> >
> > I can investigate this approach further.
> >
> > Thanks,
> > Mostafa
> >
> > >
> > > Double decryption is certainly a bug, I do not expect that to work.
> > >
> > > Jason
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-03-31 12:51 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 14:50 [RFC PATCH v2 0/5] dma-mapping: Fixes for memory encryption Mostafa Saleh
2026-03-30 14:50 ` [RFC PATCH v2 1/5] dma-mapping: Avoid double decrypting with DMA_RESTRICTED_POOL Mostafa Saleh
2026-03-30 15:06 ` Jason Gunthorpe
2026-03-30 20:43 ` Mostafa Saleh
2026-03-31 11:34 ` Suzuki K Poulose
2026-03-31 12:50 ` Mostafa Saleh
2026-03-30 14:50 ` [RFC PATCH v2 2/5] dma-mapping: Use the correct phys_to_dma() for DMA_RESTRICTED_POOL Mostafa Saleh
2026-03-30 15:09 ` Jason Gunthorpe
2026-03-30 20:47 ` Mostafa Saleh
2026-03-30 22:28 ` Jason Gunthorpe
2026-03-30 14:50 ` [RFC PATCH v2 3/5] dma-mapping: Decrypt memory on remap Mostafa Saleh
2026-03-30 15:19 ` Jason Gunthorpe
2026-03-30 20:49 ` Mostafa Saleh
2026-03-30 22:30 ` Jason Gunthorpe
2026-03-30 14:50 ` [RFC PATCH v2 4/5] dma-mapping: Refactor memory encryption usage Mostafa Saleh
2026-03-30 15:27 ` Jason Gunthorpe
2026-03-30 14:50 ` [RFC PATCH v2 5/5] dma-mapping: Add doc for memory encryption Mostafa Saleh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox