* [PATCH v6 14/20] dma-direct: return struct page from dma_direct_alloc_from_pool()
From: Aneesh Kumar K.V (Arm) @ 2026-06-04 8:39 UTC (permalink / raw)
To: iommu, linux-arm-kernel, linux-kernel, linux-coco
Cc: Aneesh Kumar K.V (Arm), Robin Murphy, Marek Szyprowski,
Will Deacon, Marc Zyngier, Steven Price, Suzuki K Poulose,
Catalin Marinas, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, stable, Michael Kelley
In-Reply-To: <20260604083959.1265923-1-aneesh.kumar@kernel.org>
Commit 5b138c534fda ("dma-direct: factor out a dma_direct_alloc_from_pool
helper") changed dma_direct_alloc_from_pool() to return the CPU address
from dma_alloc_from_pool(). That fits dma_direct_alloc(), but
dma_direct_alloc_pages() also uses the helper and expects a struct page *.
Fix this by making dma_direct_alloc_from_pool() return the struct page *
again, and pass the CPU address back through an out-parameter for the
dma_direct_alloc() caller.
Fixes: 5b138c534fda ("dma-direct: factor out a dma_direct_alloc_from_pool helper")
Cc: stable@vger.kernel.org
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
kernel/dma/direct.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4e446aa4130e..e0ab9ff3f1d6 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -157,24 +157,24 @@ static bool dma_direct_use_pool(struct device *dev, gfp_t gfp)
return !gfpflags_allow_blocking(gfp) && !is_swiotlb_for_alloc(dev);
}
-static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+static struct page *dma_direct_alloc_from_pool(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, void **cpu_addr, gfp_t gfp,
+ unsigned long attrs)
{
struct page *page;
u64 phys_limit;
- void *ret;
if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_DMA_COHERENT_POOL)))
return NULL;
gfp |= dma_direct_optimal_gfp_mask(dev, &phys_limit);
- page = dma_alloc_from_pool(dev, size, &ret, gfp, attrs,
+ page = dma_alloc_from_pool(dev, size, cpu_addr, gfp, attrs,
dma_coherent_ok);
if (!page)
return NULL;
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page),
!!(attrs & DMA_ATTR_CC_SHARED));
- return ret;
+ return page;
}
static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
@@ -270,9 +270,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
* the atomic pools instead if we aren't allowed block.
*/
if ((remap || (attrs & DMA_ATTR_CC_SHARED)) &&
- dma_direct_use_pool(dev, gfp))
- return dma_direct_alloc_from_pool(dev, size, dma_handle,
- gfp, attrs);
+ dma_direct_use_pool(dev, gfp)) {
+ page = dma_direct_alloc_from_pool(dev, size,
+ dma_handle, &cpu_addr,
+ gfp, attrs);
+ return page ? cpu_addr : NULL;
+ }
if (is_swiotlb_for_alloc(dev)) {
page = dma_direct_alloc_swiotlb(dev, size, attrs);
@@ -445,7 +448,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
if ((attrs & DMA_ATTR_CC_SHARED) && dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size, dma_handle,
- gfp, attrs);
+ &cpu_addr, gfp, attrs);
if (is_swiotlb_for_alloc(dev)) {
page = dma_direct_alloc_swiotlb(dev, size, attrs);
--
2.43.0
^ permalink raw reply related
* [PATCH v6 15/20] iommu/dma: Check atomic pool allocation result directly
From: Aneesh Kumar K.V (Arm) @ 2026-06-04 8:39 UTC (permalink / raw)
To: iommu, linux-arm-kernel, linux-kernel, linux-coco
Cc: Aneesh Kumar K.V (Arm), Robin Murphy, Marek Szyprowski,
Will Deacon, Marc Zyngier, Steven Price, Suzuki K Poulose,
Catalin Marinas, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-1-aneesh.kumar@kernel.org>
The non-blocking, non-coherent allocation path uses dma_alloc_from_pool(),
which returns the allocated page and fills cpu_addr only on success.
Do not rely on cpu_addr to detect allocation failure in this path. Check
the returned page directly before using it for the IOMMU mapping.
Fixes: 9420139f516d ("dma-pool: fix coherent pool allocations for IOMMU mappings")
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
drivers/iommu/dma-iommu.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 725c7adb0a8d..52c599f4472c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1671,13 +1671,16 @@ void *iommu_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
}
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
- !gfpflags_allow_blocking(gfp) && !coherent)
+ !gfpflags_allow_blocking(gfp) && !coherent) {
page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
gfp, attrs, NULL);
- else
+ if (!page)
+ return NULL;
+ } else {
cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
- if (!cpu_addr)
- return NULL;
+ if (!cpu_addr)
+ return NULL;
+ }
*handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
dev->coherent_dma_mask);
--
2.43.0
^ permalink raw reply related
* [PATCH v6 16/20] dma: swiotlb: free dynamic pools from process context
From: Aneesh Kumar K.V (Arm) @ 2026-06-04 8:39 UTC (permalink / raw)
To: iommu, linux-arm-kernel, linux-kernel, linux-coco
Cc: Aneesh Kumar K.V (Arm), Robin Murphy, Marek Szyprowski,
Will Deacon, Marc Zyngier, Steven Price, Suzuki K Poulose,
Catalin Marinas, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-1-aneesh.kumar@kernel.org>
swiotlb_dyn_free() is used after removing a dynamic swiotlb pool from
RCU-protected lists. It can call swiotlb_free_tlb(), which may need to
restore the encryption state of an unencrypted pool with
set_memory_encrypted() before freeing the pages.
RCU callbacks run in atomic context, but set_memory_encrypted() is not
guaranteed to be atomic-safe on all architectures. For example, page
attribute updates may allocate page tables or take sleeping locks.
Use queue_rcu_work() for dynamic pool freeing instead. This keeps the RCU
grace period before freeing a published pool, while running the actual pool
teardown from workqueue context. Use the same helper for the transient-pool
error path, since that path may also be reached from atomic DMA mapping
context.
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
include/linux/swiotlb.h | 4 ++--
kernel/dma/swiotlb.c | 19 +++++++++++--------
2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4dcbf3931be1..526f82e9da45 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -64,7 +64,7 @@ extern void __init swiotlb_update_mem_attributes(void);
* @areas: Array of memory area descriptors.
* @slots: Array of slot descriptors.
* @node: Member of the IO TLB memory pool list.
- * @rcu: RCU head for swiotlb_dyn_free().
+ * @dyn_free: RCU work item used to free the pool from process context.
* @transient: %true if transient memory pool.
*/
struct io_tlb_pool {
@@ -79,7 +79,7 @@ struct io_tlb_pool {
struct io_tlb_slot *slots;
#ifdef CONFIG_SWIOTLB_DYNAMIC
struct list_head node;
- struct rcu_head rcu;
+ struct rcu_work dyn_free;
bool transient;
bool unencrypted;
#endif
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f4e8b241a1c4..4c56f64602ea 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -774,13 +774,10 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
add_mem_pool(mem, pool);
}
-/**
- * swiotlb_dyn_free() - RCU callback to free a memory pool
- * @rcu: RCU head in the corresponding struct io_tlb_pool.
- */
-static void swiotlb_dyn_free(struct rcu_head *rcu)
+static void swiotlb_dyn_free_work(struct work_struct *work)
{
- struct io_tlb_pool *pool = container_of(rcu, struct io_tlb_pool, rcu);
+ struct io_tlb_pool *pool =
+ container_of(to_rcu_work(work), struct io_tlb_pool, dyn_free);
size_t slots_size = array_size(sizeof(*pool->slots), pool->nslabs);
size_t tlb_size = pool->end - pool->start;
@@ -789,6 +786,12 @@ static void swiotlb_dyn_free(struct rcu_head *rcu)
kfree(pool);
}
+static void swiotlb_schedule_dyn_free(struct io_tlb_pool *pool)
+{
+ INIT_RCU_WORK(&pool->dyn_free, swiotlb_dyn_free_work);
+ queue_rcu_work(system_wq, &pool->dyn_free);
+}
+
/**
* __swiotlb_find_pool() - find the IO TLB pool for a physical address
* @dev: Device which has mapped the DMA buffer.
@@ -835,7 +838,7 @@ static void swiotlb_del_pool(struct device *dev, struct io_tlb_pool *pool)
list_del_rcu(&pool->node);
spin_unlock_irqrestore(&dev->dma_io_tlb_lock, flags);
- call_rcu(&pool->rcu, swiotlb_dyn_free);
+ swiotlb_schedule_dyn_free(pool);
}
#endif /* CONFIG_SWIOTLB_DYNAMIC */
@@ -1276,7 +1279,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
index = swiotlb_search_pool_area(dev, pool, 0, orig_addr, tbl_dma_addr,
alloc_size, alloc_align_mask);
if (index < 0) {
- swiotlb_dyn_free(&pool->rcu);
+ swiotlb_schedule_dyn_free(pool);
return -1;
}
--
2.43.0
^ permalink raw reply related
* [PATCH v6 17/20] dma: swiotlb: handle set_memory_decrypted() failures
From: Aneesh Kumar K.V (Arm) @ 2026-06-04 8:39 UTC (permalink / raw)
To: iommu, linux-arm-kernel, linux-kernel, linux-coco
Cc: Aneesh Kumar K.V (Arm), Robin Murphy, Marek Szyprowski,
Will Deacon, Marc Zyngier, Steven Price, Suzuki K Poulose,
Catalin Marinas, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-1-aneesh.kumar@kernel.org>
Check the return value when converting swiotlb pools between encrypted and
decrypted mappings. If the default pool cannot be decrypted after early
initialization, mark the pool fully used so it cannot satisfy future bounce
allocations.
For late initialization, return the `set_memory_decrypted()` failure. For
restricted DMA pools, fail device initialization if the reserved pool
cannot be decrypted.
This prevents swiotlb from using pools whose encryption attributes do not
match their metadata, and avoids returning pages with uncertain encryption
state back to the allocator.
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
kernel/dma/swiotlb.c | 80 +++++++++++++++++++++++++++++++++++---------
1 file changed, 65 insertions(+), 15 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 4c56f64602ea..14d834ca298b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -248,6 +248,23 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
}
+static void swiotlb_mark_pool_used(struct io_tlb_pool *pool)
+{
+ unsigned long i;
+
+ for (i = 0; i < pool->nareas; i++) {
+ pool->areas[i].index = 0;
+ pool->areas[i].used = pool->area_nslabs;
+ }
+
+ for (i = 0; i < pool->nslabs; i++) {
+ pool->slots[i].list = 0;
+ pool->slots[i].orig_addr = INVALID_PHYS_ADDR;
+ pool->slots[i].alloc_size = 0;
+ pool->slots[i].pad_slots = 0;
+ }
+}
+
/*
* Early SWIOTLB allocation may be too early to allow an architecture to
* perform the desired operations. This function allows the architecture to
@@ -272,8 +289,16 @@ void __init swiotlb_update_mem_attributes(void)
return;
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
- if (io_tlb_default_mem.unencrypted)
- set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
+ if (io_tlb_default_mem.unencrypted) {
+ int ret;
+
+ ret = set_memory_decrypted((unsigned long)mem->vaddr,
+ bytes >> PAGE_SHIFT);
+ if (ret) {
+ pr_warn("Failed to decrypt default memory pool, disabling it\n");
+ swiotlb_mark_pool_used(mem);
+ }
+ }
}
static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
@@ -442,9 +467,10 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
{
struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
+ unsigned int order, area_order, slot_order;
+ bool leak_pages = false;
unsigned int nareas;
unsigned char *vstart = NULL;
- unsigned int order, area_order;
bool retried = false;
int rc = 0;
@@ -504,6 +530,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
(PAGE_SIZE << order) >> 20);
}
+ rc = -ENOMEM;
nareas = limit_nareas(default_nareas, nslabs);
area_order = get_order(array_size(sizeof(*mem->areas), nareas));
mem->areas = (struct io_tlb_area *)
@@ -511,14 +538,20 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
if (!mem->areas)
goto error_area;
+ slot_order = get_order(array_size(sizeof(*mem->slots), nslabs));
mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
- get_order(array_size(sizeof(*mem->slots), nslabs)));
+ slot_order);
if (!mem->slots)
goto error_slots;
- if (io_tlb_default_mem.unencrypted)
- set_memory_decrypted((unsigned long)vstart,
- (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
+ if (io_tlb_default_mem.unencrypted) {
+ rc = set_memory_decrypted((unsigned long)vstart,
+ (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
+ if (rc) {
+ leak_pages = true;
+ goto error_decrypt;
+ }
+ }
swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
nareas);
@@ -527,16 +560,20 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
swiotlb_print_info();
return 0;
+error_decrypt:
+ free_pages((unsigned long)mem->slots, slot_order);
error_slots:
free_pages((unsigned long)mem->areas, area_order);
error_area:
- free_pages((unsigned long)vstart, order);
- return -ENOMEM;
+ if (!leak_pages)
+ free_pages((unsigned long)vstart, order);
+ return rc;
}
void __init swiotlb_exit(void)
{
struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
+ bool leak_pages = false;
unsigned long tbl_vaddr;
size_t tbl_size, slots_size;
unsigned int area_order;
@@ -552,19 +589,23 @@ void __init swiotlb_exit(void)
tbl_size = PAGE_ALIGN(mem->end - mem->start);
slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
- if (io_tlb_default_mem.unencrypted)
- set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
+ if (io_tlb_default_mem.unencrypted) {
+ if (set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT))
+ leak_pages = true;
+ }
if (mem->late_alloc) {
area_order = get_order(array_size(sizeof(*mem->areas),
mem->nareas));
free_pages((unsigned long)mem->areas, area_order);
- free_pages(tbl_vaddr, get_order(tbl_size));
+ if (!leak_pages)
+ free_pages(tbl_vaddr, get_order(tbl_size));
free_pages((unsigned long)mem->slots, get_order(slots_size));
} else {
memblock_free(mem->areas,
array_size(sizeof(*mem->areas), mem->nareas));
- memblock_phys_free(mem->start, tbl_size);
+ if (!leak_pages)
+ memblock_phys_free(mem->start, tbl_size);
memblock_free(mem->slots, slots_size);
}
@@ -1938,9 +1979,18 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
* restricted mem pool is decrypted by default
*/
if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
+ int ret;
+
mem->unencrypted = true;
- set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
- rmem->size >> PAGE_SHIFT);
+ ret = set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
+ rmem->size >> PAGE_SHIFT);
+ if (ret) {
+ dev_err(dev, "Failed to decrypt restricted DMA pool\n");
+ kfree(pool->areas);
+ kfree(pool->slots);
+ kfree(mem);
+ return ret;
+ }
} else {
mem->unencrypted = false;
}
--
2.43.0
^ permalink raw reply related
* [PATCH v6 18/20] dma: free atomic pool pages by physical address
From: Aneesh Kumar K.V (Arm) @ 2026-06-04 8:39 UTC (permalink / raw)
To: iommu, linux-arm-kernel, linux-kernel, linux-coco
Cc: Aneesh Kumar K.V (Arm), Robin Murphy, Marek Szyprowski,
Will Deacon, Marc Zyngier, Steven Price, Suzuki K Poulose,
Catalin Marinas, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-1-aneesh.kumar@kernel.org>
dma_direct_alloc_pages() may satisfy atomic allocations from the coherent
atomic pools. The pool allocation is keyed by the virtual address stored in
the gen_pool, but the pages API returns only the backing struct page.
On architectures with CONFIG_DMA_DIRECT_REMAP, atomic pool chunks are added
to the gen_pool using their remapped virtual address.
dma_direct_free_pages() reconstructs a linear-map address with
page_address(page) and passes that to dma_free_from_pool(). That address
does not match the gen_pool virtual range, so the pool lookup can fail and
the code can fall through to freeing a pool-owned page through the normal
page allocator path.
Add a page-based pool free helper that looks up the owning pool chunk by
physical address, translates it back to the gen_pool virtual address, and
frees that address to the pool. Use it from dma_direct_free_pages() while
keeping the existing virtual-address helper for coherent allocation frees.
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
include/linux/dma-map-ops.h | 1 +
kernel/dma/direct.c | 4 +--
kernel/dma/pool.c | 54 +++++++++++++++++++++++++++++++++++++
3 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 696b2c3a2305..8be059e69935 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -215,6 +215,7 @@ struct page *dma_alloc_from_pool(struct device *dev, size_t size,
void **cpu_addr, gfp_t flags, unsigned long attrs,
bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t));
bool dma_free_from_pool(struct device *dev, void *start, size_t size);
+bool dma_free_from_pool_page(struct device *dev, struct page *page, size_t size);
int dma_direct_set_offset(struct device *dev, phys_addr_t cpu_start,
dma_addr_t dma_start, u64 size);
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index e0ab9ff3f1d6..58f7ea1be963 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -488,9 +488,9 @@ void dma_direct_free_pages(struct device *dev, size_t size,
*/
bool mark_mem_encrypted = force_dma_unencrypted(dev);
- /* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */
+ /* If page is not from an atomic pool, dma_free_from_pool_page() fails */
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
- dma_free_from_pool(dev, vaddr, size))
+ dma_free_from_pool_page(dev, page, size))
return;
phys = page_to_phys(page);
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index e7df8d279e75..43b8101d860f 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -356,3 +356,57 @@ bool dma_free_from_pool(struct device *dev, void *start, size_t size)
return false;
}
+
+struct dma_pool_phys_match {
+ phys_addr_t phys;
+ size_t size;
+ unsigned long addr;
+ bool found;
+};
+
+static void dma_pool_find_phys(struct gen_pool *pool, struct gen_pool_chunk *chunk,
+ void *data)
+{
+ struct dma_pool_phys_match *match = data;
+ phys_addr_t end = match->phys + match->size - 1;
+ phys_addr_t chunk_end;
+
+ if (match->found)
+ return;
+
+ chunk_end = chunk->phys_addr + (chunk->end_addr - chunk->start_addr);
+ if (match->phys < chunk->phys_addr || end > chunk_end)
+ return;
+
+ match->addr = chunk->start_addr + (match->phys - chunk->phys_addr);
+ match->found = true;
+}
+
+static bool dma_free_from_pool_phys(struct dma_gen_pool *dma_pool, phys_addr_t phys,
+ size_t size)
+{
+ struct dma_pool_phys_match match = {
+ .phys = phys,
+ .size = size,
+ };
+
+ gen_pool_for_each_chunk(dma_pool->pool, dma_pool_find_phys, &match);
+ if (!match.found)
+ return false;
+
+ gen_pool_free(dma_pool->pool, match.addr, size);
+ return true;
+}
+
+bool dma_free_from_pool_page(struct device *dev, struct page *page, size_t size)
+{
+ struct dma_gen_pool *dma_pool = NULL;
+ phys_addr_t phys = page_to_phys(page);
+
+ while ((dma_pool = dma_guess_pool(dma_pool, 0))) {
+ if (dma_free_from_pool_phys(dma_pool, phys, size))
+ return true;
+ }
+
+ return false;
+}
--
2.43.0
^ permalink raw reply related
* [PATCH v6 19/20] swiotlb: Preserve allocation virtual address for dynamic pools
From: Aneesh Kumar K.V (Arm) @ 2026-06-04 8:39 UTC (permalink / raw)
To: iommu, linux-arm-kernel, linux-kernel, linux-coco
Cc: Aneesh Kumar K.V (Arm), Robin Murphy, Marek Szyprowski,
Will Deacon, Marc Zyngier, Steven Price, Suzuki K Poulose,
Catalin Marinas, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86, Michael Kelley
In-Reply-To: <20260604083959.1265923-1-aneesh.kumar@kernel.org>
swiotlb_alloc_tlb() can allocate from the DMA atomic pool when a decrypted
pool is needed from atomic context. With CONFIG_DMA_DIRECT_REMAP, the
atomic pool is backed by remapped virtual addresses, which are not the same
as the direct-map addresses returned by phys_to_virt().
swiotlb_init_io_tlb_pool() currently reconstructs the pool virtual address
from the physical start address. For atomic-pool backed allocations this
stores the wrong address in pool->vaddr. Later, swiotlb_free_tlb() passes
that address to dma_free_from_pool(), which will fail to recognize the
chunk
Pass the virtual address returned by the allocation path into
swiotlb_init_io_tlb_pool(), and store that address in pool->vaddr. This
keeps the pool free path using the same virtual address as the allocator.
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
kernel/dma/swiotlb.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 14d834ca298b..e4bd8c9eaeda 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -302,9 +302,9 @@ void __init swiotlb_update_mem_attributes(void)
}
static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
- unsigned long nslabs, bool late_alloc, unsigned int nareas)
+ void *vaddr, unsigned long nslabs, bool late_alloc,
+ unsigned int nareas)
{
- void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
mem->nslabs = nslabs;
@@ -445,7 +445,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
return;
}
- swiotlb_init_io_tlb_pool(mem, __pa(tlb), nslabs, false, nareas);
+ swiotlb_init_io_tlb_pool(mem, __pa(tlb), tlb, nslabs, false, nareas);
add_mem_pool(&io_tlb_default_mem, mem);
if (flags & SWIOTLB_VERBOSE)
@@ -553,7 +553,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
}
}
- swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
+ swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), vstart, nslabs, true,
nareas);
add_mem_pool(&io_tlb_default_mem, mem);
@@ -664,25 +664,26 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes,
* @phys_limit: Maximum allowed physical address of the buffer.
* @attrs: DMA attributes for the allocation.
* @gfp: GFP flags for the allocation.
+ * @vaddr: Receives the virtual address for the allocated buffer.
*
* Return: Allocated pages, or %NULL on allocation failure.
*/
static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
- u64 phys_limit, unsigned long attrs, gfp_t gfp)
+ u64 phys_limit, unsigned long attrs, gfp_t gfp, void **vaddr)
{
struct page *page;
+ *vaddr = NULL;
+
/*
* Allocate from the atomic pools if memory is encrypted and
* the allocation is atomic, because decrypting may block.
*/
if (!gfpflags_allow_blocking(gfp) && (attrs & DMA_ATTR_CC_SHARED)) {
- void *vaddr;
-
if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
return NULL;
- return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
+ return dma_alloc_from_pool(dev, bytes, vaddr, gfp,
attrs, dma_coherent_ok);
}
@@ -705,6 +706,8 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
return NULL;
}
+ if (page)
+ *vaddr = phys_to_virt(page_to_phys(page));
return page;
}
@@ -750,6 +753,7 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
{
struct io_tlb_pool *pool;
unsigned int slot_order;
+ void *tlb_vaddr;
struct page *tlb;
size_t pool_size;
size_t tlb_size;
@@ -767,7 +771,8 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
pool->unencrypted = !!(attrs & DMA_ATTR_CC_SHARED);
tlb_size = nslabs << IO_TLB_SHIFT;
- while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp))) {
+ while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp,
+ &tlb_vaddr))) {
if (nslabs <= minslabs)
goto error_tlb;
nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
@@ -781,12 +786,12 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
if (!pool->slots)
goto error_slots;
- swiotlb_init_io_tlb_pool(pool, page_to_phys(tlb), nslabs, true, nareas);
+ swiotlb_init_io_tlb_pool(pool, page_to_phys(tlb), tlb_vaddr, nslabs,
+ true, nareas);
return pool;
error_slots:
- swiotlb_free_tlb(page_address(tlb), tlb_size,
- !!(attrs & DMA_ATTR_CC_SHARED));
+ swiotlb_free_tlb(tlb_vaddr, tlb_size, !!(attrs & DMA_ATTR_CC_SHARED));
error_tlb:
kfree(pool);
error:
@@ -1995,7 +2000,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
mem->unencrypted = false;
}
- swiotlb_init_io_tlb_pool(pool, rmem->base, nslabs,
+ swiotlb_init_io_tlb_pool(pool, rmem->base, phys_to_virt(rmem->base),
+ nslabs,
false, nareas);
mem->force_bounce = true;
mem->for_alloc = true;
--
2.43.0
^ permalink raw reply related
* [PATCH v6 20/20] swiotlb: remove unused SWIOTLB_FORCE flag
From: Aneesh Kumar K.V (Arm) @ 2026-06-04 8:39 UTC (permalink / raw)
To: iommu, linux-arm-kernel, linux-kernel, linux-coco
Cc: Aneesh Kumar K.V (Arm), Robin Murphy, Marek Szyprowski,
Will Deacon, Marc Zyngier, Steven Price, Suzuki K Poulose,
Catalin Marinas, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260604083959.1265923-1-aneesh.kumar@kernel.org>
SWIOTLB_FORCE has no remaining in-tree users. Forced bouncing is now
controlled through the swiotlb=force command line option via
swiotlb_force_bounce.
Remove the unused flag and simplify the force_bounce initialization.
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
include/linux/swiotlb.h | 1 -
kernel/dma/swiotlb.c | 3 +--
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 526f82e9da45..af88ca7182f4 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -15,7 +15,6 @@ struct page;
struct scatterlist;
#define SWIOTLB_VERBOSE (1 << 0) /* verbose initialization */
-#define SWIOTLB_FORCE (1 << 1) /* force bounce buffering */
#define SWIOTLB_ANY (1 << 2) /* allow any memory for the buffer */
/*
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e4bd8c9eaeda..81cc4928e949 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -400,8 +400,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
if (swiotlb_force_disable)
return;
- io_tlb_default_mem.force_bounce =
- swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
+ io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
#ifdef CONFIG_SWIOTLB_DYNAMIC
if (!remap)
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v6 3/4] firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC device
From: Suzuki K Poulose @ 2026-06-04 9:18 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm), linux-coco, linux-arm-kernel,
linux-kernel
Cc: Catalin Marinas, Greg KH, Jeremy Linton, Jonathan Cameron,
Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
Steven Price
In-Reply-To: <20260527100233.428018-4-aneesh.kumar@kernel.org>
On 27/05/2026 11:02, Aneesh Kumar K.V (Arm) wrote:
> The Arm CCA guest TSM provider currently binds through the arm-cca-dev
> platform device. Like arm-smccc-trng, this device is not an independent
> platform resource; it is a software representation of the RSI firmware
> service discovered through SMCCC.
>
> Move RSI discovery into the SMCCC firmware driver. When the SMCCC conduit
> is SMC and the RSI ABI version check succeeds, create an arm-rsi-dev SMCCC
> device. Convert the Arm CCA guest TSM provider to an SMCCC driver so it
> binds to that discovered RSI service and keeps module autoloading through
> the SMCCC device id table.
>
> Keep the old arm-cca-dev platform-device registration for now. Userspace
> has used that device as a Realm-guest indicator, so removing it is left to
> a follow-up patch that adds a replacement sysfs ABI.
>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
> arch/arm64/include/asm/rsi.h | 2 +-
> arch/arm64/kernel/rsi.c | 2 +-
> drivers/firmware/smccc/Makefile | 4 ++
> drivers/firmware/smccc/rmm.c | 25 ++++++++
> drivers/firmware/smccc/rmm.h | 17 ++++++
> drivers/firmware/smccc/smccc.c | 8 +++
> drivers/virt/coco/arm-cca-guest/Kconfig | 1 +
> drivers/virt/coco/arm-cca-guest/Makefile | 2 +
> .../{arm-cca-guest.c => arm-cca.c} | 60 +++++++++----------
> 9 files changed, 89 insertions(+), 32 deletions(-)
> create mode 100644 drivers/firmware/smccc/rmm.c
> create mode 100644 drivers/firmware/smccc/rmm.h
> rename drivers/virt/coco/arm-cca-guest/{arm-cca-guest.c => arm-cca.c} (85%)
>
> diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
> index 88b50d660e85..2d2d363aaaee 100644
> --- a/arch/arm64/include/asm/rsi.h
> +++ b/arch/arm64/include/asm/rsi.h
> @@ -10,7 +10,7 @@
> #include <linux/jump_label.h>
> #include <asm/rsi_cmds.h>
>
> -#define RSI_PDEV_NAME "arm-cca-dev"
> +#define RSI_DEV_NAME "arm-rsi-dev"
>
> DECLARE_STATIC_KEY_FALSE(rsi_present);
>
> diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
> index 92160f2e57ff..da440f71bb64 100644
> --- a/arch/arm64/kernel/rsi.c
> +++ b/arch/arm64/kernel/rsi.c
> @@ -161,7 +161,7 @@ void __init arm64_rsi_init(void)
> }
>
> static struct platform_device rsi_dev = {
> - .name = RSI_PDEV_NAME,
> + .name = "arm-cca-dev",
> .id = PLATFORM_DEVID_NONE
> };
>
> diff --git a/drivers/firmware/smccc/Makefile b/drivers/firmware/smccc/Makefile
> index 40d19144a860..33c850aaff4d 100644
> --- a/drivers/firmware/smccc/Makefile
> +++ b/drivers/firmware/smccc/Makefile
> @@ -2,3 +2,7 @@
> #
> obj-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smccc.o kvm_guest.o
> obj-$(CONFIG_ARM_SMCCC_SOC_ID) += soc_id.o
> +
> +ifeq ($(CONFIG_HAVE_ARM_SMCCC_DISCOVERY),y)
> +obj-$(CONFIG_ARM64) += rmm.o
> +endif
> diff --git a/drivers/firmware/smccc/rmm.c b/drivers/firmware/smccc/rmm.c
> new file mode 100644
> index 000000000000..d572f47e955c
> --- /dev/null
> +++ b/drivers/firmware/smccc/rmm.c
> @@ -0,0 +1,25 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2026 Arm Limited
> + */
> +
> +#include <linux/arm-smccc-bus.h>
> +#include <linux/err.h>
> +#include <linux/printk.h>
> +
> +#include "rmm.h"
> +
> +void __init register_rsi_device(void)
minor nit: Could we rename this global symbol to scope it under rmm ?
perhaps, rmm_register_rsi_device()?
> +{
> + unsigned long ret;
> +
> + if (arm_smccc_1_1_get_conduit() != SMCCC_CONDUIT_SMC)
> + return;
> +
> + ret = rsi_request_version(RSI_ABI_VERSION, NULL, NULL);
> + if (ret != RSI_SUCCESS)
> + return;
> +
> + if (IS_ERR(arm_smccc_device_register(RSI_DEV_NAME)))
> + pr_err("%s: could not register device\n", RSI_DEV_NAME);
> +}
> diff --git a/drivers/firmware/smccc/rmm.h b/drivers/firmware/smccc/rmm.h
> new file mode 100644
> index 000000000000..627098e2ae1f
> --- /dev/null
> +++ b/drivers/firmware/smccc/rmm.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _SMCCC_RMM_H
> +#define _SMCCC_RMM_H
> +
> +#include <linux/init.h>
> +
> +#ifdef CONFIG_ARM64
> +#include <linux/arm-smccc-bus.h>
> +#include <asm/rsi_cmds.h>
minor nit: Could the header files be moved to rmm.c ?
> +void __init register_rsi_device(void);
> +#else
> +
> +static inline void __init register_rsi_device(void)
> +{
> +}
> +#endif
> +#endif
> diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
> index 6d260354d0f9..888e7f1d6f86 100644
> --- a/drivers/firmware/smccc/smccc.c
> +++ b/drivers/firmware/smccc/smccc.c
> @@ -15,6 +15,8 @@
>
> #include <asm/archrandom.h>
>
> +#include "rmm.h"
> +
> static u32 smccc_version = ARM_SMCCC_VERSION_1_0;
> static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
> static DEFINE_IDA(arm_smccc_bus_id);
> @@ -240,6 +242,12 @@ subsys_initcall(arm_smccc_bus_init);
>
> static int __init smccc_devices_init(void)
> {
> + /*
> + * Register the RMI and RSI devices only when firmware exposes
> + * the required SMCCC function IDs at a supported revision.
> + */
> + register_rsi_device();
nit: We don't have RMI devices yet ? Do we want to make it
rmm_register_devices();
instead ?
> +
> if (smccc_trng_available) {
> struct arm_smccc_device *sdev;
>
> diff --git a/drivers/virt/coco/arm-cca-guest/Kconfig b/drivers/virt/coco/arm-cca-guest/Kconfig
> index 3f0f013f03f1..ad7538750c5a 100644
> --- a/drivers/virt/coco/arm-cca-guest/Kconfig
> +++ b/drivers/virt/coco/arm-cca-guest/Kconfig
> @@ -1,6 +1,7 @@
> config ARM_CCA_GUEST
> tristate "Arm CCA Guest driver"
> depends on ARM64
> + depends on HAVE_ARM_SMCCC_DISCOVERY
> select TSM_REPORTS
> help
> The driver provides userspace interface to request and
> diff --git a/drivers/virt/coco/arm-cca-guest/Makefile b/drivers/virt/coco/arm-cca-guest/Makefile
> index 69eeba08e98a..75a120e24fda 100644
> --- a/drivers/virt/coco/arm-cca-guest/Makefile
> +++ b/drivers/virt/coco/arm-cca-guest/Makefile
> @@ -1,2 +1,4 @@
> # SPDX-License-Identifier: GPL-2.0-only
> obj-$(CONFIG_ARM_CCA_GUEST) += arm-cca-guest.o
> +
> +arm-cca-guest-y += arm-cca.o
> diff --git a/drivers/virt/coco/arm-cca-guest/arm-cca-guest.c b/drivers/virt/coco/arm-cca-guest/arm-cca.c
> similarity index 85%
> rename from drivers/virt/coco/arm-cca-guest/arm-cca-guest.c
> rename to drivers/virt/coco/arm-cca-guest/arm-cca.c
> index 66d00b6ceb78..8d5a09bd772a 100644
> --- a/drivers/virt/coco/arm-cca-guest/arm-cca-guest.c
> +++ b/drivers/virt/coco/arm-cca-guest/arm-cca.c
> @@ -4,6 +4,7 @@
> */
>
> #include <linux/arm-smccc.h>
> +#include <linux/arm-smccc-bus.h>
> #include <linux/cc_platform.h>
> #include <linux/kernel.h>
> #include <linux/mod_devicetable.h>
> @@ -182,52 +183,51 @@ static int arm_cca_report_new(struct tsm_report *report, void *data)
> return ret;
> }
>
> -static const struct tsm_report_ops arm_cca_tsm_ops = {
> +static const struct tsm_report_ops arm_cca_tsm_report_ops = {
> .name = KBUILD_MODNAME,
> .report_new = arm_cca_report_new,
> };
>
Would you like to either :
1) Call out renaming the existing cca_tsm to reflect cca_tsm_report
in the commit description ?
OR
2) Split the renaming of the "report" stuff in a follow up patch ?
Rest looks fine by me.
Suzuki
> -/**
> - * arm_cca_guest_init - Register with the Trusted Security Module (TSM)
> - * interface.
> - *
> - * Return:
> - * * %0 - Registered successfully with the TSM interface.
> - * * %-ENODEV - The execution context is not an Arm Realm.
> - * * %-EBUSY - Already registered.
> - */
> -static int __init arm_cca_guest_init(void)
> +static void unregister_cca_tsm_report(void *data)
> +{
> + tsm_report_unregister(&arm_cca_tsm_report_ops);
> +}
> +
> +static int cca_tsm_probe(struct arm_smccc_device *sdev)
> {
> int ret;
>
> if (!is_realm_world())
> return -ENODEV;
>
> - ret = tsm_report_register(&arm_cca_tsm_ops, NULL);
> - if (ret < 0)
> - pr_err("Error %d registering with TSM\n", ret);
> + ret = tsm_report_register(&arm_cca_tsm_report_ops, NULL);
> + if (ret < 0) {
> + dev_err_probe(&sdev->dev, ret, "Error registering with TSM\n");
> + return ret;
> + }
>
> - return ret;
> -}
> -module_init(arm_cca_guest_init);
> + ret = devm_add_action_or_reset(&sdev->dev, unregister_cca_tsm_report,
> + NULL);
> + if (ret < 0) {
> + dev_err_probe(&sdev->dev, ret, "Error registering devm action\n");
> + return ret;
> + }
>
> -/**
> - * arm_cca_guest_exit - unregister with the Trusted Security Module (TSM)
> - * interface.
> - */
> -static void __exit arm_cca_guest_exit(void)
> -{
> - tsm_report_unregister(&arm_cca_tsm_ops);
> + return 0;
> }
> -module_exit(arm_cca_guest_exit);
>
> -/* modalias, so userspace can autoload this module when RSI is available */
> -static const struct platform_device_id arm_cca_match[] __maybe_unused = {
> - { RSI_PDEV_NAME, 0},
> - { }
> +static const struct arm_smccc_device_id cca_tsm_id_table[] = {
> + { .name = RSI_DEV_NAME },
> + {}
> };
> +MODULE_DEVICE_TABLE(arm_smccc, cca_tsm_id_table);
>
> -MODULE_DEVICE_TABLE(platform, arm_cca_match);
> +static struct arm_smccc_driver cca_tsm_driver = {
> + .name = KBUILD_MODNAME,
> + .probe = cca_tsm_probe,
> + .id_table = cca_tsm_id_table,
> +};
> +module_arm_smccc_driver(cca_tsm_driver);
> MODULE_AUTHOR("Sami Mujawar <sami.mujawar@arm.com>");
> MODULE_DESCRIPTION("Arm CCA Guest TSM Driver");
> MODULE_LICENSE("GPL");
^ permalink raw reply
* Re: [PATCH v6 3/4] firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC device
From: Sudeep Holla @ 2026-06-04 9:21 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: linux-coco, linux-arm-kernel, linux-kernel, Catalin Marinas,
Sudeep Holla, Greg KH, Jeremy Linton, Jonathan Cameron,
Lorenzo Pieralisi, Mark Rutland, Will Deacon, Steven Price,
Suzuki K Poulose
In-Reply-To: <20260527100233.428018-4-aneesh.kumar@kernel.org>
On Wed, May 27, 2026 at 03:32:32PM +0530, Aneesh Kumar K.V (Arm) wrote:
> The Arm CCA guest TSM provider currently binds through the arm-cca-dev
> platform device. Like arm-smccc-trng, this device is not an independent
> platform resource; it is a software representation of the RSI firmware
> service discovered through SMCCC.
>
> Move RSI discovery into the SMCCC firmware driver. When the SMCCC conduit
> is SMC and the RSI ABI version check succeeds, create an arm-rsi-dev SMCCC
> device. Convert the Arm CCA guest TSM provider to an SMCCC driver so it
> binds to that discovered RSI service and keeps module autoloading through
> the SMCCC device id table.
>
> Keep the old arm-cca-dev platform-device registration for now. Userspace
> has used that device as a Realm-guest indicator, so removing it is left to
> a follow-up patch that adds a replacement sysfs ABI.
>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
> arch/arm64/include/asm/rsi.h | 2 +-
> arch/arm64/kernel/rsi.c | 2 +-
> drivers/firmware/smccc/Makefile | 4 ++
> drivers/firmware/smccc/rmm.c | 25 ++++++++
> drivers/firmware/smccc/rmm.h | 17 ++++++
> drivers/firmware/smccc/smccc.c | 8 +++
> drivers/virt/coco/arm-cca-guest/Kconfig | 1 +
> drivers/virt/coco/arm-cca-guest/Makefile | 2 +
> .../{arm-cca-guest.c => arm-cca.c} | 60 +++++++++----------
> 9 files changed, 89 insertions(+), 32 deletions(-)
> create mode 100644 drivers/firmware/smccc/rmm.c
> create mode 100644 drivers/firmware/smccc/rmm.h
> rename drivers/virt/coco/arm-cca-guest/{arm-cca-guest.c => arm-cca.c} (85%)
>
> diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
> index 88b50d660e85..2d2d363aaaee 100644
> --- a/arch/arm64/include/asm/rsi.h
> +++ b/arch/arm64/include/asm/rsi.h
> @@ -10,7 +10,7 @@
> #include <linux/jump_label.h>
> #include <asm/rsi_cmds.h>
>
> -#define RSI_PDEV_NAME "arm-cca-dev"
> +#define RSI_DEV_NAME "arm-rsi-dev"
>
> DECLARE_STATIC_KEY_FALSE(rsi_present);
>
> diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
> index 92160f2e57ff..da440f71bb64 100644
> --- a/arch/arm64/kernel/rsi.c
> +++ b/arch/arm64/kernel/rsi.c
> @@ -161,7 +161,7 @@ void __init arm64_rsi_init(void)
> }
>
> static struct platform_device rsi_dev = {
> - .name = RSI_PDEV_NAME,
> + .name = "arm-cca-dev",
> .id = PLATFORM_DEVID_NONE
> };
>
> diff --git a/drivers/firmware/smccc/Makefile b/drivers/firmware/smccc/Makefile
> index 40d19144a860..33c850aaff4d 100644
> --- a/drivers/firmware/smccc/Makefile
> +++ b/drivers/firmware/smccc/Makefile
> @@ -2,3 +2,7 @@
> #
> obj-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smccc.o kvm_guest.o
> obj-$(CONFIG_ARM_SMCCC_SOC_ID) += soc_id.o
> +
> +ifeq ($(CONFIG_HAVE_ARM_SMCCC_DISCOVERY),y)
> +obj-$(CONFIG_ARM64) += rmm.o
> +endif
> diff --git a/drivers/firmware/smccc/rmm.c b/drivers/firmware/smccc/rmm.c
> new file mode 100644
> index 000000000000..d572f47e955c
> --- /dev/null
> +++ b/drivers/firmware/smccc/rmm.c
> @@ -0,0 +1,25 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2026 Arm Limited
> + */
> +
> +#include <linux/arm-smccc-bus.h>
> +#include <linux/err.h>
> +#include <linux/printk.h>
> +
> +#include "rmm.h"
> +
> +void __init register_rsi_device(void)
> +{
> + unsigned long ret;
> +
> + if (arm_smccc_1_1_get_conduit() != SMCCC_CONDUIT_SMC)
> + return;
> +
> + ret = rsi_request_version(RSI_ABI_VERSION, NULL, NULL);
> + if (ret != RSI_SUCCESS)
> + return;
> +
> + if (IS_ERR(arm_smccc_device_register(RSI_DEV_NAME)))
> + pr_err("%s: could not register device\n", RSI_DEV_NAME);
> +}
OK, I had something else in my mind when I started looking at 1/4. I didn't
expect each device added on this bus comes up with it's own way to enumerate
it. IMO, it defeats the purpose of building the smccc bus. We may find the
specs for each feature deviated a bit but we can have a generic probe
IMO, let's try that before exploring per feature probe function.
I have a brief sketch of what I think we should aim for(uncompiled/untested)
below. Let me know if that makes sense. I just based it on your bus code.
Regards,
Sudeep
-->8
diff --git c/drivers/firmware/smccc/smccc.c w/drivers/firmware/smccc/smccc.c
index 695c920a8087..450605ddfab6 100644
--- c/drivers/firmware/smccc/smccc.c
+++ w/drivers/firmware/smccc/smccc.c
@@ -9,21 +9,58 @@
#include <linux/init.h>
#include <linux/arm-smccc.h>
#include <linux/kernel.h>
-#include <linux/platform_device.h>
#include <linux/arm-smccc-bus.h>
#include <linux/idr.h>
#include <linux/slab.h>
-#include <asm/archrandom.h>
-
static u32 smccc_version = ARM_SMCCC_VERSION_1_0;
static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
static DEFINE_IDA(arm_smccc_bus_id);
-bool __ro_after_init smccc_trng_available = false;
+struct smccc_device_info {
+ u32 func_id;
+ bool requires_smc;
+ unsigned long min_return;
+ const char *device_name;
+};
+
+bool __ro_after_init smccc_trng_available;
s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
s32 __ro_after_init smccc_soc_id_revision = SMCCC_RET_NOT_SUPPORTED;
+static const struct smccc_device_info smccc_devices[] __initconst = {
+ {
+ .func_id = ARM_SMCCC_TRNG_VERSION,
+ .requires_smc = false,
+ .min_return = ARM_SMCCC_TRNG_MIN_VERSION,
+ .device_name = "arm-smccc-trng",
+ },
+};
+
+static bool __init
+smccc_probe_smccc_device(const struct smccc_device_info *smccc_dev)
+{
+ struct arm_smccc_res res;
+ unsigned long ret;
+
+ if (!IS_ENABLED(CONFIG_ARM64))
+ return false;
+
+ if (smccc_conduit == SMCCC_CONDUIT_NONE)
+ return false;
+
+ if (smccc_dev->requires_smc && smccc_conduit != SMCCC_CONDUIT_SMC)
+ return false;
+
+ arm_smccc_1_1_invoke(smccc_dev->func_id, &res);
+ ret = res.a0;
+
+ if ((s32)ret < 0)
+ return false;
+
+ return ret >= smccc_dev->min_return;
+}
+
void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
{
struct arm_smccc_res res;
@@ -31,7 +68,7 @@ void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
smccc_version = version;
smccc_conduit = conduit;
- smccc_trng_available = smccc_probe_trng();
+ smccc_trng_available = smccc_probe_smccc_device(&smccc_devices[0]);
if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
(smccc_conduit != SMCCC_CONDUIT_NONE)) {
@@ -241,14 +278,20 @@ subsys_initcall(arm_smccc_bus_init);
static int __init smccc_devices_init(void)
{
- struct platform_device *pdev;
-
- if (smccc_trng_available) {
- pdev = platform_device_register_simple("smccc_trng", -1,
- NULL, 0);
- if (IS_ERR(pdev))
- pr_err("smccc_trng: could not register device: %ld\n",
- PTR_ERR(pdev));
+ const struct smccc_device_info *smccc_dev;
+ struct arm_smccc_device *sdev;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(smccc_devices); i++) {
+ smccc_dev = &smccc_devices[i];
+
+ if (!smccc_probe_smccc_device(smccc_dev))
+ continue;
+
+ sdev = arm_smccc_device_register(smccc_dev->device_name);
+ if (IS_ERR(sdev))
+ pr_err("%s: could not register device: %ld\n",
+ smccc_dev->device_name, PTR_ERR(sdev));
}
return 0;
^ permalink raw reply related
* [RFC PATCH 0/6] Support virtio-mem memory hotplug in TDX guests
From: Zhenzhong Duan @ 2026-06-04 9:35 UTC (permalink / raw)
To: marcandre.lureau, david, kas, rick.p.edgecombe, prsampat,
pbonzini, mst, peterx, chenyi.qiang, elena.reshetova, michaeluth,
ackerleytng
Cc: linux-kernel, linux-coco, virtualization, x86, yilun.xu,
xiaoyao.li, chao.p.peng
This RFC series explores the start-private memory approach for virtio-mem
CoCo support using TDG.MEM.PAGE.RELEASE. We are seeking feedback from
Kiryl on the CoCo guest implementation, MM experts on the callback
infrastructure and virtio-mem integration, and broader virtio/CoCo
community input on the overall approach. We are not seeking x86 maintainer
review at this stage.
== Background ==
In Confidential Computing (CoCo) guests like TDX, memory hotplug
operations face unique challenges:
1. Newly added memory must be explicitly "accepted" by the guest using
TDG.MEM.PAGE.ACCEPT TDCALL before it can be safely accessed. Accessing
unaccepted memory triggers VM exits and guest crashes.
2. Hypervisor may perform no-op unplug operations, leaving old memory in
place. Re-accepting this already-accepted memory during re-plug operations
returns errors.
3. State management become much more complex, "accepted"/"unccepted" plus
"plugged"/"unplugged".
4. Initial virtio-mem memory may be start-private or start-shared.
A previous series [1][2] supports start-private memory and utilized memory
hotplug notifiers to call tdx_accept_memory() before pages are freed to
the buddy allocator. However, this approach has limitations:
1. virtio-mem operates memory at subblock granularity (e.g., 2MB chunks
within 128MB memory blocks), while generic memory notifiers operate on entire
memory blocks, causing acceptance of unplugged subblocks with no backing
memory.
2. Re-accepting already-accepted memory returns errors. Ignoring these errors
can mislead the guest into believing re-accepted memory is zeroed when it
contains stale data.
Currently, virtio-mem spec doesn't define what kind of hotplugged memory
should be supported for CoCo guest, shared or private or both. There is a
newer series [3][4] supporting start-shared memory in discuss. It converts
shared->private before online (via set_memory_encrypted-> MapGPA + ACCEPT),
and back to shared on unplug (via set_memory_decrypted).
== About this series ==
This series takes a different direction, supporting start-private memory
and addressing the limitations of previous series [1] by implementing a
callback-based infrastructure that integrates TDX memory acceptance and
release operations with proper subblock granularity. See Rick and Paolo's
discussion about using TDG.MEM.PAGE.RELEASE in [1].
The goal is not to compete with existing efforts, but rather to kick off
discussion and seek for suggestions from mm expert whether utilizing
callback-based infrastructure and PAGE.RELEASE API is a viable scheme.
We chose the generic post-plug and pre-unplug callback approach because
it provides a simple proof-of-concept that can support kexec/kdump
scenarios, though it does not support lazy acceptance. We rely on
community discussion to identify better, more upstreamable solutions if
the start-private direction is ultimately adopted.
== More details ==
**Post-plug callbacks** are registered by TDX guests during early boot and
triggered by virtio-mem after successfully requesting memory from the
hypervisor. The callback invokes tdx_accept_memory(), which performs
TDG.MEM.PAGE.ACCEPT TDCALL on the exact memory range that was plugged,
providing subblock-aware granularity. Note that tdx_accept_memory() may
not be fully self-consistent in all environments, as some pages may
remain in an "accepted" state while others do not, since page release is
not supported across all TDX module versions.
**Pre-unplug callbacks** are registered during early boot and invoked by
virtio-mem before requesting memory removal from the hypervisor. The
callback executes tdx_release_memory(), which performs
TDG.MEM.PAGE.RELEASE TDCALL with an optimization strategy that attempts
1GB/2MB page releases first before falling back to 4KB pages for maximum
efficiency. Unlike acceptance operations, tdx_release_memory() maintains
full self-consistency since page acceptance is universally supported
across TDX implementations.
**Error handling strategy** prioritizes system stability by marking the
virtio-mem device as broken whenever TDX operations fail:
1. Post-plug failures: If memory acceptance fails after successful
hypervisor allocation, the device is marked as broken to prevent memory
corruption. The hypervisor-side memory is leaked for the device lifetime.
2. Pre-unplug failures: If TDX memory release fails, the device is marked as
broken and no hypervisor unplug is attempted.
3. Hypervisor unplug failures: If the hypervisor unplug fails after
successful TDX release, the system attempts to re-accept the memory for
consistency. If re-acceptance fails, the device is marked as broken.
This approach avoids complex recovery mechanisms that could fail and
cause state corruption, choosing instead to fail safely by disabling the
device when TDX operations cannot maintain consistent state between guest
and hypervisor.
**PAGE.RELEASE configuration** requires explicit enablement by the
hypervisor during TD creation. The hypervisor must set the
CONFIG_FLAGS.PAGE_RELEASE flag in the TD's configuration to enable
TDG.MEM.PAGE.RELEASE functionality within the guest. Without this
configuration, guests cannot perform memory release operations and must
rely on the hypervisor to handle private memory release. This series
focuses on guest-side changes and does not include hypervisor
modifications, which can be added in future versions if needed.
== Testing ==
Tested with qemu [2] which supports start-private memory:
Basic memory hotplug/unplug test.
Basic kexec/kdump functions test with zero/half/full memory plugged.
Interestingly, it also pass with qemu [4] which supports start-shared memory,
because acceptance triggers memory convert implicitly, but it's slow as
implicit conversion is 4K page granularity.
== Future work ==
support lazy accept
Thanks
Zhenzhong
[1] kernel: https://lore.kernel.org/kvm/20260324-tdx-hotplug-fixes-v1-0-8f29f2c17278@redhat.com/
[2] qemu: https://lore.kernel.org/qemu-devel/20260226140001.3622334-1-marcandre.lureau@redhat.com/
[3] kernel: https://lore.kernel.org/lkml/20260401-coco-v1-1-b9c3072e2d9c@redhat.com/
[4] qemu: https://lore.kernel.org/qemu-devel/20260504-rdm5-v4-0-bdf61e57c1e1@redhat.com/
Zhenzhong Duan (6):
mm/memory_hotplug: Add memory post-plug callback infrastructure
mm/memory_hotplug: Add memory pre-unplug callback infrastructure
virtio-mem: Integrate memory acceptance and release callbacks
x86/tdx: Register memory post-plug callback for TDX guests
x86/tdx: Register memory pre-unplug callback for TDX guests
x86/tdx: Release private memory before private->shared conversion
arch/x86/include/asm/shared/tdx.h | 2 +
include/linux/memory_hotplug.h | 21 ++++
arch/x86/coco/tdx/tdx.c | 174 ++++++++++++++++++++++++++++++
drivers/virtio/virtio_mem.c | 80 ++++++++++++--
mm/memory_hotplug.c | 40 +++++++
5 files changed, 307 insertions(+), 10 deletions(-)
--
2.52.0
^ permalink raw reply
* [RFC PATCH 1/6] mm/memory_hotplug: Add memory post-plug callback infrastructure
From: Zhenzhong Duan @ 2026-06-04 9:35 UTC (permalink / raw)
To: marcandre.lureau, david, kas, rick.p.edgecombe, prsampat,
pbonzini, mst, peterx, chenyi.qiang, elena.reshetova, michaeluth,
ackerleytng
Cc: linux-kernel, linux-coco, virtualization, x86, yilun.xu,
xiaoyao.li, chao.p.peng
In-Reply-To: <20260604093551.1511079-1-zhenzhong.duan@intel.com>
In confidential computing environments like TDX, newly added memory must be
explicitly "accepted" by the guest before it can be safely accessed. When
virtio-mem or other memory hotplug drivers add memory to a TDX guest, the
memory pages are initially in an "unaccepted" state. Accessing unaccepted
memory triggers VM exits and can cause guest crashes. The guest must call
TDX hypercalls to accept each page before use.
This callback infrastructure allows the TDX guest code to register a
handler that will be invoked after memory is plugged, ensuring all newly
added memory is properly accepted before being made available to the
kernel's memory management subsystem.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/linux/memory_hotplug.h | 11 +++++++++++
mm/memory_hotplug.c | 20 ++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 815e908c4135..39f0a35a5112 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -28,6 +28,8 @@ enum mmop {
MMOP_ONLINE_MOVABLE,
};
+typedef int (*memory_post_plug_callback_t)(u64 addr, u64 size);
+
#ifdef CONFIG_MEMORY_HOTPLUG
struct page *pfn_to_online_page(unsigned long pfn);
@@ -176,6 +178,9 @@ static inline void pgdat_kswapd_lock_init(pg_data_t *pgdat)
mutex_init(&pgdat->kswapd_lock);
}
+void set_memory_post_plug_callback(memory_post_plug_callback_t callback);
+int memory_post_plug_call(u64 addr, u64 size);
+
#else /* ! CONFIG_MEMORY_HOTPLUG */
#define pfn_to_online_page(pfn) \
({ \
@@ -221,6 +226,12 @@ static inline bool mhp_supports_memmap_on_memory(void)
static inline void pgdat_kswapd_lock(pg_data_t *pgdat) {}
static inline void pgdat_kswapd_unlock(pg_data_t *pgdat) {}
static inline void pgdat_kswapd_lock_init(pg_data_t *pgdat) {}
+
+static inline void set_memory_post_plug_callback(memory_post_plug_callback_t callback) {}
+static inline int memory_post_plug_call(u64 addr, u64 size)
+{
+ return 0;
+}
#endif /* ! CONFIG_MEMORY_HOTPLUG */
/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 40c7915dabe0..73054ed016fd 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1729,6 +1729,26 @@ bool mhp_range_allowed(u64 start, u64 size, bool need_mapping)
return false;
}
+static memory_post_plug_callback_t memory_post_plug_callback __ro_after_init;
+
+void set_memory_post_plug_callback(memory_post_plug_callback_t callback)
+{
+ /* Fatal error to set callback twice in boot stage */
+ if (memory_post_plug_callback)
+ panic("memory_post_plug_callback is already registered\n");
+
+ memory_post_plug_callback = callback;
+}
+
+int memory_post_plug_call(u64 addr, u64 size)
+{
+ if (!memory_post_plug_callback)
+ return 0;
+
+ return (*memory_post_plug_callback)(addr, size);
+}
+EXPORT_SYMBOL_GPL(memory_post_plug_call);
+
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
* Scan pfn range [start,end) to find movable/migratable pages (LRU and
--
2.52.0
^ permalink raw reply related
* [RFC PATCH 2/6] mm/memory_hotplug: Add memory pre-unplug callback infrastructure
From: Zhenzhong Duan @ 2026-06-04 9:35 UTC (permalink / raw)
To: marcandre.lureau, david, kas, rick.p.edgecombe, prsampat,
pbonzini, mst, peterx, chenyi.qiang, elena.reshetova, michaeluth,
ackerleytng
Cc: linux-kernel, linux-coco, virtualization, x86, yilun.xu,
xiaoyao.li, chao.p.peng
In-Reply-To: <20260604093551.1511079-1-zhenzhong.duan@intel.com>
In confidential computing environments like TDX, memory that was
previously accepted by the guest could be explicitly "released" back to
the hypervisor before it is unplugged, because hypervisor can do no-op
for the unplug operation without guest awares, then replug will fail
with re-accept error.
This callback infrastructure allows the TDX guest code to register a
handler that will be invoked after kernel removes memory from its memory
management subsystem but before it is unplugged, ensuring all memory
pages are properly released via TDG.MEM.PAGE.RELEASE TDCALL. Then re-plug
triggers TDG.MEM.PAGE.ACCEPT on pages in "unaccepted" state and succeed.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/linux/memory_hotplug.h | 10 ++++++++++
mm/memory_hotplug.c | 20 ++++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 39f0a35a5112..5bb77670b6cf 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -29,6 +29,7 @@ enum mmop {
};
typedef int (*memory_post_plug_callback_t)(u64 addr, u64 size);
+typedef int (*memory_pre_unplug_callback_t)(u64 addr, u64 size);
#ifdef CONFIG_MEMORY_HOTPLUG
struct page *pfn_to_online_page(unsigned long pfn);
@@ -278,6 +279,9 @@ extern int remove_memory(u64 start, u64 size);
extern void __remove_memory(u64 start, u64 size);
extern int offline_and_remove_memory(u64 start, u64 size);
+void set_memory_pre_unplug_callback(memory_pre_unplug_callback_t callback);
+int memory_pre_unplug_call(u64 addr, u64 size);
+
#else
static inline void try_offline_node(int nid) {}
@@ -293,6 +297,12 @@ static inline int remove_memory(u64 start, u64 size)
}
static inline void __remove_memory(u64 start, u64 size) {}
+
+static inline void set_memory_pre_unplug_callback(memory_pre_unplug_callback_t callback) {}
+static inline int memory_pre_unplug_call(u64 addr, u64 size)
+{
+ return 0;
+}
#endif /* CONFIG_MEMORY_HOTREMOVE */
#ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 73054ed016fd..fcb6f85c40d0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -2451,4 +2451,24 @@ int offline_and_remove_memory(u64 start, u64 size)
return rc;
}
EXPORT_SYMBOL_GPL(offline_and_remove_memory);
+
+static memory_pre_unplug_callback_t memory_pre_unplug_callback __ro_after_init;
+
+void set_memory_pre_unplug_callback(memory_pre_unplug_callback_t callback)
+{
+ /* Fatal error to set callback twice in boot stage */
+ if (memory_pre_unplug_callback)
+ panic("memory_pre_unplug_callback is already registered\n");
+
+ memory_pre_unplug_callback = callback;
+}
+
+int memory_pre_unplug_call(u64 addr, u64 size)
+{
+ if (!memory_pre_unplug_callback)
+ return 0;
+
+ return (*memory_pre_unplug_callback)(addr, size);
+}
+EXPORT_SYMBOL_GPL(memory_pre_unplug_call);
#endif /* CONFIG_MEMORY_HOTREMOVE */
--
2.52.0
^ permalink raw reply related
* [RFC PATCH 3/6] virtio-mem: Integrate memory acceptance and release callbacks
From: Zhenzhong Duan @ 2026-06-04 9:35 UTC (permalink / raw)
To: marcandre.lureau, david, kas, rick.p.edgecombe, prsampat,
pbonzini, mst, peterx, chenyi.qiang, elena.reshetova, michaeluth,
ackerleytng
Cc: linux-kernel, linux-coco, virtualization, x86, yilun.xu,
xiaoyao.li, chao.p.peng
In-Reply-To: <20260604093551.1511079-1-zhenzhong.duan@intel.com>
Integrate the memory post-plug and pre-unplug callbacks into virtio-mem's
plug and unplug operations to support TDX memory acceptance and release.
For memory plugging, call the post-plug callback after successfully
requesting memory from the hypervisor to ensure newly added memory is
accepted by TDX guests. If acceptance fails, return -EINVAL to mark the
device as broken rather than attempting rollback, since unplug operations
may also fail and partial acceptance creates difficult-to-recover state.
For memory unplugging, call the pre-unplug callback before requesting
memory removal from the hypervisor to allow TDX guests to release memory
pages. If release fails, return -EINVAL to mark the device as broken.
If the hypervisor unplug request fails after successful memory release,
attempt to re-accept the memory to restore consistent state for retry. If
re-acceptance fails, mark the device as broken to prevent corruption.
The config_changed check is moved to the wrapper functions to ensure
callbacks are not invoked unnecessarily when operations will be retried.
This integration ensures proper memory lifecycle management in
confidential computing environments while maintaining backward
compatibility with non-TDX systems where the callbacks are no-ops.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
drivers/virtio/virtio_mem.c | 80 ++++++++++++++++++++++++++++++++-----
1 file changed, 70 insertions(+), 10 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 48051e9e98ab..12b8229dab0d 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1416,8 +1416,8 @@ static uint64_t virtio_mem_send_request(struct virtio_mem *vm,
return virtio16_to_cpu(vm->vdev, vm->resp.type);
}
-static int virtio_mem_send_plug_request(struct virtio_mem *vm, uint64_t addr,
- uint64_t size)
+static int _virtio_mem_send_plug_request(struct virtio_mem *vm, uint64_t addr,
+ uint64_t size)
{
const uint64_t nb_vm_blocks = size / vm->device_block_size;
const struct virtio_mem_req req = {
@@ -1427,9 +1427,6 @@ static int virtio_mem_send_plug_request(struct virtio_mem *vm, uint64_t addr,
};
int rc = -ENOMEM;
- if (atomic_read(&vm->config_changed))
- return -EAGAIN;
-
dev_dbg(&vm->vdev->dev, "plugging memory: 0x%llx - 0x%llx\n", addr,
addr + size - 1);
@@ -1454,8 +1451,8 @@ static int virtio_mem_send_plug_request(struct virtio_mem *vm, uint64_t addr,
return rc;
}
-static int virtio_mem_send_unplug_request(struct virtio_mem *vm, uint64_t addr,
- uint64_t size)
+static int _virtio_mem_send_unplug_request(struct virtio_mem *vm, uint64_t addr,
+ uint64_t size)
{
const uint64_t nb_vm_blocks = size / vm->device_block_size;
const struct virtio_mem_req req = {
@@ -1465,9 +1462,6 @@ static int virtio_mem_send_unplug_request(struct virtio_mem *vm, uint64_t addr,
};
int rc = -ENOMEM;
- if (atomic_read(&vm->config_changed))
- return -EAGAIN;
-
dev_dbg(&vm->vdev->dev, "unplugging memory: 0x%llx - 0x%llx\n", addr,
addr + size - 1);
@@ -1489,6 +1483,72 @@ static int virtio_mem_send_unplug_request(struct virtio_mem *vm, uint64_t addr,
return rc;
}
+static int virtio_mem_send_plug_request(struct virtio_mem *vm, uint64_t addr,
+ uint64_t size)
+{
+ int ret;
+
+ if (atomic_read(&vm->config_changed))
+ return -EAGAIN;
+
+ ret = _virtio_mem_send_plug_request(vm, addr, size);
+ if (ret)
+ return ret;
+
+ /*
+ * If memory acceptance fails, we cannot safely rollback to the pre-plug
+ * state because the unplug operation may also fail (e.g., hypervisor
+ * out of memory, VM migration in progress). Additionally, acceptance
+ * failures may be partial, leaving some pages accepted and others not,
+ * creating inconsistent memory state that is difficult to track and
+ * recover from.
+ *
+ * Rather than attempting complex state recovery that may fail, we treat
+ * acceptance failure as a critical error and return -EINVAL. This causes
+ * the caller to set the broken flag and stop processing further requests,
+ * preventing potential memory corruption or system instability. As a
+ * consequence, the hypervisor-side memory for the failing range is
+ * leaked for the lifetime of the device.
+ */
+ if (memory_post_plug_call(addr, size))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int virtio_mem_send_unplug_request(struct virtio_mem *vm, uint64_t addr,
+ uint64_t size)
+{
+ int ret;
+
+ if (atomic_read(&vm->config_changed))
+ return -EAGAIN;
+
+ /*
+ * If memory release fails, treat it as a critical error similar to
+ * acceptance failure. See virtio_mem_send_plug_request() for detailed
+ * rationale on why we avoid complex error recovery.
+ */
+ ret = memory_pre_unplug_call(addr, size);
+ if (ret)
+ return -EINVAL;
+
+ ret = _virtio_mem_send_unplug_request(vm, addr, size);
+ /*
+ * If the hypervisor unplug request fails (e.g., out of memory, VM
+ * migration), the operation will be retried later. Since we already
+ * released the memory from TDX perspective, we must re-accept it to
+ * restore consistent state for the next retry. If re-acceptance fails,
+ * treat it as critical error to prevent state corruption. As a
+ * consequence, the hypervisor-side memory for the failing range is
+ * leaked for the lifetime of the device.
+ */
+ if (ret && memory_post_plug_call(addr, size))
+ return -EINVAL;
+
+ return ret;
+}
+
static int virtio_mem_send_unplug_all_request(struct virtio_mem *vm)
{
const struct virtio_mem_req req = {
--
2.52.0
^ permalink raw reply related
* [RFC PATCH 4/6] x86/tdx: Register memory post-plug callback for TDX guests
From: Zhenzhong Duan @ 2026-06-04 9:35 UTC (permalink / raw)
To: marcandre.lureau, david, kas, rick.p.edgecombe, prsampat,
pbonzini, mst, peterx, chenyi.qiang, elena.reshetova, michaeluth,
ackerleytng
Cc: linux-kernel, linux-coco, virtualization, x86, yilun.xu,
xiaoyao.li, chao.p.peng
In-Reply-To: <20260604093551.1511079-1-zhenzhong.duan@intel.com>
Register a callback to handle memory acceptance after memory plugging in
TDX guests. When memory is added by virtio-mem or other memory hotplug
drivers, the TDX guest must accept the memory pages using
TDG.MEM.PAGE.ACCEPT TDCALL before they can be safely accessed.
The callback uses the existing tdx_accept_memory() function to accept all
pages in the newly plugged memory range. Without this callback, newly
added memory would remain in "unaccepted" state, and any access to these
pages would trigger VM exits and potentially cause guest crashes. The
callback is registered during TDX setup and remains active for the
lifetime of the guest, ensuring all dynamically added memory is properly
accepted before being made available to the kernel's memory management
subsystem.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
arch/x86/coco/tdx/tdx.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 186915a17c50..d93ba092d311 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -326,6 +326,25 @@ static void reduce_unnecessary_ve(void)
enable_cpu_topology_enumeration();
}
+static int tdx_memory_post_plug(u64 addr, u64 size)
+{
+ u64 end;
+
+ if (!PAGE_ALIGNED(addr) || !PAGE_ALIGNED(size))
+ return -EINVAL;
+
+ if (check_add_overflow(addr, size, &end))
+ return -EINVAL;
+
+ if (tdx_accept_memory(addr, end))
+ return 0;
+
+ pr_err("Failed to accept memory [0x%llx, 0x%llx)\n",
+ (unsigned long long)addr, (unsigned long long)end);
+
+ return -EINVAL;
+}
+
static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
@@ -359,6 +378,8 @@ static void tdx_setup(u64 *cc_mask)
disable_sept_ve(td_attr);
reduce_unnecessary_ve();
+
+ set_memory_post_plug_callback(tdx_memory_post_plug);
}
/*
--
2.52.0
^ permalink raw reply related
* [RFC PATCH 5/6] x86/tdx: Register memory pre-unplug callback for TDX guests
From: Zhenzhong Duan @ 2026-06-04 9:35 UTC (permalink / raw)
To: marcandre.lureau, david, kas, rick.p.edgecombe, prsampat,
pbonzini, mst, peterx, chenyi.qiang, elena.reshetova, michaeluth,
ackerleytng
Cc: linux-kernel, linux-coco, virtualization, x86, yilun.xu,
xiaoyao.li, chao.p.peng
In-Reply-To: <20260604093551.1511079-1-zhenzhong.duan@intel.com>
Add support for releasing memory pages before unplugging in TDX guests.
When memory is about to be unplugged by virtio-mem or other memory
hotplug drivers, the TDX guest should release the memory pages back to the
hypervisor using TDG.MEM.PAGE.RELEASE TDCALL to be more robust for buggy
VMM behavior, e.g., VMM may do nothing for unplug request.
The implementation detects TDG.MEM.PAGE.RELEASE support and optimizes
release operations by trying larger page sizes 1G/2M before falling back
to 4K pages. If release fails, the function re-accepts any released pages
to maintain consistency. Without proper memory release, re-plugging memory
in TDX guests fails when guest accepts those memory because hypervisor can
do no-op to memory unplug request and memory is already in "accepted"
state.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
arch/x86/include/asm/shared/tdx.h | 2 +
arch/x86/coco/tdx/tdx.c | 135 ++++++++++++++++++++++++++++++
2 files changed, 137 insertions(+)
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 049638e3da74..910ec1e57528 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -19,6 +19,7 @@
#define TDG_MEM_PAGE_ACCEPT 6
#define TDG_VM_RD 7
#define TDG_VM_WR 8
+#define TDG_MEM_PAGE_RELEASE 30
/* TDX TD attributes */
#define TDX_TD_ATTR_DEBUG_BIT 0
@@ -54,6 +55,7 @@
/* TDCS_CONFIG_FLAGS bits */
#define TDCS_CONFIG_FLEXIBLE_PENDING_VE BIT_ULL(1)
+#define TDCS_CONFIG_PAGE_RELEASE BIT_ULL(6)
/* TDCS_TD_CTLS bits */
#define TD_CTLS_PENDING_VE_DISABLE_BIT 0
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index d93ba092d311..0abfb3505093 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -345,6 +345,139 @@ static int tdx_memory_post_plug(u64 addr, u64 size)
return -EINVAL;
}
+static bool tdx_page_release_supported;
+
+static void detect_mem_page_release(void)
+{
+ u64 config = 0;
+
+ tdg_vm_rd(TDCS_CONFIG_FLAGS, &config);
+
+ tdx_page_release_supported = !!(config & TDCS_CONFIG_PAGE_RELEASE);
+}
+
+static unsigned long try_release_one(phys_addr_t start, unsigned long len,
+ enum pg_level pg_level)
+{
+ unsigned long release_size = page_level_size(pg_level);
+ struct tdx_module_args args = {};
+ u8 page_size;
+ u64 ret;
+
+ if (!IS_ALIGNED(start, release_size))
+ return 0;
+
+ if (len < release_size)
+ return 0;
+
+ /*
+ * Pass the page physical address to TDX module to release the
+ * private page and to put it in PENDING state.
+ *
+ * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
+ */
+ switch (pg_level) {
+ case PG_LEVEL_4K:
+ page_size = TDX_PS_4K;
+ break;
+ case PG_LEVEL_2M:
+ page_size = TDX_PS_2M;
+ break;
+ case PG_LEVEL_1G:
+ page_size = TDX_PS_1G;
+ break;
+ default:
+ return 0;
+ }
+
+ args.rcx = start | page_size;
+ ret = __tdcall(TDG_MEM_PAGE_RELEASE, &args);
+ if (ret)
+ return 0;
+
+ return release_size;
+}
+
+static bool _tdx_release_memory(phys_addr_t start, phys_addr_t end, phys_addr_t *cur)
+{
+ *cur = start;
+
+ while (*cur < end) {
+ unsigned long len = end - *cur;
+ unsigned long release_size;
+
+ /*
+ * Try larger release first. It speeds up process by cutting
+ * number of hypercalls (if successful).
+ */
+
+ release_size = try_release_one(*cur, len, PG_LEVEL_1G);
+ if (!release_size)
+ release_size = try_release_one(*cur, len, PG_LEVEL_2M);
+ if (!release_size)
+ release_size = try_release_one(*cur, len, PG_LEVEL_4K);
+ if (!release_size)
+ return false;
+ *cur += release_size;
+ }
+
+ return true;
+}
+
+/*
+ * Release memory pages back to the hypervisor in TDX guests.
+ *
+ * @start: Physical start address of memory range to release
+ * @end: Physical end address of memory range to release
+ *
+ * Uses TDG.MEM.PAGE.RELEASE TDCALL to transition private pages back to
+ * pending state. If PAGE_RELEASE is not supported by the TDX
+ * configuration, returns true (success) as no action is needed.
+ *
+ * On partial failure, automatically re-accepts any successfully released
+ * pages to restore consistent memory state. Re-acceptance failure is
+ * treated as a fatal error since it indicates severe TDX module issues.
+ *
+ * Returns: true on success, false on failure
+ */
+static bool tdx_release_memory(phys_addr_t start, phys_addr_t end)
+{
+ phys_addr_t released = start;
+ bool ret;
+
+ if (!tdx_page_release_supported)
+ return true;
+
+ ret = _tdx_release_memory(start, end, &released);
+ if (!ret) {
+ pr_err("Failed to release memory [0x%llx, 0x%llx)\n",
+ (unsigned long long)start, (unsigned long long)end);
+
+ /*
+ * Re-accept any pages that were successfully released before
+ * the failure occurred. This should never fail since we're
+ * just restoring the previous accepted state.
+ */
+ if (!tdx_accept_memory(start, released))
+ panic("%s Failed to re-accept memory\n", __func__);
+ }
+
+ return ret;
+}
+
+static int tdx_memory_pre_unplug(u64 addr, u64 size)
+{
+ u64 end;
+
+ if (!PAGE_ALIGNED(addr) || !PAGE_ALIGNED(size))
+ return -EINVAL;
+
+ if (check_add_overflow(addr, size, &end))
+ return -EINVAL;
+
+ return tdx_release_memory(addr, end) ? 0 : -EINVAL;
+}
+
static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
@@ -380,6 +513,8 @@ static void tdx_setup(u64 *cc_mask)
reduce_unnecessary_ve();
set_memory_post_plug_callback(tdx_memory_post_plug);
+ detect_mem_page_release();
+ set_memory_pre_unplug_callback(tdx_memory_pre_unplug);
}
/*
--
2.52.0
^ permalink raw reply related
* [RFC PATCH 6/6] x86/tdx: Release private memory before private->shared conversion
From: Zhenzhong Duan @ 2026-06-04 9:35 UTC (permalink / raw)
To: marcandre.lureau, david, kas, rick.p.edgecombe, prsampat,
pbonzini, mst, peterx, chenyi.qiang, elena.reshetova, michaeluth,
ackerleytng
Cc: linux-kernel, linux-coco, virtualization, x86, yilun.xu,
xiaoyao.li, chao.p.peng
In-Reply-To: <20260604093551.1511079-1-zhenzhong.duan@intel.com>
TDX supports a PAGE.RELEASE feature, when configured, host can only
remove a private page until guest releases it and puts it in a PENDING
state through TDG.MEM.PAGE.RELEASE.
When TDX PAGE.RELEASE is supported, release private memory pages before
converting them to shared state, this ensures pages transition from
accepted to pending state.
The release operation helps handle scenarios where the hypervisor may
retain old private pages during conversion. Without proper release,
subsequent shared->private conversions could encounter re-acceptance
errors when attempting to accept pages that are still in accepted state.
If the release operation fails, abort the conversion to prevent
inconsistent memory state. Note that if tdx_map_gpa() fails after
successful release, we cannot safely rollback because the GPA mapping may
have partially succeeded, creating a mix of shared and private pages that
cannot be reliably tracked or recovered.
Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
arch/x86/coco/tdx/tdx.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 0abfb3505093..ecee6df92395 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -1121,7 +1121,25 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
{
phys_addr_t start = __pa(vaddr);
phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+ bool release_required = !enc && tdx_page_release_supported;
+ /*
+ * For private->shared conversion, release memory pages first.
+ * This transitions pages from accepted to pending state to be
+ * more robust with buggy VMM, e.g., VMM may keep old pages,
+ * when converting back to private, re-accept error triggers.
+ */
+ if (release_required && !tdx_release_memory(start, end))
+ return false;
+
+ /*
+ * Update the GPA mapping state. If this fails, we cannot rollback
+ * by calling tdx_accept_memory() because tdx_map_gpa() may have
+ * partially succeeded, creating a mix of shared and private pages.
+ * Attempting to accept the entire range would fail on pages that
+ * are still in shared state, and we have no way to determine which
+ * pages are in which state after partial failure.
+ */
if (!tdx_map_gpa(start, end, enc))
return false;
--
2.52.0
^ permalink raw reply related
* Re: [PATCH v6 3/4] firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC device
From: Suzuki K Poulose @ 2026-06-04 10:24 UTC (permalink / raw)
To: Sudeep Holla, Aneesh Kumar K.V (Arm)
Cc: linux-coco, linux-arm-kernel, linux-kernel, Catalin Marinas,
Greg KH, Jeremy Linton, Jonathan Cameron, Lorenzo Pieralisi,
Mark Rutland, Will Deacon, Steven Price
In-Reply-To: <20260603-determined-bumblebee-of-promise-e633d6@sudeepholla>
On 04/06/2026 10:21, Sudeep Holla wrote:
> On Wed, May 27, 2026 at 03:32:32PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> The Arm CCA guest TSM provider currently binds through the arm-cca-dev
>> platform device. Like arm-smccc-trng, this device is not an independent
>> platform resource; it is a software representation of the RSI firmware
>> service discovered through SMCCC.
>>
>> Move RSI discovery into the SMCCC firmware driver. When the SMCCC conduit
>> is SMC and the RSI ABI version check succeeds, create an arm-rsi-dev SMCCC
>> device. Convert the Arm CCA guest TSM provider to an SMCCC driver so it
>> binds to that discovered RSI service and keeps module autoloading through
>> the SMCCC device id table.
>>
>> Keep the old arm-cca-dev platform-device registration for now. Userspace
>> has used that device as a Realm-guest indicator, so removing it is left to
>> a follow-up patch that adds a replacement sysfs ABI.
>>
>> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
>> ---
>> arch/arm64/include/asm/rsi.h | 2 +-
>> arch/arm64/kernel/rsi.c | 2 +-
>> drivers/firmware/smccc/Makefile | 4 ++
>> drivers/firmware/smccc/rmm.c | 25 ++++++++
>> drivers/firmware/smccc/rmm.h | 17 ++++++
>> drivers/firmware/smccc/smccc.c | 8 +++
>> drivers/virt/coco/arm-cca-guest/Kconfig | 1 +
>> drivers/virt/coco/arm-cca-guest/Makefile | 2 +
>> .../{arm-cca-guest.c => arm-cca.c} | 60 +++++++++----------
>> 9 files changed, 89 insertions(+), 32 deletions(-)
>> create mode 100644 drivers/firmware/smccc/rmm.c
>> create mode 100644 drivers/firmware/smccc/rmm.h
>> rename drivers/virt/coco/arm-cca-guest/{arm-cca-guest.c => arm-cca.c} (85%)
>>
>> diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
>> index 88b50d660e85..2d2d363aaaee 100644
>> --- a/arch/arm64/include/asm/rsi.h
>> +++ b/arch/arm64/include/asm/rsi.h
>> @@ -10,7 +10,7 @@
>> #include <linux/jump_label.h>
>> #include <asm/rsi_cmds.h>
>>
>> -#define RSI_PDEV_NAME "arm-cca-dev"
>> +#define RSI_DEV_NAME "arm-rsi-dev"
>>
>> DECLARE_STATIC_KEY_FALSE(rsi_present);
>>
>> diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
>> index 92160f2e57ff..da440f71bb64 100644
>> --- a/arch/arm64/kernel/rsi.c
>> +++ b/arch/arm64/kernel/rsi.c
>> @@ -161,7 +161,7 @@ void __init arm64_rsi_init(void)
>> }
>>
>> static struct platform_device rsi_dev = {
>> - .name = RSI_PDEV_NAME,
>> + .name = "arm-cca-dev",
>> .id = PLATFORM_DEVID_NONE
>> };
>>
>> diff --git a/drivers/firmware/smccc/Makefile b/drivers/firmware/smccc/Makefile
>> index 40d19144a860..33c850aaff4d 100644
>> --- a/drivers/firmware/smccc/Makefile
>> +++ b/drivers/firmware/smccc/Makefile
>> @@ -2,3 +2,7 @@
>> #
>> obj-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smccc.o kvm_guest.o
>> obj-$(CONFIG_ARM_SMCCC_SOC_ID) += soc_id.o
>> +
>> +ifeq ($(CONFIG_HAVE_ARM_SMCCC_DISCOVERY),y)
>> +obj-$(CONFIG_ARM64) += rmm.o
>> +endif
>> diff --git a/drivers/firmware/smccc/rmm.c b/drivers/firmware/smccc/rmm.c
>> new file mode 100644
>> index 000000000000..d572f47e955c
>> --- /dev/null
>> +++ b/drivers/firmware/smccc/rmm.c
>> @@ -0,0 +1,25 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (C) 2026 Arm Limited
>> + */
>> +
>> +#include <linux/arm-smccc-bus.h>
>> +#include <linux/err.h>
>> +#include <linux/printk.h>
>> +
>> +#include "rmm.h"
>> +
>> +void __init register_rsi_device(void)
>> +{
>> + unsigned long ret;
>> +
>> + if (arm_smccc_1_1_get_conduit() != SMCCC_CONDUIT_SMC)
>> + return;
>> +
>> + ret = rsi_request_version(RSI_ABI_VERSION, NULL, NULL);
>> + if (ret != RSI_SUCCESS)
>> + return;
>> +
>> + if (IS_ERR(arm_smccc_device_register(RSI_DEV_NAME)))
>> + pr_err("%s: could not register device\n", RSI_DEV_NAME);
>> +}
>
> OK, I had something else in my mind when I started looking at 1/4. I didn't
> expect each device added on this bus comes up with it's own way to enumerate
> it. IMO, it defeats the purpose of building the smccc bus. We may find the
> specs for each feature deviated a bit but we can have a generic probe
> IMO, let's try that before exploring per feature probe function.
I guess this is ideal, but see below.
>
> I have a brief sketch of what I think we should aim for(uncompiled/untested)
> below. Let me know if that makes sense. I just based it on your bus code.
>
> Regards,
> Sudeep
>
> -->8
>
> diff --git c/drivers/firmware/smccc/smccc.c w/drivers/firmware/smccc/smccc.c
> index 695c920a8087..450605ddfab6 100644
> --- c/drivers/firmware/smccc/smccc.c
> +++ w/drivers/firmware/smccc/smccc.c
> @@ -9,21 +9,58 @@
> #include <linux/init.h>
> #include <linux/arm-smccc.h>
> #include <linux/kernel.h>
> -#include <linux/platform_device.h>
> #include <linux/arm-smccc-bus.h>
> #include <linux/idr.h>
> #include <linux/slab.h>
>
> -#include <asm/archrandom.h>
> -
> static u32 smccc_version = ARM_SMCCC_VERSION_1_0;
> static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
> static DEFINE_IDA(arm_smccc_bus_id);
>
> -bool __ro_after_init smccc_trng_available = false;
> +struct smccc_device_info {
> + u32 func_id;
> + bool requires_smc;
> + unsigned long min_return;
Is this viable for all ? There may be additional restrictions around
the return values and further SMC calls ? Which brings us to call backs
and we kind of have a variant of that here.
Suzuki
> + const char *device_name;
> +};
> +
> +bool __ro_after_init smccc_trng_available;
> s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
> s32 __ro_after_init smccc_soc_id_revision = SMCCC_RET_NOT_SUPPORTED;
>
> +static const struct smccc_device_info smccc_devices[] __initconst = {
> + {
> + .func_id = ARM_SMCCC_TRNG_VERSION,
> + .requires_smc = false,
> + .min_return = ARM_SMCCC_TRNG_MIN_VERSION,
> + .device_name = "arm-smccc-trng",
> + },
> +};
> +
> +static bool __init
> +smccc_probe_smccc_device(const struct smccc_device_info *smccc_dev)
> +{
> + struct arm_smccc_res res;
> + unsigned long ret;
> +
> + if (!IS_ENABLED(CONFIG_ARM64))
> + return false;
> +
> + if (smccc_conduit == SMCCC_CONDUIT_NONE)
> + return false;
> +
> + if (smccc_dev->requires_smc && smccc_conduit != SMCCC_CONDUIT_SMC)
> + return false;
> +
> + arm_smccc_1_1_invoke(smccc_dev->func_id, &res);
> + ret = res.a0;
> +
> + if ((s32)ret < 0)
> + return false;
> +
> + return ret >= smccc_dev->min_return;
> +}
> +
> void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
> {
> struct arm_smccc_res res;
> @@ -31,7 +68,7 @@ void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
> smccc_version = version;
> smccc_conduit = conduit;
>
> - smccc_trng_available = smccc_probe_trng();
> + smccc_trng_available = smccc_probe_smccc_device(&smccc_devices[0]);
>
> if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
> (smccc_conduit != SMCCC_CONDUIT_NONE)) {
> @@ -241,14 +278,20 @@ subsys_initcall(arm_smccc_bus_init);
>
> static int __init smccc_devices_init(void)
> {
> - struct platform_device *pdev;
> -
> - if (smccc_trng_available) {
> - pdev = platform_device_register_simple("smccc_trng", -1,
> - NULL, 0);
> - if (IS_ERR(pdev))
> - pr_err("smccc_trng: could not register device: %ld\n",
> - PTR_ERR(pdev));
> + const struct smccc_device_info *smccc_dev;
> + struct arm_smccc_device *sdev;
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(smccc_devices); i++) {
> + smccc_dev = &smccc_devices[i];
> +
> + if (!smccc_probe_smccc_device(smccc_dev))
> + continue;
> +
> + sdev = arm_smccc_device_register(smccc_dev->device_name);
> + if (IS_ERR(sdev))
> + pr_err("%s: could not register device: %ld\n",
> + smccc_dev->device_name, PTR_ERR(sdev));
> }
>
> return 0;
>
^ permalink raw reply
* Re: [PATCH v6 3/4] firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC device
From: Sudeep Holla @ 2026-06-04 10:55 UTC (permalink / raw)
To: Suzuki K Poulose
Cc: Aneesh Kumar K.V (Arm), linux-coco, linux-arm-kernel,
linux-kernel, Sudeep Holla, Catalin Marinas, Greg KH,
Jeremy Linton, Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland,
Will Deacon, Steven Price
In-Reply-To: <bee6166c-f63e-4692-a874-b80b7a6f6dc4@arm.com>
On Thu, Jun 04, 2026 at 11:24:05AM +0100, Suzuki K Poulose wrote:
> On 04/06/2026 10:21, Sudeep Holla wrote:
> > On Wed, May 27, 2026 at 03:32:32PM +0530, Aneesh Kumar K.V (Arm) wrote:
> > > The Arm CCA guest TSM provider currently binds through the arm-cca-dev
> > > platform device. Like arm-smccc-trng, this device is not an independent
> > > platform resource; it is a software representation of the RSI firmware
> > > service discovered through SMCCC.
> > >
> > > Move RSI discovery into the SMCCC firmware driver. When the SMCCC conduit
> > > is SMC and the RSI ABI version check succeeds, create an arm-rsi-dev SMCCC
> > > device. Convert the Arm CCA guest TSM provider to an SMCCC driver so it
> > > binds to that discovered RSI service and keeps module autoloading through
> > > the SMCCC device id table.
> > >
> > > Keep the old arm-cca-dev platform-device registration for now. Userspace
> > > has used that device as a Realm-guest indicator, so removing it is left to
> > > a follow-up patch that adds a replacement sysfs ABI.
> > >
> > > Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> > > ---
> > > arch/arm64/include/asm/rsi.h | 2 +-
> > > arch/arm64/kernel/rsi.c | 2 +-
> > > drivers/firmware/smccc/Makefile | 4 ++
> > > drivers/firmware/smccc/rmm.c | 25 ++++++++
> > > drivers/firmware/smccc/rmm.h | 17 ++++++
> > > drivers/firmware/smccc/smccc.c | 8 +++
> > > drivers/virt/coco/arm-cca-guest/Kconfig | 1 +
> > > drivers/virt/coco/arm-cca-guest/Makefile | 2 +
> > > .../{arm-cca-guest.c => arm-cca.c} | 60 +++++++++----------
> > > 9 files changed, 89 insertions(+), 32 deletions(-)
> > > create mode 100644 drivers/firmware/smccc/rmm.c
> > > create mode 100644 drivers/firmware/smccc/rmm.h
> > > rename drivers/virt/coco/arm-cca-guest/{arm-cca-guest.c => arm-cca.c} (85%)
> > >
> > > diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
> > > index 88b50d660e85..2d2d363aaaee 100644
> > > --- a/arch/arm64/include/asm/rsi.h
> > > +++ b/arch/arm64/include/asm/rsi.h
> > > @@ -10,7 +10,7 @@
> > > #include <linux/jump_label.h>
> > > #include <asm/rsi_cmds.h>
> > > -#define RSI_PDEV_NAME "arm-cca-dev"
> > > +#define RSI_DEV_NAME "arm-rsi-dev"
> > > DECLARE_STATIC_KEY_FALSE(rsi_present);
> > > diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
> > > index 92160f2e57ff..da440f71bb64 100644
> > > --- a/arch/arm64/kernel/rsi.c
> > > +++ b/arch/arm64/kernel/rsi.c
> > > @@ -161,7 +161,7 @@ void __init arm64_rsi_init(void)
> > > }
> > > static struct platform_device rsi_dev = {
> > > - .name = RSI_PDEV_NAME,
> > > + .name = "arm-cca-dev",
> > > .id = PLATFORM_DEVID_NONE
> > > };
> > > diff --git a/drivers/firmware/smccc/Makefile b/drivers/firmware/smccc/Makefile
> > > index 40d19144a860..33c850aaff4d 100644
> > > --- a/drivers/firmware/smccc/Makefile
> > > +++ b/drivers/firmware/smccc/Makefile
> > > @@ -2,3 +2,7 @@
> > > #
> > > obj-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smccc.o kvm_guest.o
> > > obj-$(CONFIG_ARM_SMCCC_SOC_ID) += soc_id.o
> > > +
> > > +ifeq ($(CONFIG_HAVE_ARM_SMCCC_DISCOVERY),y)
> > > +obj-$(CONFIG_ARM64) += rmm.o
> > > +endif
> > > diff --git a/drivers/firmware/smccc/rmm.c b/drivers/firmware/smccc/rmm.c
> > > new file mode 100644
> > > index 000000000000..d572f47e955c
> > > --- /dev/null
> > > +++ b/drivers/firmware/smccc/rmm.c
> > > @@ -0,0 +1,25 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * Copyright (C) 2026 Arm Limited
> > > + */
> > > +
> > > +#include <linux/arm-smccc-bus.h>
> > > +#include <linux/err.h>
> > > +#include <linux/printk.h>
> > > +
> > > +#include "rmm.h"
> > > +
> > > +void __init register_rsi_device(void)
> > > +{
> > > + unsigned long ret;
> > > +
> > > + if (arm_smccc_1_1_get_conduit() != SMCCC_CONDUIT_SMC)
> > > + return;
> > > +
> > > + ret = rsi_request_version(RSI_ABI_VERSION, NULL, NULL);
> > > + if (ret != RSI_SUCCESS)
> > > + return;
> > > +
> > > + if (IS_ERR(arm_smccc_device_register(RSI_DEV_NAME)))
> > > + pr_err("%s: could not register device\n", RSI_DEV_NAME);
> > > +}
> >
> > OK, I had something else in my mind when I started looking at 1/4. I didn't
> > expect each device added on this bus comes up with it's own way to enumerate
> > it. IMO, it defeats the purpose of building the smccc bus. We may find the
> > specs for each feature deviated a bit but we can have a generic probe
> > IMO, let's try that before exploring per feature probe function.
>
> I guess this is ideal, but see below.
>
> >
> > I have a brief sketch of what I think we should aim for(uncompiled/untested)
> > below. Let me know if that makes sense. I just based it on your bus code.
> >
> > Regards,
> > Sudeep
> >
> > -->8
> >
> > diff --git c/drivers/firmware/smccc/smccc.c w/drivers/firmware/smccc/smccc.c
> > index 695c920a8087..450605ddfab6 100644
> > --- c/drivers/firmware/smccc/smccc.c
> > +++ w/drivers/firmware/smccc/smccc.c
> > @@ -9,21 +9,58 @@
> > #include <linux/init.h>
> > #include <linux/arm-smccc.h>
> > #include <linux/kernel.h>
> > -#include <linux/platform_device.h>
> > #include <linux/arm-smccc-bus.h>
> > #include <linux/idr.h>
> > #include <linux/slab.h>
> >
> > -#include <asm/archrandom.h>
> > -
> > static u32 smccc_version = ARM_SMCCC_VERSION_1_0;
> > static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
> > static DEFINE_IDA(arm_smccc_bus_id);
> >
> > -bool __ro_after_init smccc_trng_available = false;
> > +struct smccc_device_info {
> > + u32 func_id;
> > + bool requires_smc;
I wanted to ask this but just forgot.
RSI uses SMC because the Realm is calling the RMM, not the host hypervisor.
Using HVC would make RSI look like a hypervisor ABI and would blur the trust
boundary, is that right assumption ?
I assume the conduit derived from PSCI node for the realms will always be
SMC and it shouldn't be a problem. I mean there won't be a case where you
would need HVC as conduit within the realm VM kernel ?
> > + unsigned long min_return;
>
> Is this viable for all ? There may be additional restrictions around
> the return values and further SMC calls ?
Fair enough, but do you have examples currently ?
> Which brings us to call backs and we kind of have a variant of that here.
>
I am fine with callback but need to keep the scope of it as minimum as
possible IMO. For me it's simple if that main FID for that feature is
implemented we create SMCCC device and probe can deal with all the extra
details, why won't that work ? Just trying to understand why the core
SMCCC bus need to know more or must provide a callback.
--
Regards,
Sudeep
^ permalink raw reply
* Re: [PATCH v6 0/4] Switch Arm SMCCC firmware services to an SMCCC bus
From: Aneesh Kumar K.V @ 2026-06-04 12:58 UTC (permalink / raw)
To: gregkh, linux-coco, linux-arm-kernel, linux-kernel
Cc: Catalin Marinas, Jeremy Linton, Jonathan Cameron,
Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
Steven Price, Suzuki K Poulose
In-Reply-To: <20260527100233.428018-1-aneesh.kumar@kernel.org>
Hi Greg,
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> writes:
> As discussed here:
> https://lore.kernel.org/all/20250728135216.48084-12-aneesh.kumar@kernel.org
>
> The earlier CCA guest support used an arm-cca-dev platform device as a pure
> software anchor for the TSM class device. That platform device did not
> correspond to a DT/ACPI described device, MMIO range, interrupt, or other
> platform resource; it existed only to make the CCA guest driver bind and to
> place the resulting TSM device in the driver model. The same pattern also
> exists for smccc_trng. Creating separate platform devices for such
> SMCCC-discovered features is misleading, because those features are not
> independent platform devices.
>
> This series adds an Arm SMCCC bus for services discovered through the SMCCC
> firmware interface. The bus provides SMCCC device and driver registration
> helpers, name-based matching, uevent modalias generation, and a sysfs modalias
> attribute. SMCCC service drivers can use MODULE_DEVICE_TABLE(arm_smccc, ...)
> to emit arm_smccc:<name> aliases, allowing userspace to autoload service
> drivers when the SMCCC core registers matching firmware-service devices.
>
> The series then moves SMCCC TRNG and the Arm CCA guest RSI service off the
> platform bus. When the SMCCC core discovers the corresponding firmware
> service, it registers an arm-smccc device for that service. The hwrng
> arm_smccc_trng driver and the Arm CCA guest TSM provider are converted to
> SMCCC drivers that bind to those discovered devices.
>
> The old arm-cca-dev platform device has also been used by userspace as a Realm
> guest indicator. Removing it without a replacement would leave userspace
> depending on an internal driver-binding device. This series therefore adds
> /sys/firmware/cca/realm_guest as a stable, architecture-provided ABI for
> detecting whether the kernel is running as an Arm CCA Realm guest, and then
> removes the dummy arm-cca-dev platform-device registration.
>
Gentle ping. Based on your feedback in [1], I reworked the series to use
an SMCCC bus, with smccc-trng and arm-cca-dev represented as devices on
that bus. Could you let me know whether this approach addresses your
concerns?
[1] https://lore.kernel.org/all/2026051451-comfort-museum-4d2a@gregkh/
-aneesh
^ permalink raw reply
* Re: [PATCH v6 3/4] firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC device
From: Aneesh Kumar K.V @ 2026-06-04 13:26 UTC (permalink / raw)
To: Sudeep Holla
Cc: linux-coco, linux-arm-kernel, linux-kernel, Catalin Marinas,
Sudeep Holla, Greg KH, Jeremy Linton, Jonathan Cameron,
Lorenzo Pieralisi, Mark Rutland, Will Deacon, Steven Price,
Suzuki K Poulose
In-Reply-To: <20260603-determined-bumblebee-of-promise-e633d6@sudeepholla>
Sudeep Holla <sudeep.holla@kernel.org> writes:
...
> +static const struct smccc_device_info smccc_devices[] __initconst = {
> + {
> + .func_id = ARM_SMCCC_TRNG_VERSION,
> + .requires_smc = false,
> + .min_return = ARM_SMCCC_TRNG_MIN_VERSION,
> + .device_name = "arm-smccc-trng",
> + },
> +};
> +
> +static bool __init
> +smccc_probe_smccc_device(const struct smccc_device_info *smccc_dev)
> +{
> + struct arm_smccc_res res;
> + unsigned long ret;
> +
> + if (!IS_ENABLED(CONFIG_ARM64))
> + return false;
> +
> + if (smccc_conduit == SMCCC_CONDUIT_NONE)
> + return false;
> +
> + if (smccc_dev->requires_smc && smccc_conduit != SMCCC_CONDUIT_SMC)
> + return false;
> +
> + arm_smccc_1_1_invoke(smccc_dev->func_id, &res);
> + ret = res.a0;
> +
> + if ((s32)ret < 0)
> + return false;
> +
> + return ret >= smccc_dev->min_return;
> +}
> +
>
I am not sure we want the check to be as simple as ret < 0. Some
function IDs may return input errors based on the supplied arguments
(for example, RMI_ERROR_INPUT). In those cases, we would likely want
this to be handled via a callback.
We also want to use conditional compilation for some function IDs.
Given the callback approach and the #ifdefs, I wonder whether what we
currently have is actually simpler and more flexible.”
> void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
> {
> struct arm_smccc_res res;
> @@ -31,7 +68,7 @@ void __init arm_smccc_version_init(u32 version, enum arm_smccc_conduit conduit)
> smccc_version = version;
> smccc_conduit = conduit;
>
> - smccc_trng_available = smccc_probe_trng();
> + smccc_trng_available = smccc_probe_smccc_device(&smccc_devices[0]);
>
> if ((smccc_version >= ARM_SMCCC_VERSION_1_2) &&
> (smccc_conduit != SMCCC_CONDUIT_NONE)) {
> @@ -241,14 +278,20 @@ subsys_initcall(arm_smccc_bus_init);
>
> static int __init smccc_devices_init(void)
> {
> - struct platform_device *pdev;
> -
> - if (smccc_trng_available) {
> - pdev = platform_device_register_simple("smccc_trng", -1,
> - NULL, 0);
> - if (IS_ERR(pdev))
> - pr_err("smccc_trng: could not register device: %ld\n",
> - PTR_ERR(pdev));
> + const struct smccc_device_info *smccc_dev;
> + struct arm_smccc_device *sdev;
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(smccc_devices); i++) {
> + smccc_dev = &smccc_devices[i];
> +
> + if (!smccc_probe_smccc_device(smccc_dev))
> + continue;
> +
> + sdev = arm_smccc_device_register(smccc_dev->device_name);
> + if (IS_ERR(sdev))
> + pr_err("%s: could not register device: %ld\n",
> + smccc_dev->device_name, PTR_ERR(sdev));
> }
>
> return 0;
>
-aneesh
^ permalink raw reply
* Re: [PATCH v6 3/4] firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC device
From: Sudeep Holla @ 2026-06-04 13:45 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-coco, linux-arm-kernel, linux-kernel, Catalin Marinas,
Sudeep Holla, Greg KH, Jeremy Linton, Jonathan Cameron,
Lorenzo Pieralisi, Mark Rutland, Will Deacon, Steven Price,
Suzuki K Poulose
In-Reply-To: <yq5ase72qvwb.fsf@kernel.org>
On Thu, Jun 04, 2026 at 06:56:28PM +0530, Aneesh Kumar K.V wrote:
> Sudeep Holla <sudeep.holla@kernel.org> writes:
>
> ...
>
> > +static const struct smccc_device_info smccc_devices[] __initconst = {
> > + {
> > + .func_id = ARM_SMCCC_TRNG_VERSION,
> > + .requires_smc = false,
> > + .min_return = ARM_SMCCC_TRNG_MIN_VERSION,
> > + .device_name = "arm-smccc-trng",
> > + },
> > +};
> > +
> > +static bool __init
> > +smccc_probe_smccc_device(const struct smccc_device_info *smccc_dev)
> > +{
> > + struct arm_smccc_res res;
> > + unsigned long ret;
> > +
> > + if (!IS_ENABLED(CONFIG_ARM64))
> > + return false;
> > +
> > + if (smccc_conduit == SMCCC_CONDUIT_NONE)
> > + return false;
> > +
> > + if (smccc_dev->requires_smc && smccc_conduit != SMCCC_CONDUIT_SMC)
> > + return false;
> > +
> > + arm_smccc_1_1_invoke(smccc_dev->func_id, &res);
> > + ret = res.a0;
> > +
> > + if ((s32)ret < 0)
> > + return false;
> > +
> > + return ret >= smccc_dev->min_return;
> > +}
> > +
> >
>
> I am not sure we want the check to be as simple as ret < 0. Some
> function IDs may return input errors based on the supplied arguments
> (for example, RMI_ERROR_INPUT). In those cases, we would likely want
> this to be handled via a callback.
>
As I mentioned in response to Suzuki, we can defer that to probe of
that device. If *_VERSION, succeeds SMCCC core can add that device and
leave the rest to the core keeping the core and bus layer simple IMO.
> We also want to use conditional compilation for some function IDs.
> Given the callback approach and the #ifdefs, I wonder whether what we
> currently have is actually simpler and more flexible.”
>
I was trying to avoid conditional compilation altogether and hence the
reason for keeping it as simple as possible. Also IS_ENABLED(CONFIG_ARM64)
in above snippet must come as some condition to this generic probe.
Adding any more logic or callback defeats the bus idea here if we need
to rely/depend on multiple conditional compilation or callbacks IMO.
Let's find see if it can work with what we are adding now and may add in
near future and then decide.
--
Regards,
Sudeep
^ permalink raw reply
* RE: [PATCH v5 05/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Michael Kelley @ 2026-06-04 14:05 UTC (permalink / raw)
To: Jason Gunthorpe, Michael Kelley
Cc: Aneesh Kumar K.V, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
Mostafa Saleh, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
Xu Yilun, linuxppc-dev@lists.ozlabs.org,
linux-s390@vger.kernel.org, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86@kernel.org, Jiri Pirko
In-Reply-To: <20260603005454.GM2487554@ziepe.ca>
From: Jason Gunthorpe <jgg@ziepe.ca> Sent: Tuesday, June 2, 2026 5:55 PM
>
> On Tue, Jun 02, 2026 at 02:24:40PM +0000, Michael Kelley wrote:
>
> > Except that in a normal VM, the "unencrypted" pool attribute does *not*
> > describe the state of the memory itself. In a normal VM, the memory is
> > unencrypted, but the "unencrypted" pool attribute is false. That
> > contradiction is the essence of my concern.
>
> I would argue no..
>
> When CC is enabled the default state of memory in a Linux environment
> is "encrypted". You have to take a special action to "decrypt" it.
>
> Thus the default state of memory in a non-CC environment is also
> paradoxically "encrypted" too.
The need to have such an unnatural premise is usually an indication
of a conceptual problem with the overall model, or perhaps just a
terminology problem.
Here's a proposal. The new DMA attribute is DMA_ATTR_CC_SHARED.
Name the pool attribute "cc_shared" instead of "unencrypted". Having
"cc_shared" set to false in a normal VM doesn't lead to the non-sensical
situation of claiming that a normal VM is encrypted. The boolean
"unencrypted" parameter that has been added to various calls also
becomes "cc_shared". If "CC_SHARED" is a suitable name for the DMA
attribute, it ought to be suitable as the pool attribute. And everything
matches as well.
Michael
> "decryption" is impossible.
>
> Therefore the "unencrypted" state is a special state that only memory
> inside a CC VM can have. A normal VM can never have "unencrypted"
> memory at all, so having it be false in the pool is accurate as far as
> the APIs go.
>
> un-encrypted = true means "the memory in this pool was transformed with
> set_memory_decrypted()" - which is impossible on a normal VM.
>
> Jason
^ permalink raw reply
* Re: [PATCH v5 05/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Jason Gunthorpe @ 2026-06-04 14:30 UTC (permalink / raw)
To: Michael Kelley
Cc: Aneesh Kumar K.V, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
Mostafa Saleh, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
Xu Yilun, linuxppc-dev@lists.ozlabs.org,
linux-s390@vger.kernel.org, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86@kernel.org, Jiri Pirko
In-Reply-To: <SN6PR02MB4157F94C902B78E55E99372DD4102@SN6PR02MB4157.namprd02.prod.outlook.com>
On Thu, Jun 04, 2026 at 02:05:35PM +0000, Michael Kelley wrote:
> From: Jason Gunthorpe <jgg@ziepe.ca> Sent: Tuesday, June 2, 2026 5:55 PM
> >
> > On Tue, Jun 02, 2026 at 02:24:40PM +0000, Michael Kelley wrote:
> >
> > > Except that in a normal VM, the "unencrypted" pool attribute does *not*
> > > describe the state of the memory itself. In a normal VM, the memory is
> > > unencrypted, but the "unencrypted" pool attribute is false. That
> > > contradiction is the essence of my concern.
> >
> > I would argue no..
> >
> > When CC is enabled the default state of memory in a Linux environment
> > is "encrypted". You have to take a special action to "decrypt" it.
> >
> > Thus the default state of memory in a non-CC environment is also
> > paradoxically "encrypted" too.
>
> The need to have such an unnatural premise is usually an indication
> of a conceptual problem with the overall model, or perhaps just a
> terminology problem.
Oh yes I do think the AMD derived terminogy is aweful :(
> Here's a proposal. The new DMA attribute is DMA_ATTR_CC_SHARED.
> Name the pool attribute "cc_shared" instead of "unencrypted".
Yeah maybe. I sometimes imagine replacing the encrypted/decrypted
names with cc_shared too just to make it sane.
> "cc_shared" set to false in a normal VM doesn't lead to the non-sensical
> situation of claiming that a normal VM is encrypted.
It seems like a good idea to me
Jason
^ permalink raw reply
* Re: [PATCH v14 09/44] arm64: RMI: Provide functions to delegate/undelegate ranges of memory
From: Steven Price @ 2026-06-04 14:43 UTC (permalink / raw)
To: Marc Zyngier
Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <867bowx3qx.wl-maz@kernel.org>
On 21/05/2026 14:59, Marc Zyngier wrote:
> On Wed, 13 May 2026 14:17:17 +0100,
> Steven Price <steven.price@arm.com> wrote:
>>
>> The RMM requires memory is 'delegated' to it so that it can be used
>> either for a realm guest or for various tracking purposes within the RMM
>> (e.g. for metadata or page tables). Memory that has been delegated
>> cannot be accessed by the host (it will result in a Granule Protection
>> Fault).
>>
>> Undelegation may fail if the memory is still in use by the RMM. This
>> shouldn't happen (Linux should ensure it has destroyed the RMM objects
>> before attempting to undelegate). In the event that it does happen this
>> points to a programming bug and the only reasonable approach is for the
>> physical pages to be leaked - it is up to the caller of
>> rmi_undelegate_range() to handle this.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> v14:
>> * Split into separate patch and moved out of KVM
>> ---
>> arch/arm64/include/asm/rmi_cmds.h | 13 +++++++++++
>> arch/arm64/kernel/rmi.c | 36 +++++++++++++++++++++++++++++++
>> 2 files changed, 49 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
>> index 9078a2920a7c..eb213c8e6f26 100644
>> --- a/arch/arm64/include/asm/rmi_cmds.h
>> +++ b/arch/arm64/include/asm/rmi_cmds.h
>> @@ -33,6 +33,19 @@ struct rmi_sro_state {
>> } while (RMI_RETURN_STATUS(res.a0) == RMI_BUSY || \
>> RMI_RETURN_STATUS(res.a0) == RMI_BLOCKED)
>>
>> +int rmi_delegate_range(phys_addr_t phys, unsigned long size);
>> +int rmi_undelegate_range(phys_addr_t phys, unsigned long size);
>> +
>> +static inline int rmi_delegate_page(phys_addr_t phys)
>> +{
>> + return rmi_delegate_range(phys, PAGE_SIZE);
>> +}
>> +
>> +static inline int rmi_undelegate_page(phys_addr_t phys)
>> +{
>> + return rmi_undelegate_range(phys, PAGE_SIZE);
>> +}
>> +
>> bool rmi_is_available(void);
>>
>> unsigned long rmi_sro_execute(struct rmi_sro_state *sro, gfp_t gfp);
>> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
>> index 52a415e99500..08cef54acadb 100644
>> --- a/arch/arm64/kernel/rmi.c
>> +++ b/arch/arm64/kernel/rmi.c
>> @@ -12,6 +12,42 @@ static bool arm64_rmi_is_available;
>> unsigned long rmm_feat_reg0;
>> unsigned long rmm_feat_reg1;
>>
>> +int rmi_delegate_range(phys_addr_t phys, unsigned long size)
>> +{
>> + unsigned long ret = 0;
>> + unsigned long top = phys + size;
>> + unsigned long out_top;
>> +
>> + while (phys < top) {
>> + ret = rmi_granule_range_delegate(phys, top, &out_top);
>> + if (ret == RMI_SUCCESS)
>> + phys = out_top;
>> + else if (ret != RMI_BUSY && ret != RMI_BLOCKED)
>> + return ret;
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +int rmi_undelegate_range(phys_addr_t phys, unsigned long size)
>> +{
>> + unsigned long ret = 0;
>> + unsigned long top = phys + size;
>> + unsigned long out_top;
>> +
>> + WARN_ON(size == 0);
>
> I find it odd to warn on size = 0. After all, free(NULL) is not an
> error. But even then, you continue feeding this to the RMM.
Ok, I'll admit that this is left over debugging - although this is a
condition that shouldn't happen.
Note that the while() condition prevents this from actually getting to
the RMM.
I'll drop the WARN_ON() since it's confusing.
Thanks,
Steve
> You also don't seem to be bothered with that on the delegation side...
>
>> +
>> + while (phys < top) {
>> + ret = rmi_granule_range_undelegate(phys, top, &out_top);
>> + if (ret == RMI_SUCCESS)
>> + phys = out_top;
>
> and size==0 doesn't violate any of the failure conditions listed in
> B4.5.18.2 (beta2). Will you end-up looping around forever?
>
> Same questions for the delegation, obviously.
>
> M.
>
^ permalink raw reply
* [PATCH v4 0/3] x86/tdx: Fix port I/O handling bugs
From: Kiryl Shutsemau (Meta) @ 2026-06-04 14:46 UTC (permalink / raw)
To: tglx, mingo, bp, dave.hansen
Cc: seanjc, pbonzini, sathyanarayanan.kuppuswamy, kai.huang,
xiaoyao.li, binbin.wu, rick.p.edgecombe, david.laight.linux, ak,
djbw, tsyrulnikov.borys, x86, kvm, linux-coco, linux-kernel,
Kiryl Shutsemau (Meta)
Two bugs in the TDX guest port I/O #VE emulation, plus a small helper
extracted from KVM to avoid open-coding partial-register-write logic
in the second fix.
Patch 1 is an off-by-one in the mask used to clip the I/O value:
GENMASK(BITS_PER_BYTE * size, 0) is one bit too wide. Unchanged from
v3 1/2.
Patch 2 lifts KVM's instruction-emulator helper assign_register() out
of arch/x86/kvm/emulate.c into <asm/insn-eval.h>, renamed to
insn_assign_reg(). Dave suggested consolidating rather than adding a
third copy of the same partial-register switch; the body is rewritten
using plain arithmetic (suggested by David Laight) so the helper does
not rely on -fno-strict-aliasing or little-endian byte order. KVM
behaviour is unchanged.
Patch 3 fixes the architectural zero-extension of 32-bit IN: the old
mask-based handle_in() preserves RAX[63:32] after inl, which is wrong.
Now done by calling the helper.
Changes since v3:
- Patch 1/2 carried over unchanged as 1/3.
- Helper extracted from KVM (new patch 2/3) and used from
handle_in() (Dave, David Laight).
- Reviewed-by tags from v3 2/2 dropped on patch 3/3 because the
implementation changed substantially. v3 1/2 -> v4 1/3 Rb tags
preserved (patch unchanged).
v3: https://lore.kernel.org/all/20260527120544.2903923-1-kas@kernel.org/
Kiryl Shutsemau (Meta) (3):
x86/tdx: Fix off-by-one in port I/O handling
x86/insn-eval: Add insn_assign_reg() helper
x86/tdx: Fix zero-extension for 32-bit port I/O
arch/x86/coco/tdx/tdx.c | 10 ++++------
arch/x86/include/asm/insn-eval.h | 25 +++++++++++++++++++++++++
arch/x86/kvm/emulate.c | 26 ++++----------------------
3 files changed, 33 insertions(+), 28 deletions(-)
--
2.54.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox