* [PATCH v5] dma-contiguous: add kconfig option to setup numa cma area if not configured explicitly
@ 2026-05-12 8:55 Feng Tang
0 siblings, 0 replies; only message in thread
From: Feng Tang @ 2026-05-12 8:55 UTC (permalink / raw)
To: Marek Szyprowski, Robin Murphy, Ying Huang, Andrew Morton,
David Hildenbrand
Cc: Mike Rapoport, linux-mm, Christoph Hellwig, Catalin Marinas,
Will Deacon, iommu, linux-kernel, Feng Tang, Changrong Chen
There was a report on a multi-numa-nodes arm64 server that when IOMMU
is disabled, the dma_alloc_coherent() function always returns memory
from node 0 even for devices attaching to other nodes, while they can
get local dma memory when IOMMU is on with the same API.
The reason is, when IOMMU is disabled, the dma_alloc_coherent() will
go the direct way and call dma_alloc_contiguous(). The system doesn't
have any explicit cma setting (like per-numa cma), and only has a
default 64MB cma reserved area (on node 0), where kernel will try
first to allocate memory from.
Robin Murphy suggested to setup pernuma cma or disable cma, which did
solve the issue. While there is still concern that for customers
which don't have much kernel knowledge, they could still suffer from
this silently as some architectures enable cma area by default (not
an issue for X86 though, which set CONFIG_CMA_SIZE_MBYTES to 0 by
default) for most Linux distributions.
One thought is to follow the current cma reserving policy for platform
with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma (either the 'numa cma'
or 'cma pernuma' method) is not explicitly configured, and the platform
really has multiple NUMA nodes, set it up according to size of default
'dma_contiguous_default_area'. This way, the default behavior of
platform with one NUMA node is kept unchanged (say embedded/small
devices don't need to allocate extra memory), while the general dma
locality is improved.
Add a new bool kernel config CONFIG_CMA_SIZE_PERNUMA to control whether
to enable it. Even when the config is enabled, user can still disable
it by kernel-cmdline setting like "numa_cma=0:0" or "cma_pernuma=0".
Reported-by: Changrong Chen <chenchangrong.ccr@alibaba-inc.com>
Suggested-by: Ying Huang <ying.huang@linux.alibaba.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
Changelog:
since v4:
* remove the cma_get_nid() helper and use 'nr_online_nodes > 1'
check instead (David, Robin)
since v3:
* Add kernel config to control this pernuma cma setup (David, Robin)
* Fix a compile waring (LKP 0day robot)
since v2:
* setup the numa cma are following default cma, while
skipping the node holds the default cma (Robin Murphy)
* add cma_get_node() help and related code
* add reporter info
since v1:
* don't use the original way of adding alloc_pages_node()
before trying default cma node (Robin Murphy)
* setup default numa cma area if not configured (Ying Huang)
v4: https://lore.kernel.org/lkml/20260509072543.69650-1-feng.tang@linux.alibaba.com/
v3: https://lore.kernel.org/lkml/20260428060550.7167-1-feng.tang@linux.alibaba.com/
v2: https://lore.kernel.org/lkml/20260423095243.14239-1-feng.tang@linux.alibaba.com/
v1: https://lore.kernel.org/lkml/20260414090310.92055-1-feng.tang@linux.alibaba.com/
kernel/dma/Kconfig | 10 ++++++++++
kernel/dma/contiguous.c | 13 +++++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index bfef21b4a9ae..c9fa0a922cba 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -181,6 +181,16 @@ config DMA_NUMA_CMA
or set the node id and its size of CMA by specifying "numa_cma=
<node>:size[,<node>:size]" on the kernel's command line.
+config CMA_SIZE_PERNUMA
+ bool "Default CMA area per NUMA node"
+ depends on DMA_NUMA_CMA
+ default y
+ help
+ On systems with more than one NUMA node, the selected CMA
+ area size will be also allocated on each additional node,
+ so that most devices may have benefit from better DMA
+ locality without an explicit command-line opt-in.
+
comment "Default contiguous memory area size:"
config CMA_SIZE_MBYTES
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 03f52bd17120..42227c823893 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -136,6 +136,7 @@ static struct cma *dma_contiguous_numa_area[MAX_NUMNODES];
static phys_addr_t numa_cma_size[MAX_NUMNODES] __initdata;
static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
static phys_addr_t pernuma_size_bytes __initdata;
+static bool numa_cma_configured;
static int __init early_numa_cma(char *p)
{
@@ -164,6 +165,7 @@ static int __init early_numa_cma(char *p)
break;
}
+ numa_cma_configured = true;
return 0;
}
early_param("numa_cma", early_numa_cma);
@@ -171,6 +173,7 @@ early_param("numa_cma", early_numa_cma);
static int __init early_cma_pernuma(char *p)
{
pernuma_size_bytes = memparse(p, &p);
+ numa_cma_configured = true;
return 0;
}
early_param("cma_pernuma", early_cma_pernuma);
@@ -199,6 +202,12 @@ static void __init dma_numa_cma_reserve(void)
{
int nid;
+#ifdef CONFIG_CMA_SIZE_PERNUMA
+ if (!numa_cma_configured && dma_contiguous_default_area
+ && nr_online_nodes > 1)
+ pernuma_size_bytes = cma_get_size(dma_contiguous_default_area);
+#endif
+
for_each_node(nid) {
int ret;
char name[CMA_MAX_NAME];
@@ -255,8 +264,6 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
phys_addr_t selected_limit = limit;
bool fixed = false;
- dma_numa_cma_reserve();
-
pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
if (size_cmdline != -1) {
@@ -312,6 +319,8 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
if (ret)
pr_warn("Couldn't queue default CMA region for heap creation.");
}
+
+ dma_numa_cma_reserve();
}
void __weak
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-05-12 8:55 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 8:55 [PATCH v5] dma-contiguous: add kconfig option to setup numa cma area if not configured explicitly Feng Tang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox