* [RFC PATCH] mm/hugetlb_cma: round up per_node before logging it
@ 2026-04-21 23:02 Sang-Heon Jeon
2026-04-22 6:20 ` Muchun Song
0 siblings, 1 reply; 3+ messages in thread
From: Sang-Heon Jeon @ 2026-04-21 23:02 UTC (permalink / raw)
To: muchun.song, osalvador, david, akpm; +Cc: linux-mm, Sang-Heon Jeon
When the user requests a total hugetlb CMA size without per-node
specification, hugetlb_cma_reserve() computes per_node from
hugetlb_cma_size and the number of nodes that have memory
per_node = DIV_ROUND_UP(hugetlb_cma_size,
nodes_weight(hugetlb_bootmem_nodes));
The reservation loop later computes
size = round_up(min(per_node, hugetlb_cma_size - reserved),
PAGE_SIZE << order);
So the actually reserved per_node size is multiple of (PAGE_SIZE <<
order), but the logged per_node is not rounded up, so it may be smaller
than the actual reserved size.
For example, as the existing comment describes, if a 3 GB area is
requested on a machine with 4 NUMA nodes that have memory, 1 GB is
allocated on the first three nodes, but the printed log is
hugetlb_cma: reserve 3072 MiB, up to 768 MiB per node
Round per_node up to (PAGE_SIZE << order) before logging so that the
printed log always matches the actual reserved size. No functional change
to the actual reservation size, as the following case analysis shows
1. remaining (hugetlb_cma_size - reserved) >= rounded per_node
- AS-IS: min() picks unrounded per_node;
round_up() returns rounded per_node
- TO-BE: min() picks rounded per_node;
round_up() returns rounded per_node (no-op)
2. remaining < unrounded per_node
- AS-IS: min() picks remaining;
round_up() returns round_up(remaining)
- TO-BE: min() picks remaining;
round_up() returns round_up(remaining)
3. unrounded per_node <= remaining < rounded per_node
- AS-IS: min() picks unrounded per_node;
round_up() returns rounded per_node
- TO-BE: min() picks remaining;
round_up() returns round_up(remaining) equals rounded per_node
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
---
Hello,
While looking into boot information, I found a minor issue.
I am sending this patch as an RFC because of the two additional
questions below (one directly related, the other indirectly related)
1. This patch only fixes the log output, not the reservation result itself.
Do I need to add Fixes tag on this case? (i.e., Does this patch need backporting?)
If so, I'll add below commit.
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") # 5.7
2. When node_specific_cma_alloc is true, the reservation loop can break
out early due to round_up() before all specified nodes are reserved.
Is this intentional or a bug?
For example, with hugetlb_cma=0:1300M,1:1300M,2:1300M and (PAGE_SIZE
<< order) is 1GB
hugetlb_cma_size_in_node[0..2] = 1300MB
hugetlb_cma_size = 3900MB
Actual reserved size is rounded up from 1300MB to 2GB
iter 1 (node 0): reserved: 2GB, 2GB < 3900MB, continue
iter 2 (node 1): reserved: 4GB, 4GB >= 3900MB, break
As a result, node 2 was specified but no CMA area is reserved. If this is
unintended, I would be happy to send a follow-up patch to fix it.
If I misunderstood anything, please feel free to let me know.
Best Regards,
Sang-Heon Jeon
---
mm/hugetlb_cma.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/hugetlb_cma.c b/mm/hugetlb_cma.c
index f83ae4998990..4a92a9e38500 100644
--- a/mm/hugetlb_cma.c
+++ b/mm/hugetlb_cma.c
@@ -204,6 +204,7 @@ void __init hugetlb_cma_reserve(void)
*/
per_node = DIV_ROUND_UP(hugetlb_cma_size,
nodes_weight(hugetlb_bootmem_nodes));
+ per_node = round_up(per_node, PAGE_SIZE << order)
pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n",
hugetlb_cma_size / SZ_1M, per_node / SZ_1M);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [RFC PATCH] mm/hugetlb_cma: round up per_node before logging it 2026-04-21 23:02 [RFC PATCH] mm/hugetlb_cma: round up per_node before logging it Sang-Heon Jeon @ 2026-04-22 6:20 ` Muchun Song 2026-04-22 9:49 ` Sang-Heon Jeon 0 siblings, 1 reply; 3+ messages in thread From: Muchun Song @ 2026-04-22 6:20 UTC (permalink / raw) To: Sang-Heon Jeon; +Cc: osalvador, david, akpm, linux-mm > On Apr 22, 2026, at 07:02, Sang-Heon Jeon <ekffu200098@gmail.com> wrote: > > When the user requests a total hugetlb CMA size without per-node > specification, hugetlb_cma_reserve() computes per_node from > hugetlb_cma_size and the number of nodes that have memory > > per_node = DIV_ROUND_UP(hugetlb_cma_size, > nodes_weight(hugetlb_bootmem_nodes)); > > The reservation loop later computes > > size = round_up(min(per_node, hugetlb_cma_size - reserved), > PAGE_SIZE << order); > > So the actually reserved per_node size is multiple of (PAGE_SIZE << > order), but the logged per_node is not rounded up, so it may be smaller > than the actual reserved size. > > For example, as the existing comment describes, if a 3 GB area is > requested on a machine with 4 NUMA nodes that have memory, 1 GB is > allocated on the first three nodes, but the printed log is > > hugetlb_cma: reserve 3072 MiB, up to 768 MiB per node > > Round per_node up to (PAGE_SIZE << order) before logging so that the > printed log always matches the actual reserved size. No functional change > to the actual reservation size, as the following case analysis shows > > 1. remaining (hugetlb_cma_size - reserved) >= rounded per_node > - AS-IS: min() picks unrounded per_node; > round_up() returns rounded per_node > - TO-BE: min() picks rounded per_node; > round_up() returns rounded per_node (no-op) > 2. remaining < unrounded per_node > - AS-IS: min() picks remaining; > round_up() returns round_up(remaining) > - TO-BE: min() picks remaining; > round_up() returns round_up(remaining) > 3. unrounded per_node <= remaining < rounded per_node > - AS-IS: min() picks unrounded per_node; > round_up() returns rounded per_node > - TO-BE: min() picks remaining; > round_up() returns round_up(remaining) equals rounded per_node > > Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> > --- > Hello, > > While looking into boot information, I found a minor issue. > I am sending this patch as an RFC because of the two additional > questions below (one directly related, the other indirectly related) > > 1. This patch only fixes the log output, not the reservation result itself. > Do I need to add Fixes tag on this case? (i.e., Does this patch need backporting?) > If so, I'll add below commit. > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") # 5.7 Yes, it is recommended to add a Fixes tag. The rule of thumb in the kernel community is that if a patch fixes an issue (even a cosmetic one like a log message), it should have a Fixes tag for proper traceability. However, please note that having a Fixes: tag **does not automatically mean it will be backported** to stable trees. Backporting usually requires an explicit `Cc: <stable@vger.kernel.org>` tag, which is reserved for functional bugs, crashes, or incorrect logic. Since this patch only corrects a log message, I would not recommend adding the `Cc: stable` tag unless you believe this misleading log has severe consequences (e.g., breaking userspace scripts that parse it). > 2. When node_specific_cma_alloc is true, the reservation loop can break > out early due to round_up() before all specified nodes are reserved. > Is this intentional or a bug? > > For example, with hugetlb_cma=0:1300M,1:1300M,2:1300M and (PAGE_SIZE > << order) is 1GB To me, this is a strange configuration. While the documentation doesn't explicitly forbid this type of allocation, it doesn't clearly state whether it's supported either. So I am curious why you have such configuration? If we say that upward alignment is allowed—for example, allocating 2GB when a user requests 1300MB—you’ll notice the code explicitly fails to support configurations smaller than 1GB (where no upward alignment occurs). These two alignment requirements are contradictory. Personally, I’m inclined to view this as undefined behavior. If we’re going to fix this, I think it is better to restrict user input to strictly follow huge-page alignment. This should also help simplify our processing logic. Let me know if anyone sees a downside to this approach! > > hugetlb_cma_size_in_node[0..2] = 1300MB > hugetlb_cma_size = 3900MB > > Actual reserved size is rounded up from 1300MB to 2GB > > iter 1 (node 0): reserved: 2GB, 2GB < 3900MB, continue > iter 2 (node 1): reserved: 4GB, 4GB >= 3900MB, break > > As a result, node 2 was specified but no CMA area is reserved. If this is > unintended, I would be happy to send a follow-up patch to fix it. > > If I misunderstood anything, please feel free to let me know. > > Best Regards, > Sang-Heon Jeon > --- > mm/hugetlb_cma.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/hugetlb_cma.c b/mm/hugetlb_cma.c > index f83ae4998990..4a92a9e38500 100644 > --- a/mm/hugetlb_cma.c > +++ b/mm/hugetlb_cma.c > @@ -204,6 +204,7 @@ void __init hugetlb_cma_reserve(void) > */ > per_node = DIV_ROUND_UP(hugetlb_cma_size, > nodes_weight(hugetlb_bootmem_nodes)); > + per_node = round_up(per_node, PAGE_SIZE << order) Missing semicolon at the end of this statement. I'm not sure if you actually tested this patch? Muchun, Thanks. > pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n", > hugetlb_cma_size / SZ_1M, per_node / SZ_1M); > } > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC PATCH] mm/hugetlb_cma: round up per_node before logging it 2026-04-22 6:20 ` Muchun Song @ 2026-04-22 9:49 ` Sang-Heon Jeon 0 siblings, 0 replies; 3+ messages in thread From: Sang-Heon Jeon @ 2026-04-22 9:49 UTC (permalink / raw) To: Muchun Song; +Cc: osalvador, david, akpm, linux-mm Hello, On Wed, Apr 22, 2026 at 3:21 PM Muchun Song <muchun.song@linux.dev> wrote: > > > > > On Apr 22, 2026, at 07:02, Sang-Heon Jeon <ekffu200098@gmail.com> wrote: > > > > When the user requests a total hugetlb CMA size without per-node > > specification, hugetlb_cma_reserve() computes per_node from > > hugetlb_cma_size and the number of nodes that have memory > > > > per_node = DIV_ROUND_UP(hugetlb_cma_size, > > nodes_weight(hugetlb_bootmem_nodes)); > > > > The reservation loop later computes > > > > size = round_up(min(per_node, hugetlb_cma_size - reserved), > > PAGE_SIZE << order); > > > > So the actually reserved per_node size is multiple of (PAGE_SIZE << > > order), but the logged per_node is not rounded up, so it may be smaller > > than the actual reserved size. > > > > For example, as the existing comment describes, if a 3 GB area is > > requested on a machine with 4 NUMA nodes that have memory, 1 GB is > > allocated on the first three nodes, but the printed log is > > > > hugetlb_cma: reserve 3072 MiB, up to 768 MiB per node > > > > Round per_node up to (PAGE_SIZE << order) before logging so that the > > printed log always matches the actual reserved size. No functional change > > to the actual reservation size, as the following case analysis shows > > > > 1. remaining (hugetlb_cma_size - reserved) >= rounded per_node > > - AS-IS: min() picks unrounded per_node; > > round_up() returns rounded per_node > > - TO-BE: min() picks rounded per_node; > > round_up() returns rounded per_node (no-op) > > 2. remaining < unrounded per_node > > - AS-IS: min() picks remaining; > > round_up() returns round_up(remaining) > > - TO-BE: min() picks remaining; > > round_up() returns round_up(remaining) > > 3. unrounded per_node <= remaining < rounded per_node > > - AS-IS: min() picks unrounded per_node; > > round_up() returns rounded per_node > > - TO-BE: min() picks remaining; > > round_up() returns round_up(remaining) equals rounded per_node > > > > Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> > > --- > > Hello, > > > > While looking into boot information, I found a minor issue. > > I am sending this patch as an RFC because of the two additional > > questions below (one directly related, the other indirectly related) > > > > 1. This patch only fixes the log output, not the reservation result itself. > > Do I need to add Fixes tag on this case? (i.e., Does this patch need backporting?) > > If so, I'll add below commit. > > > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") # 5.7 > > Yes, it is recommended to add a Fixes tag. The rule of thumb in the kernel > community is that if a patch fixes an issue (even a cosmetic one like a log > message), it should have a Fixes tag for proper traceability. > > However, please note that having a Fixes: tag **does not automatically mean > it will be backported** to stable trees. Backporting usually requires an > explicit `Cc: <stable@vger.kernel.org>` tag, which is reserved for functional > bugs, crashes, or incorrect logic. Since this patch only corrects a log message, > I would not recommend adding the `Cc: stable` tag unless you believe this > misleading log has severe consequences (e.g., breaking userspace scripts that > parse it). Thank you for the detailed explanation. Until now I had understood `Cc: stable` tag as the explicit backport request and `Fixes` as an implicit one. Your clarification is very helpful. I also agree with your opinion that this patch is not appropriate for the `Cc: stable` tag. I will add the `Fixes` tag only to the next patch. > > 2. When node_specific_cma_alloc is true, the reservation loop can break > > out early due to round_up() before all specified nodes are reserved. > > Is this intentional or a bug? > > > > For example, with hugetlb_cma=0:1300M,1:1300M,2:1300M and (PAGE_SIZE > > << order) is 1GB > > To me, this is a strange configuration. While the documentation doesn't > explicitly forbid this type of allocation, it doesn't clearly state whether > it's supported either. So I am curious why you have such configuration? This is an example scenario to describe the situation below. So, it is not a real configuration from the real world. > If we say that upward alignment is allowed—for example, allocating 2GB > when a user requests 1300MB—you’ll notice the code explicitly fails to > support configurations smaller than 1GB (where no upward alignment occurs). > These two alignment requirements are contradictory. > > Personally, I’m inclined to view this as undefined behavior. If we’re going > to fix this, I think it is better to restrict user input to strictly follow > huge-page alignment. This should also help simplify our processing logic. > Let me know if anyone sees a downside to this approach! > Thanks for the detailed description. I think your suggestion is very reasonable. TBH, I'm not as familiar with hugetlb as you are, but I'd be happy to work on this with your guidance. > > > > hugetlb_cma_size_in_node[0..2] = 1300MB > > hugetlb_cma_size = 3900MB > > > > Actual reserved size is rounded up from 1300MB to 2GB > > > > iter 1 (node 0): reserved: 2GB, 2GB < 3900MB, continue > > iter 2 (node 1): reserved: 4GB, 4GB >= 3900MB, break > > > > As a result, node 2 was specified but no CMA area is reserved. If this is > > unintended, I would be happy to send a follow-up patch to fix it. > > > > If I misunderstood anything, please feel free to let me know. > > > > Best Regards, > > Sang-Heon Jeon > > --- > > mm/hugetlb_cma.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/mm/hugetlb_cma.c b/mm/hugetlb_cma.c > > index f83ae4998990..4a92a9e38500 100644 > > --- a/mm/hugetlb_cma.c > > +++ b/mm/hugetlb_cma.c > > @@ -204,6 +204,7 @@ void __init hugetlb_cma_reserve(void) > > */ > > per_node = DIV_ROUND_UP(hugetlb_cma_size, > > nodes_weight(hugetlb_bootmem_nodes)); > > + per_node = round_up(per_node, PAGE_SIZE << order) > > Missing semicolon at the end of this statement. I'm not sure if you > actually tested this patch? Oops, It's my mistake. I tested with QEMU and I'll attach test result to the next patch > Muchun, > Thanks. > > > pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n", > > hugetlb_cma_size / SZ_1M, per_node / SZ_1M); > > } > > -- > > 2.43.0 > > > Best Regards, Sang-Heon Jeon ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-22 9:49 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-21 23:02 [RFC PATCH] mm/hugetlb_cma: round up per_node before logging it Sang-Heon Jeon 2026-04-22 6:20 ` Muchun Song 2026-04-22 9:49 ` Sang-Heon Jeon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox