[PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high
@ 2025-08-22 11:28 lirongqing
  2025-08-22 13:50 ` Giorgi Tchankvetadze
  0 siblings, 1 reply; 7+ messages in thread
From: lirongqing @ 2025-08-22 11:28 UTC (permalink / raw)
  To: muchun.song, osalvador, david, akpm, linux-mm, linux-kernel
  Cc: Li RongQing, Wenjie Xu

From: Li RongQing <lirongqing@baidu.com>

When the total reserved hugepages account for 95% or more of system RAM
(common in cloud computing on physical servers), allocating them all in one
go can lead to OOM or fail to allocating huge page during early boot.

The previous hugetlb vmemmap batching change (91f386bf0772) can worsen
peak memory pressure under these conditions by deferring page frees,
exacerbating allocation failures. To prevent this, split the allocation
into two equal batches whenever
	huge_reserved_pages >= totalram_pages() * 90 / 100.

This change does not alter the number of padata worker threads per batch;
it merely introduces a second round of padata_do_multithreaded(). The added
overhead of restarting the worker threads is minimal.

Before:
[    8.423187] HugeTLB: allocation took 1584ms with hugepage_allocation_threads=48
[    8.431189] HugeTLB: allocating 385920 of page size 2.00 MiB failed.  Only allocated 385296 hugepages.

After:
[    8.740201] HugeTLB: allocation took 1900ms with hugepage_allocation_threads=48
[    8.748266] HugeTLB: registered 2.00 MiB page size, pre-allocated 385920 pages

Fixes: 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")

Co-developed-by: Wenjie Xu <xuwenjie04@baidu.com>
Signed-off-by: Wenjie Xu <xuwenjie04@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 mm/hugetlb.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 753f99b..a86d3a0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3587,12 +3587,23 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
 		.numa_aware	= true
 	};
 
+	unsigned long huge_reserved_pages = h->max_huge_pages << h->order;
+	unsigned long huge_pages, remaining, total_pages;
 	unsigned long jiffies_start;
 	unsigned long jiffies_end;
 
+	total_pages = totalram_pages() * 90 / 100;
+	if (huge_reserved_pages > total_pages) {
+		huge_pages =  h->max_huge_pages * 90 / 100;
+		remaining = h->max_huge_pages - huge_pages;
+	} else {
+		huge_pages =  h->max_huge_pages;
+		remaining = 0;
+	}
+
 	job.thread_fn	= hugetlb_pages_alloc_boot_node;
 	job.start	= 0;
-	job.size	= h->max_huge_pages;
+	job.size	= huge_pages;
 
 	/*
 	 * job.max_threads is 25% of the available cpu threads by default.
@@ -3616,10 +3627,16 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
 	}
 
 	job.max_threads	= hugepage_allocation_threads;
-	job.min_chunk	= h->max_huge_pages / hugepage_allocation_threads;
+	job.min_chunk	= huge_pages / hugepage_allocation_threads;
 
 	jiffies_start = jiffies;
 	padata_do_multithreaded(&job);
+	if (remaining) {
+		job.start   = huge_pages;
+		job.size    = remaining;
+		job.min_chunk   = remaining / hugepage_allocation_threads;
+		padata_do_multithreaded(&job);
+	}
 	jiffies_end = jiffies;
 
 	pr_info("HugeTLB: allocation took %dms with hugepage_allocation_threads=%ld\n",
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high
  2025-08-22 11:28 lirongqing
@ 2025-08-22 13:50 ` Giorgi Tchankvetadze
  0 siblings, 0 replies; 7+ messages in thread
From: Giorgi Tchankvetadze @ 2025-08-22 13:50 UTC (permalink / raw)
  To: lirongqing
  Cc: akpm, david, linux-kernel, linux-mm, muchun.song, osalvador,
	xuwenjie04

Hi there. The 90% split is solid. Would it make sense to (a) log a 
one-time warning if the second pass is triggered, so operators know why 
boot slowed, and (b) make the 90% cap a Kconfig default ratio, so 
distros can lower it without patching? Both are low-risk and don’t 
change the ABI

Thanks
On 8/22/2025 3:28 PM, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> When the total reserved hugepages account for 95% or more of system RAM
> (common in cloud computing on physical servers), allocating them all in one
> go can lead to OOM or fail to allocating huge page during early boot.
> 
> The previous hugetlb vmemmap batching change (91f386bf0772) can worsen
> peak memory pressure under these conditions by deferring page frees,
> exacerbating allocation failures. To prevent this, split the allocation
> into two equal batches whenever
> 	huge_reserved_pages >= totalram_pages() * 90 / 100.
> 
> This change does not alter the number of padata worker threads per batch;
> it merely introduces a second round of padata_do_multithreaded(). The added
> overhead of restarting the worker threads is minimal.
> 
> Before:
> [    8.423187] HugeTLB: allocation took 1584ms with hugepage_allocation_threads=48
> [    8.431189] HugeTLB: allocating 385920 of page size 2.00 MiB failed.  Only allocated 385296 hugepages.
> 
> After:
> [    8.740201] HugeTLB: allocation took 1900ms with hugepage_allocation_threads=48
> [    8.748266] HugeTLB: registered 2.00 MiB page size, pre-allocated 385920 pages
> 
> Fixes: 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")
> 
> Co-developed-by: Wenjie Xu <xuwenjie04@baidu.com>
> Signed-off-by: Wenjie Xu <xuwenjie04@baidu.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>   mm/hugetlb.c | 21 +++++++++++++++++++--
>   1 filechanged <https://lore.kernel.org/linux-mm/20250822112828.2742-1-lirongqing@baidu.com/#related>, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 753f99b..a86d3a0 100644 
> --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3587,12 +3587,23 @@ static 
> unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)  		.numa_aware	= true
>   	};
>   
> + unsigned long huge_reserved_pages = h->max_huge_pages << h->order; + 
> unsigned long huge_pages, remaining, total_pages;  	unsigned long jiffies_start;
>   	unsigned long jiffies_end;
>   
> + total_pages = totalram_pages() * 90 / 100; + if (huge_reserved_pages > 
> total_pages) { + huge_pages = h->max_huge_pages * 90 / 100; + remaining 
> = h->max_huge_pages - huge_pages; + } else { + huge_pages = h- 
>  >max_huge_pages; + remaining = 0; + } +  	job.thread_fn	= hugetlb_pages_alloc_boot_node;
>   	job.start	= 0;
> - job.size = h->max_huge_pages; + job.size = huge_pages;  
>   	/*
>   	 * job.max_threads is 25% of the available cpu threads by default.
> @@ -3616,10 +3627,16 @@ static unsigned long __init 
> hugetlb_pages_alloc_boot(struct hstate *h)  	}
>   
>   	job.max_threads	= hugepage_allocation_threads;
> - job.min_chunk = h->max_huge_pages / hugepage_allocation_threads; + 
> job.min_chunk = huge_pages / hugepage_allocation_threads;  
>   	jiffies_start = jiffies;
>   	padata_do_multithreaded(&job);
> + if (remaining) { + job.start = huge_pages; + job.size = remaining; + 
> job.min_chunk = remaining / hugepage_allocation_threads; + 
> padata_do_multithreaded(&job); + }  	jiffies_end = jiffies;
>   
>   	pr_info("HugeTLB: allocation took %dms with hugepage_allocation_threads=%ld\n",
> -- 
> 2.9.4
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high
@ 2025-08-26 10:18 lirongqing
  2025-08-26 13:25 ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: lirongqing @ 2025-08-26 10:18 UTC (permalink / raw)
  To: muchun.song, osalvador, david, akpm, linux-mm, linux-kernel,
	giorgitchankvetadze1997
  Cc: Li RongQing, Wenjie Xu

From: Li RongQing <lirongqing@baidu.com>

When the total reserved hugepages account for 95% or more of system RAM
(common in cloud computing on physical servers), allocating them all in one
go can lead to OOM or fail to allocate huge page during early boot.

The commit 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages") can
worsen peak memory pressure under these conditions by deferring page frees,
exacerbating allocation failures. To prevent this, split the allocation
into two equal batches whenever
	huge_reserved_pages >= totalram_pages() * 90 / 100.

This change does not alter the number of padata worker threads per batch;
it merely introduces a second round of padata_do_multithreaded(). The added
overhead of restarting the worker threads is minimal.

The result on a 256G memory machine as below:
Before:
[    4.350400] HugeTLB: allocation took 706ms with hugepage_allocation_threads=32
[    4.351577] HugeTLB: allocating 128512 of page size 2.00 MiB failed.  Only allocated 128074 hugepages.
[    4.355608] HugeTLB: registered 2.00 MiB page size, pre-allocated 128074 pages
After:
[    3.561088] HugeTLB: two-phase hugepage allocation is used
[    4.280300] HugeTLB: allocation took 712ms with hugepage_allocation_threads=32
[    4.281054] HugeTLB: registered 2.00 MiB page size, pre-allocated 128512 pages

Fixes: 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")

Co-developed-by: Wenjie Xu <xuwenjie04@baidu.com>
Signed-off-by: Wenjie Xu <xuwenjie04@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
Diff with v1: add log if two-phase hugepage allocation is triggered
              add the knod to control split ratio

 Documentation/admin-guide/mm/hugetlbpage.rst | 12 +++++++++
 mm/hugetlb.c                                 | 39 ++++++++++++++++++++++++++--
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
index 67a9419..5cfb6e3 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -156,6 +156,18 @@ hugepage_alloc_threads
 		hugepage_alloc_threads=8
 
 	Note that this parameter only applies to non-gigantic huge pages.
+
+hugepage_split_ratio
+    Controls the threshold for two-phase hugepage allocation.
+    When the total number of reserved hugepages (huge_reserved_pages) exceeds
+    (totalram_pages * hugepage_split_ratio / 100), the hugepage allocation process
+    during boot is split into two batches.
+
+    Default value is 90, meaning the two-phase allocation is triggered when
+    reserved hugepages exceed 90% of total system RAM.
+    The value can be adjusted via the kernel command line parameter
+    "hugepage_split_ratio=". Valid range is 1 to 99.
+
 default_hugepagesz
 	Specify the default huge page size.  This parameter can
 	only be specified once on the command line.  default_hugepagesz can
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 753f99b..576f402 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -88,6 +88,7 @@ static bool __initdata parsed_valid_hugepagesz = true;
 static bool __initdata parsed_default_hugepagesz;
 static unsigned int default_hugepages_in_node[MAX_NUMNODES] __initdata;
 static unsigned long hugepage_allocation_threads __initdata;
+static int hugepage_split_ratio __initdata = 90;
 
 static char hstate_cmdline_buf[COMMAND_LINE_SIZE] __initdata;
 static int hstate_cmdline_index __initdata;
@@ -3587,12 +3588,24 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
 		.numa_aware	= true
 	};
 
+	unsigned long huge_reserved_pages = h->max_huge_pages << h->order;
+	unsigned long huge_pages, remaining, total_pages;
 	unsigned long jiffies_start;
 	unsigned long jiffies_end;
 
+	total_pages = totalram_pages() * hugepage_split_ratio / 100;
+	if (huge_reserved_pages > total_pages) {
+		huge_pages =  h->max_huge_pages * hugepage_split_ratio / 100;
+		remaining = h->max_huge_pages - huge_pages;
+		pr_info("HugeTLB: two-phase hugepage allocation is used\n");
+	} else {
+		huge_pages =  h->max_huge_pages;
+		remaining = 0;
+	}
+
 	job.thread_fn	= hugetlb_pages_alloc_boot_node;
 	job.start	= 0;
-	job.size	= h->max_huge_pages;
+	job.size	= huge_pages;
 
 	/*
 	 * job.max_threads is 25% of the available cpu threads by default.
@@ -3616,10 +3629,16 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
 	}
 
 	job.max_threads	= hugepage_allocation_threads;
-	job.min_chunk	= h->max_huge_pages / hugepage_allocation_threads;
+	job.min_chunk	= huge_pages / hugepage_allocation_threads;
 
 	jiffies_start = jiffies;
 	padata_do_multithreaded(&job);
+	if (remaining) {
+		job.start   = huge_pages;
+		job.size    = remaining;
+		job.min_chunk   = remaining / hugepage_allocation_threads;
+		padata_do_multithreaded(&job);
+	}
 	jiffies_end = jiffies;
 
 	pr_info("HugeTLB: allocation took %dms with hugepage_allocation_threads=%ld\n",
@@ -5061,6 +5080,22 @@ static int __init hugepage_alloc_threads_setup(char *s)
 }
 __setup("hugepage_alloc_threads=", hugepage_alloc_threads_setup);
 
+static int __init hugepage_split_ratio_setup(char *s)
+{
+	int ratio;
+
+	if (kstrtoint(s, 0, &ratio) != 0)
+		return 1;
+
+	if (ratio > 99 || ratio < 0)
+		return 1;
+
+	hugepage_split_ratio = ratio;
+
+	return 1;
+}
+__setup("hugepage_split_ratio=", hugepage_split_ratio_setup);
+
 static unsigned int allowed_mems_nr(struct hstate *h)
 {
 	int node;
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high
  2025-08-26 10:18 lirongqing
@ 2025-08-26 13:25 ` David Hildenbrand
  0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2025-08-26 13:25 UTC (permalink / raw)
  To: lirongqing, muchun.song, osalvador, akpm, linux-mm, linux-kernel,
	giorgitchankvetadze1997
  Cc: Wenjie Xu

On 26.08.25 12:18, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> When the total reserved hugepages account for 95% or more of system RAM
> (common in cloud computing on physical servers), allocating them all in one
> go can lead to OOM or fail to allocate huge page during early boot.
> 
> The commit 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages") can
> worsen peak memory pressure under these conditions by deferring page frees,
> exacerbating allocation failures. To prevent this, split the allocation
> into two equal batches whenever
> 	huge_reserved_pages >= totalram_pages() * 90 / 100.
> 
> This change does not alter the number of padata worker threads per batch;
> it merely introduces a second round of padata_do_multithreaded(). The added
> overhead of restarting the worker threads is minimal.
> 
> The result on a 256G memory machine as below:
> Before:
> [    4.350400] HugeTLB: allocation took 706ms with hugepage_allocation_threads=32
> [    4.351577] HugeTLB: allocating 128512 of page size 2.00 MiB failed.  Only allocated 128074 hugepages.
> [    4.355608] HugeTLB: registered 2.00 MiB page size, pre-allocated 128074 pages
> After:
> [    3.561088] HugeTLB: two-phase hugepage allocation is used
> [    4.280300] HugeTLB: allocation took 712ms with hugepage_allocation_threads=32
> [    4.281054] HugeTLB: registered 2.00 MiB page size, pre-allocated 128512 pages
> 
> Fixes: 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")
> 
> Co-developed-by: Wenjie Xu <xuwenjie04@baidu.com>
> Signed-off-by: Wenjie Xu <xuwenjie04@baidu.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> Diff with v1: add log if two-phase hugepage allocation is triggered
>                add the knod to control split ratio
> 
>   Documentation/admin-guide/mm/hugetlbpage.rst | 12 +++++++++
>   mm/hugetlb.c                                 | 39 ++++++++++++++++++++++++++--
>   2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
> index 67a9419..5cfb6e3 100644
> --- a/Documentation/admin-guide/mm/hugetlbpage.rst
> +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
> @@ -156,6 +156,18 @@ hugepage_alloc_threads
>   		hugepage_alloc_threads=8
>   
>   	Note that this parameter only applies to non-gigantic huge pages.
> +
> +hugepage_split_ratio
> +    Controls the threshold for two-phase hugepage allocation.
> +    When the total number of reserved hugepages (huge_reserved_pages) exceeds
> +    (totalram_pages * hugepage_split_ratio / 100), the hugepage allocation process
> +    during boot is split into two batches.
> +
> +    Default value is 90, meaning the two-phase allocation is triggered when
> +    reserved hugepages exceed 90% of total system RAM.
> +    The value can be adjusted via the kernel command line parameter
> +    "hugepage_split_ratio=". Valid range is 1 to 99.

Can we just do something reasonable here and not introduce toggles where 
nobody knows how to really set a reasonable value?

This really sounds like something we should not be exporting to users.

Also, can't we fail lightly during the first attempt and dynamically 
decide if we should do a second pase?

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high
  2025-08-27  4:12 Li,Rongqing
@ 2025-08-27 11:51 ` David Hildenbrand
  0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2025-08-27 11:51 UTC (permalink / raw)
  To: Li,Rongqing, muchun.song@linux.dev, osalvador@suse.de,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, giorgitchankvetadze1997@gmail.com
  Cc: Xu,Wenjie(ACG CCN)

On 27.08.25 06:12, Li,Rongqing wrote:
> 
> .
>>
>> Also, can't we fail lightly during the first attempt and dynamically decide if we
>> should do a second pase?
>>
> 
> 
> Good idea, like below
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 753f99b..425a759 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3589,6 +3589,7 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
> 
>          unsigned long jiffies_start;
>          unsigned long jiffies_end;
> +       unsigned long remaining;
> 
>          job.thread_fn   = hugetlb_pages_alloc_boot_node;
>          job.start       = 0;
> @@ -3620,6 +3621,18 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
> 
>          jiffies_start = jiffies;
>          padata_do_multithreaded(&job);
> +
> +       if (h->nr_huge_pages != h->max_huge_pages && hugetlb_vmemmap_optimizable(h)) {
> +               remaining = h->max_huge_pages - h->nr_huge_pages;
> +               /* vmemmap optimization can save about 1.6% (4/250) memory */
> +               remaining = min(remaining, (h->nr_huge_pages * 4 / 250));

I don't like hard coding that here.

> +
> +               job.start       = h->nr_huge_pages;
> +               job.size        = remaining;
> +               job.min_chunk   = remaining / hugepage_allocation_threads;
> +               padata_do_multithreaded(&job);
> +       }

Thinking out load, can't we try in a loop until either

a) We allocated all we need

b) We don't make any more progress


Not sure if something like the following could fly:

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1f42186a85ea4..dfb4d717b8a02 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3595,8 +3595,6 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
         unsigned long jiffies_end;
  
         job.thread_fn   = hugetlb_pages_alloc_boot_node;
-       job.start       = 0;
-       job.size        = h->max_huge_pages;
  
         /*
          * job.max_threads is 25% of the available cpu threads by default.
@@ -3620,10 +3618,24 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
         }
  
         job.max_threads = hugepage_allocation_threads;
-       job.min_chunk   = h->max_huge_pages / hugepage_allocation_threads;
  
         jiffies_start = jiffies;
-       padata_do_multithreaded(&job);
+       /* TODO: comment why we retry and how it interacts with vmemmap op. */
+       while (h->nr_huge_pages != h->max_huge_pages) {
+               unsigned long remaining = h->max_huge_pages - h->nr_huge_pages;
+
+               job.start       = h->nr_huge_pages;
+               job.size        = remaining;
+               job.min_chunk   = remaining / hugepage_allocation_threads;
+               padata_do_multithreaded(&job);
+
+               if (hugetlb_vmemmap_optimizable(h))
+                       break;
+
+               /* Stop if there is no progress. */
+               if (remaining == h->max_huge_pages - h->nr_huge_pages)
+                       break;
+       }
         jiffies_end = jiffies;
  
         pr_info("HugeTLB: allocation took %dms with hugepage_allocation_threads=%ld\n",


-- 
Cheers

David / dhildenb


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high
@ 2025-08-27 12:33 Li,Rongqing
  2025-08-27 12:37 ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Li,Rongqing @ 2025-08-27 12:33 UTC (permalink / raw)
  To: David Hildenbrand, muchun.song@linux.dev, osalvador@suse.de,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, giorgitchankvetadze1997@gmail.com
  Cc: Xu,Wenjie(ACG CCN)

> Not sure if something like the following could fly:
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index
> 1f42186a85ea4..dfb4d717b8a02 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3595,8 +3595,6 @@ static unsigned long __init
> hugetlb_pages_alloc_boot(struct hstate *h)
>          unsigned long jiffies_end;
> 
>          job.thread_fn   = hugetlb_pages_alloc_boot_node;
> -       job.start       = 0;
> -       job.size        = h->max_huge_pages;
> 
>          /*
>           * job.max_threads is 25% of the available cpu threads by default.
> @@ -3620,10 +3618,24 @@ static unsigned long __init
> hugetlb_pages_alloc_boot(struct hstate *h)
>          }
> 
>          job.max_threads = hugepage_allocation_threads;
> -       job.min_chunk   = h->max_huge_pages /
> hugepage_allocation_threads;
> 
>          jiffies_start = jiffies;
> -       padata_do_multithreaded(&job);
> +       /* TODO: comment why we retry and how it interacts with
> vmemmap op. */
> +       while (h->nr_huge_pages != h->max_huge_pages) {
> +               unsigned long remaining = h->max_huge_pages -
> + h->nr_huge_pages;
> +
> +               job.start       = h->nr_huge_pages;
> +               job.size        = remaining;
> +               job.min_chunk   = remaining /
> hugepage_allocation_threads;
> +               padata_do_multithreaded(&job);
> +
> +               if (hugetlb_vmemmap_optimizable(h))
> +                       break;

It should be:
           if (!hugetlb_vmemmap_optimizable(h))
                     break;

other is fine to me

thanks

-Li


> +
> +               /* Stop if there is no progress. */
> +               if (remaining == h->max_huge_pages - h->nr_huge_pages)
> +                       break;
> +       }
>          jiffies_end = jiffies;
> 
>          pr_info("HugeTLB: allocation took %dms with
> hugepage_allocation_threads=%ld\n",
> 
> 
> --
> Cheers
> 
> David / dhildenb


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high
  2025-08-27 12:33 Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high Li,Rongqing
@ 2025-08-27 12:37 ` David Hildenbrand
  0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2025-08-27 12:37 UTC (permalink / raw)
  To: Li,Rongqing, muchun.song@linux.dev, osalvador@suse.de,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, giorgitchankvetadze1997@gmail.com
  Cc: Xu,Wenjie(ACG CCN)

On 27.08.25 14:33, Li,Rongqing wrote:
>> Not sure if something like the following could fly:
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index
>> 1f42186a85ea4..dfb4d717b8a02 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3595,8 +3595,6 @@ static unsigned long __init
>> hugetlb_pages_alloc_boot(struct hstate *h)
>>           unsigned long jiffies_end;
>>
>>           job.thread_fn   = hugetlb_pages_alloc_boot_node;
>> -       job.start       = 0;
>> -       job.size        = h->max_huge_pages;
>>
>>           /*
>>            * job.max_threads is 25% of the available cpu threads by default.
>> @@ -3620,10 +3618,24 @@ static unsigned long __init
>> hugetlb_pages_alloc_boot(struct hstate *h)
>>           }
>>
>>           job.max_threads = hugepage_allocation_threads;
>> -       job.min_chunk   = h->max_huge_pages /
>> hugepage_allocation_threads;
>>
>>           jiffies_start = jiffies;
>> -       padata_do_multithreaded(&job);
>> +       /* TODO: comment why we retry and how it interacts with
>> vmemmap op. */
>> +       while (h->nr_huge_pages != h->max_huge_pages) {
>> +               unsigned long remaining = h->max_huge_pages -
>> + h->nr_huge_pages;
>> +
>> +               job.start       = h->nr_huge_pages;
>> +               job.size        = remaining;
>> +               job.min_chunk   = remaining /
>> hugepage_allocation_threads;
>> +               padata_do_multithreaded(&job);
>> +
>> +               if (hugetlb_vmemmap_optimizable(h))
>> +                       break;
> 
> It should be:
>             if (!hugetlb_vmemmap_optimizable(h))
>                       break;

Very right.

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-27 12:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 12:33 Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high Li,Rongqing
2025-08-27 12:37 ` David Hildenbrand
  -- strict thread matches above, loose matches on Subject: below --
2025-08-27  4:12 Li,Rongqing
2025-08-27 11:51 ` David Hildenbrand
2025-08-26 10:18 lirongqing
2025-08-26 13:25 ` David Hildenbrand
2025-08-22 11:28 lirongqing
2025-08-22 13:50 ` Giorgi Tchankvetadze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).