linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Giorgi Tchankvetadze <giorgitchankvetadze1997@gmail.com>
To: lirongqing@baidu.com
Cc: akpm@linux-foundation.org, david@redhat.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	muchun.song@linux.dev, osalvador@suse.de, xuwenjie04@baidu.com
Subject: Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high
Date: Fri, 22 Aug 2025 17:50:47 +0400	[thread overview]
Message-ID: <4391e3f5-e0a5-4920-bd50-05337b7764e7@gmail.com> (raw)
In-Reply-To: <20250822112828.2742-1-lirongqing@baidu.com>

Hi there. The 90% split is solid. Would it make sense to (a) log a 
one-time warning if the second pass is triggered, so operators know why 
boot slowed, and (b) make the 90% cap a Kconfig default ratio, so 
distros can lower it without patching? Both are low-risk and don’t 
change the ABI

Thanks
On 8/22/2025 3:28 PM, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> When the total reserved hugepages account for 95% or more of system RAM
> (common in cloud computing on physical servers), allocating them all in one
> go can lead to OOM or fail to allocating huge page during early boot.
> 
> The previous hugetlb vmemmap batching change (91f386bf0772) can worsen
> peak memory pressure under these conditions by deferring page frees,
> exacerbating allocation failures. To prevent this, split the allocation
> into two equal batches whenever
> 	huge_reserved_pages >= totalram_pages() * 90 / 100.
> 
> This change does not alter the number of padata worker threads per batch;
> it merely introduces a second round of padata_do_multithreaded(). The added
> overhead of restarting the worker threads is minimal.
> 
> Before:
> [    8.423187] HugeTLB: allocation took 1584ms with hugepage_allocation_threads=48
> [    8.431189] HugeTLB: allocating 385920 of page size 2.00 MiB failed.  Only allocated 385296 hugepages.
> 
> After:
> [    8.740201] HugeTLB: allocation took 1900ms with hugepage_allocation_threads=48
> [    8.748266] HugeTLB: registered 2.00 MiB page size, pre-allocated 385920 pages
> 
> Fixes: 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")
> 
> Co-developed-by: Wenjie Xu <xuwenjie04@baidu.com>
> Signed-off-by: Wenjie Xu <xuwenjie04@baidu.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>   mm/hugetlb.c | 21 +++++++++++++++++++--
>   1 filechanged <https://lore.kernel.org/linux-mm/20250822112828.2742-1-lirongqing@baidu.com/#related>, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 753f99b..a86d3a0 100644 
> --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3587,12 +3587,23 @@ static 
> unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)  		.numa_aware	= true
>   	};
>   
> + unsigned long huge_reserved_pages = h->max_huge_pages << h->order; + 
> unsigned long huge_pages, remaining, total_pages;  	unsigned long jiffies_start;
>   	unsigned long jiffies_end;
>   
> + total_pages = totalram_pages() * 90 / 100; + if (huge_reserved_pages > 
> total_pages) { + huge_pages = h->max_huge_pages * 90 / 100; + remaining 
> = h->max_huge_pages - huge_pages; + } else { + huge_pages = h- 
>  >max_huge_pages; + remaining = 0; + } +  	job.thread_fn	= hugetlb_pages_alloc_boot_node;
>   	job.start	= 0;
> - job.size = h->max_huge_pages; + job.size = huge_pages;  
>   	/*
>   	 * job.max_threads is 25% of the available cpu threads by default.
> @@ -3616,10 +3627,16 @@ static unsigned long __init 
> hugetlb_pages_alloc_boot(struct hstate *h)  	}
>   
>   	job.max_threads	= hugepage_allocation_threads;
> - job.min_chunk = h->max_huge_pages / hugepage_allocation_threads; + 
> job.min_chunk = huge_pages / hugepage_allocation_threads;  
>   	jiffies_start = jiffies;
>   	padata_do_multithreaded(&job);
> + if (remaining) { + job.start = huge_pages; + job.size = remaining; + 
> job.min_chunk = remaining / hugepage_allocation_threads; + 
> padata_do_multithreaded(&job); + }  	jiffies_end = jiffies;
>   
>   	pr_info("HugeTLB: allocation took %dms with hugepage_allocation_threads=%ld\n",
> -- 
> 2.9.4
> 
> 



  reply	other threads:[~2025-08-22 13:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-22 11:28 [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high lirongqing
2025-08-22 13:50 ` Giorgi Tchankvetadze [this message]
  -- strict thread matches above, loose matches on Subject: below --
2025-08-26 10:18 lirongqing
2025-08-26 13:25 ` David Hildenbrand
2025-08-27  4:12 Li,Rongqing
2025-08-27 11:51 ` David Hildenbrand
2025-08-27 12:33 Li,Rongqing
2025-08-27 12:37 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4391e3f5-e0a5-4920-bd50-05337b7764e7@gmail.com \
    --to=giorgitchankvetadze1997@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lirongqing@baidu.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=xuwenjie04@baidu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).