From: Giorgi Tchankvetadze <giorgitchankvetadze1997@gmail.com>
To: lirongqing@baidu.com
Cc: akpm@linux-foundation.org, david@redhat.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
muchun.song@linux.dev, osalvador@suse.de, xuwenjie04@baidu.com
Subject: Re: [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high
Date: Fri, 22 Aug 2025 17:50:47 +0400 [thread overview]
Message-ID: <4391e3f5-e0a5-4920-bd50-05337b7764e7@gmail.com> (raw)
In-Reply-To: <20250822112828.2742-1-lirongqing@baidu.com>
Hi there. The 90% split is solid. Would it make sense to (a) log a
one-time warning if the second pass is triggered, so operators know why
boot slowed, and (b) make the 90% cap a Kconfig default ratio, so
distros can lower it without patching? Both are low-risk and don’t
change the ABI
Thanks
On 8/22/2025 3:28 PM, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
>
> When the total reserved hugepages account for 95% or more of system RAM
> (common in cloud computing on physical servers), allocating them all in one
> go can lead to OOM or fail to allocating huge page during early boot.
>
> The previous hugetlb vmemmap batching change (91f386bf0772) can worsen
> peak memory pressure under these conditions by deferring page frees,
> exacerbating allocation failures. To prevent this, split the allocation
> into two equal batches whenever
> huge_reserved_pages >= totalram_pages() * 90 / 100.
>
> This change does not alter the number of padata worker threads per batch;
> it merely introduces a second round of padata_do_multithreaded(). The added
> overhead of restarting the worker threads is minimal.
>
> Before:
> [ 8.423187] HugeTLB: allocation took 1584ms with hugepage_allocation_threads=48
> [ 8.431189] HugeTLB: allocating 385920 of page size 2.00 MiB failed. Only allocated 385296 hugepages.
>
> After:
> [ 8.740201] HugeTLB: allocation took 1900ms with hugepage_allocation_threads=48
> [ 8.748266] HugeTLB: registered 2.00 MiB page size, pre-allocated 385920 pages
>
> Fixes: 91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")
>
> Co-developed-by: Wenjie Xu <xuwenjie04@baidu.com>
> Signed-off-by: Wenjie Xu <xuwenjie04@baidu.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> mm/hugetlb.c | 21 +++++++++++++++++++--
> 1 filechanged <https://lore.kernel.org/linux-mm/20250822112828.2742-1-lirongqing@baidu.com/#related>, 19 insertions(+), 2 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 753f99b..a86d3a0 100644
> --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3587,12 +3587,23 @@ static
> unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h) .numa_aware = true
> };
>
> + unsigned long huge_reserved_pages = h->max_huge_pages << h->order; +
> unsigned long huge_pages, remaining, total_pages; unsigned long jiffies_start;
> unsigned long jiffies_end;
>
> + total_pages = totalram_pages() * 90 / 100; + if (huge_reserved_pages >
> total_pages) { + huge_pages = h->max_huge_pages * 90 / 100; + remaining
> = h->max_huge_pages - huge_pages; + } else { + huge_pages = h-
> >max_huge_pages; + remaining = 0; + } + job.thread_fn = hugetlb_pages_alloc_boot_node;
> job.start = 0;
> - job.size = h->max_huge_pages; + job.size = huge_pages;
> /*
> * job.max_threads is 25% of the available cpu threads by default.
> @@ -3616,10 +3627,16 @@ static unsigned long __init
> hugetlb_pages_alloc_boot(struct hstate *h) }
>
> job.max_threads = hugepage_allocation_threads;
> - job.min_chunk = h->max_huge_pages / hugepage_allocation_threads; +
> job.min_chunk = huge_pages / hugepage_allocation_threads;
> jiffies_start = jiffies;
> padata_do_multithreaded(&job);
> + if (remaining) { + job.start = huge_pages; + job.size = remaining; +
> job.min_chunk = remaining / hugepage_allocation_threads; +
> padata_do_multithreaded(&job); + } jiffies_end = jiffies;
>
> pr_info("HugeTLB: allocation took %dms with hugepage_allocation_threads=%ld\n",
> --
> 2.9.4
>
>
next prev parent reply other threads:[~2025-08-22 13:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-22 11:28 [PATCH] mm/hugetlb: two-phase hugepage allocation when reservation is high lirongqing
2025-08-22 13:50 ` Giorgi Tchankvetadze [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-08-26 10:18 lirongqing
2025-08-26 13:25 ` David Hildenbrand
2025-08-27 4:12 Li,Rongqing
2025-08-27 11:51 ` David Hildenbrand
2025-08-27 12:33 Li,Rongqing
2025-08-27 12:37 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4391e3f5-e0a5-4920-bd50-05337b7764e7@gmail.com \
--to=giorgitchankvetadze1997@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lirongqing@baidu.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=xuwenjie04@baidu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).