* [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool
@ 2025-03-25 6:16 Wupeng Ma
2025-03-31 21:23 ` Joshua Hahn
2025-04-09 13:58 ` Joshua Hahn
0 siblings, 2 replies; 6+ messages in thread
From: Wupeng Ma @ 2025-03-25 6:16 UTC (permalink / raw)
To: akpm, mike.kravetz, david; +Cc: muchun.song, mawupeng1, linux-mm, linux-kernel
During our testing with hugetlb subpool enabled, we observe that
hstate->resv_huge_pages may underflow into negative values. Root cause
analysis reveals a race condition in subpool reservation fallback handling
as follow:
hugetlb_reserve_pages()
/* Attempt subpool reservation */
gbl_reserve = hugepage_subpool_get_pages(spool, chg);
/* Global reservation may fail after subpool allocation */
if (hugetlb_acct_memory(h, gbl_reserve) < 0)
goto out_put_pages;
out_put_pages:
/* This incorrectly restores reservation to subpool */
hugepage_subpool_put_pages(spool, chg);
When hugetlb_acct_memory() fails after subpool allocation, the current
implementation over-commits subpool reservations by returning the full
'chg' value instead of the actual allocated 'gbl_reserve' amount. This
discrepancy propagates to global reservations during subsequent releases,
eventually causing resv_huge_pages underflow.
This problem can be trigger easily with the following steps:
1. reverse hugepage for hugeltb allocation
2. mount hugetlbfs with min_size to enable hugetlb subpool
3. alloc hugepages with two task(make sure the second will fail due to
insufficient amount of hugepages)
4. with for a few seconds and repeat step 3 which will make
hstate->resv_huge_pages to go below zero.
To fix this problem, return corrent amount of pages to subpool during the
fallback after hugepage_subpool_get_pages is called.
Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
---
mm/hugetlb.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 318624c96584..dc4449592d00 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2987,7 +2987,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
struct hugepage_subpool *spool = subpool_vma(vma);
struct hstate *h = hstate_vma(vma);
struct folio *folio;
- long retval, gbl_chg;
+ long retval, gbl_chg, gbl_reserve;
map_chg_state map_chg;
int ret, idx;
struct hugetlb_cgroup *h_cg = NULL;
@@ -3140,8 +3140,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
h_cg);
out_subpool_put:
- if (map_chg)
- hugepage_subpool_put_pages(spool, 1);
+ /*
+ * put page to subpool iff the quota of subpool's rsv_hpages is used
+ * during hugepage_subpool_get_pages.
+ */
+ if (map_chg && !gbl_chg) {
+ gbl_reserve = hugepage_subpool_put_pages(spool, 1);
+ hugetlb_acct_memory(h, -gbl_reserve);
+ }
+
+
out_end_reservation:
if (map_chg != MAP_CHG_ENFORCED)
vma_end_reservation(h, vma, addr);
@@ -6949,7 +6957,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
struct vm_area_struct *vma,
vm_flags_t vm_flags)
{
- long chg = -1, add = -1;
+ long chg = -1, add = -1, spool_resv, gbl_resv;
struct hstate *h = hstate_inode(inode);
struct hugepage_subpool *spool = subpool_inode(inode);
struct resv_map *resv_map;
@@ -7084,8 +7092,16 @@ bool hugetlb_reserve_pages(struct inode *inode,
return true;
out_put_pages:
- /* put back original number of pages, chg */
- (void)hugepage_subpool_put_pages(spool, chg);
+ spool_resv = chg - gbl_reserve;
+ if (spool_resv) {
+ /* put sub pool's reservation back, chg - gbl_reserve */
+ gbl_resv = hugepage_subpool_put_pages(spool, spool_resv);
+ /*
+ * subpool's reserved pages can not be put back due to race,
+ * return to hstate.
+ */
+ hugetlb_acct_memory(h, -gbl_resv);
+ }
out_uncharge_cgroup:
hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h),
chg * pages_per_huge_page(h), h_cg);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool
2025-03-25 6:16 [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool Wupeng Ma
@ 2025-03-31 21:23 ` Joshua Hahn
2025-06-11 15:55 ` Ackerley Tng
2025-04-09 13:58 ` Joshua Hahn
1 sibling, 1 reply; 6+ messages in thread
From: Joshua Hahn @ 2025-03-31 21:23 UTC (permalink / raw)
To: Wupeng Ma
Cc: akpm, mike.kravetz, david, muchun.song, linux-mm, linux-kernel,
kernel-team
On Tue, 25 Mar 2025 14:16:34 +0800 Wupeng Ma <mawupeng1@huawei.com> wrote:
> During our testing with hugetlb subpool enabled, we observe that
> hstate->resv_huge_pages may underflow into negative values. Root cause
> analysis reveals a race condition in subpool reservation fallback handling
> as follow:
>
> hugetlb_reserve_pages()
> /* Attempt subpool reservation */
> gbl_reserve = hugepage_subpool_get_pages(spool, chg);
>
> /* Global reservation may fail after subpool allocation */
> if (hugetlb_acct_memory(h, gbl_reserve) < 0)
> goto out_put_pages;
>
> out_put_pages:
> /* This incorrectly restores reservation to subpool */
> hugepage_subpool_put_pages(spool, chg);
>
> When hugetlb_acct_memory() fails after subpool allocation, the current
> implementation over-commits subpool reservations by returning the full
> 'chg' value instead of the actual allocated 'gbl_reserve' amount. This
> discrepancy propagates to global reservations during subsequent releases,
> eventually causing resv_huge_pages underflow.
>
> This problem can be trigger easily with the following steps:
> 1. reverse hugepage for hugeltb allocation
> 2. mount hugetlbfs with min_size to enable hugetlb subpool
> 3. alloc hugepages with two task(make sure the second will fail due to
> insufficient amount of hugepages)
> 4. with for a few seconds and repeat step 3 which will make
> hstate->resv_huge_pages to go below zero.
>
> To fix this problem, return corrent amount of pages to subpool during the
> fallback after hugepage_subpool_get_pages is called.
>
> Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
Hi Wupeng,
Thank you for the fix! This is a problem that we've also seen happen in
our fleet at Meta. I was able to recreate the issue that you mentioned -- to
explicitly lay down the steps I used:
1. echo 1 > /proc/sys/vm/nr_hugepages
2. mkdir /mnt/hugetlb-pool
3.mount -t hugetlbfs -o min_size=2M none /mnt/hugetlb-pool
4. (./get_hugepage &) && (./get_hugepage &)
# get_hugepage just opens a file in /mnt/hugetlb-pool and mmaps 2M into it.
5. sleep 3
6. (./get_hugepage &) && (./get_hugepage &)
7. cat /proc/meminfo | grep HugePages_Rsvd
... and (6) shows that HugePages_Rsvd has indeed underflowed to U64_MAX!
I've also verified that applying your fix and then re-running the reproducer
shows no underflow.
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool
2025-03-25 6:16 [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool Wupeng Ma
2025-03-31 21:23 ` Joshua Hahn
@ 2025-04-09 13:58 ` Joshua Hahn
2025-04-10 0:51 ` mawupeng
1 sibling, 1 reply; 6+ messages in thread
From: Joshua Hahn @ 2025-04-09 13:58 UTC (permalink / raw)
To: Wupeng Ma; +Cc: akpm, mike.kravetz, david, muchun.song, linux-mm, linux-kernel
On Tue, 25 Mar 2025 14:16:34 +0800 Wupeng Ma <mawupeng1@huawei.com> wrote:
> During our testing with hugetlb subpool enabled, we observe that
> hstate->resv_huge_pages may underflow into negative values. Root cause
> analysis reveals a race condition in subpool reservation fallback handling
> as follow:
>
> hugetlb_reserve_pages()
> /* Attempt subpool reservation */
> gbl_reserve = hugepage_subpool_get_pages(spool, chg);
>
> /* Global reservation may fail after subpool allocation */
> if (hugetlb_acct_memory(h, gbl_reserve) < 0)
> goto out_put_pages;
>
> out_put_pages:
> /* This incorrectly restores reservation to subpool */
> hugepage_subpool_put_pages(spool, chg);
>
> When hugetlb_acct_memory() fails after subpool allocation, the current
> implementation over-commits subpool reservations by returning the full
> 'chg' value instead of the actual allocated 'gbl_reserve' amount. This
> discrepancy propagates to global reservations during subsequent releases,
> eventually causing resv_huge_pages underflow.
>
> This problem can be trigger easily with the following steps:
> 1. reverse hugepage for hugeltb allocation
> 2. mount hugetlbfs with min_size to enable hugetlb subpool
> 3. alloc hugepages with two task(make sure the second will fail due to
> insufficient amount of hugepages)
> 4. with for a few seconds and repeat step 3 which will make
> hstate->resv_huge_pages to go below zero.
>
> To fix this problem, return corrent amount of pages to subpool during the
> fallback after hugepage_subpool_get_pages is called.
>
> Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
> ---
Hi Wupeng,
I hope you are doing well! It's been a while since this fix was sent in, and
there hasn't been any comments on the test either. Would you consider sending
this patch again without the RFC tag? I think that might help in the way of
moving this patch forward : -)
Thank you again for this fix! Have a great day!
Joshua
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool
2025-04-09 13:58 ` Joshua Hahn
@ 2025-04-10 0:51 ` mawupeng
0 siblings, 0 replies; 6+ messages in thread
From: mawupeng @ 2025-04-10 0:51 UTC (permalink / raw)
To: joshua.hahnjy
Cc: mawupeng1, akpm, mike.kravetz, david, muchun.song, linux-mm,
linux-kernel
On 2025/4/9 21:58, Joshua Hahn wrote:
> On Tue, 25 Mar 2025 14:16:34 +0800 Wupeng Ma <mawupeng1@huawei.com> wrote:
>
>> During our testing with hugetlb subpool enabled, we observe that
>> hstate->resv_huge_pages may underflow into negative values. Root cause
>> analysis reveals a race condition in subpool reservation fallback handling
>> as follow:
>>
>> hugetlb_reserve_pages()
>> /* Attempt subpool reservation */
>> gbl_reserve = hugepage_subpool_get_pages(spool, chg);
>>
>> /* Global reservation may fail after subpool allocation */
>> if (hugetlb_acct_memory(h, gbl_reserve) < 0)
>> goto out_put_pages;
>>
>> out_put_pages:
>> /* This incorrectly restores reservation to subpool */
>> hugepage_subpool_put_pages(spool, chg);
>>
>> When hugetlb_acct_memory() fails after subpool allocation, the current
>> implementation over-commits subpool reservations by returning the full
>> 'chg' value instead of the actual allocated 'gbl_reserve' amount. This
>> discrepancy propagates to global reservations during subsequent releases,
>> eventually causing resv_huge_pages underflow.
>>
>> This problem can be trigger easily with the following steps:
>> 1. reverse hugepage for hugeltb allocation
>> 2. mount hugetlbfs with min_size to enable hugetlb subpool
>> 3. alloc hugepages with two task(make sure the second will fail due to
>> insufficient amount of hugepages)
>> 4. with for a few seconds and repeat step 3 which will make
>> hstate->resv_huge_pages to go below zero.
>>
>> To fix this problem, return corrent amount of pages to subpool during the
>> fallback after hugepage_subpool_get_pages is called.
>>
>> Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
>> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
>> ---
>
> Hi Wupeng,
>
> I hope you are doing well! It's been a while since this fix was sent in, and
> there hasn't been any comments on the test either. Would you consider sending
> this patch again without the RFC tag? I think that might help in the way of
> moving this patch forward : -)
Ok, I'll resend it without the RFC tag.
>
> Thank you again for this fix! Have a great day!
> Joshua
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool
2025-03-31 21:23 ` Joshua Hahn
@ 2025-06-11 15:55 ` Ackerley Tng
2025-06-12 0:54 ` Joshua Hahn
0 siblings, 1 reply; 6+ messages in thread
From: Ackerley Tng @ 2025-06-11 15:55 UTC (permalink / raw)
To: Joshua Hahn
Cc: mawupeng1, akpm, mike.kravetz, david, muchun.song, linux-mm,
linux-kernel, kernel-team
Joshua Hahn <joshua.hahnjy@gmail.com> writes:
> On Tue, 25 Mar 2025 14:16:34 +0800 Wupeng Ma <mawupeng1@huawei.com> wrote:
>
>> During our testing with hugetlb subpool enabled, we observe that
>> hstate->resv_huge_pages may underflow into negative values. Root cause
>> analysis reveals a race condition in subpool reservation fallback handling
>> as follow:
>>
>> hugetlb_reserve_pages()
>> /* Attempt subpool reservation */
>> gbl_reserve = hugepage_subpool_get_pages(spool, chg);
>>
>> /* Global reservation may fail after subpool allocation */
>> if (hugetlb_acct_memory(h, gbl_reserve) < 0)
>> goto out_put_pages;
>>
>> out_put_pages:
>> /* This incorrectly restores reservation to subpool */
>> hugepage_subpool_put_pages(spool, chg);
>>
>> When hugetlb_acct_memory() fails after subpool allocation, the current
>> implementation over-commits subpool reservations by returning the full
>> 'chg' value instead of the actual allocated 'gbl_reserve' amount. This
>> discrepancy propagates to global reservations during subsequent releases,
>> eventually causing resv_huge_pages underflow.
>>
>> This problem can be trigger easily with the following steps:
>> 1. reverse hugepage for hugeltb allocation
>> 2. mount hugetlbfs with min_size to enable hugetlb subpool
>> 3. alloc hugepages with two task(make sure the second will fail due to
>> insufficient amount of hugepages)
>> 4. with for a few seconds and repeat step 3 which will make
>> hstate->resv_huge_pages to go below zero.
>>
>> To fix this problem, return corrent amount of pages to subpool during the
>> fallback after hugepage_subpool_get_pages is called.
>>
>> Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
>> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
>
> Hi Wupeng,
> Thank you for the fix! This is a problem that we've also seen happen in
> our fleet at Meta. I was able to recreate the issue that you mentioned -- to
> explicitly lay down the steps I used:
>
> 1. echo 1 > /proc/sys/vm/nr_hugepages
> 2. mkdir /mnt/hugetlb-pool
> 3.mount -t hugetlbfs -o min_size=2M none /mnt/hugetlb-pool
> 4. (./get_hugepage &) && (./get_hugepage &)
> # get_hugepage just opens a file in /mnt/hugetlb-pool and mmaps 2M into it.
Hi Joshua,
Would you be able to share the source for ./get_hugepage? I'm trying to
reproduce this too.
Does ./get_hugepage just mmap and then spin in an infinite loop?
Do you have to somehow limit allocation of surplus HugeTLB pages from
the buddy allocator?
Thanks!
> 5. sleep 3
> 6. (./get_hugepage &) && (./get_hugepage &)
> 7. cat /proc/meminfo | grep HugePages_Rsvd
>
> ... and (6) shows that HugePages_Rsvd has indeed underflowed to U64_MAX!
>
> I've also verified that applying your fix and then re-running the reproducer
> shows no underflow.
>
> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
>
> Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool
2025-06-11 15:55 ` Ackerley Tng
@ 2025-06-12 0:54 ` Joshua Hahn
0 siblings, 0 replies; 6+ messages in thread
From: Joshua Hahn @ 2025-06-12 0:54 UTC (permalink / raw)
To: Ackerley Tng
Cc: mawupeng1, akpm, mike.kravetz, david, muchun.song, linux-mm,
linux-kernel, kernel-team
On Wed, 11 Jun 2025 08:55:30 -0700 Ackerley Tng <ackerleytng@google.com> wrote:
> Joshua Hahn <joshua.hahnjy@gmail.com> writes:
>
> > On Tue, 25 Mar 2025 14:16:34 +0800 Wupeng Ma <mawupeng1@huawei.com> wrote:
> >
> >> During our testing with hugetlb subpool enabled, we observe that
> >> hstate->resv_huge_pages may underflow into negative values. Root cause
> >> analysis reveals a race condition in subpool reservation fallback handling
> >> as follow:
> >>
> >> hugetlb_reserve_pages()
> >> /* Attempt subpool reservation */
> >> gbl_reserve = hugepage_subpool_get_pages(spool, chg);
> >>
> >> /* Global reservation may fail after subpool allocation */
> >> if (hugetlb_acct_memory(h, gbl_reserve) < 0)
> >> goto out_put_pages;
> >>
> >> out_put_pages:
> >> /* This incorrectly restores reservation to subpool */
> >> hugepage_subpool_put_pages(spool, chg);
> >>
> >> When hugetlb_acct_memory() fails after subpool allocation, the current
> >> implementation over-commits subpool reservations by returning the full
> >> 'chg' value instead of the actual allocated 'gbl_reserve' amount. This
> >> discrepancy propagates to global reservations during subsequent releases,
> >> eventually causing resv_huge_pages underflow.
> >>
> >> This problem can be trigger easily with the following steps:
> >> 1. reverse hugepage for hugeltb allocation
> >> 2. mount hugetlbfs with min_size to enable hugetlb subpool
> >> 3. alloc hugepages with two task(make sure the second will fail due to
> >> insufficient amount of hugepages)
> >> 4. with for a few seconds and repeat step 3 which will make
> >> hstate->resv_huge_pages to go below zero.
> >>
> >> To fix this problem, return corrent amount of pages to subpool during the
> >> fallback after hugepage_subpool_get_pages is called.
> >>
> >> Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
> >> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
> >
> > Hi Wupeng,
> > Thank you for the fix! This is a problem that we've also seen happen in
> > our fleet at Meta. I was able to recreate the issue that you mentioned -- to
> > explicitly lay down the steps I used:
> >
> > 1. echo 1 > /proc/sys/vm/nr_hugepages
> > 2. mkdir /mnt/hugetlb-pool
> > 3.mount -t hugetlbfs -o min_size=2M none /mnt/hugetlb-pool
> > 4. (./get_hugepage &) && (./get_hugepage &)
> > # get_hugepage just opens a file in /mnt/hugetlb-pool and mmaps 2M into it.
>
> Hi Joshua,
>
> Would you be able to share the source for ./get_hugepage? I'm trying to
> reproduce this too.
>
> Does ./get_hugepage just mmap and then spin in an infinite loop?
>
> Do you have to somehow limit allocation of surplus HugeTLB pages from
> the buddy allocator?
>
> Thanks!
Hi Ackerley,
The script I used for get_hugepage is very simple : -) No need to even spin
infinitely! I just make a file descriptor, ftruncate it to 2M, and mmap
into it. For good measure I set addr[0] = '.', sleep for 1 second, and then
munmap the area afterwards.
Here is a simplified version of the script (no error handling):
int fd = open("/mnt/hugetlb-pool/hugetlb_file", O_RDWR | O_CREAT, 0666);
ftruncate(fd, 2*1024*1024);
char *addr = mmap(NULL, 2*1024*1024, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
addr[0] = '.';
sleep(1);
munmap(addr, 2*1024*1024);
close(fd);
Hope this helps! Please let me know if it doesn't work, I would be happy
to investigate this with you. Have a great day!
Joshua
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-06-12 0:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-25 6:16 [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool Wupeng Ma
2025-03-31 21:23 ` Joshua Hahn
2025-06-11 15:55 ` Ackerley Tng
2025-06-12 0:54 ` Joshua Hahn
2025-04-09 13:58 ` Joshua Hahn
2025-04-10 0:51 ` mawupeng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).