* [PATCH] mm/hugetlb: fix subpool accounting after cgroup charge failure
@ 2026-04-27 14:52 Catherine
2026-04-27 15:12 ` Andrew Morton
2026-04-28 3:07 ` [PATCH v2] " Zhao Li
0 siblings, 2 replies; 10+ messages in thread
From: Catherine @ 2026-04-27 14:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Muchun Song, Oscar Salvador, David Hildenbrand, linux-mm,
linux-kernel, Catherine
alloc_hugetlb_folio() calls hugepage_subpool_get_pages() when map_chg
is set. For subpools with max_hpages, that increments used_hpages even
when the returned gbl_chg is positive.
If a later hugetlb cgroup charge fails, the cleanup currently calls
hugepage_subpool_put_pages() only for !gbl_chg. The gbl_chg > 0 path
therefore leaks one used_hpages charge per failure.
Always undo the subpool charge after a successful subpool get. Keep the
global reservation accounting under !gbl_chg, because only that path
consumed a reservation from the subpool.
Signed-off-by: Catherine <enderaoelyther@gmail.com>
---
mm/hugetlb.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f24bf49be..b3ad024a0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3026,12 +3026,14 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
h_cg);
out_subpool_put:
/*
- * put page to subpool iff the quota of subpool's rsv_hpages is used
- * during hugepage_subpool_get_pages.
+ * map_chg means hugepage_subpool_get_pages() succeeded above.
+ * Always undo the subpool quota charge; only drop global reservation
+ * accounting if the subpool consumed a reservation.
*/
- if (map_chg && !gbl_chg) {
+ if (map_chg) {
gbl_reserve = hugepage_subpool_put_pages(spool, 1);
- hugetlb_acct_memory(h, -gbl_reserve);
+ if (!gbl_chg)
+ hugetlb_acct_memory(h, -gbl_reserve);
}
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/hugetlb: fix subpool accounting after cgroup charge failure
2026-04-27 14:52 [PATCH] mm/hugetlb: fix subpool accounting after cgroup charge failure Catherine
@ 2026-04-27 15:12 ` Andrew Morton
2026-04-27 15:19 ` Catherine
2026-04-28 3:07 ` [PATCH v2] " Zhao Li
1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2026-04-27 15:12 UTC (permalink / raw)
To: Catherine
Cc: Muchun Song, Oscar Salvador, David Hildenbrand, linux-mm,
linux-kernel
On Mon, 27 Apr 2026 22:52:48 +0800 Catherine <enderaoelyther@gmail.com> wrote:
> alloc_hugetlb_folio() calls hugepage_subpool_get_pages() when map_chg
> is set. For subpools with max_hpages, that increments used_hpages even
> when the returned gbl_chg is positive.
>
> If a later hugetlb cgroup charge fails, the cleanup currently calls
> hugepage_subpool_put_pages() only for !gbl_chg. The gbl_chg > 0 path
> therefore leaks one used_hpages charge per failure.
>
> Always undo the subpool charge after a successful subpool get. Keep the
> global reservation accounting under !gbl_chg, because only that path
> consumed a reservation from the subpool.
Thanks.
We do prefer full, real names for kernel alterations. Can you please
provide that?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/hugetlb: fix subpool accounting after cgroup charge failure
2026-04-27 15:12 ` Andrew Morton
@ 2026-04-27 15:19 ` Catherine
2026-04-27 21:12 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Catherine @ 2026-04-27 15:19 UTC (permalink / raw)
To: Andrew Morton
Cc: Catherine, Muchun Song, Oscar Salvador, David Hildenbrand,
linux-mm, linux-kernel
Hi Andrew,
My real name is Zhao Li. Please use Zhao Li <enderaoelyther@gmail.com>
for authorship and Signed-off-by.
Thanks,
Catherine
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/hugetlb: fix subpool accounting after cgroup charge failure
2026-04-27 15:19 ` Catherine
@ 2026-04-27 21:12 ` Andrew Morton
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2026-04-27 21:12 UTC (permalink / raw)
To: Catherine
Cc: Muchun Song, Oscar Salvador, David Hildenbrand, linux-mm,
linux-kernel
On Mon, 27 Apr 2026 23:19:35 +0800 Catherine <enderaoelyther@gmail.com> wrote:
> Hi Andrew,
>
> My real name is Zhao Li. Please use Zhao Li <enderaoelyther@gmail.com>
> for authorship and Signed-off-by.
Thanks.
AI review might have found an accounting imbalance:
https://sashiko.dev/#/patchset/20260427145247.84157-2-enderaoelyther@gmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] mm/hugetlb: fix subpool accounting after cgroup charge failure
2026-04-27 14:52 [PATCH] mm/hugetlb: fix subpool accounting after cgroup charge failure Catherine
2026-04-27 15:12 ` Andrew Morton
@ 2026-04-28 3:07 ` Zhao Li
2026-04-28 9:08 ` Oscar Salvador
2026-04-28 11:30 ` [PATCH v3] mm/hugetlb: fix max-only subpool accounting on alloc_hugetlb_folio failure Zhao Li
1 sibling, 2 replies; 10+ messages in thread
From: Zhao Li @ 2026-04-28 3:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Muchun Song, Oscar Salvador, David Hildenbrand, linux-mm,
linux-kernel
alloc_hugetlb_folio() calls hugepage_subpool_get_pages() when map_chg
is set. For subpools with max_hpages, that increments used_hpages.
If the later hugetlb cgroup charge fails, the unwind must undo that
charge even when gbl_chg > 0.
hugepage_subpool_put_pages() can also restore rsv_hpages if concurrent
frees move used_hpages below min_hpages between the get and put. When
that happens on the gbl_chg > 0 path, restore the matching global
reservation as well.
Skip the gbl_chg > 0 put when max_hpages is unset. For a min_size-only
subpool, get_pages() did not change subpool state and put_pages() would
create a false reservation.
Signed-off-by: Zhao Li <enderaoelyther@gmail.com>
---
Changes in v2:
- Handle rsv_hpages restoration when racing frees cross min_hpages.
- Skip gbl_chg > 0 put_pages() when max_hpages is unset.
mm/hugetlb.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f24bf49be047e..4065d66fdcb5c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3026,12 +3026,23 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
h_cg);
out_subpool_put:
/*
- * put page to subpool iff the quota of subpool's rsv_hpages is used
- * during hugepage_subpool_get_pages.
+ * map_chg means hugepage_subpool_get_pages() succeeded above.
+ * If max_hpages accounting was touched, undo it. If racing frees
+ * moved the subpool below min_hpages, the put path may restore a
+ * subpool reservation. Restore the matching global reservation too.
*/
- if (map_chg && !gbl_chg) {
- gbl_reserve = hugepage_subpool_put_pages(spool, 1);
- hugetlb_acct_memory(h, -gbl_reserve);
+ if (map_chg) {
+ if (!gbl_chg) {
+ gbl_reserve = hugepage_subpool_put_pages(spool, 1);
+ hugetlb_acct_memory(h, -gbl_reserve);
+ } else if (spool && spool->max_hpages != -1) {
+ gbl_reserve = hugepage_subpool_put_pages(spool, 1);
+ if (!gbl_reserve) {
+ spin_lock_irq(&hugetlb_lock);
+ h->resv_huge_pages++;
+ spin_unlock_irq(&hugetlb_lock);
+ }
+ }
}
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm/hugetlb: fix subpool accounting after cgroup charge failure
2026-04-28 3:07 ` [PATCH v2] " Zhao Li
@ 2026-04-28 9:08 ` Oscar Salvador
2026-04-28 11:30 ` Lance Yang
2026-04-28 11:41 ` Zhao Li
2026-04-28 11:30 ` [PATCH v3] mm/hugetlb: fix max-only subpool accounting on alloc_hugetlb_folio failure Zhao Li
1 sibling, 2 replies; 10+ messages in thread
From: Oscar Salvador @ 2026-04-28 9:08 UTC (permalink / raw)
To: Zhao Li
Cc: Andrew Morton, Muchun Song, David Hildenbrand, linux-mm,
linux-kernel
On Tue, Apr 28, 2026 at 11:07:13AM +0800, Zhao Li wrote:
> alloc_hugetlb_folio() calls hugepage_subpool_get_pages() when map_chg
> is set. For subpools with max_hpages, that increments used_hpages.
> If the later hugetlb cgroup charge fails, the unwind must undo that
> charge even when gbl_chg > 0.
I found that last sentence misleading, because we do not really care
about hugetlb cgroup charge/uncharge (besides that being of the reasons
we end up on error path) but rather the fact that we fiddle with
subpool->used_hpages and we need to undo that when we rollback.
> hugepage_subpool_put_pages() can also restore rsv_hpages if concurrent
> frees move used_hpages below min_hpages between the get and put. When
> that happens on the gbl_chg > 0 path, restore the matching global
> reservation as well.
Well, that does not quite explain the problem I think, at least not clear enough?
So the problem at hand (IIUC) is that
1) if we took a global reservation
2) and concurrent hugepage_subpool_put_pages operations made
used_hpages be below min_hpages, and so subpool's reservation
was incremented, but we need to increment the global one as well
otherwise the next time we pull a page from the spool,
global resv_huge_pages will be inbalanced
> Skip the gbl_chg > 0 put when max_hpages is unset. For a min_size-only
> subpool, get_pages() did not change subpool state and put_pages() would
> create a false reservation.
>
> Signed-off-by: Zhao Li <enderaoelyther@gmail.com>
> ---
> Changes in v2:
> - Handle rsv_hpages restoration when racing frees cross min_hpages.
> - Skip gbl_chg > 0 put_pages() when max_hpages is unset.
>
> mm/hugetlb.c | 21 ++++++++++++++++-----
> 1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f24bf49be047e..4065d66fdcb5c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3026,12 +3026,23 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> h_cg);
> out_subpool_put:
> /*
> - * put page to subpool iff the quota of subpool's rsv_hpages is used
> - * during hugepage_subpool_get_pages.
> + * map_chg means hugepage_subpool_get_pages() succeeded above.
> + * If max_hpages accounting was touched, undo it. If racing frees
> + * moved the subpool below min_hpages, the put path may restore a
> + * subpool reservation. Restore the matching global reservation too.
I would split the comment in two parts and place them within the block
they belong, otherwise it sounds confusing.
And maybe elaborate a little bit more.
Subpools, reservations and hugetlb make a very head-spinning situation, so let
us make our life easier.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3] mm/hugetlb: fix max-only subpool accounting on alloc_hugetlb_folio failure
2026-04-28 3:07 ` [PATCH v2] " Zhao Li
2026-04-28 9:08 ` Oscar Salvador
@ 2026-04-28 11:30 ` Zhao Li
1 sibling, 0 replies; 10+ messages in thread
From: Zhao Li @ 2026-04-28 11:30 UTC (permalink / raw)
To: Andrew Morton
Cc: mawupeng1, Zhao Li, Muchun Song, Oscar Salvador,
David Hildenbrand, linux-mm, linux-kernel, stable
alloc_hugetlb_folio() calls hugepage_subpool_get_pages() when map_chg
is set. For a subpool with max_hpages != -1, that bumps used_hpages
regardless of whether it returns gbl_chg = 0 (rsv slot consumed) or
gbl_chg > 0 (used_hpages slot only). If the allocation later fails
before a folio is returned, the unwind must undo the used_hpages
bump. The old cleanup only ran for !gbl_chg, leaking used_hpages on
the gbl_chg > 0 path.
For gbl_chg > 0 on max-only subpools (max_hpages != -1, min_hpages
== -1), hugepage_subpool_get_pages() took only a speculative
used_hpages slot. Drop that slot directly under spool->lock. In
that configuration hugepage_subpool_put_pages() cannot restore
rsv_hpages, so the direct decrement is the exact inverse and is
race-free against concurrent puts. This matches the used_hpages-only
part of hugetlb_reserve_pages()'s out_put_pages cleanup, but
restricts it to the max-only case where no rsv_hpages restoration is
possible.
Mounts with min_hpages != -1 are left unchanged for now. v2's
approach (hugepage_subpool_put_pages() + h->resv_huge_pages++ to
back a restored rsv_hpages slot) double-counts global backing under
concurrent free_huge_folio() and creates phantom reservations under
concurrent hugetlb_unreserve_pages(). Safe cleanup of that quadrant
needs a coordinated fix across multiple call sites.
Reproduced on size=20M hugetlbfs with the faulting task in a hugetlb
cgroup whose limit is exceeded. Vanilla leaks 6/8 hugepages of
subpool quota; this patch leaks 0/8. Verified under QEMU.
Fixes: a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Zhao Li <enderaoelyther@gmail.com>
---
Changes in v3:
- Replace v2's hugepage_subpool_put_pages() + h->resv_huge_pages++ on
the gbl_chg > 0 branch with a direct used_hpages-- under spool->lock.
- Restrict the cleanup to (max_hpages != -1, min_hpages == -1) where
the direct decrement is the exact inverse of the speculative bump.
Changes in v2:
- Skip the gbl_chg > 0 cleanup when max_hpages is unset.
- Add hugepage_subpool_put_pages() + h->resv_huge_pages++ on the
gbl_chg > 0 branch.
mm/hugetlb.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f24bf49be047e..cfdeaf6394c5b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3025,13 +3025,24 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
h_cg);
out_subpool_put:
- /*
- * put page to subpool iff the quota of subpool's rsv_hpages is used
- * during hugepage_subpool_get_pages.
- */
- if (map_chg && !gbl_chg) {
- gbl_reserve = hugepage_subpool_put_pages(spool, 1);
- hugetlb_acct_memory(h, -gbl_reserve);
+ if (map_chg) {
+ if (!gbl_chg) {
+ /* Full inverse when subpool_get_pages() consumed rsv_hpages. */
+ gbl_reserve = hugepage_subpool_put_pages(spool, 1);
+ hugetlb_acct_memory(h, -gbl_reserve);
+ } else if (gbl_chg > 0 && spool && spool->min_hpages == -1 &&
+ spool->max_hpages != -1) {
+ unsigned long flags;
+
+ /*
+ * For max-only subpools, subpool_get_pages() took only a
+ * speculative used_hpages slot. Drop that slot directly.
+ */
+ spin_lock_irqsave(&spool->lock, flags);
+ if (spool->used_hpages > 0)
+ spool->used_hpages--;
+ unlock_or_release_subpool(spool, flags);
+ }
}
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm/hugetlb: fix subpool accounting after cgroup charge failure
2026-04-28 9:08 ` Oscar Salvador
@ 2026-04-28 11:30 ` Lance Yang
2026-04-28 11:41 ` Zhao Li
2026-04-28 11:41 ` Zhao Li
1 sibling, 1 reply; 10+ messages in thread
From: Lance Yang @ 2026-04-28 11:30 UTC (permalink / raw)
To: osalvador, enderaoelyther
Cc: akpm, muchun.song, david, linux-mm, linux-kernel, Lance Yang
On Tue, Apr 28, 2026 at 11:08:04AM +0200, Oscar Salvador wrote:
>On Tue, Apr 28, 2026 at 11:07:13AM +0800, Zhao Li wrote:
>> alloc_hugetlb_folio() calls hugepage_subpool_get_pages() when map_chg
>> is set. For subpools with max_hpages, that increments used_hpages.
>> If the later hugetlb cgroup charge fails, the unwind must undo that
>> charge even when gbl_chg > 0.
>
>I found that last sentence misleading, because we do not really care
>about hugetlb cgroup charge/uncharge (besides that being of the reasons
>we end up on error path) but rather the fact that we fiddle with
>subpool->used_hpages and we need to undo that when we rollback.
>
>> hugepage_subpool_put_pages() can also restore rsv_hpages if concurrent
>> frees move used_hpages below min_hpages between the get and put. When
>> that happens on the gbl_chg > 0 path, restore the matching global
>> reservation as well.
>
>Well, that does not quite explain the problem I think, at least not clear enough?
>So the problem at hand (IIUC) is that
>
>1) if we took a global reservation
>2) and concurrent hugepage_subpool_put_pages operations made
> used_hpages be below min_hpages, and so subpool's reservation
> was incremented, but we need to increment the global one as well
> otherwise the next time we pull a page from the spool,
> global resv_huge_pages will be inbalanced
>
>
>> Skip the gbl_chg > 0 put when max_hpages is unset. For a min_size-only
>> subpool, get_pages() did not change subpool state and put_pages() would
>> create a false reservation.
>>
>> Signed-off-by: Zhao Li <enderaoelyther@gmail.com>
>> ---
>> Changes in v2:
>> - Handle rsv_hpages restoration when racing frees cross min_hpages.
>> - Skip gbl_chg > 0 put_pages() when max_hpages is unset.
>>
>> mm/hugetlb.c | 21 ++++++++++++++++-----
>> 1 file changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index f24bf49be047e..4065d66fdcb5c 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3026,12 +3026,23 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>> h_cg);
>> out_subpool_put:
>> /*
>> - * put page to subpool iff the quota of subpool's rsv_hpages is used
>> - * during hugepage_subpool_get_pages.
>> + * map_chg means hugepage_subpool_get_pages() succeeded above.
>> + * If max_hpages accounting was touched, undo it. If racing frees
>> + * moved the subpool below min_hpages, the put path may restore a
>> + * subpool reservation. Restore the matching global reservation too.
>
>I would split the comment in two parts and place them within the block
>they belong, otherwise it sounds confusing.
>And maybe elaborate a little bit more.
>
>Subpools, reservations and hugetlb make a very head-spinning situation, so let
>us make our life easier.
Yep, the comment is confusing to me as well ...
+ if (map_chg) {
+ if (!gbl_chg) {
+ gbl_reserve = hugepage_subpool_put_pages(spool, 1);
+ hugetlb_acct_memory(h, -gbl_reserve);
+ } else if (spool && spool->max_hpages != -1) {
+ gbl_reserve = hugepage_subpool_put_pages(spool, 1);
+ if (!gbl_reserve) {
+ spin_lock_irq(&hugetlb_lock);
+ h->resv_huge_pages++;
+ spin_unlock_irq(&hugetlb_lock);
+ }
+ }
}
IIUC, there are three cases:
1) !gbl_chg
hugepage_subpool_get_pages() consumed one reservation from
spool->rsv_hpages. So the error path needs to call
hugepage_subpool_put_pages() and then drop the matching global
reservation with hugetlb_acct_memory().
2) gbl_chg > 0 && spool->max_hpages != -1
hugepage_subpool_get_pages() did not consume spool->rsv_hpages, but it
did increment spool->used_hpages after the max_hpages check passed. So
the error path still needs to call hugepage_subpool_put_pages() to undo
that.
If hugepage_subpool_put_pages() returns 0 here, it restored one
reservation in spool->rsv_hpages, so we also need to increment
h->resv_huge_pages.
3) gbl_chg > 0 && spool->max_hpages == -1
hugepage_subpool_get_pages() did not change spool->rsv_hpages or
spool->used_hpages, so the error path should not call
hugepage_subpool_put_pages(), otherwise it could create a false
reservation in spool->rsv_hpages.
Hopefully I didn't miss anything :)
Lance
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm/hugetlb: fix subpool accounting after cgroup charge failure
2026-04-28 9:08 ` Oscar Salvador
2026-04-28 11:30 ` Lance Yang
@ 2026-04-28 11:41 ` Zhao Li
1 sibling, 0 replies; 10+ messages in thread
From: Zhao Li @ 2026-04-28 11:41 UTC (permalink / raw)
To: Oscar Salvador
Cc: Zhao Li, Andrew Morton, Muchun Song, David Hildenbrand,
Lance Yang, linux-mm, linux-kernel
On Tue, Apr 28, 2026 at 11:08:04AM +0200, Oscar Salvador wrote:
> I found that last sentence misleading, because we do not really
> care about hugetlb cgroup charge/uncharge (besides that being of
> the reasons we end up on error path) but rather the fact that we
> fiddle with subpool->used_hpages and we need to undo that when we
> rollback.
Agreed - reframed in v3. The commit body now states the bug as
the unwind missing the used_hpages rollback, without pinning it to
the cgroup-charge case, and the subject is narrowed to "fix
max-only subpool accounting on alloc_hugetlb_folio failure".
> Well, that does not quite explain the problem I think, at least
> not clear enough? [...]
Fair - that explanation got tangled because v2's design itself was
trying to compensate for racing min crossings. v3 sidesteps it
entirely: the gbl_chg > 0 cleanup is now restricted to
(max_hpages != -1, min_hpages == -1). In that configuration
hugepage_subpool_put_pages()'s min-restoration branch is dead, so a
direct used_hpages-- under spool->lock is the exact inverse of the
speculative bump - no h->resv_huge_pages++ needed, no rsv_hpages
publication, no racing-put reasoning.
Mounts with min_hpages != -1 are left at v1 behaviour for now.
That quadrant has an inherited race that also exists at
hugetlb_reserve_pages()'s out_put_pages cleanup, so a coordinated
fix belongs in a separate RFC rather than this stable backport.
> I would split the comment in two parts and place them within the
> block they belong, otherwise it sounds confusing.
>
> Subpools, reservations and hugetlb make a very head-spinning
> situation, so let us make our life easier.
Done - one short comment per branch placed inside the relevant
code block in v3. Hopefully easier to follow now.
v3:
https://lore.kernel.org/linux-mm/20260428113037.88766-2-enderaoelyther@gmail.com/
Thanks for the review.
--
Zhao Li
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm/hugetlb: fix subpool accounting after cgroup charge failure
2026-04-28 11:30 ` Lance Yang
@ 2026-04-28 11:41 ` Zhao Li
0 siblings, 0 replies; 10+ messages in thread
From: Zhao Li @ 2026-04-28 11:41 UTC (permalink / raw)
To: Lance Yang
Cc: Zhao Li, Oscar Salvador, Andrew Morton, Muchun Song,
David Hildenbrand, linux-mm, linux-kernel
On Tue, Apr 28, 2026 at 07:30:59PM +0800, Lance Yang wrote:
> IIUC, there are three cases:
> [...]
> 2) gbl_chg > 0 && spool->max_hpages != -1
> [...]
> If hugepage_subpool_put_pages() returns 0 here, it restored one
> reservation in spool->rsv_hpages, so we also need to increment
> h->resv_huge_pages.
Thanks for working through the three cases - that's a clean
breakdown and matches what v2 was trying to do. We ran into trouble
on case 2 specifically: the h->resv_huge_pages++ on the
put-returned-0 path looked right in isolation but turns out to be
unsafe once you put it next to concurrent hugetlb_unreserve_pages()
or free_huge_folio(). Two reachable orderings break it:
* free_huge_folio() with HPageRestoreReserve set already does
h->resv_huge_pages++ on its own. v2's bump on the gbl_chg > 0
cleanup double-counts against that.
* hugetlb_unreserve_pages() does hugetlb_acct_memory(h, -X) which
subtracts from h->resv_huge_pages and may also return surplus
backing. v2's bump then leaves rsv_hpages backed by no
h->resv_huge_pages - a phantom reservation that the next
subpool_get_pages() consumes without real backing.
v3 ended up going a different way: the gbl_chg > 0 cleanup is now
restricted to (max_hpages != -1, min_hpages == -1). In that
configuration hugepage_subpool_put_pages()'s min-restoration branch
is dead, so a direct used_hpages-- under spool->lock is the exact
inverse of the speculative bump - no put_pages(), no
h->resv_huge_pages++, no concurrent-races to reason about.
Your case 3 (max_hpages == -1) is unchanged: cleanup is a no-op,
because get_pages() didn't touch any subpool field.
Mounts with min_hpages != -1 are left at v1 behaviour for now. That
quadrant has an inherited race that also exists at
hugetlb_reserve_pages()'s out_put_pages cleanup; a coordinated fix
belongs in a separate RFC rather than this stable backport.
v3:
https://lore.kernel.org/linux-mm/20260428113037.88766-2-enderaoelyther@gmail.com/
Thanks for the review.
--
Zhao Li
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-28 11:42 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 14:52 [PATCH] mm/hugetlb: fix subpool accounting after cgroup charge failure Catherine
2026-04-27 15:12 ` Andrew Morton
2026-04-27 15:19 ` Catherine
2026-04-27 21:12 ` Andrew Morton
2026-04-28 3:07 ` [PATCH v2] " Zhao Li
2026-04-28 9:08 ` Oscar Salvador
2026-04-28 11:30 ` Lance Yang
2026-04-28 11:41 ` Zhao Li
2026-04-28 11:41 ` Zhao Li
2026-04-28 11:30 ` [PATCH v3] mm/hugetlb: fix max-only subpool accounting on alloc_hugetlb_folio failure Zhao Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox