[PATCH v3] mm/hugetlb: fix max-only subpool accounting on alloc_hugetlb_folio failure

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Zhao Li <enderaoelyther@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: mawupeng1@huawei.com, Zhao Li <enderaoelyther@gmail.com>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: [PATCH v3] mm/hugetlb: fix max-only subpool accounting on alloc_hugetlb_folio failure
Date: Tue, 28 Apr 2026 19:30:38 +0800	[thread overview]
Message-ID: <20260428113037.88766-2-enderaoelyther@gmail.com> (raw)
In-Reply-To: <20260428030712.66256-2-enderaoelyther@gmail.com>

alloc_hugetlb_folio() calls hugepage_subpool_get_pages() when map_chg
is set.  For a subpool with max_hpages != -1, that bumps used_hpages
regardless of whether it returns gbl_chg = 0 (rsv slot consumed) or
gbl_chg > 0 (used_hpages slot only).  If the allocation later fails
before a folio is returned, the unwind must undo the used_hpages
bump.  The old cleanup only ran for !gbl_chg, leaking used_hpages on
the gbl_chg > 0 path.

For gbl_chg > 0 on max-only subpools (max_hpages != -1, min_hpages
== -1), hugepage_subpool_get_pages() took only a speculative
used_hpages slot.  Drop that slot directly under spool->lock.  In
that configuration hugepage_subpool_put_pages() cannot restore
rsv_hpages, so the direct decrement is the exact inverse and is
race-free against concurrent puts.  This matches the used_hpages-only
part of hugetlb_reserve_pages()'s out_put_pages cleanup, but
restricts it to the max-only case where no rsv_hpages restoration is
possible.

Mounts with min_hpages != -1 are left unchanged for now.  v2's
approach (hugepage_subpool_put_pages() + h->resv_huge_pages++ to
back a restored rsv_hpages slot) double-counts global backing under
concurrent free_huge_folio() and creates phantom reservations under
concurrent hugetlb_unreserve_pages().  Safe cleanup of that quadrant
needs a coordinated fix across multiple call sites.

Reproduced on size=20M hugetlbfs with the faulting task in a hugetlb
cgroup whose limit is exceeded.  Vanilla leaks 6/8 hugepages of
subpool quota; this patch leaks 0/8.  Verified under QEMU.

Fixes: a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Zhao Li <enderaoelyther@gmail.com>
---
Changes in v3:
- Replace v2's hugepage_subpool_put_pages() + h->resv_huge_pages++ on
  the gbl_chg > 0 branch with a direct used_hpages-- under spool->lock.
- Restrict the cleanup to (max_hpages != -1, min_hpages == -1) where
  the direct decrement is the exact inverse of the speculative bump.

Changes in v2:
- Skip the gbl_chg > 0 cleanup when max_hpages is unset.
- Add hugepage_subpool_put_pages() + h->resv_huge_pages++ on the
  gbl_chg > 0 branch.

 mm/hugetlb.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f24bf49be047e..cfdeaf6394c5b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3025,13 +3025,24 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 		hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
 						    h_cg);
 out_subpool_put:
-	/*
-	 * put page to subpool iff the quota of subpool's rsv_hpages is used
-	 * during hugepage_subpool_get_pages.
-	 */
-	if (map_chg && !gbl_chg) {
-		gbl_reserve = hugepage_subpool_put_pages(spool, 1);
-		hugetlb_acct_memory(h, -gbl_reserve);
+	if (map_chg) {
+		if (!gbl_chg) {
+			/* Full inverse when subpool_get_pages() consumed rsv_hpages. */
+			gbl_reserve = hugepage_subpool_put_pages(spool, 1);
+			hugetlb_acct_memory(h, -gbl_reserve);
+		} else if (gbl_chg > 0 && spool && spool->min_hpages == -1 &&
+			   spool->max_hpages != -1) {
+			unsigned long flags;
+
+			/*
+			 * For max-only subpools, subpool_get_pages() took only a
+			 * speculative used_hpages slot. Drop that slot directly.
+			 */
+			spin_lock_irqsave(&spool->lock, flags);
+			if (spool->used_hpages > 0)
+				spool->used_hpages--;
+			unlock_or_release_subpool(spool, flags);
+		}
 	}

--
2.50.1 (Apple Git-155)

     prev parent reply	other threads:[~2026-04-28 11:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 14:52 [PATCH] mm/hugetlb: fix subpool accounting after cgroup charge failure Catherine
2026-04-27 15:12 ` Andrew Morton
2026-04-27 15:19   ` Catherine
2026-04-27 21:12     ` Andrew Morton
2026-04-28  3:07 ` [PATCH v2] " Zhao Li
2026-04-28  9:08   ` Oscar Salvador
2026-04-28 11:30     ` Lance Yang
2026-04-28 11:41       ` Zhao Li
2026-04-28 11:41     ` Zhao Li
2026-04-28 11:30   ` Zhao Li [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:f24bf49be047 dfblob:cfdeaf6394c5 )
 OR (
bs:"[PATCH v3] mm/hugetlb: fix max-only subpool accounting on alloc_hugetlb_folio failure" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428113037.88766-2-enderaoelyther@gmail.com \
    --to=enderaoelyther@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mawupeng1@huawei.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox