From: Rik van Riel <riel@surriel.com>
To: linux-kernel@vger.kernel.org
Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org,
willy@infradead.org, surenb@google.com, hannes@cmpxchg.org,
ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev,
Rik van Riel <riel@meta.com>, Rik van Riel <riel@surriel.com>
Subject: [RFC PATCH 26/45] mm: page_alloc: prevent UNMOVABLE/RECLAIMABLE mixing in pageblocks
Date: Thu, 30 Apr 2026 16:20:55 -0400 [thread overview]
Message-ID: <20260430202233.111010-27-riel@surriel.com> (raw)
In-Reply-To: <20260430202233.111010-1-riel@surriel.com>
From: Rik van Riel <riel@meta.com>
Summary:
Inside a tainted SPB, free pages of UNMOVABLE and RECLAIMABLE
allocations cannot be told apart by the buddy allocator's
compatibility heuristic (alike_pages == 0 between the two non-movable
types in try_to_claim_block). Once a pageblock holds in-use pages of
both, any sticky UNMOVABLE pinhole prevents the RECLAIMABLE pages
from coalescing into useful higher-order chunks when they drain back
to the buddy. The PB's free capacity is permanently capped at
order-1 dust regardless of how much of it actually returns. Sticky
recl pages (active dentries, locked btrfs eb folios, NOFS slab) are
unavoidable; the cost is paid in internal fragmentation.
Two paths in the page allocator create UNMOVABLE<->RECLAIMABLE
mixing today:
1. try_to_claim_block() relabels a partial PB whenever the 50%
threshold "free_pages + alike_pages >= pageblock_nr_pages/2"
passes. For UNMOV<->RECL, alike_pages == 0, so the rule
degenerates to free_pages >= 256. A PB with 256 in-use UNMOV
pages plus 256 free pages passes and is relabeled RECL. Both
PB_has_unmovable and PB_has_reclaimable are then set.
2. __rmqueue_steal() takes a single foreign-type page out of a
PB without relabeling the PB. A UNMOVABLE allocation stealing
from a RECLAIMABLE-labeled PB sets PB_has_unmovable on top of
the existing PB_has_reclaimable.
Tighten both paths:
- Add noncompatible_cross_type() helper that detects the
UNMOV<->RECL pair (MOVABLE may still mix with either since
movable pages can be migrated out).
- In try_to_claim_block(), require a fully-free PB
(free_pages == pageblock_nr_pages) for any cross-type relabel,
regardless of from_tainted_spb. The other-type bit inherited
from the prior label is stale on a fully-free PB (no in-use
pages of either type) so clear it during the relabel rather
than leaving the PB visibly mixed in PB_has_* state.
- In __rmqueue_steal(), pass a new SB_SKIP_CROSS_TYPE flag to
__rmqueue_sb_find_fallback() so the cross-type fallback entry
in fallbacks[] is skipped. Steal then falls through to the
MIGRATE_MOVABLE second fallback instead of single-page-stealing
into a foreign non-movable PB.
The from_tainted_spb=true caller of try_to_claim_block() is
unaffected because it hardcodes block_type=MIGRATE_MOVABLE. The
claim_whole_block() branch (current_order >= pageblock_order) is
also unaffected: it requires PB_all_free, so the PB is fully free
of any prior type.
Test Plan:
Bare-metal devvm with the existing 4 stuck tainted SPBs (sb[2,15,
36,51] in Normal). Build and reboot. Compare per-order free
distribution in newly tainted SPBs against pre-patch baseline:
today o0/o1 dominate, target meaningful (>10%) free at order >= 3
in pure-RECL SPBs created post-patch. Watch for tainted SPB count
growth past ~12 (3x current baseline) — the fully-free constraint
on cross-type claims will taint fresh SPBs more often, and a
runaway count means the cost was misjudged. Watch dmesg for
allocation failures and kswapd CPU stays under 2 cores. Existing
mixed SPBs from before this change won't unmix; the win is for
SPBs created after.
Reviewers:
Subscribers:
Tasks:
Tags:
Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
mm/page_alloc.c | 111 ++++++++++++++++++++++++++++++++++++------------
1 file changed, 85 insertions(+), 26 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 67cc8165ab1f..ceb1284a63ed 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3057,6 +3057,23 @@ static int fallbacks[MIGRATE_PCPTYPES][MIGRATE_PCPTYPES - 1] = {
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE },
};
+/*
+ * UNMOVABLE and RECLAIMABLE allocations should not share the same
+ * pageblock. Their free pages are interchangeable on the buddy free
+ * lists (alike_pages == 0 between them), so once a PB holds both
+ * types the buddy can no longer tell them apart and any sticky
+ * UNMOVABLE pinhole prevents the RECLAIMABLE pages from coalescing
+ * into useful higher-order chunks when they drain back. MOVABLE may
+ * mix with either, since MOVABLE pages can be migrated out.
+ */
+static inline bool noncompatible_cross_type(int start_type, int fallback_type)
+{
+ return (start_type == MIGRATE_UNMOVABLE &&
+ fallback_type == MIGRATE_RECLAIMABLE) ||
+ (start_type == MIGRATE_RECLAIMABLE &&
+ fallback_type == MIGRATE_UNMOVABLE);
+}
+
#ifdef CONFIG_CMA
static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
unsigned int order)
@@ -3434,6 +3451,9 @@ try_to_claim_block(struct zone *zone, struct page *page,
bool from_tainted_spb)
{
int free_pages, movable_pages, alike_pages;
+#ifdef CONFIG_COMPACTION
+ struct superpageblock *sb;
+#endif
unsigned long start_pfn;
/*
@@ -3492,35 +3512,48 @@ try_to_claim_block(struct zone *zone, struct page *page,
* allocations. Inside a tainted SPB the protection is unnecessary:
* fragmentation has already been accepted at the SPB level, and
* relabeling is much cheaper than tainting a fresh clean SPB.
- */
- if (from_tainted_spb ||
- free_pages + alike_pages >= (1 << (pageblock_order-1)) ||
- page_group_by_mobility_disabled) {
- __move_freepages_block(zone, start_pfn, block_type, start_type);
- set_pageblock_migratetype(pfn_to_page(start_pfn), start_type);
-#ifdef CONFIG_COMPACTION
- /*
- * Track actual page contents in pageblock flags and
- * update superpageblock counters so the SPB moves to
- * the correct fullness list for steering.
- */
- {
- struct page *start_page = pfn_to_page(start_pfn);
- struct superpageblock *sb;
-
- __spb_set_has_type(start_page, start_type);
- if (block_type != start_type)
- __spb_set_has_type(start_page, block_type);
+ *
+ * UNMOVABLE<->RECLAIMABLE cross-type claims override these rules:
+ * once mixed, sticky pinholes of one type prevent the other from
+ * coalescing into useful higher-order free chunks even after drain.
+ * Only relabel a fully-free PB in that case, regardless of whether
+ * the SPB is tainted.
+ */
+ if (noncompatible_cross_type(start_type, block_type)) {
+ if (free_pages != pageblock_nr_pages)
+ return NULL;
+ } else if (!from_tainted_spb &&
+ free_pages + alike_pages < (1 << (pageblock_order-1)) &&
+ !page_group_by_mobility_disabled) {
+ return NULL;
+ }
- sb = pfn_to_superpageblock(zone, start_pfn);
- if (sb)
- spb_update_list(sb);
- }
-#endif
- return __rmqueue_smallest(zone, order, start_type);
+ __move_freepages_block(zone, start_pfn, block_type, start_type);
+ set_pageblock_migratetype(pfn_to_page(start_pfn), start_type);
+#ifdef CONFIG_COMPACTION
+ /*
+ * Track actual page contents in pageblock flags and update
+ * superpageblock counters so the SPB moves to the correct
+ * fullness list for steering.
+ *
+ * For cross-type UNMOVABLE<->RECLAIMABLE relabel (which by the
+ * predicate above only fires on a fully-free PB), the inherited
+ * PB_has_<block_type> bit is stale — there are no in-use pages
+ * of that type. Clear it so the resulting PB is unmixed.
+ */
+ __spb_set_has_type(pfn_to_page(start_pfn), start_type);
+ if (block_type != start_type) {
+ if (noncompatible_cross_type(start_type, block_type))
+ __spb_clear_has_type(pfn_to_page(start_pfn), block_type);
+ else
+ __spb_set_has_type(pfn_to_page(start_pfn), block_type);
}
- return NULL;
+ sb = pfn_to_superpageblock(zone, start_pfn);
+ if (sb)
+ spb_update_list(sb);
+#endif
+ return __rmqueue_smallest(zone, order, start_type);
}
/*
@@ -3544,6 +3577,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
#define SB_SEARCH_EMPTY (1 << 1)
#define SB_SEARCH_FALLBACK (1 << 2)
#define SB_SEARCH_ALL (SB_SEARCH_PREFERRED | SB_SEARCH_EMPTY | SB_SEARCH_FALLBACK)
+/*
+ * Skip UNMOVABLE<->RECLAIMABLE cross-type fallback. Used by the steal
+ * path to prevent landing single foreign-type pages into a PB labeled
+ * with the other non-movable type — a steal does not relabel the PB
+ * so cross-type stealing creates permanent mixing.
+ */
+#define SB_SKIP_CROSS_TYPE (1 << 3)
static struct page *
__rmqueue_sb_find_fallback(struct zone *zone, unsigned int order,
@@ -3580,6 +3620,10 @@ __rmqueue_sb_find_fallback(struct zone *zone, unsigned int order,
int fmt = fallbacks[start_migratetype][i];
struct page *page;
+ if ((search_cats & SB_SKIP_CROSS_TYPE) &&
+ noncompatible_cross_type(start_migratetype, fmt))
+ continue;
+
page = get_page_from_free_area(area,
fmt);
if (page) {
@@ -3601,6 +3645,10 @@ __rmqueue_sb_find_fallback(struct zone *zone, unsigned int order,
int fmt = fallbacks[start_migratetype][i];
struct page *page;
+ if ((search_cats & SB_SKIP_CROSS_TYPE) &&
+ noncompatible_cross_type(start_migratetype, fmt))
+ continue;
+
page = get_page_from_free_area(area,
fmt);
if (page) {
@@ -3629,6 +3677,10 @@ __rmqueue_sb_find_fallback(struct zone *zone, unsigned int order,
int fmt = fallbacks[start_migratetype][i];
struct page *page;
+ if ((search_cats & SB_SKIP_CROSS_TYPE) &&
+ noncompatible_cross_type(start_migratetype, fmt))
+ continue;
+
page = get_page_from_free_area(area,
fmt);
if (page) {
@@ -3765,11 +3817,18 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype,
/*
* When ALLOC_NOFRAG_TAINTED_OK is set, only steal from tainted
* SPBs to avoid tainting clean ones. Otherwise search all categories.
+ *
+ * Always skip UNMOVABLE<->RECLAIMABLE cross-type fallback. The steal
+ * path takes a single page without relabeling its PB, so a cross-type
+ * steal would land an UNMOVABLE page in a RECLAIMABLE-labeled PB
+ * (or vice versa) and create permanent mixing. Falling through to
+ * MIGRATE_MOVABLE (the second fallback) is preferable.
*/
if (alloc_flags & ALLOC_NOFRAG_TAINTED_OK)
search_cats = SB_SEARCH_PREFERRED;
else
search_cats = SB_SEARCH_PREFERRED | SB_SEARCH_FALLBACK;
+ search_cats |= SB_SKIP_CROSS_TYPE;
/*
* Search per-superpageblock free lists for fallback migratetypes.
--
2.52.0
next prev parent reply other threads:[~2026-04-30 20:22 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 20:20 [00/45 RFC PATCH] 1GB superpageblock memory allocation Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 01/45] mm: page_alloc: replace pageblock_flags bitmap with struct pageblock_data Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 02/45] mm: page_alloc: per-cpu pageblock buddy allocator Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 03/45] mm: page_alloc: use trylock for PCP lock in free path to avoid lock inversion Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 04/45] mm: mm_init: fix zone assignment for pages in unavailable ranges Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 05/45] mm: vmstat: restore per-migratetype free counts in /proc/pagetypeinfo Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 06/45] mm: page_alloc: remove watermark boost mechanism Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 07/45] mm: page_alloc: async evacuation of stolen movable pageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 08/45] mm: page_alloc: track actual page contents in pageblock flags Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 09/45] mm: page_alloc: introduce superpageblock metadata for 1GB anti-fragmentation Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 10/45] mm: page_alloc: support superpageblock resize for memory hotplug Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 11/45] mm: page_alloc: add superpageblock fullness lists for allocation steering Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 12/45] mm: page_alloc: steer pageblock stealing to tainted superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 13/45] mm: page_alloc: steer movable allocations to fullest clean superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 14/45] mm: page_alloc: extract claim_whole_block from try_to_claim_block Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 15/45] mm: page_alloc: add per-superpageblock free lists Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 16/45] mm: page_alloc: add background superpageblock defragmentation worker Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 17/45] mm: page_alloc: add within-superpageblock compaction for clean superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 18/45] mm: page_alloc: superpageblock-aware contiguous and higher order allocation Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 19/45] mm: page_alloc: prevent atomic allocations from tainting clean SPBs Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 20/45] mm: page_alloc: aggressively pack non-movable allocations in tainted SPBs on large systems Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 21/45] mm: page_alloc: prefer reclaim over tainting clean superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 22/45] mm: page_alloc: adopt partial pageblocks from tainted superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 23/45] mm: page_alloc: add CONFIG_DEBUG_VM sanity checks for SPB counters Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 24/45] mm: page_alloc: targeted evacuation and dynamic reserves for tainted SPBs Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 25/45] mm: page_alloc: skip pageblock compatibility threshold in " Rik van Riel
2026-04-30 20:20 ` Rik van Riel [this message]
2026-04-30 20:20 ` [RFC PATCH 27/45] mm: trigger deferred SPB evacuation when atomic allocs would taint a clean SPB Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 28/45] mm: page_alloc: keep PCP refill in tainted SPBs across owned pageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 29/45] mm: page_alloc: refuse fragmenting fallback for callers with cheap fallback Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 30/45] mm: page_alloc: drive slab shrink from SPB anti-fragmentation pressure Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 31/45] mm: page_alloc: cross-non-movable buddy borrow within tainted SPBs Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 32/45] mm: page_alloc: proactive high-water trigger for SPB slab shrink Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 33/45] mm: page_alloc: refuse to taint clean SPBs for atomic NORETRY callers Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 34/45] mm: page_reporting: walk per-superpageblock free lists Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 35/45] mm: show_mem: collect migratetype letters from per-superpageblock lists Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 36/45] mm: page_alloc: add alloc_flags parameter to __rmqueue_smallest Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 37/45] mm/slub: kvmalloc — add __GFP_NORETRY to large-kmalloc attempt Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 38/45] mm: page_alloc: per-(zone, order, mt) PASS_1 hint cache Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 39/45] mm: debug: prevent infinite recursion in dump_page() with CMA Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 40/45] PM: hibernate: walk per-superpageblock free lists in mark_free_pages Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 41/45] btrfs: allocate eb-attached btree pages as movable Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 42/45] mm: page_alloc: cross-MOV borrow within tainted SPBs Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 43/45] mm: page_alloc: trigger defrag from allocator hot path on tainted-SPB pressure Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 44/45] mm: page_alloc: SPB tracepoint instrumentation [DROP-FOR-UPSTREAM] Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 45/45] mm: page_alloc: enlarge and unify spb_evacuate_for_order Rik van Riel
2026-05-01 7:14 ` [00/45 RFC PATCH] 1GB superpageblock memory allocation David Hildenbrand (Arm)
2026-05-01 11:58 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260430202233.111010-27-riel@surriel.com \
--to=riel@surriel.com \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=riel@meta.com \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox