The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Rik van Riel <riel@surriel.com>
To: linux-kernel@vger.kernel.org
Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org,
	willy@infradead.org, surenb@google.com, hannes@cmpxchg.org,
	ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev,
	fvdl@google.com, Rik van Riel <riel@surriel.com>
Subject: [RFC PATCH 24/40] mm: page_alloc: prevent UNMOVABLE/RECLAIMABLE mixing in pageblocks
Date: Wed, 20 May 2026 10:59:30 -0400	[thread overview]
Message-ID: <20260520150018.2491267-25-riel@surriel.com> (raw)
In-Reply-To: <20260520150018.2491267-1-riel@surriel.com>

Inside a tainted SPB, free pages of UNMOVABLE and RECLAIMABLE
allocations cannot be told apart by the buddy allocator's
compatibility heuristic (alike_pages == 0 between the two non-movable
types in try_to_claim_block). Once a pageblock holds in-use pages of
both, any sticky UNMOVABLE pinhole prevents the RECLAIMABLE pages
from coalescing into useful higher-order chunks when they drain back
to the buddy. The PB's free capacity is permanently capped at
order-1 dust regardless of how much of it actually returns. Sticky
recl pages (active dentries, locked btrfs eb folios, NOFS slab) are
unavoidable; the cost is paid in internal fragmentation.

Two paths in the page allocator create UNMOVABLE<->RECLAIMABLE
mixing today:

  1. try_to_claim_block() relabels a partial PB whenever the 50%
     threshold "free_pages + alike_pages >= pageblock_nr_pages/2"
     passes. For UNMOV<->RECL, alike_pages == 0, so the rule
     degenerates to free_pages >= 256. A PB with 256 in-use UNMOV
     pages plus 256 free pages passes and is relabeled RECL. Both
     PB_has_unmovable and PB_has_reclaimable are then set.

  2. __rmqueue_steal() takes a single foreign-type page out of a
     PB without relabeling the PB. A UNMOVABLE allocation stealing
     from a RECLAIMABLE-labeled PB sets PB_has_unmovable on top of
     the existing PB_has_reclaimable.

Tighten both paths:

  - Add noncompatible_cross_type() helper that detects the
    UNMOV<->RECL pair (MOVABLE may still mix with either since
    movable pages can be migrated out).

  - In try_to_claim_block(), require a fully-free PB
    (free_pages == pageblock_nr_pages) for any cross-type relabel,
    regardless of from_tainted_spb. The other-type bit inherited
    from the prior label is stale on a fully-free PB (no in-use
    pages of either type) so clear it during the relabel rather
    than leaving the PB visibly mixed in PB_has_* state.

  - In __rmqueue_steal(), pass a new SB_SKIP_CROSS_TYPE flag to
    __rmqueue_sb_find_fallback() so the cross-type fallback entry
    in fallbacks[] is skipped. Steal then falls through to the
    MIGRATE_MOVABLE second fallback instead of single-page-stealing
    into a foreign non-movable PB.

The from_tainted_spb=true caller of try_to_claim_block() is
unaffected because it hardcodes block_type=MIGRATE_MOVABLE. The
claim_whole_block() branch (current_order >= pageblock_order) is
also unaffected: it requires PB_all_free, so the PB is fully free
of any prior type.

Existing mixed PBs from before this change won't unmix; the win
is for PBs created after.

Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
 mm/page_alloc.c | 108 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 83 insertions(+), 25 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b4794ba7024f..988cf6f27938 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3073,6 +3073,23 @@ static int fallbacks[MIGRATE_PCPTYPES][MIGRATE_PCPTYPES - 1] = {
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE   },
 };
 
+/*
+ * UNMOVABLE and RECLAIMABLE allocations should not share the same
+ * pageblock. Their free pages are interchangeable on the buddy free
+ * lists (alike_pages == 0 between them), so once a PB holds both
+ * types the buddy can no longer tell them apart and any sticky
+ * UNMOVABLE pinhole prevents the RECLAIMABLE pages from coalescing
+ * into useful higher-order chunks when they drain back. MOVABLE may
+ * mix with either, since MOVABLE pages can be migrated out.
+ */
+static inline bool noncompatible_cross_type(int start_type, int fallback_type)
+{
+	return (start_type == MIGRATE_UNMOVABLE &&
+		fallback_type == MIGRATE_RECLAIMABLE) ||
+	       (start_type == MIGRATE_RECLAIMABLE &&
+		fallback_type == MIGRATE_UNMOVABLE);
+}
+
 #ifdef CONFIG_CMA
 static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 					unsigned int order)
@@ -3450,11 +3467,10 @@ try_to_claim_block(struct zone *zone, struct page *page,
 		   bool from_tainted_spb)
 {
 	int free_pages, movable_pages, alike_pages;
-	unsigned long start_pfn;
 #ifdef CONFIG_COMPACTION
-	struct page *start_page;
 	struct superpageblock *sb;
 #endif
+	unsigned long start_pfn;
 
 	/*
 	 * Don't steal from pageblocks that are isolated for
@@ -3512,32 +3528,48 @@ try_to_claim_block(struct zone *zone, struct page *page,
 	 * allocations. Inside a tainted SPB the protection is unnecessary:
 	 * fragmentation has already been accepted at the SPB level, and
 	 * relabeling is much cheaper than tainting a fresh clean SPB.
-	 */
-	if (from_tainted_spb ||
-	    free_pages + alike_pages >= (1 << (pageblock_order-1)) ||
-			page_group_by_mobility_disabled) {
-		__move_freepages_block(zone, start_pfn, block_type, start_type);
-		set_pageblock_migratetype(pfn_to_page(start_pfn), start_type);
-#ifdef CONFIG_COMPACTION
-		/*
-		 * Track actual page contents in pageblock flags and
-		 * update superpageblock counters so the SPB moves to
-		 * the correct fullness list for steering.
-		 */
-		start_page = pfn_to_page(start_pfn);
-		__spb_set_has_type(start_page, start_type);
-		if (block_type != start_type)
-			__spb_set_has_type(start_page, block_type);
-
-		sb = pfn_to_superpageblock(zone, start_pfn);
-		if (sb)
-			spb_update_list(sb);
+	 *
+	 * UNMOVABLE<->RECLAIMABLE cross-type claims override these rules:
+	 * once mixed, sticky pinholes of one type prevent the other from
+	 * coalescing into useful higher-order free chunks even after drain.
+	 * Only relabel a fully-free PB in that case, regardless of whether
+	 * the SPB is tainted.
+	 */
+	if (noncompatible_cross_type(start_type, block_type)) {
+		if (free_pages != pageblock_nr_pages)
+			return NULL;
+	} else if (!from_tainted_spb &&
+		   free_pages + alike_pages < (1 << (pageblock_order-1)) &&
+		   !page_group_by_mobility_disabled) {
+		return NULL;
+	}
 
-#endif
-		return __rmqueue_smallest(zone, order, start_type);
+	__move_freepages_block(zone, start_pfn, block_type, start_type);
+	set_pageblock_migratetype(pfn_to_page(start_pfn), start_type);
+#ifdef CONFIG_COMPACTION
+	/*
+	 * Track actual page contents in pageblock flags and update
+	 * superpageblock counters so the SPB moves to the correct
+	 * fullness list for steering.
+	 *
+	 * For cross-type UNMOVABLE<->RECLAIMABLE relabel (which by the
+	 * predicate above only fires on a fully-free PB), the inherited
+	 * PB_has_<block_type> bit is stale -- there are no in-use pages
+	 * of that type. Clear it so the resulting PB is unmixed.
+	 */
+	__spb_set_has_type(pfn_to_page(start_pfn), start_type);
+	if (block_type != start_type) {
+		if (noncompatible_cross_type(start_type, block_type))
+			__spb_clear_has_type(pfn_to_page(start_pfn), block_type);
+		else
+			__spb_set_has_type(pfn_to_page(start_pfn), block_type);
 	}
 
-	return NULL;
+	sb = pfn_to_superpageblock(zone, start_pfn);
+	if (sb)
+		spb_update_list(sb);
+#endif
+	return __rmqueue_smallest(zone, order, start_type);
 }
 
 /*
@@ -3561,6 +3593,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
 #define SB_SEARCH_EMPTY		(1 << 1)
 #define SB_SEARCH_FALLBACK	(1 << 2)
 #define SB_SEARCH_ALL		(SB_SEARCH_PREFERRED | SB_SEARCH_EMPTY | SB_SEARCH_FALLBACK)
+/*
+ * Skip UNMOVABLE<->RECLAIMABLE cross-type fallback. Used by the steal
+ * path to prevent landing single foreign-type pages into a PB labeled
+ * with the other non-movable type -- a steal does not relabel the PB
+ * so cross-type stealing creates permanent mixing.
+ */
+#define SB_SKIP_CROSS_TYPE	(1 << 3)
 
 static struct page *
 __rmqueue_sb_find_fallback(struct zone *zone, unsigned int order,
@@ -3597,6 +3636,10 @@ __rmqueue_sb_find_fallback(struct zone *zone, unsigned int order,
 					int fmt = fallbacks[start_migratetype][i];
 					struct page *page;
 
+					if ((search_cats & SB_SKIP_CROSS_TYPE) &&
+					    noncompatible_cross_type(start_migratetype, fmt))
+						continue;
+
 					page = get_page_from_free_area(area,
 								       fmt);
 					if (page) {
@@ -3618,6 +3661,10 @@ __rmqueue_sb_find_fallback(struct zone *zone, unsigned int order,
 				int fmt = fallbacks[start_migratetype][i];
 				struct page *page;
 
+				if ((search_cats & SB_SKIP_CROSS_TYPE) &&
+				    noncompatible_cross_type(start_migratetype, fmt))
+					continue;
+
 				page = get_page_from_free_area(area,
 							       fmt);
 				if (page) {
@@ -3646,6 +3693,10 @@ __rmqueue_sb_find_fallback(struct zone *zone, unsigned int order,
 					int fmt = fallbacks[start_migratetype][i];
 					struct page *page;
 
+					if ((search_cats & SB_SKIP_CROSS_TYPE) &&
+					    noncompatible_cross_type(start_migratetype, fmt))
+						continue;
+
 					page = get_page_from_free_area(area,
 								       fmt);
 					if (page) {
@@ -3782,11 +3833,18 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype,
 	/*
 	 * When ALLOC_NOFRAG_TAINTED_OK is set, only steal from tainted
 	 * SPBs to avoid tainting clean ones. Otherwise search all categories.
+	 *
+	 * Always skip UNMOVABLE<->RECLAIMABLE cross-type fallback. The steal
+	 * path takes a single page without relabeling its PB, so a cross-type
+	 * steal would land an UNMOVABLE page in a RECLAIMABLE-labeled PB
+	 * (or vice versa) and create permanent mixing. Falling through to
+	 * MIGRATE_MOVABLE (the second fallback) is preferable.
 	 */
 	if (alloc_flags & ALLOC_NOFRAG_TAINTED_OK)
 		search_cats = SB_SEARCH_PREFERRED;
 	else
 		search_cats = SB_SEARCH_PREFERRED | SB_SEARCH_FALLBACK;
+	search_cats |= SB_SKIP_CROSS_TYPE;
 
 	/*
 	 * Search per-superpageblock free lists for fallback migratetypes.
-- 
2.54.0


  parent reply	other threads:[~2026-05-20 15:00 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 14:59 [RFC PATCH 00/40] mm: reliable 1GB page allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 01/40] mm: page_alloc: replace pageblock_flags bitmap with struct pageblock_data Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 02/40] mm: page_alloc: per-cpu pageblock buddy allocator Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 03/40] mm: page_alloc: split-path PCP free with local-trylock + remote-llist Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 04/40] mm: mm_init: fix zone assignment for pages in unavailable ranges Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 05/40] mm: page_alloc: remove watermark boost mechanism Rik van Riel
2026-05-26 14:02   ` Usama Arif
2026-05-27 15:41     ` Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 06/40] mm: page_alloc: async evacuation of stolen movable pageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 07/40] mm: page_alloc: track actual page contents in pageblock flags Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 08/40] mm: page_alloc: superpageblock metadata for 1GB anti-fragmentation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 09/40] mm: page_alloc: support superpageblock resize for memory hotplug Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 10/40] mm: page_alloc: add superpageblock fullness lists for allocation steering Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 11/40] mm: page_alloc: steer pageblock stealing to tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 12/40] mm: page_alloc: steer movable allocations to fullest clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 13/40] mm: page_alloc: extract claim_whole_block from try_to_claim_block Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 14/40] mm: page_alloc: add per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 15/40] mm: page_alloc: add background superpageblock defragmentation worker Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 16/40] mm: compaction: walk per-superpageblock free lists for migration targets Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 17/40] mm: page_alloc: superpageblock-aware contiguous and higher order allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 18/40] mm: page_alloc: prevent atomic allocations from tainting clean SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 19/40] mm: page_alloc: aggressively pack non-movable allocs in tainted SPBs on large systems Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 20/40] mm: page_alloc: prefer reclaim over tainting clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 21/40] mm: page_alloc: adopt partial pageblocks from tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 22/40] mm: page_alloc: add CONFIG_DEBUG_VM sanity checks for SPB counters Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 23/40] mm: page_alloc: targeted evacuation and dynamic reserves for tainted SPBs Rik van Riel
2026-05-20 14:59 ` Rik van Riel [this message]
2026-05-20 14:59 ` [RFC PATCH 25/40] mm: trigger deferred SPB evac when atomic allocs would taint a clean SPB Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 26/40] mm: page_alloc: refuse fragmenting fallback for callers with cheap fallback Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 27/40] mm: page_alloc: cross-migratetype buddy borrow within tainted SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 28/40] mm: page_alloc: drive slab shrink from SPB anti-fragmentation pressure Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 29/40] mm: page_reporting: walk per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 30/40] mm: show_mem: collect migratetype letters from per-superpageblock lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 31/40] mm: page_alloc: per-(zone, order, mt) PASS_1 hint cache Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 32/40] mm: debug: prevent infinite recursion in dump_page() with CMA Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 33/40] PM: hibernate: walk per-superpageblock free lists in mark_free_pages Rik van Riel
2026-05-20 18:19   ` Rafael J. Wysocki
2026-05-20 14:59 ` [RFC PATCH 34/40] btrfs: allocate eb-attached btree pages as movable Rik van Riel
2026-05-20 17:47   ` Boris Burkov
2026-05-23 15:58     ` David Sterba
2026-05-24  1:43       ` Rik van Riel
2026-05-24 19:59         ` Matthew Wilcox
2026-05-25  6:57           ` Christoph Hellwig
2026-05-20 14:59 ` [RFC PATCH 35/40] mm: page_alloc: refuse best-effort high-order allocs servable at lower orders Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 36/40] mm: page_alloc: set ALLOC_NOFRAGMENT on alloc_frozen_pages_nolock_noprof Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 37/40] mm: page_alloc: move spb_get_category and spb_tainted_reserve to mmzone.h Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 38/40] mm: compaction: skip empty tainted superpageblocks as migration source Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 39/40] mm: compaction: respect tainted SPB reserve in destination selection Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 40/40] mm: page_alloc: SPB tracepoint instrumentation [DO-NOT-MERGE] Rik van Riel
2026-05-21  7:39 ` [syzbot ci] Re: mm: reliable 1GB page allocation syzbot ci
2026-05-22 11:02 ` [RFC PATCH 00/40] " Usama Arif
2026-05-22 13:55   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520150018.2491267-25-riel@surriel.com \
    --to=riel@surriel.com \
    --cc=david@kernel.org \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox