From: Rik van Riel <riel@surriel.com>
To: linux-kernel@vger.kernel.org
Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org,
willy@infradead.org, surenb@google.com, hannes@cmpxchg.org,
ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev,
Rik van Riel <riel@meta.com>, Rik van Riel <riel@surriel.com>
Subject: [RFC PATCH 19/45] mm: page_alloc: prevent atomic allocations from tainting clean SPBs
Date: Thu, 30 Apr 2026 16:20:48 -0400 [thread overview]
Message-ID: <20260430202233.111010-20-riel@surriel.com> (raw)
In-Reply-To: <20260430202233.111010-1-riel@surriel.com>
From: Rik van Riel <riel@meta.com>
Non-DIRECT_RECLAIM (atomic) allocations that fail with ALLOC_NOFRAGMENT
previously dropped the flag entirely and retried, allowing them to taint
clean superpageblocks. This was the primary source of taint spreading
observed on production systems.
Two changes to keep atomic allocations within tainted SPBs:
1. Extend Pass 2 in __rmqueue_smallest with a sub-pageblock phase (Pass
2b). The original Pass 2 only finds whole free pageblocks (>= pageblock
order) in tainted SPBs. Pass 2b searches for sub-pageblock-order free
blocks and uses try_to_claim_block to claim the pageblock if it has
enough compatible pages. This finds pages in tainted SPBs that have
fragmented free space but no whole free pageblocks.
2. Add ALLOC_NOFRAG_TAINTED_OK intermediate flag. Instead of going
directly from ALLOC_NOFRAGMENT to no protection, atomic allocations
first try with ALLOC_NOFRAG_TAINTED_OK which allows __rmqueue_steal
to search tainted SPBs only. Clean/empty SPBs remain protected. Only
if steal from tainted SPBs also fails is ALLOC_NOFRAGMENT fully
dropped as a last resort.
Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
mm/internal.h | 1 +
mm/page_alloc.c | 87 +++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 81 insertions(+), 7 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 02f1c7d36b85..f641795688af 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1413,6 +1413,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
#define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */
#define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */
#define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
+#define ALLOC_NOFRAG_TAINTED_OK 0x1000 /* NOFRAGMENT, but allow steal from tainted SPBs */
/* Flags that allow allocations below the min watermark. */
#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8ce96db50c2f..13bc57592cd5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2713,6 +2713,9 @@ static struct page *__rmqueue_from_sb(struct zone *zone, unsigned int order,
*/
static struct page *claim_whole_block(struct zone *zone, struct page *page,
int current_order, int order, int new_type, int old_type);
+static struct page *try_to_claim_block(struct zone *zone, struct page *page,
+ int current_order, int order, int start_type,
+ int block_type, unsigned int alloc_flags);
static __always_inline
struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
@@ -2782,6 +2785,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
* free list (reset by mark_pageblock_free), so the search above
* misses them. Claim them inline to keep non-movable allocations
* concentrated in already-tainted superpageblocks.
+ *
+ * Try whole pageblock orders first (preferred for PCP buddy optimization),
+ * then fall back to sub-pageblock orders. Sub-pageblock claiming uses
+ * try_to_claim_block which checks whether the pageblock has enough
+ * compatible pages to justify claiming it.
*/
if (!movable && !is_migrate_cma(migratetype)) {
for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) {
@@ -2814,6 +2822,43 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
}
}
}
+ /* Pass 2b: sub-pageblock orders in tainted SPBs */
+ for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) {
+ list_for_each_entry(sb,
+ &zone->spb_lists[SB_TAINTED][full], list) {
+ int co;
+
+ if (!sb->nr_free_pages)
+ continue;
+ for (co = min_t(int, pageblock_order - 1,
+ NR_PAGE_ORDERS - 1);
+ co >= (int)order;
+ --co) {
+ current_order = co;
+ area = &sb->free_area[current_order];
+ page = get_page_from_free_area(
+ area, MIGRATE_MOVABLE);
+ if (!page)
+ continue;
+ if (get_pageblock_isolate(page))
+ continue;
+ if (is_migrate_cma(
+ get_pageblock_migratetype(page)))
+ continue;
+ page = try_to_claim_block(zone, page,
+ current_order, order,
+ migratetype, MIGRATE_MOVABLE,
+ 0);
+ if (!page)
+ continue;
+ trace_mm_page_alloc_zone_locked(
+ page, order, migratetype,
+ pcp_allowed_order(order) &&
+ migratetype < MIGRATE_PCPTYPES);
+ return page;
+ }
+ }
+ }
}
/* Empty superpageblocks: try before falling back to non-preferred category */
@@ -3566,12 +3611,23 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
* the block as its current migratetype, potentially causing fragmentation.
*/
static __always_inline struct page *
-__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
+__rmqueue_steal(struct zone *zone, int order, int start_migratetype,
+ unsigned int alloc_flags)
{
struct superpageblock *sb;
int current_order;
struct page *page;
int fallback_mt;
+ unsigned int search_cats;
+
+ /*
+ * When ALLOC_NOFRAG_TAINTED_OK is set, only steal from tainted
+ * SPBs to avoid tainting clean ones. Otherwise search all categories.
+ */
+ if (alloc_flags & ALLOC_NOFRAG_TAINTED_OK)
+ search_cats = SB_SEARCH_PREFERRED;
+ else
+ search_cats = SB_SEARCH_PREFERRED | SB_SEARCH_FALLBACK;
/*
* Search per-superpageblock free lists for fallback migratetypes.
@@ -3581,7 +3637,7 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
page = __rmqueue_sb_find_fallback(zone, current_order,
start_migratetype,
&fallback_mt,
- SB_SEARCH_PREFERRED | SB_SEARCH_FALLBACK);
+ search_cats);
if (!page)
continue;
@@ -3681,8 +3737,10 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
}
fallthrough;
case RMQUEUE_STEAL:
- if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
- page = __rmqueue_steal(zone, order, migratetype);
+ if (!(alloc_flags & ALLOC_NOFRAGMENT) ||
+ (alloc_flags & ALLOC_NOFRAG_TAINTED_OK)) {
+ page = __rmqueue_steal(zone, order, migratetype,
+ alloc_flags);
if (page) {
*mode = RMQUEUE_STEAL;
return page;
@@ -5301,9 +5359,24 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
/*
* It's possible on a UMA machine to get through all zones that are
* fragmented. If avoiding fragmentation, reset and try again.
- */
- if (no_fallback && !defrag_mode) {
- alloc_flags &= ~ALLOC_NOFRAGMENT;
+ *
+ * For allocations that can do direct reclaim, keep NOFRAGMENT set
+ * and let the slowpath try reclaim and compaction to free pages in
+ * already-tainted superpageblocks before allowing clean SPBs to be
+ * tainted.
+ *
+ * Atomic allocations cannot reclaim, but try an intermediate step
+ * first: allow steal/claim from tainted SPBs only. This avoids
+ * tainting clean SPBs while still finding pages in tainted ones.
+ * Only drop NOFRAGMENT entirely if that also fails.
+ */
+ if (no_fallback && !defrag_mode &&
+ !(gfp_mask & __GFP_DIRECT_RECLAIM)) {
+ if (!(alloc_flags & ALLOC_NOFRAG_TAINTED_OK)) {
+ alloc_flags |= ALLOC_NOFRAG_TAINTED_OK;
+ goto retry;
+ }
+ alloc_flags &= ~(ALLOC_NOFRAGMENT | ALLOC_NOFRAG_TAINTED_OK);
goto retry;
}
--
2.52.0
next prev parent reply other threads:[~2026-04-30 20:22 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 20:20 [00/45 RFC PATCH] 1GB superpageblock memory allocation Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 01/45] mm: page_alloc: replace pageblock_flags bitmap with struct pageblock_data Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 02/45] mm: page_alloc: per-cpu pageblock buddy allocator Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 03/45] mm: page_alloc: use trylock for PCP lock in free path to avoid lock inversion Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 04/45] mm: mm_init: fix zone assignment for pages in unavailable ranges Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 05/45] mm: vmstat: restore per-migratetype free counts in /proc/pagetypeinfo Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 06/45] mm: page_alloc: remove watermark boost mechanism Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 07/45] mm: page_alloc: async evacuation of stolen movable pageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 08/45] mm: page_alloc: track actual page contents in pageblock flags Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 09/45] mm: page_alloc: introduce superpageblock metadata for 1GB anti-fragmentation Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 10/45] mm: page_alloc: support superpageblock resize for memory hotplug Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 11/45] mm: page_alloc: add superpageblock fullness lists for allocation steering Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 12/45] mm: page_alloc: steer pageblock stealing to tainted superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 13/45] mm: page_alloc: steer movable allocations to fullest clean superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 14/45] mm: page_alloc: extract claim_whole_block from try_to_claim_block Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 15/45] mm: page_alloc: add per-superpageblock free lists Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 16/45] mm: page_alloc: add background superpageblock defragmentation worker Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 17/45] mm: page_alloc: add within-superpageblock compaction for clean superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 18/45] mm: page_alloc: superpageblock-aware contiguous and higher order allocation Rik van Riel
2026-04-30 20:20 ` Rik van Riel [this message]
2026-04-30 20:20 ` [RFC PATCH 20/45] mm: page_alloc: aggressively pack non-movable allocations in tainted SPBs on large systems Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 21/45] mm: page_alloc: prefer reclaim over tainting clean superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 22/45] mm: page_alloc: adopt partial pageblocks from tainted superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 23/45] mm: page_alloc: add CONFIG_DEBUG_VM sanity checks for SPB counters Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 24/45] mm: page_alloc: targeted evacuation and dynamic reserves for tainted SPBs Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 25/45] mm: page_alloc: skip pageblock compatibility threshold in " Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 26/45] mm: page_alloc: prevent UNMOVABLE/RECLAIMABLE mixing in pageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 27/45] mm: trigger deferred SPB evacuation when atomic allocs would taint a clean SPB Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 28/45] mm: page_alloc: keep PCP refill in tainted SPBs across owned pageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 29/45] mm: page_alloc: refuse fragmenting fallback for callers with cheap fallback Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 30/45] mm: page_alloc: drive slab shrink from SPB anti-fragmentation pressure Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 31/45] mm: page_alloc: cross-non-movable buddy borrow within tainted SPBs Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 32/45] mm: page_alloc: proactive high-water trigger for SPB slab shrink Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 33/45] mm: page_alloc: refuse to taint clean SPBs for atomic NORETRY callers Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 34/45] mm: page_reporting: walk per-superpageblock free lists Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 35/45] mm: show_mem: collect migratetype letters from per-superpageblock lists Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 36/45] mm: page_alloc: add alloc_flags parameter to __rmqueue_smallest Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 37/45] mm/slub: kvmalloc — add __GFP_NORETRY to large-kmalloc attempt Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 38/45] mm: page_alloc: per-(zone, order, mt) PASS_1 hint cache Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 39/45] mm: debug: prevent infinite recursion in dump_page() with CMA Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 40/45] PM: hibernate: walk per-superpageblock free lists in mark_free_pages Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 41/45] btrfs: allocate eb-attached btree pages as movable Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 42/45] mm: page_alloc: cross-MOV borrow within tainted SPBs Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 43/45] mm: page_alloc: trigger defrag from allocator hot path on tainted-SPB pressure Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 44/45] mm: page_alloc: SPB tracepoint instrumentation [DROP-FOR-UPSTREAM] Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 45/45] mm: page_alloc: enlarge and unify spb_evacuate_for_order Rik van Riel
2026-05-01 7:14 ` [00/45 RFC PATCH] 1GB superpageblock memory allocation David Hildenbrand (Arm)
2026-05-01 11:58 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260430202233.111010-20-riel@surriel.com \
--to=riel@surriel.com \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=riel@meta.com \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox