From: Rik van Riel <riel@surriel.com>
To: linux-kernel@vger.kernel.org
Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org,
willy@infradead.org, surenb@google.com, hannes@cmpxchg.org,
ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev,
fvdl@google.com, Rik van Riel <riel@surriel.com>
Subject: [RFC PATCH 19/40] mm: page_alloc: aggressively pack non-movable allocs in tainted SPBs on large systems
Date: Wed, 20 May 2026 10:59:25 -0400 [thread overview]
Message-ID: <20260520150018.2491267-20-riel@surriel.com> (raw)
In-Reply-To: <20260520150018.2491267-1-riel@surriel.com>
On systems with many superpageblocks, sub-pageblock MOVABLE fragments
within already-tainted SPBs were being skipped by __rmqueue_claim()
due to the ALLOC_NOFRAGMENT pageblock_order floor. This caused the
allocator to fall through to clean SPBs, tainting them unnecessarily.
Introduce SPB_AGGRESSIVE_THRESHOLD: on systems with more than 8
superpageblocks, relax the min_order floor for the preferred category
(tainted SPBs) so non-movable allocations consume free space there at
any granularity. On small systems, preserve the pageblock_order floor
to protect MOVABLE capacity within tainted SPBs.
Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
mm/page_alloc.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 68 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6884f638a97c..63151e99bd53 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2659,6 +2659,24 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
*/
#define SPB_TAINTED_RESERVE 4
+/*
+ * On systems with many superpageblocks, we can afford to "write off"
+ * tainted superpageblocks by aggressively packing unmovable/reclaimable
+ * allocations into them -- even sub-pageblock fragments -- to keep clean
+ * superpageblocks clean for future 1GB hugepage and contiguous allocations.
+ *
+ * On small systems (few superpageblocks), each SPB represents a large
+ * fraction of total memory. Aggressively claiming sub-pageblock movable
+ * fragments from tainted SPBs would destroy MOVABLE capacity that the
+ * system can't afford to lose, with little benefit since there are too
+ * few SPBs to meaningfully separate movable from unmovable anyway.
+ *
+ * This threshold controls the crossover: above it, prefer concentrating
+ * non-movable allocations in tainted SPBs at any granularity; below it,
+ * only claim whole free pageblocks from tainted SPBs.
+ */
+#define SPB_AGGRESSIVE_THRESHOLD 8
+
/**
* sb_preferred_for_movable - Find the fullest clean superpageblock for movable
* @zone: zone to search
@@ -3585,6 +3603,7 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
{
int current_order;
int min_order = order;
+ int nofrag_min_order = order;
struct page *page;
int fallback_mt;
static const unsigned int cat_search[] = {
@@ -3598,9 +3617,18 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
* Do not steal pages from freelists belonging to other pageblocks
* i.e. orders < pageblock_order. If there are no local zones free,
* the zonelists will be reiterated without ALLOC_NOFRAGMENT.
+ *
+ * Only apply this restriction to empty and clean superpageblocks.
+ * Claiming within already-tainted superpageblocks does not cause
+ * new fragmentation, and skipping them wastes free space that
+ * could prevent tainting clean superpageblocks.
+ *
+ * When ALLOC_NOFRAGMENT is set, skip empty and clean superpageblocks
+ * entirely to avoid tainting them. The slowpath will try reclaim and
+ * compaction first, and only drop ALLOC_NOFRAGMENT as a last resort.
*/
if (order < pageblock_order && alloc_flags & ALLOC_NOFRAGMENT)
- min_order = pageblock_order;
+ nofrag_min_order = pageblock_order;
/*
* Find the largest available free page in a fallback migratetype.
@@ -3610,6 +3638,31 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
* ones.
*/
for (c = 0; c < ARRAY_SIZE(cat_search); c++) {
+ /*
+ * When avoiding fragmentation, do not search clean/empty
+ * superpageblocks for fallback pages. Tainting a clean SPB
+ * is the worst outcome -- better to fail and let the slowpath
+ * try reclaim and compaction in already-tainted SPBs first.
+ */
+ if ((alloc_flags & ALLOC_NOFRAGMENT) &&
+ cat_search[c] != SB_SEARCH_PREFERRED)
+ continue;
+
+ /*
+ * For the preferred category (tainted SPBs for non-movable),
+ * search all orders down to the allocation order on systems
+ * with enough superpageblocks that we can afford to write off
+ * tainted ones. These SPBs are already tainted, so sub-pageblock
+ * stealing doesn't cause additional fragmentation.
+ *
+ * On small systems, keep the pageblock_order floor to preserve
+ * MOVABLE capacity within tainted SPBs -- see comment at
+ * SPB_AGGRESSIVE_THRESHOLD.
+ */
+ min_order = (cat_search[c] == SB_SEARCH_PREFERRED &&
+ zone->nr_superpageblocks > SPB_AGGRESSIVE_THRESHOLD) ?
+ order : nofrag_min_order;
+
for (current_order = MAX_PAGE_ORDER;
current_order >= min_order; --current_order) {
if (!should_try_claim_block(current_order,
@@ -3881,8 +3934,18 @@ static bool rmqueue_bulk(struct zone *zone, unsigned int order,
* For movable allocations, prefer pageblocks from the
* fullest clean superpageblock to pack allocations and
* preserve empty superpageblocks for 1GB hugepages.
+ *
+ * For non-movable allocations, force ALLOC_NOFRAGMENT so
+ * __rmqueue cannot steal a whole pageblock out of a clean
+ * SPB. Stealing is the worst possible outcome for a bulk
+ * refill: a single network or slab burst can taint dozens
+ * of clean pageblocks. Phase 2 will adopt sub-pageblock
+ * fragments from tainted SPBs before Phase 3 falls back to
+ * the original alloc_flags (which may eventually steal at
+ * the requested order, a much smaller fragmentation event).
*/
while (refilled + pageblock_nr_pages <= pages_needed) {
+ unsigned int p1_alloc_flags = alloc_flags;
struct page *page = NULL;
if (migratetype == MIGRATE_MOVABLE) {
@@ -3892,11 +3955,14 @@ static bool rmqueue_bulk(struct zone *zone, unsigned int order,
if (sb)
page = __rmqueue_from_sb(zone, pageblock_order,
migratetype, sb);
+ } else if (!is_migrate_cma(migratetype)) {
+ p1_alloc_flags = (p1_alloc_flags | ALLOC_NOFRAGMENT) &
+ ~ALLOC_NOFRAG_TAINTED_OK;
}
if (!page)
page = __rmqueue(zone, pageblock_order,
migratetype,
- alloc_flags, &rmqm);
+ p1_alloc_flags, &rmqm);
if (!page)
break;
--
2.54.0
next prev parent reply other threads:[~2026-05-20 15:00 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 14:59 [RFC PATCH 00/40] mm: reliable 1GB page allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 01/40] mm: page_alloc: replace pageblock_flags bitmap with struct pageblock_data Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 02/40] mm: page_alloc: per-cpu pageblock buddy allocator Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 03/40] mm: page_alloc: split-path PCP free with local-trylock + remote-llist Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 04/40] mm: mm_init: fix zone assignment for pages in unavailable ranges Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 05/40] mm: page_alloc: remove watermark boost mechanism Rik van Riel
2026-05-26 14:02 ` Usama Arif
2026-05-27 15:41 ` Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 06/40] mm: page_alloc: async evacuation of stolen movable pageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 07/40] mm: page_alloc: track actual page contents in pageblock flags Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 08/40] mm: page_alloc: superpageblock metadata for 1GB anti-fragmentation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 09/40] mm: page_alloc: support superpageblock resize for memory hotplug Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 10/40] mm: page_alloc: add superpageblock fullness lists for allocation steering Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 11/40] mm: page_alloc: steer pageblock stealing to tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 12/40] mm: page_alloc: steer movable allocations to fullest clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 13/40] mm: page_alloc: extract claim_whole_block from try_to_claim_block Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 14/40] mm: page_alloc: add per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 15/40] mm: page_alloc: add background superpageblock defragmentation worker Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 16/40] mm: compaction: walk per-superpageblock free lists for migration targets Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 17/40] mm: page_alloc: superpageblock-aware contiguous and higher order allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 18/40] mm: page_alloc: prevent atomic allocations from tainting clean SPBs Rik van Riel
2026-05-20 14:59 ` Rik van Riel [this message]
2026-05-20 14:59 ` [RFC PATCH 20/40] mm: page_alloc: prefer reclaim over tainting clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 21/40] mm: page_alloc: adopt partial pageblocks from tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 22/40] mm: page_alloc: add CONFIG_DEBUG_VM sanity checks for SPB counters Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 23/40] mm: page_alloc: targeted evacuation and dynamic reserves for tainted SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 24/40] mm: page_alloc: prevent UNMOVABLE/RECLAIMABLE mixing in pageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 25/40] mm: trigger deferred SPB evac when atomic allocs would taint a clean SPB Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 26/40] mm: page_alloc: refuse fragmenting fallback for callers with cheap fallback Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 27/40] mm: page_alloc: cross-migratetype buddy borrow within tainted SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 28/40] mm: page_alloc: drive slab shrink from SPB anti-fragmentation pressure Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 29/40] mm: page_reporting: walk per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 30/40] mm: show_mem: collect migratetype letters from per-superpageblock lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 31/40] mm: page_alloc: per-(zone, order, mt) PASS_1 hint cache Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 32/40] mm: debug: prevent infinite recursion in dump_page() with CMA Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 33/40] PM: hibernate: walk per-superpageblock free lists in mark_free_pages Rik van Riel
2026-05-20 18:19 ` Rafael J. Wysocki
2026-05-20 14:59 ` [RFC PATCH 34/40] btrfs: allocate eb-attached btree pages as movable Rik van Riel
2026-05-20 17:47 ` Boris Burkov
2026-05-23 15:58 ` David Sterba
2026-05-24 1:43 ` Rik van Riel
2026-05-24 19:59 ` Matthew Wilcox
2026-05-25 6:57 ` Christoph Hellwig
2026-05-20 14:59 ` [RFC PATCH 35/40] mm: page_alloc: refuse best-effort high-order allocs servable at lower orders Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 36/40] mm: page_alloc: set ALLOC_NOFRAGMENT on alloc_frozen_pages_nolock_noprof Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 37/40] mm: page_alloc: move spb_get_category and spb_tainted_reserve to mmzone.h Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 38/40] mm: compaction: skip empty tainted superpageblocks as migration source Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 39/40] mm: compaction: respect tainted SPB reserve in destination selection Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 40/40] mm: page_alloc: SPB tracepoint instrumentation [DO-NOT-MERGE] Rik van Riel
2026-05-21 5:09 ` kernel test robot
2026-05-21 7:39 ` [syzbot ci] Re: mm: reliable 1GB page allocation syzbot ci
2026-05-22 11:02 ` [RFC PATCH 00/40] " Usama Arif
2026-05-22 13:55 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520150018.2491267-20-riel@surriel.com \
--to=riel@surriel.com \
--cc=david@kernel.org \
--cc=fvdl@google.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.