From: Rik van Riel <riel@surriel.com>
To: linux-kernel@vger.kernel.org
Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org,
willy@infradead.org, surenb@google.com, hannes@cmpxchg.org,
ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev,
fvdl@google.com, Rik van Riel <riel@surriel.com>
Subject: [RFC PATCH 16/40] mm: compaction: walk per-superpageblock free lists for migration targets
Date: Wed, 20 May 2026 10:59:22 -0400 [thread overview]
Message-ID: <20260520150018.2491267-17-riel@surriel.com> (raw)
In-Reply-To: <20260520150018.2491267-1-riel@surriel.com>
Free pages live on per-SPB free lists
rather than zone-level free_lists. Standard compaction's free-page
scanner needs to walk the per-SPB free lists to find migration targets;
without this, kcompactd would see "nothing free" even when SPBs hold
plenty of order-9 buddies.
Also wire superpageblock_set_has_movable() and the corresponding clear
calls into the migration-source-isolation and free-page-isolation paths,
so pageblock movability bookkeeping stays correct as compaction shuffles
contents around.
Fix the PB_has_movable check for zones whose start_pfn is not aligned
to pageblock_order (DMA32 with reserved memory at the bottom).
This is the compaction-side infrastructure for SPB-aware standard
compaction. Subsequent commits add the predicates that let kcompactd
skip useless tainted SPBs.
Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
include/linux/mmzone.h | 1 +
mm/compaction.c | 337 ++++++++++++++++++++++++++++-------------
mm/page_alloc.c | 135 ++++++++++++-----
3 files changed, 330 insertions(+), 143 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6cba69603918..e7d760a689f9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1039,6 +1039,7 @@ struct superpageblock {
struct work_struct defrag_work;
struct irq_work defrag_irq_work;
bool defrag_active;
+ unsigned long defrag_cursor;
/*
* Back-off state after a no-op defrag pass: defer the next attempt
* until either nr_free_pages has grown by at least pageblock_nr_pages
diff --git a/mm/compaction.c b/mm/compaction.c
index 6d2aefdbc0c8..e4ba21072435 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -867,7 +867,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
bool skip_on_failure = false;
unsigned long next_skip_pfn = 0;
bool skip_updated = false;
- bool movable_skipped = false;
+ bool movable_seen = false;
+ bool pb_cleared = false;
int ret = 0;
cc->migrate_pfn = low_pfn;
@@ -964,6 +965,26 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
goto isolate_abort;
}
valid_page = page;
+
+ /*
+ * Clear PB_has_movable up-front. The scan below will
+ * re-set it if any movable page is encountered. This
+ * self-corrects stale bits left behind when movable
+ * content was previously freed without the bit being
+ * cleared (e.g. PB held both movable and unmovable
+ * pages, so mark_pageblock_free was never reached).
+ * A racing allocator that places a movable page in
+ * this PB will set the bit too; both setters are
+ * idempotent, so the bit ends up correctly set.
+ */
+ if (pageblock_start_pfn(start_pfn) >=
+ cc->zone->zone_start_pfn &&
+ get_pfnblock_bit(valid_page, low_pfn,
+ PB_has_movable)) {
+ superpageblock_clear_has_movable(cc->zone,
+ valid_page);
+ pb_cleared = true;
+ }
}
if (PageHuge(page)) {
@@ -979,12 +1000,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
low_pfn += (1UL << order) - 1;
nr_scanned += (1UL << order) - 1;
}
- /*
- * Skipped a movable page; clearing
- * PB_has_movable here would orphan SPB type
- * counters (debugfs invariant 1).
- */
- movable_skipped = true;
+ /* HugeTLB page is movable content. */
+ movable_seen = true;
goto isolate_fail;
}
/* for alloc_contig case */
@@ -1064,12 +1081,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
low_pfn += (1UL << order) - 1;
nr_scanned += (1UL << order) - 1;
}
- /*
- * Skipped a movable compound page; clearing
- * PB_has_movable here would orphan SPB type
- * counters (debugfs invariant 1).
- */
- movable_skipped = true;
+ /* THP/compound page is movable content. */
+ movable_seen = true;
goto isolate_fail;
}
}
@@ -1088,19 +1101,21 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
locked = NULL;
}
+ /* movable_ops page is movable content. */
+ movable_seen = true;
if (isolate_movable_ops_page(page, mode)) {
folio = page_folio(page);
goto isolate_success;
}
- movable_skipped = true;
}
/*
- * Non-LRU non-movable_ops page: still occupies the
- * pageblock, so clearing PB_has_movable here would
- * orphan SPB type counters (debugfs invariant 1).
+ * Non-LRU, non-movable_ops page (slab, pgtable,
+ * reserved, ...): not movable content. Do NOT mark
+ * the PB as having movable pages; if it had no other
+ * movable pages, the up-front clear of PB_has_movable
+ * stays in effect.
*/
- movable_skipped = true;
goto isolate_fail;
}
@@ -1113,6 +1128,14 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
if (unlikely(!folio))
goto isolate_fail;
+ /*
+ * LRU folio reference acquired: this PB definitely
+ * contains movable content. Mark it now so any abort
+ * before isolate_success/isolate_fail_put still
+ * triggers the post-loop PB_has_movable re-set.
+ */
+ movable_seen = true;
+
/*
* Migration will fail if an anonymous page is pinned in memory,
* so avoid taking lru_lock and isolating it unnecessarily in an
@@ -1266,7 +1289,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
lruvec_unlock_irqrestore(locked, flags);
locked = NULL;
}
- movable_skipped = true;
+ /* Page was LRU; treat as movable content even though we couldn't take it. */
+ movable_seen = true;
folio_put(folio);
isolate_fail:
@@ -1330,17 +1354,31 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
if (!cc->no_set_skip_hint && valid_page && !skip_updated)
set_pageblock_skip(valid_page);
update_cached_migrate(cc, low_pfn);
+ }
+
+ /*
+ * PB_has_movable was cleared up-front when this PB was first
+ * entered. Re-set it unless a complete scan of the pageblock
+ * proved no movable content exists. Re-setting is required on:
+ * - any partial scan (low_pfn != end_pfn): we can't conclude
+ * the PB is movable-free without seeing every PFN
+ * - nr_isolated > 0: pages may fail migration and return to
+ * this PB, so the bit must persist
+ * - movable_seen: hugeTLB/THP/movable_ops/LRU content was
+ * observed, even if it could not be isolated
+ * The set is idempotent (a racing allocator may set it too).
+ */
+ if (pb_cleared && valid_page &&
+ (low_pfn != end_pfn || nr_isolated || movable_seen)) {
+ unsigned long pb_pfn = pageblock_start_pfn(start_pfn);
/*
- * Full pageblock scanned with no movable pages isolated.
- * Only clear PB_has_movable if no movable pages were
- * seen at all. If movable pages exist but could not be
- * isolated (pinned, writeback, dirty, etc.), leave the
- * flag set so a future migration attempt can try again.
+ * start_pfn may not be pageblock-aligned when the zone
+ * start is not aligned (e.g. DMA zone at PFN 1). Skip
+ * the update if the pageblock start falls below the zone.
*/
- if (!nr_isolated && !movable_skipped && valid_page)
- superpageblock_clear_has_movable(cc->zone,
- valid_page);
+ if (pb_pfn >= cc->zone->zone_start_pfn)
+ superpageblock_set_has_movable(cc->zone, valid_page);
}
trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
@@ -1557,6 +1595,7 @@ static void fast_isolate_freepages(struct compact_control *cc)
unsigned long low_pfn, min_pfn, highest = 0;
unsigned long nr_isolated = 0;
unsigned long distance;
+ unsigned long si, nr_spb;
struct page *page = NULL;
bool scan_start = false;
int order;
@@ -1594,45 +1633,66 @@ static void fast_isolate_freepages(struct compact_control *cc)
for (order = cc->search_order;
!page && order >= 0;
order = next_search_order(cc, order)) {
- struct free_area *area = &cc->zone->free_area[order];
- struct list_head *freelist;
- struct page *freepage;
+ struct list_head *freelist = NULL;
+ struct page *freepage = NULL;
unsigned long flags;
unsigned int order_scanned = 0;
unsigned long high_pfn = 0;
- if (!area->nr_free)
+ if (!cc->zone->free_area[order].nr_free)
continue;
spin_lock_irqsave(&cc->zone->lock, flags);
- freelist = &area->free_list[MIGRATE_MOVABLE];
- list_for_each_entry_reverse(freepage, freelist, buddy_list) {
- unsigned long pfn;
-
- order_scanned++;
- nr_scanned++;
- pfn = page_to_pfn(freepage);
-
- if (pfn >= highest)
- highest = max(pageblock_start_pfn(pfn),
- cc->zone->zone_start_pfn);
-
- if (pfn >= low_pfn) {
- cc->fast_search_fail = 0;
- cc->search_order = order;
- page = freepage;
- break;
+
+ /*
+ * With superpageblocks, free pages live on per-SPB free
+ * lists rather than zone-level free lists. Iterate all
+ * SPBs to find candidate pages.
+ */
+ nr_spb = cc->zone->nr_superpageblocks;
+ for (si = 0; !page && order_scanned < limit; si++) {
+ struct free_area *area;
+
+ if (nr_spb) {
+ if (si >= nr_spb)
+ break;
+ area = &cc->zone->superpageblocks[si].free_area[order];
+ } else {
+ if (si > 0)
+ break;
+ area = &cc->zone->free_area[order];
}
- if (pfn >= min_pfn && pfn > high_pfn) {
- high_pfn = pfn;
+ freelist = &area->free_list[MIGRATE_MOVABLE];
+ list_for_each_entry_reverse(freepage,
+ freelist,
+ buddy_list) {
+ unsigned long pfn;
+
+ order_scanned++;
+ nr_scanned++;
+ pfn = page_to_pfn(freepage);
+
+ if (pfn >= highest)
+ highest = max(
+ pageblock_start_pfn(pfn),
+ cc->zone->zone_start_pfn);
+
+ if (pfn >= low_pfn) {
+ cc->fast_search_fail = 0;
+ cc->search_order = order;
+ page = freepage;
+ break;
+ }
- /* Shorten the scan if a candidate is found */
- limit >>= 1;
- }
+ if (pfn >= min_pfn && pfn > high_pfn) {
+ high_pfn = pfn;
+ limit >>= 1;
+ }
- if (order_scanned >= limit)
- break;
+ if (order_scanned >= limit)
+ break;
+ }
}
/* Use a maximum candidate pfn if a preferred one was not found */
@@ -1641,10 +1701,24 @@ static void fast_isolate_freepages(struct compact_control *cc)
/* Update freepage for the list reorder below */
freepage = page;
+
+ /*
+ * high_pfn page may be on a different SPB's list
+ * than the last one scanned; fix up freelist.
+ */
+ if (cc->zone->nr_superpageblocks) {
+ struct superpageblock *sb;
+
+ sb = pfn_to_superpageblock(cc->zone,
+ high_pfn);
+ if (sb)
+ freelist = &sb->free_area[order].free_list[MIGRATE_MOVABLE];
+ }
}
/* Reorder to so a future search skips recent pages */
- move_freelist_head(freelist, freepage);
+ if (freelist && freepage)
+ move_freelist_head(freelist, freepage);
/* Isolate the page if available */
if (page) {
@@ -1985,6 +2059,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
unsigned long distance;
unsigned long pfn = cc->migrate_pfn;
unsigned long high_pfn;
+ unsigned long si, nr_spb;
int order;
bool found_block = false;
@@ -2038,47 +2113,73 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
for (order = cc->order - 1;
order >= PAGE_ALLOC_COSTLY_ORDER && !found_block && nr_scanned < limit;
order--) {
- struct free_area *area = &cc->zone->free_area[order];
- struct list_head *freelist;
unsigned long flags;
struct page *freepage;
- if (!area->nr_free)
+ if (!cc->zone->free_area[order].nr_free)
continue;
spin_lock_irqsave(&cc->zone->lock, flags);
- freelist = &area->free_list[MIGRATE_MOVABLE];
- list_for_each_entry(freepage, freelist, buddy_list) {
- unsigned long free_pfn;
- if (nr_scanned++ >= limit) {
- move_freelist_tail(freelist, freepage);
- break;
+ /*
+ * With superpageblocks, free pages live on per-SPB free
+ * lists. Iterate all SPBs to find candidates.
+ */
+ nr_spb = cc->zone->nr_superpageblocks;
+ for (si = 0; !found_block && nr_scanned < limit; si++) {
+ struct free_area *area;
+ struct list_head *freelist;
+
+ if (nr_spb) {
+ if (si >= nr_spb)
+ break;
+ area = &cc->zone->superpageblocks[si].free_area[order];
+ } else {
+ if (si > 0)
+ break;
+ area = &cc->zone->free_area[order];
}
- free_pfn = page_to_pfn(freepage);
- if (free_pfn < high_pfn) {
- /*
- * Avoid if skipped recently. Ideally it would
- * move to the tail but even safe iteration of
- * the list assumes an entry is deleted, not
- * reordered.
- */
- if (get_pageblock_skip(freepage))
- continue;
-
- /* Reorder to so a future search skips recent pages */
- move_freelist_tail(freelist, freepage);
-
- update_fast_start_pfn(cc, free_pfn);
- pfn = pageblock_start_pfn(free_pfn);
- if (pfn < cc->zone->zone_start_pfn)
- pfn = cc->zone->zone_start_pfn;
- cc->fast_search_fail = 0;
- found_block = true;
- break;
+ freelist = &area->free_list[MIGRATE_MOVABLE];
+ list_for_each_entry(freepage, freelist,
+ buddy_list) {
+ unsigned long free_pfn;
+
+ if (nr_scanned++ >= limit) {
+ move_freelist_tail(freelist,
+ freepage);
+ break;
+ }
+
+ free_pfn = page_to_pfn(freepage);
+ if (free_pfn < high_pfn) {
+ /*
+ * Avoid if skipped recently.
+ * Ideally it would move to
+ * the tail but even safe
+ * iteration of the list
+ * assumes an entry is deleted,
+ * not reordered.
+ */
+ if (get_pageblock_skip(freepage))
+ continue;
+
+ move_freelist_tail(freelist,
+ freepage);
+
+ update_fast_start_pfn(cc,
+ free_pfn);
+ pfn = pageblock_start_pfn(
+ free_pfn);
+ if (pfn < cc->zone->zone_start_pfn)
+ pfn = cc->zone->zone_start_pfn;
+ cc->fast_search_fail = 0;
+ found_block = true;
+ break;
+ }
}
}
+
spin_unlock_irqrestore(&cc->zone->lock, flags);
}
@@ -2292,6 +2393,7 @@ static bool should_proactive_compact_node(pg_data_t *pgdat)
static enum compact_result __compact_finished(struct compact_control *cc)
{
unsigned int order;
+ unsigned long si, nr_spb;
const int migratetype = cc->migratetype;
int ret;
@@ -2364,33 +2466,56 @@ static enum compact_result __compact_finished(struct compact_control *cc)
/* Direct compactor: Is a suitable page free? */
ret = COMPACT_NO_SUITABLE_PAGE;
+ nr_spb = cc->zone->nr_superpageblocks;
for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
- struct free_area *area = &cc->zone->free_area[order];
+ /* Zone-level nr_free is maintained even with SPBs */
+ if (!cc->zone->free_area[order].nr_free)
+ continue;
- /* Job done if page is free of the right migratetype */
- if (!free_area_empty(area, migratetype))
- return COMPACT_SUCCESS;
+ /*
+ * With superpageblocks, free pages live on per-SPB free
+ * lists. Check all SPBs for a suitable page.
+ */
+ for (si = 0; ; si++) {
+ struct free_area *area;
+
+ if (nr_spb) {
+ if (si >= nr_spb)
+ break;
+ area = &cc->zone->superpageblocks[si].free_area[order];
+ } else {
+ if (si > 0)
+ break;
+ area = &cc->zone->free_area[order];
+ }
+
+ /* Job done if page is free of the right migratetype */
+ if (!free_area_empty(area, migratetype))
+ return COMPACT_SUCCESS;
#ifdef CONFIG_CMA
- /* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
- if (migratetype == MIGRATE_MOVABLE &&
- !free_area_empty(area, MIGRATE_CMA))
- return COMPACT_SUCCESS;
+ /* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
+ if (migratetype == MIGRATE_MOVABLE &&
+ !free_area_empty(area, MIGRATE_CMA))
+ return COMPACT_SUCCESS;
#endif
- /*
- * Job done if allocation would steal freepages from
- * other migratetype buddy lists.
- */
- if (find_suitable_fallback(area, order, migratetype, true) >= 0)
/*
- * Movable pages are OK in any pageblock. If we are
- * stealing for a non-movable allocation, make sure
- * we finish compacting the current pageblock first
- * (which is assured by the above migrate_pfn align
- * check) so it is as free as possible and we won't
- * have to steal another one soon.
+ * Job done if allocation would steal freepages from
+ * other migratetype buddy lists.
*/
- return COMPACT_SUCCESS;
+ if (find_suitable_fallback(area, order, migratetype,
+ true) >= 0)
+ /*
+ * Movable pages are OK in any pageblock. If we
+ * are stealing for a non-movable allocation,
+ * make sure we finish compacting the current
+ * pageblock first (which is assured by the
+ * above migrate_pfn align check) so it is as
+ * free as possible and we won't have to steal
+ * another one soon.
+ */
+ return COMPACT_SUCCESS;
+ }
}
out:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 530ddc73e90a..3c11c8c5ce6a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8288,17 +8288,13 @@ static void evacuate_pageblock(struct zone *zone, unsigned long start_pfn,
* - Skip superpageblocks with no movable pages (nothing to evacuate)
*/
-/* Target free space: 3 pageblocks worth of free pages */
-#define SPB_DEFRAG_FREE_PAGES_TARGET (3UL * pageblock_nr_pages)
-
/**
* spb_needs_defrag - Check if a superpageblock needs defragmentation
* @sb: superpageblock to check (may be NULL)
*
- * Returns false for NULL, non-tainted, or clean superpageblocks.
- * A tainted superpageblock needs defrag if it has movable pages that can
- * be evacuated AND free space is running low (1 or fewer free
- * pageblocks, or less than 2 pageblocks worth of free pages).
+ * Defrag here is the per-SPB tainted-pool evacuation worker. Clean SPBs
+ * are handled by standard compaction (kcompactd) and do not return true
+ * from this predicate.
*/
/*
* Cooldown between defrag attempts that made no progress, in seconds.
@@ -8312,14 +8308,11 @@ static bool spb_needs_defrag(struct superpageblock *sb)
if (!sb)
return false;
- if (spb_get_category(sb) != SB_TAINTED)
- return false;
-
/*
* Back off if the previous pass made no progress: do not retry until
* either the cooldown elapses or free pages have grown by at least a
* pageblock's worth (a hint that there might be new material to
- * consolidate or evacuate).
+ * evacuate).
*/
if (sb->defrag_last_no_progress_jiffies &&
time_before(jiffies, sb->defrag_last_no_progress_jiffies +
@@ -8330,21 +8323,24 @@ static bool spb_needs_defrag(struct superpageblock *sb)
/*
* Tainted superpageblocks: evacuate movable pages to concentrate
- * unmovable/reclaimable allocations. Migration targets are
- * allocated system-wide, so no internal free space is needed.
- * Maintain the tainted reserve so unmovable claims always
- * find room in existing tainted superpageblocks.
+ * unmovable/reclaimable allocations. Maintain the tainted reserve
+ * so unmovable claims always find room in existing tainted
+ * superpageblocks.
*/
- return sb->nr_movable > 0 &&
- sb->nr_free < SPB_TAINTED_RESERVE;
+ if (spb_get_category(sb) == SB_TAINTED)
+ return sb->nr_movable > 0 &&
+ sb->nr_free < SPB_TAINTED_RESERVE;
+
+ /* Clean SPBs: kcompactd handles consolidation; nothing to do here. */
+ return false;
}
/**
- * spb_defrag_done - Check if defrag target has been reached
+ * spb_defrag_done - Check if defrag should stop
* @sb: superpageblock being defragmented
*
- * Stop defragmenting when the superpageblock has enough free space
- * or there are no more movable pages to evacuate.
+ * Only meaningful for tainted SPBs. Clean SPBs never reach this from
+ * the SPB defrag worker (spb_needs_defrag returns false for them).
*/
static bool spb_defrag_done(struct superpageblock *sb)
{
@@ -8353,49 +8349,112 @@ static bool spb_defrag_done(struct superpageblock *sb)
* the reserve of free pageblocks is restored, or until there
* are no more movable pages to evacuate.
*/
- return !sb->nr_movable ||
- sb->nr_free >= SPB_TAINTED_RESERVE;
+ if (spb_get_category(sb) == SB_TAINTED)
+ return !sb->nr_movable ||
+ sb->nr_free >= SPB_TAINTED_RESERVE;
+
+ /* Clean SPBs should not be handled here. */
+ return true;
+}
+
+static void spb_clear_skip_bits(struct superpageblock *sb)
+{
+ unsigned long pfn, end_pfn;
+ struct zone *zone = sb->zone;
+
+ end_pfn = sb->start_pfn + SUPERPAGEBLOCK_NR_PAGES;
+
+ for (pfn = sb->start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+ struct page *page;
+
+ if (!pfn_valid(pfn))
+ continue;
+ if (!zone_spans_pfn(zone, pfn))
+ continue;
+
+ page = pfn_to_page(pfn);
+ clear_pageblock_skip(page);
+ }
}
/**
- * spb_defrag_superpageblock - evacuate movable pages from a tainted superpageblock
+ * spb_defrag_tainted - evacuate movable pages from a tainted superpageblock
* @sb: the tainted superpageblock to defragment
*
* Find any pageblock with movable pages (PB_has_movable) and evacuate
* them, leaving only unmovable, reclaimable, and free pages behind.
* Stop when the free space target is reached.
*/
-static void spb_defrag_superpageblock(struct superpageblock *sb)
+static void spb_defrag_tainted(struct superpageblock *sb)
{
- unsigned long pfn, end_pfn;
+ unsigned long pfn, end_pfn, start_pfn, cursor;
struct zone *zone = sb->zone;
+ bool wrapped = false;
if (!sb->nr_movable)
return;
- end_pfn = sb->start_pfn + SUPERPAGEBLOCK_NR_PAGES;
+ start_pfn = sb->start_pfn;
+ end_pfn = start_pfn + SUPERPAGEBLOCK_NR_PAGES;
- for (pfn = sb->start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+ cursor = sb->defrag_cursor;
+ if (cursor < start_pfn || cursor >= end_pfn) {
+ cursor = start_pfn;
+ spb_clear_skip_bits(sb);
+ }
+
+ pfn = cursor;
+
+ while (pfn < end_pfn) {
struct page *page;
if (spb_defrag_done(sb))
- return;
+ goto out;
if (!pfn_valid(pfn))
- continue;
+ goto next;
+
+ if (!zone_spans_pfn(zone, pfn))
+ goto next;
page = pfn_to_page(pfn);
- /* Skip pageblocks without movable pages */
if (!get_pfnblock_bit(page, pfn, PB_has_movable))
- continue;
+ goto next;
- /* Skip if fully free -- nothing to evacuate */
if (get_pfnblock_bit(page, pfn, PB_all_free))
- continue;
+ goto next;
+
+ if (get_pageblock_skip(page))
+ goto next;
evacuate_pageblock(zone, pfn, true);
+next:
+ pfn += pageblock_nr_pages;
+ if (pfn >= end_pfn && !wrapped) {
+ spb_clear_skip_bits(sb);
+ pfn = start_pfn;
+ wrapped = true;
+ }
+ if (wrapped && pfn > cursor)
+ break;
}
+out:
+ sb->defrag_cursor = pfn;
+}
+
+/**
+ * spb_defrag_superpageblock - defragment a tainted superpageblock
+ * @sb: the superpageblock to defragment
+ *
+ * Tainted SPBs are evacuated by spb_defrag_tainted. Clean SPBs are
+ * handled by standard compaction (kcompactd) and never reach this
+ * dispatcher (spb_needs_defrag returns false for them).
+ */
+static void spb_defrag_superpageblock(struct superpageblock *sb)
+{
+ if (spb_get_category(sb) == SB_TAINTED)
+ spb_defrag_tainted(sb);
}
static void spb_defrag_work_fn(struct work_struct *work)
@@ -8455,10 +8514,12 @@ static void spb_defrag_irq_work_fn(struct irq_work *work)
* @sb: superpageblock whose counters just changed
*
* Called from counter update paths (under zone->lock). If the
- * superpageblock is tainted and running low on free space, schedule
- * irq_work to queue defrag work outside the allocator's lock context.
- * The irq_work handler is set up by pageblock_evacuate_init();
- * before that runs, defrag_irq_work.func is NULL and we skip.
+ * superpageblock needs defragmentation -- either evacuation of movable
+ * pages from a tainted superpageblock, or internal compaction of a
+ * clean superpageblock -- schedule irq_work to queue defrag work outside
+ * the allocator's lock context. The irq_work handler is set up by
+ * pageblock_evacuate_init(); before that runs, defrag_irq_work.func
+ * is NULL and we skip.
*/
static void spb_maybe_start_defrag(struct superpageblock *sb)
{
--
2.54.0
next prev parent reply other threads:[~2026-05-20 15:00 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 14:59 [RFC PATCH 00/40] mm: reliable 1GB page allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 01/40] mm: page_alloc: replace pageblock_flags bitmap with struct pageblock_data Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 02/40] mm: page_alloc: per-cpu pageblock buddy allocator Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 03/40] mm: page_alloc: split-path PCP free with local-trylock + remote-llist Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 04/40] mm: mm_init: fix zone assignment for pages in unavailable ranges Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 05/40] mm: page_alloc: remove watermark boost mechanism Rik van Riel
2026-05-26 14:02 ` Usama Arif
2026-05-27 15:41 ` Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 06/40] mm: page_alloc: async evacuation of stolen movable pageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 07/40] mm: page_alloc: track actual page contents in pageblock flags Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 08/40] mm: page_alloc: superpageblock metadata for 1GB anti-fragmentation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 09/40] mm: page_alloc: support superpageblock resize for memory hotplug Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 10/40] mm: page_alloc: add superpageblock fullness lists for allocation steering Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 11/40] mm: page_alloc: steer pageblock stealing to tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 12/40] mm: page_alloc: steer movable allocations to fullest clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 13/40] mm: page_alloc: extract claim_whole_block from try_to_claim_block Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 14/40] mm: page_alloc: add per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 15/40] mm: page_alloc: add background superpageblock defragmentation worker Rik van Riel
2026-05-20 14:59 ` Rik van Riel [this message]
2026-05-20 14:59 ` [RFC PATCH 17/40] mm: page_alloc: superpageblock-aware contiguous and higher order allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 18/40] mm: page_alloc: prevent atomic allocations from tainting clean SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 19/40] mm: page_alloc: aggressively pack non-movable allocs in tainted SPBs on large systems Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 20/40] mm: page_alloc: prefer reclaim over tainting clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 21/40] mm: page_alloc: adopt partial pageblocks from tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 22/40] mm: page_alloc: add CONFIG_DEBUG_VM sanity checks for SPB counters Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 23/40] mm: page_alloc: targeted evacuation and dynamic reserves for tainted SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 24/40] mm: page_alloc: prevent UNMOVABLE/RECLAIMABLE mixing in pageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 25/40] mm: trigger deferred SPB evac when atomic allocs would taint a clean SPB Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 26/40] mm: page_alloc: refuse fragmenting fallback for callers with cheap fallback Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 27/40] mm: page_alloc: cross-migratetype buddy borrow within tainted SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 28/40] mm: page_alloc: drive slab shrink from SPB anti-fragmentation pressure Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 29/40] mm: page_reporting: walk per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 30/40] mm: show_mem: collect migratetype letters from per-superpageblock lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 31/40] mm: page_alloc: per-(zone, order, mt) PASS_1 hint cache Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 32/40] mm: debug: prevent infinite recursion in dump_page() with CMA Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 33/40] PM: hibernate: walk per-superpageblock free lists in mark_free_pages Rik van Riel
2026-05-20 18:19 ` Rafael J. Wysocki
2026-05-20 14:59 ` [RFC PATCH 34/40] btrfs: allocate eb-attached btree pages as movable Rik van Riel
2026-05-20 17:47 ` Boris Burkov
2026-05-23 15:58 ` David Sterba
2026-05-24 1:43 ` Rik van Riel
2026-05-24 19:59 ` Matthew Wilcox
2026-05-25 6:57 ` Christoph Hellwig
2026-05-20 14:59 ` [RFC PATCH 35/40] mm: page_alloc: refuse best-effort high-order allocs servable at lower orders Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 36/40] mm: page_alloc: set ALLOC_NOFRAGMENT on alloc_frozen_pages_nolock_noprof Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 37/40] mm: page_alloc: move spb_get_category and spb_tainted_reserve to mmzone.h Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 38/40] mm: compaction: skip empty tainted superpageblocks as migration source Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 39/40] mm: compaction: respect tainted SPB reserve in destination selection Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 40/40] mm: page_alloc: SPB tracepoint instrumentation [DO-NOT-MERGE] Rik van Riel
2026-05-21 7:39 ` [syzbot ci] Re: mm: reliable 1GB page allocation syzbot ci
2026-05-22 11:02 ` [RFC PATCH 00/40] " Usama Arif
2026-05-22 13:55 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520150018.2491267-17-riel@surriel.com \
--to=riel@surriel.com \
--cc=david@kernel.org \
--cc=fvdl@google.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox