public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@surriel.com>
To: linux-kernel@vger.kernel.org
Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org,
	willy@infradead.org, surenb@google.com, hannes@cmpxchg.org,
	ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev,
	Rik van Riel <riel@meta.com>, Rik van Riel <riel@surriel.com>
Subject: [RFC PATCH 17/45] mm: page_alloc: add within-superpageblock compaction for clean superpageblocks
Date: Thu, 30 Apr 2026 16:20:46 -0400	[thread overview]
Message-ID: <20260430202233.111010-18-riel@surriel.com> (raw)
In-Reply-To: <20260430202233.111010-1-riel@surriel.com>

From: Rik van Riel <riel@meta.com>

Extend the superpageblock defragmentation framework to handle clean
superpageblocks in addition to tainted ones. While tainted superpageblock
defrag evacuates movable pages out to free up pageblocks, clean
superpageblock compaction migrates pages *within* the same superpageblock
to consolidate scattered free pages into whole free pageblocks.

The key components:

- spb_needs_defrag() and spb_defrag_done() now handle both categories: both
  use the same nr_free < 2 and nr_free_pages thresholds, with tainted SBs
  additionally checking nr_movable.

- spb_defrag_superpageblock() becomes a dispatcher that calls either
  spb_defrag_tainted() (existing evacuation logic) or
  spb_defrag_clean() (new compaction logic).

- spb_defrag_clean() scans pageblocks in the superpageblock,
  skipping fully-free (PB_all_free) and PCP-owned pageblocks, and calls
  compact_pageblock_in_spb() on candidates.

- compact_pageblock_in_spb() uses the same isolate/migrate loop pattern as
  evacuate_pageblock(), but with a custom migration target allocator
  (alloc_spb_compaction_target) that allocates pages exclusively from the
  superpageblock's own free lists.

Also make the compaction code superpageblock-aware:

- Search per-superpageblock free lists instead of zone free lists for
  migration targets, since with SPBs enabled all pages live on per-
  superpageblock free lists.

- Fix PB_has_movable check for zones with non-aligned start PFNs by using
  zone_start_pfn for pageblock boundary checks.

Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
 include/linux/mmzone.h |   1 +
 mm/compaction.c        | 272 ++++++++++++++++++++++----------
 mm/page_alloc.c        | 343 +++++++++++++++++++++++++++++++++++++----
 3 files changed, 501 insertions(+), 115 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 61fe939e7c0f..ba6f08295ff9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -942,6 +942,7 @@ struct superpageblock {
 	struct work_struct	defrag_work;
 	struct irq_work		defrag_irq_work;
 	bool			defrag_active;
+	unsigned long		defrag_cursor;
 	/*
 	 * Back-off state after a no-op defrag pass: defer the next attempt
 	 * until either nr_free_pages has grown by at least pageblock_nr_pages
diff --git a/mm/compaction.c b/mm/compaction.c
index 88ba88340f3b..0e9b4b3ca59b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1321,9 +1321,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		 * isolated (pinned, writeback, dirty, etc.), leave the
 		 * flag set so a future migration attempt can try again.
 		 */
-		if (!nr_isolated && !movable_skipped && valid_page)
-			superpageblock_clear_has_movable(cc->zone,
-							valid_page);
+		if (!nr_isolated && !movable_skipped && valid_page) {
+			unsigned long pb_pfn = pageblock_start_pfn(start_pfn);
+
+			/*
+			 * start_pfn may not be pageblock-aligned when the
+			 * zone start is not aligned (e.g. DMA zone at PFN 1).
+			 * Skip the PB_has_movable update if the pageblock
+			 * start falls below the zone.
+			 */
+			if (pb_pfn >= cc->zone->zone_start_pfn)
+				superpageblock_clear_has_movable(cc->zone,
+								valid_page);
+		}
 	}
 
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
@@ -1577,45 +1587,70 @@ static void fast_isolate_freepages(struct compact_control *cc)
 	for (order = cc->search_order;
 	     !page && order >= 0;
 	     order = next_search_order(cc, order)) {
-		struct free_area *area = &cc->zone->free_area[order];
-		struct list_head *freelist;
-		struct page *freepage;
+		struct list_head *freelist = NULL;
+		struct page *freepage = NULL;
 		unsigned long flags;
 		unsigned int order_scanned = 0;
 		unsigned long high_pfn = 0;
 
-		if (!area->nr_free)
+		if (!cc->zone->free_area[order].nr_free)
 			continue;
 
 		spin_lock_irqsave(&cc->zone->lock, flags);
-		freelist = &area->free_list[MIGRATE_MOVABLE];
-		list_for_each_entry_reverse(freepage, freelist, buddy_list) {
-			unsigned long pfn;
-
-			order_scanned++;
-			nr_scanned++;
-			pfn = page_to_pfn(freepage);
-
-			if (pfn >= highest)
-				highest = max(pageblock_start_pfn(pfn),
-					      cc->zone->zone_start_pfn);
-
-			if (pfn >= low_pfn) {
-				cc->fast_search_fail = 0;
-				cc->search_order = order;
-				page = freepage;
-				break;
-			}
 
-			if (pfn >= min_pfn && pfn > high_pfn) {
-				high_pfn = pfn;
+		/*
+		 * With superpageblocks, free pages live on per-SPB free
+		 * lists rather than zone-level free lists.  Iterate all
+		 * SPBs to find candidate pages.
+		 */
+		{
+			struct zone *zone = cc->zone;
+			unsigned long si, nr_spb = zone->nr_superpageblocks;
+
+			for (si = 0; !page && order_scanned < limit; si++) {
+				struct free_area *area;
+
+				if (nr_spb) {
+					if (si >= nr_spb)
+						break;
+					area = &zone->superpageblocks[si].free_area[order];
+				} else {
+					if (si > 0)
+						break;
+					area = &zone->free_area[order];
+				}
 
-				/* Shorten the scan if a candidate is found */
-				limit >>= 1;
+				freelist = &area->free_list[MIGRATE_MOVABLE];
+				list_for_each_entry_reverse(freepage,
+							    freelist,
+							    buddy_list) {
+					unsigned long pfn;
+
+					order_scanned++;
+					nr_scanned++;
+					pfn = page_to_pfn(freepage);
+
+					if (pfn >= highest)
+						highest = max(
+						    pageblock_start_pfn(pfn),
+						    zone->zone_start_pfn);
+
+					if (pfn >= low_pfn) {
+						cc->fast_search_fail = 0;
+						cc->search_order = order;
+						page = freepage;
+						break;
+					}
+
+					if (pfn >= min_pfn && pfn > high_pfn) {
+						high_pfn = pfn;
+						limit >>= 1;
+					}
+
+					if (order_scanned >= limit)
+						break;
+				}
 			}
-
-			if (order_scanned >= limit)
-				break;
 		}
 
 		/* Use a maximum candidate pfn if a preferred one was not found */
@@ -1624,10 +1659,24 @@ static void fast_isolate_freepages(struct compact_control *cc)
 
 			/* Update freepage for the list reorder below */
 			freepage = page;
+
+			/*
+			 * high_pfn page may be on a different SPB's list
+			 * than the last one scanned; fix up freelist.
+			 */
+			if (cc->zone->nr_superpageblocks) {
+				struct superpageblock *sb;
+
+				sb = pfn_to_superpageblock(cc->zone,
+							   high_pfn);
+				if (sb)
+					freelist = &sb->free_area[order].free_list[MIGRATE_MOVABLE];
+			}
 		}
 
 		/* Reorder to so a future search skips recent pages */
-		move_freelist_head(freelist, freepage);
+		if (freelist && freepage)
+			move_freelist_head(freelist, freepage);
 
 		/* Isolate the page if available */
 		if (page) {
@@ -2021,47 +2070,77 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 	for (order = cc->order - 1;
 	     order >= PAGE_ALLOC_COSTLY_ORDER && !found_block && nr_scanned < limit;
 	     order--) {
-		struct free_area *area = &cc->zone->free_area[order];
-		struct list_head *freelist;
 		unsigned long flags;
 		struct page *freepage;
 
-		if (!area->nr_free)
+		if (!cc->zone->free_area[order].nr_free)
 			continue;
 
 		spin_lock_irqsave(&cc->zone->lock, flags);
-		freelist = &area->free_list[MIGRATE_MOVABLE];
-		list_for_each_entry(freepage, freelist, buddy_list) {
-			unsigned long free_pfn;
 
-			if (nr_scanned++ >= limit) {
-				move_freelist_tail(freelist, freepage);
-				break;
-			}
+		/*
+		 * With superpageblocks, free pages live on per-SPB free
+		 * lists.  Iterate all SPBs to find candidates.
+		 */
+		{
+			struct zone *zone = cc->zone;
+			unsigned long si, nr_spb = zone->nr_superpageblocks;
+
+			for (si = 0; !found_block && nr_scanned < limit; si++) {
+				struct free_area *area;
+				struct list_head *freelist;
+
+				if (nr_spb) {
+					if (si >= nr_spb)
+						break;
+					area = &zone->superpageblocks[si].free_area[order];
+				} else {
+					if (si > 0)
+						break;
+					area = &zone->free_area[order];
+				}
 
-			free_pfn = page_to_pfn(freepage);
-			if (free_pfn < high_pfn) {
-				/*
-				 * Avoid if skipped recently. Ideally it would
-				 * move to the tail but even safe iteration of
-				 * the list assumes an entry is deleted, not
-				 * reordered.
-				 */
-				if (get_pageblock_skip(freepage))
-					continue;
-
-				/* Reorder to so a future search skips recent pages */
-				move_freelist_tail(freelist, freepage);
-
-				update_fast_start_pfn(cc, free_pfn);
-				pfn = pageblock_start_pfn(free_pfn);
-				if (pfn < cc->zone->zone_start_pfn)
-					pfn = cc->zone->zone_start_pfn;
-				cc->fast_search_fail = 0;
-				found_block = true;
-				break;
+				freelist = &area->free_list[MIGRATE_MOVABLE];
+				list_for_each_entry(freepage, freelist,
+						    buddy_list) {
+					unsigned long free_pfn;
+
+					if (nr_scanned++ >= limit) {
+						move_freelist_tail(freelist,
+								   freepage);
+						break;
+					}
+
+					free_pfn = page_to_pfn(freepage);
+					if (free_pfn < high_pfn) {
+						/*
+						 * Avoid if skipped recently.
+						 * Ideally it would move to
+						 * the tail but even safe
+						 * iteration of the list
+						 * assumes an entry is deleted,
+						 * not reordered.
+						 */
+						if (get_pageblock_skip(freepage))
+							continue;
+
+						move_freelist_tail(freelist,
+								   freepage);
+
+						update_fast_start_pfn(cc,
+								      free_pfn);
+						pfn = pageblock_start_pfn(
+								free_pfn);
+						if (pfn < zone->zone_start_pfn)
+							pfn = zone->zone_start_pfn;
+						cc->fast_search_fail = 0;
+						found_block = true;
+						break;
+					}
+				}
 			}
 		}
+
 		spin_unlock_irqrestore(&cc->zone->lock, flags);
 	}
 
@@ -2348,32 +2427,57 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 	/* Direct compactor: Is a suitable page free? */
 	ret = COMPACT_NO_SUITABLE_PAGE;
 	for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
-		struct free_area *area = &cc->zone->free_area[order];
+		struct zone *zone = cc->zone;
+		unsigned long si, nr_spb = zone->nr_superpageblocks;
 
-		/* Job done if page is free of the right migratetype */
-		if (!free_area_empty(area, migratetype))
-			return COMPACT_SUCCESS;
+		/* Zone-level nr_free is maintained even with SPBs */
+		if (!zone->free_area[order].nr_free)
+			continue;
 
-#ifdef CONFIG_CMA
-		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
-		if (migratetype == MIGRATE_MOVABLE &&
-			!free_area_empty(area, MIGRATE_CMA))
-			return COMPACT_SUCCESS;
-#endif
 		/*
-		 * Job done if allocation would steal freepages from
-		 * other migratetype buddy lists.
+		 * With superpageblocks, free pages live on per-SPB free
+		 * lists.  Check all SPBs for a suitable page.
 		 */
-		if (find_suitable_fallback(area, order, migratetype, true) >= 0)
+		for (si = 0; ; si++) {
+			struct free_area *area;
+
+			if (nr_spb) {
+				if (si >= nr_spb)
+					break;
+				area = &zone->superpageblocks[si].free_area[order];
+			} else {
+				if (si > 0)
+					break;
+				area = &zone->free_area[order];
+			}
+
+			/* Job done if page is free of the right migratetype */
+			if (!free_area_empty(area, migratetype))
+				return COMPACT_SUCCESS;
+
+#ifdef CONFIG_CMA
+			/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
+			if (migratetype == MIGRATE_MOVABLE &&
+				!free_area_empty(area, MIGRATE_CMA))
+				return COMPACT_SUCCESS;
+#endif
 			/*
-			 * Movable pages are OK in any pageblock. If we are
-			 * stealing for a non-movable allocation, make sure
-			 * we finish compacting the current pageblock first
-			 * (which is assured by the above migrate_pfn align
-			 * check) so it is as free as possible and we won't
-			 * have to steal another one soon.
+			 * Job done if allocation would steal freepages from
+			 * other migratetype buddy lists.
 			 */
-			return COMPACT_SUCCESS;
+			if (find_suitable_fallback(area, order, migratetype,
+						   true) >= 0)
+				/*
+				 * Movable pages are OK in any pageblock. If we
+				 * are stealing for a non-movable allocation,
+				 * make sure we finish compacting the current
+				 * pageblock first (which is assured by the
+				 * above migrate_pfn align check) so it is as
+				 * free as possible and we won't have to steal
+				 * another one soon.
+				 */
+				return COMPACT_SUCCESS;
+		}
 	}
 
 out:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 07d2926ffb3d..54b9a69bda10 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8199,17 +8199,23 @@ static void evacuate_pageblock(struct zone *zone, unsigned long start_pfn,
  * - Skip superpageblocks with no movable pages (nothing to evacuate)
  */
 
-/* Target free space: 3 pageblocks worth of free pages */
-#define SPB_DEFRAG_FREE_PAGES_TARGET	(3UL * pageblock_nr_pages)
+/*
+ * Target free space for clean SPB internal compaction: at least a quarter
+ * of the superpageblock must be free before we attempt to consolidate
+ * scattered free pages into whole free pageblocks. Below this threshold
+ * the work-to-payoff ratio is poor — we walk the whole SPB and migrate
+ * a handful of pages without producing a usable free pageblock.
+ */
+#define SPB_DEFRAG_FREE_PAGES_TARGET	(SUPERPAGEBLOCK_NR_PAGES / 4)
 
 /**
  * spb_needs_defrag - Check if a superpageblock needs defragmentation
  * @sb: superpageblock to check (may be NULL)
  *
- * Returns false for NULL, non-tainted, or clean superpageblocks.
- * A tainted superpageblock needs defrag if it has movable pages that can
- * be evacuated AND free space is running low (1 or fewer free
- * pageblocks, or less than 2 pageblocks worth of free pages).
+ * For tainted superpageblocks: defrag is needed when there are movable
+ * pageblocks that can be evacuated AND free space is running low.
+ * For clean superpageblocks: compaction is needed when free pages are
+ * scattered (plenty of free pages but few whole free pageblocks).
  */
 /*
  * Cooldown between defrag attempts that made no progress, in seconds.
@@ -8223,9 +8229,6 @@ static bool spb_needs_defrag(struct superpageblock *sb)
 	if (!sb)
 		return false;
 
-	if (spb_get_category(sb) != SB_TAINTED)
-		return false;
-
 	/*
 	 * Back off if the previous pass made no progress: do not retry until
 	 * either the cooldown elapses or free pages have grown by at least a
@@ -8246,16 +8249,30 @@ static bool spb_needs_defrag(struct superpageblock *sb)
 	 * Maintain the tainted reserve so unmovable claims always
 	 * find room in existing tainted superpageblocks.
 	 */
-	return sb->nr_movable > 0 &&
-	       sb->nr_free < SPB_TAINTED_RESERVE;
+	if (spb_get_category(sb) == SB_TAINTED)
+		return sb->nr_movable > 0 &&
+		       sb->nr_free < SPB_TAINTED_RESERVE;
+
+	/*
+	 * Clean superpageblocks: compact scattered free pages into whole
+	 * free pageblocks.  Needs internal free space as destination.
+	 */
+	if (sb->nr_free >= 2)
+		return false;
+
+	if (sb->nr_free_pages < SPB_DEFRAG_FREE_PAGES_TARGET)
+		return false;
+
+	return true;
 }
 
 /**
- * spb_defrag_done - Check if defrag target has been reached
+ * spb_defrag_done - Check if defrag/compaction should stop
  * @sb: superpageblock being defragmented
  *
- * Stop defragmenting when the superpageblock has enough free space
- * or there are no more movable pages to evacuate.
+ * Stop when the superpageblock has enough free pageblocks, when free
+ * pages drop too low to be worth continuing, or (for tainted
+ * superpageblocks) when there are no more movable pages to evacuate.
  */
 static bool spb_defrag_done(struct superpageblock *sb)
 {
@@ -8264,49 +8281,311 @@ static bool spb_defrag_done(struct superpageblock *sb)
 	 * the reserve of free pageblocks is restored, or until there
 	 * are no more movable pages to evacuate.
 	 */
-	return !sb->nr_movable ||
-	       sb->nr_free >= SPB_TAINTED_RESERVE;
+	if (spb_get_category(sb) == SB_TAINTED)
+		return !sb->nr_movable ||
+		       sb->nr_free >= SPB_TAINTED_RESERVE;
+
+	/* Clean superpageblocks: stop when enough free pageblocks exist */
+	if (sb->nr_free >= 2)
+		return true;
+
+	if (sb->nr_free_pages < SPB_DEFRAG_FREE_PAGES_TARGET)
+		return true;
+
+	return false;
+}
+
+static void spb_clear_skip_bits(struct superpageblock *sb)
+{
+	unsigned long pfn, end_pfn;
+	struct zone *zone = sb->zone;
+
+	end_pfn = sb->start_pfn + SUPERPAGEBLOCK_NR_PAGES;
+
+	for (pfn = sb->start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		struct page *page;
+
+		if (!pfn_valid(pfn))
+			continue;
+		if (!zone_spans_pfn(zone, pfn))
+			continue;
+
+		page = pfn_to_page(pfn);
+		clear_pageblock_skip(page);
+	}
 }
 
 /**
- * spb_defrag_superpageblock - evacuate movable pages from a tainted superpageblock
+ * spb_defrag_tainted - evacuate movable pages from a tainted superpageblock
  * @sb: the tainted superpageblock to defragment
  *
  * Find any pageblock with movable pages (PB_has_movable) and evacuate
  * them, leaving only unmovable, reclaimable, and free pages behind.
  * Stop when the free space target is reached.
  */
-static void spb_defrag_superpageblock(struct superpageblock *sb)
+static void spb_defrag_tainted(struct superpageblock *sb)
 {
-	unsigned long pfn, end_pfn;
+	unsigned long pfn, end_pfn, start_pfn, cursor;
 	struct zone *zone = sb->zone;
+	bool wrapped = false;
 
 	if (!sb->nr_movable)
 		return;
 
-	end_pfn = sb->start_pfn + SUPERPAGEBLOCK_NR_PAGES;
+	start_pfn = sb->start_pfn;
+	end_pfn = start_pfn + SUPERPAGEBLOCK_NR_PAGES;
 
-	for (pfn = sb->start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+	cursor = sb->defrag_cursor;
+	if (cursor < start_pfn || cursor >= end_pfn) {
+		cursor = start_pfn;
+		spb_clear_skip_bits(sb);
+	}
+
+	pfn = cursor;
+
+	while (pfn < end_pfn) {
 		struct page *page;
 
 		if (spb_defrag_done(sb))
-			return;
+			goto out;
 
 		if (!pfn_valid(pfn))
-			continue;
+			goto next;
+
+		if (!zone_spans_pfn(zone, pfn))
+			goto next;
 
 		page = pfn_to_page(pfn);
 
-		/* Skip pageblocks without movable pages */
 		if (!get_pfnblock_bit(page, pfn, PB_has_movable))
-			continue;
+			goto next;
 
-		/* Skip if fully free — nothing to evacuate */
 		if (get_pfnblock_bit(page, pfn, PB_all_free))
-			continue;
+			goto next;
+
+		if (get_pageblock_skip(page))
+			goto next;
 
 		evacuate_pageblock(zone, pfn, true);
+next:
+		pfn += pageblock_nr_pages;
+		if (pfn >= end_pfn && !wrapped) {
+			spb_clear_skip_bits(sb);
+			pfn = start_pfn;
+			wrapped = true;
+		}
+		if (wrapped && pfn > cursor)
+			break;
+	}
+out:
+	sb->defrag_cursor = pfn;
+}
+
+/*
+ * Within-superpageblock compaction: migrate pages from partially-used
+ * pageblocks into free space within the same superpageblock, consolidating
+ * scattered free pages into whole free pageblocks.
+ */
+
+struct spb_compaction_control {
+	struct superpageblock	*sb;
+	struct zone		*zone;
+};
+
+/*
+ * alloc_spb_compaction_target - allocate a migration target page from
+ * within the same superpageblock's free lists.
+ *
+ * This is a custom migration target allocator that restricts allocations
+ * to the superpageblock being compacted, ensuring pages stay within the SB.
+ */
+static struct folio *alloc_spb_compaction_target(struct folio *src,
+		unsigned long private)
+{
+	struct spb_compaction_control *scc =
+		(struct spb_compaction_control *)private;
+	struct superpageblock *sb = scc->sb;
+	struct zone *zone = scc->zone;
+	int src_order = folio_order(src);
+	int order = src_order;
+	int migratetype = MIGRATE_MOVABLE;
+	struct free_area *area;
+	struct page *target;
+
+	spin_lock_irq(&zone->lock);
+
+	area = &sb->free_area[order];
+	target = get_page_from_free_area(area, migratetype);
+	if (!target) {
+		/* Try to split a higher-order block within this SB */
+		for (order = src_order + 1; order < NR_PAGE_ORDERS; order++) {
+			area = &sb->free_area[order];
+			target = get_page_from_free_area(area, migratetype);
+			if (target)
+				break;
+		}
+	}
+
+	if (target)
+		page_del_and_expand(zone, target, src_order, order, migratetype);
+
+	spin_unlock_irq(&zone->lock);
+
+	if (!target)
+		return NULL;
+
+	prep_new_page(target, src_order, __GFP_MOVABLE | __GFP_COMP, 0);
+	set_page_refcounted(target);
+	return page_rmappable_folio(target);
+}
+
+static void free_spb_compaction_target(struct folio *folio,
+		unsigned long private)
+{
+	folio_put(folio);
+}
+
+/*
+ * compact_pageblock_in_spb - migrate pages from a partially-used pageblock
+ * into free space within the same superpageblock.
+ *
+ * Similar to evacuate_pageblock() but uses the within-SB allocator
+ * so pages stay inside the superpageblock being compacted.
+ */
+static void compact_pageblock_in_spb(struct superpageblock *sb,
+				    struct zone *zone,
+				    unsigned long start_pfn)
+{
+	unsigned long end_pfn = start_pfn + pageblock_nr_pages;
+	unsigned long pfn = start_pfn;
+	int nr_reclaimed;
+	int ret = 0;
+	struct compact_control cc = {
+		.nr_migratepages = 0,
+		.order = -1,
+		.zone = zone,
+		.mode = MIGRATE_SYNC_LIGHT,
+		.gfp_mask = GFP_HIGHUSER_MOVABLE,
+	};
+	struct spb_compaction_control scc = {
+		.sb = sb,
+		.zone = zone,
+	};
+
+	INIT_LIST_HEAD(&cc.migratepages);
+
+	while (pfn < end_pfn || !list_empty(&cc.migratepages)) {
+		if (list_empty(&cc.migratepages)) {
+			cc.nr_migratepages = 0;
+			cc.migrate_pfn = pfn;
+			ret = isolate_migratepages_range(&cc, pfn, end_pfn);
+			if (ret && ret != -EAGAIN)
+				break;
+			pfn = cc.migrate_pfn;
+			if (list_empty(&cc.migratepages))
+				break;
+		}
+
+		nr_reclaimed = reclaim_clean_pages_from_list(zone,
+							&cc.migratepages);
+		cc.nr_migratepages -= nr_reclaimed;
+
+		if (!list_empty(&cc.migratepages)) {
+			ret = migrate_pages(&cc.migratepages,
+					    alloc_spb_compaction_target,
+					    free_spb_compaction_target,
+					    (unsigned long)&scc, cc.mode,
+					    MR_COMPACTION, NULL);
+			if (ret) {
+				putback_movable_pages(&cc.migratepages);
+				break;
+			}
+		}
+
+		cond_resched();
+	}
+
+	if (!list_empty(&cc.migratepages))
+		putback_movable_pages(&cc.migratepages);
+}
+
+/**
+ * spb_defrag_clean - compact a clean superpageblock internally
+ * @sb: the clean superpageblock to compact
+ *
+ * Scan pageblocks in the superpageblock looking for partially-used ones.
+ * Skip fully free pageblocks and pageblocks recently marked unsuitable
+ * by the pageblock_skip bit; PCPBuddy-cached pages within an otherwise
+ * compactable pageblock are skipped per-page by isolate_migratepages_block().
+ * Migrate pages from the best candidate into free space within the same
+ * superpageblock.
+ */
+static void spb_defrag_clean(struct superpageblock *sb)
+{
+	unsigned long pfn, end_pfn, start_pfn, cursor;
+	struct zone *zone = sb->zone;
+	bool wrapped = false;
+
+	start_pfn = sb->start_pfn;
+	end_pfn = start_pfn + SUPERPAGEBLOCK_NR_PAGES;
+
+	cursor = sb->defrag_cursor;
+	if (cursor < start_pfn || cursor >= end_pfn) {
+		cursor = start_pfn;
+		spb_clear_skip_bits(sb);
+	}
+
+	pfn = cursor;
+
+	while (pfn < end_pfn) {
+		struct page *page;
+
+		if (spb_defrag_done(sb))
+			goto out;
+
+		if (!pfn_valid(pfn))
+			goto next;
+
+		if (!zone_spans_pfn(zone, pfn))
+			goto next;
+
+		page = pfn_to_page(pfn);
+
+		if (get_pfnblock_bit(page, pfn, PB_all_free))
+			goto next;
+
+		if (get_pageblock_skip(page))
+			goto next;
+
+		compact_pageblock_in_spb(sb, zone, pfn);
+next:
+		pfn += pageblock_nr_pages;
+		if (pfn >= end_pfn && !wrapped) {
+			spb_clear_skip_bits(sb);
+			pfn = start_pfn;
+			wrapped = true;
+		}
+		if (wrapped && pfn > cursor)
+			break;
 	}
+out:
+	sb->defrag_cursor = pfn;
+}
+
+/**
+ * spb_defrag_superpageblock - defragment a superpageblock
+ * @sb: the superpageblock to defragment
+ *
+ * Dispatch to the appropriate defrag strategy based on superpageblock
+ * category: evacuate movable pages from tainted superpageblocks, or
+ * compact scattered free pages within clean superpageblocks.
+ */
+static void spb_defrag_superpageblock(struct superpageblock *sb)
+{
+	if (spb_get_category(sb) == SB_TAINTED)
+		spb_defrag_tainted(sb);
+	else
+		spb_defrag_clean(sb);
 }
 
 static void spb_defrag_work_fn(struct work_struct *work)
@@ -8357,10 +8636,12 @@ static void spb_defrag_irq_work_fn(struct irq_work *work)
  * @sb: superpageblock whose counters just changed
  *
  * Called from counter update paths (under zone->lock). If the
- * superpageblock is tainted and running low on free space, schedule
- * irq_work to queue defrag work outside the allocator's lock context.
- * The irq_work handler is set up by pageblock_evacuate_init();
- * before that runs, defrag_irq_work.func is NULL and we skip.
+ * superpageblock needs defragmentation — either evacuation of movable
+ * pages from a tainted superpageblock, or internal compaction of a
+ * clean superpageblock — schedule irq_work to queue defrag work outside
+ * the allocator's lock context. The irq_work handler is set up by
+ * pageblock_evacuate_init(); before that runs, defrag_irq_work.func
+ * is NULL and we skip.
  */
 static void spb_maybe_start_defrag(struct superpageblock *sb)
 {
-- 
2.52.0


  parent reply	other threads:[~2026-04-30 20:22 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 20:20 [00/45 RFC PATCH] 1GB superpageblock memory allocation Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 01/45] mm: page_alloc: replace pageblock_flags bitmap with struct pageblock_data Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 02/45] mm: page_alloc: per-cpu pageblock buddy allocator Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 03/45] mm: page_alloc: use trylock for PCP lock in free path to avoid lock inversion Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 04/45] mm: mm_init: fix zone assignment for pages in unavailable ranges Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 05/45] mm: vmstat: restore per-migratetype free counts in /proc/pagetypeinfo Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 06/45] mm: page_alloc: remove watermark boost mechanism Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 07/45] mm: page_alloc: async evacuation of stolen movable pageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 08/45] mm: page_alloc: track actual page contents in pageblock flags Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 09/45] mm: page_alloc: introduce superpageblock metadata for 1GB anti-fragmentation Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 10/45] mm: page_alloc: support superpageblock resize for memory hotplug Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 11/45] mm: page_alloc: add superpageblock fullness lists for allocation steering Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 12/45] mm: page_alloc: steer pageblock stealing to tainted superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 13/45] mm: page_alloc: steer movable allocations to fullest clean superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 14/45] mm: page_alloc: extract claim_whole_block from try_to_claim_block Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 15/45] mm: page_alloc: add per-superpageblock free lists Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 16/45] mm: page_alloc: add background superpageblock defragmentation worker Rik van Riel
2026-04-30 20:20 ` Rik van Riel [this message]
2026-04-30 20:20 ` [RFC PATCH 18/45] mm: page_alloc: superpageblock-aware contiguous and higher order allocation Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 19/45] mm: page_alloc: prevent atomic allocations from tainting clean SPBs Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 20/45] mm: page_alloc: aggressively pack non-movable allocations in tainted SPBs on large systems Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 21/45] mm: page_alloc: prefer reclaim over tainting clean superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 22/45] mm: page_alloc: adopt partial pageblocks from tainted superpageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 23/45] mm: page_alloc: add CONFIG_DEBUG_VM sanity checks for SPB counters Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 24/45] mm: page_alloc: targeted evacuation and dynamic reserves for tainted SPBs Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 25/45] mm: page_alloc: skip pageblock compatibility threshold in " Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 26/45] mm: page_alloc: prevent UNMOVABLE/RECLAIMABLE mixing in pageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 27/45] mm: trigger deferred SPB evacuation when atomic allocs would taint a clean SPB Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 28/45] mm: page_alloc: keep PCP refill in tainted SPBs across owned pageblocks Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 29/45] mm: page_alloc: refuse fragmenting fallback for callers with cheap fallback Rik van Riel
2026-04-30 20:20 ` [RFC PATCH 30/45] mm: page_alloc: drive slab shrink from SPB anti-fragmentation pressure Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 31/45] mm: page_alloc: cross-non-movable buddy borrow within tainted SPBs Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 32/45] mm: page_alloc: proactive high-water trigger for SPB slab shrink Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 33/45] mm: page_alloc: refuse to taint clean SPBs for atomic NORETRY callers Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 34/45] mm: page_reporting: walk per-superpageblock free lists Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 35/45] mm: show_mem: collect migratetype letters from per-superpageblock lists Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 36/45] mm: page_alloc: add alloc_flags parameter to __rmqueue_smallest Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 37/45] mm/slub: kvmalloc — add __GFP_NORETRY to large-kmalloc attempt Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 38/45] mm: page_alloc: per-(zone, order, mt) PASS_1 hint cache Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 39/45] mm: debug: prevent infinite recursion in dump_page() with CMA Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 40/45] PM: hibernate: walk per-superpageblock free lists in mark_free_pages Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 41/45] btrfs: allocate eb-attached btree pages as movable Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 42/45] mm: page_alloc: cross-MOV borrow within tainted SPBs Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 43/45] mm: page_alloc: trigger defrag from allocator hot path on tainted-SPB pressure Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 44/45] mm: page_alloc: SPB tracepoint instrumentation [DROP-FOR-UPSTREAM] Rik van Riel
2026-04-30 20:21 ` [RFC PATCH 45/45] mm: page_alloc: enlarge and unify spb_evacuate_for_order Rik van Riel
2026-05-01  7:14 ` [00/45 RFC PATCH] 1GB superpageblock memory allocation David Hildenbrand (Arm)
2026-05-01 11:58   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260430202233.111010-18-riel@surriel.com \
    --to=riel@surriel.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=riel@meta.com \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox