From: Rik van Riel <riel@surriel.com>
To: linux-kernel@vger.kernel.org
Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org,
willy@infradead.org, surenb@google.com, hannes@cmpxchg.org,
ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev,
fvdl@google.com, Rik van Riel <riel@surriel.com>
Subject: [RFC PATCH 09/40] mm: page_alloc: support superpageblock resize for memory hotplug
Date: Wed, 20 May 2026 10:59:15 -0400 [thread overview]
Message-ID: <20260520150018.2491267-10-riel@surriel.com> (raw)
In-Reply-To: <20260520150018.2491267-1-riel@surriel.com>
setup_superpageblocks() is __init-only and uses memblock_alloc_node(), so
hotplugged memory that extends a zone's span has no superpageblock
coverage. Pages in those regions would bypass superpageblock steering
entirely.
Add resize_zone_superpageblocks() which is called from
move_pfn_range_to_zone() after the zone span has been updated. It allocates
a new superpageblock array with kvmalloc_node() covering the full zone
span, copies existing superpageblocks (fixing up list head pointers), and
initializes new superpageblocks for the added range.
Use round-up division for partial pageblock counting to match
init_one_superpageblock().
ZONE_DEVICE is excluded since device pages should not participate in anti-
fragmentation steering.
Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
include/linux/mmzone.h | 1 +
mm/internal.h | 4 ++
mm/memory_hotplug.c | 4 ++
mm/mm_init.c | 138 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 147 insertions(+)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e3eac971a76a..19190328e0c7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1057,6 +1057,7 @@ struct zone {
struct superpageblock *superpageblocks;
unsigned long nr_superpageblocks;
unsigned long superpageblock_base_pfn; /* 1GB-aligned base */
+ bool spb_kvmalloced; /* true if from kvmalloc (hotplug) */
/* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */
unsigned long zone_start_pfn;
diff --git a/mm/internal.h b/mm/internal.h
index c8404cb00b08..6a089bc4aa09 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1101,6 +1101,10 @@ void init_cma_reserved_pageblock(struct page *page);
#endif /* CONFIG_COMPACTION || CONFIG_CMA */
+#ifdef CONFIG_MEMORY_HOTPLUG
+void resize_zone_superpageblocks(struct zone *zone);
+#endif
+
struct cma;
#ifdef CONFIG_CMA
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2a943ec57c85..b7c30dfdce8e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -752,6 +752,10 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
resize_zone_range(zone, start_pfn, nr_pages);
resize_pgdat_range(pgdat, start_pfn, nr_pages);
+ /* Grow superpageblock array to cover the new zone span */
+ if (!zone_is_zone_device(zone))
+ resize_zone_superpageblocks(zone);
+
/*
* Subsection population requires care in pfn_to_online_page().
* Set the taint to enable the slow path detection of
diff --git a/mm/mm_init.c b/mm/mm_init.c
index de02a6087c21..ad1cbc2b4498 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1592,6 +1592,144 @@ static void __init setup_superpageblocks(struct zone *zone)
zone_start, zone_end);
}
+#ifdef CONFIG_MEMORY_HOTPLUG
+/**
+ * resize_zone_superpageblocks - grow superpageblock array for memory hotplug
+ * @zone: zone whose span has been extended by hotplug
+ *
+ * Called from move_pfn_range_to_zone() after resize_zone_range() has
+ * updated the zone's span. Allocates a new superpageblock array covering
+ * the full zone span, copies existing superpageblocks (fixing up list heads),
+ * and initializes new superpageblocks for the added range.
+ *
+ * Must be called under mem_hotplug_lock (write). No concurrent
+ * allocations can occur since the hotplugged pages are not yet online.
+ */
+void __meminit resize_zone_superpageblocks(struct zone *zone)
+{
+ unsigned long zone_start = zone->zone_start_pfn;
+ unsigned long zone_end = zone_start + zone->spanned_pages;
+ unsigned long new_sb_base, new_nr_sbs;
+ unsigned long old_offset;
+ struct superpageblock *old_sbs;
+ struct superpageblock *new_sbs;
+ bool old_kvmalloced;
+ size_t alloc_size;
+ unsigned long i;
+ int nid = zone_to_nid(zone);
+
+ if (!zone->spanned_pages)
+ return;
+
+ new_sb_base = ALIGN_DOWN(zone_start, SUPERPAGEBLOCK_NR_PAGES);
+ new_nr_sbs = (ALIGN(zone_end, SUPERPAGEBLOCK_NR_PAGES) - new_sb_base) >>
+ SUPERPAGEBLOCK_ORDER;
+
+ /* Already covered? */
+ if (zone->superpageblocks &&
+ new_sb_base == zone->superpageblock_base_pfn &&
+ new_nr_sbs == zone->nr_superpageblocks)
+ return;
+
+ alloc_size = new_nr_sbs * sizeof(struct superpageblock);
+ new_sbs = kvmalloc_node(alloc_size, GFP_KERNEL | __GFP_ZERO, nid);
+ if (!new_sbs) {
+ pr_warn("Failed to allocate %zu bytes for zone %s superpageblocks\n",
+ alloc_size, zone->name);
+ return;
+ }
+
+ /*
+ * Copy existing superpageblocks to their new position.
+ * The old array covers [old_base, old_base + old_nr * SB_SIZE).
+ * The new array covers [new_base, new_base + new_nr * SB_SIZE).
+ * old_base >= new_base always (zone can only grow).
+ */
+ if (zone->superpageblocks) {
+ old_offset = (zone->superpageblock_base_pfn - new_sb_base) >>
+ SUPERPAGEBLOCK_ORDER;
+ memcpy(&new_sbs[old_offset], zone->superpageblocks,
+ zone->nr_superpageblocks * sizeof(struct superpageblock));
+
+ /*
+ * Fix up list_head pointers that were self-referencing
+ * (empty lists) or pointing into the old array.
+ */
+ for (i = old_offset; i < old_offset + zone->nr_superpageblocks; i++) {
+ struct superpageblock *sb = &new_sbs[i];
+
+ if (list_empty(&sb->list))
+ INIT_LIST_HEAD(&sb->list);
+ else
+ list_replace(&zone->superpageblocks[i - old_offset].list,
+ &sb->list);
+ }
+ }
+
+ /* Initialize new superpageblocks (slots not covered by old array) */
+ for (i = 0; i < new_nr_sbs; i++) {
+ struct superpageblock *sb = &new_sbs[i];
+ bool is_old = false;
+
+ if (zone->superpageblocks) {
+ old_offset = (zone->superpageblock_base_pfn - new_sb_base) >>
+ SUPERPAGEBLOCK_ORDER;
+ if (i >= old_offset &&
+ i < old_offset + zone->nr_superpageblocks)
+ is_old = true;
+ }
+
+ if (is_old)
+ continue;
+
+ init_one_superpageblock(sb, zone,
+ new_sb_base + (i << SUPERPAGEBLOCK_ORDER),
+ zone_start, zone_end);
+ }
+
+ /*
+ * Update existing superpageblocks whose nr_reserved may have
+ * increased due to the zone span growing into them.
+ */
+ if (zone->superpageblocks) {
+ old_offset = (zone->superpageblock_base_pfn - new_sb_base) >>
+ SUPERPAGEBLOCK_ORDER;
+ for (i = old_offset; i < old_offset + zone->nr_superpageblocks; i++) {
+ struct superpageblock *sb = &new_sbs[i];
+ unsigned long sb_start = sb->start_pfn;
+ unsigned long sb_end = sb_start + SUPERPAGEBLOCK_NR_PAGES;
+ unsigned long pb_start = max(sb_start, zone_start);
+ unsigned long pb_end = min(sb_end, zone_end);
+ u16 new_pbs = (pb_end > pb_start) ?
+ ((pb_end - pb_start + pageblock_nr_pages - 1) >>
+ pageblock_order) : 0;
+ u16 old_pbs = sb->nr_free + sb->nr_unmovable +
+ sb->nr_reclaimable + sb->nr_movable +
+ sb->nr_reserved;
+
+ if (new_pbs > old_pbs)
+ sb->nr_reserved += new_pbs - old_pbs;
+ }
+ }
+
+ /* Swap in the new array */
+ old_sbs = zone->superpageblocks;
+ old_kvmalloced = zone->spb_kvmalloced;
+ zone->superpageblocks = new_sbs;
+ zone->nr_superpageblocks = new_nr_sbs;
+ zone->superpageblock_base_pfn = new_sb_base;
+ zone->spb_kvmalloced = true;
+
+ /*
+ * The boot-time array was allocated with memblock_alloc, which
+ * is not individually freeable after boot. Only kvfree arrays
+ * from previous hotplug resizes.
+ */
+ if (old_sbs && old_kvmalloced)
+ kvfree(old_sbs);
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
--
2.54.0
next prev parent reply other threads:[~2026-05-20 15:00 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 14:59 [RFC PATCH 00/40] mm: reliable 1GB page allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 01/40] mm: page_alloc: replace pageblock_flags bitmap with struct pageblock_data Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 02/40] mm: page_alloc: per-cpu pageblock buddy allocator Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 03/40] mm: page_alloc: split-path PCP free with local-trylock + remote-llist Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 04/40] mm: mm_init: fix zone assignment for pages in unavailable ranges Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 05/40] mm: page_alloc: remove watermark boost mechanism Rik van Riel
2026-05-26 14:02 ` Usama Arif
2026-05-27 15:41 ` Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 06/40] mm: page_alloc: async evacuation of stolen movable pageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 07/40] mm: page_alloc: track actual page contents in pageblock flags Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 08/40] mm: page_alloc: superpageblock metadata for 1GB anti-fragmentation Rik van Riel
2026-05-20 14:59 ` Rik van Riel [this message]
2026-05-20 14:59 ` [RFC PATCH 10/40] mm: page_alloc: add superpageblock fullness lists for allocation steering Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 11/40] mm: page_alloc: steer pageblock stealing to tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 12/40] mm: page_alloc: steer movable allocations to fullest clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 13/40] mm: page_alloc: extract claim_whole_block from try_to_claim_block Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 14/40] mm: page_alloc: add per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 15/40] mm: page_alloc: add background superpageblock defragmentation worker Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 16/40] mm: compaction: walk per-superpageblock free lists for migration targets Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 17/40] mm: page_alloc: superpageblock-aware contiguous and higher order allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 18/40] mm: page_alloc: prevent atomic allocations from tainting clean SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 19/40] mm: page_alloc: aggressively pack non-movable allocs in tainted SPBs on large systems Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 20/40] mm: page_alloc: prefer reclaim over tainting clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 21/40] mm: page_alloc: adopt partial pageblocks from tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 22/40] mm: page_alloc: add CONFIG_DEBUG_VM sanity checks for SPB counters Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 23/40] mm: page_alloc: targeted evacuation and dynamic reserves for tainted SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 24/40] mm: page_alloc: prevent UNMOVABLE/RECLAIMABLE mixing in pageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 25/40] mm: trigger deferred SPB evac when atomic allocs would taint a clean SPB Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 26/40] mm: page_alloc: refuse fragmenting fallback for callers with cheap fallback Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 27/40] mm: page_alloc: cross-migratetype buddy borrow within tainted SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 28/40] mm: page_alloc: drive slab shrink from SPB anti-fragmentation pressure Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 29/40] mm: page_reporting: walk per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 30/40] mm: show_mem: collect migratetype letters from per-superpageblock lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 31/40] mm: page_alloc: per-(zone, order, mt) PASS_1 hint cache Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 32/40] mm: debug: prevent infinite recursion in dump_page() with CMA Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 33/40] PM: hibernate: walk per-superpageblock free lists in mark_free_pages Rik van Riel
2026-05-20 18:19 ` Rafael J. Wysocki
2026-05-20 14:59 ` [RFC PATCH 34/40] btrfs: allocate eb-attached btree pages as movable Rik van Riel
2026-05-20 17:47 ` Boris Burkov
2026-05-23 15:58 ` David Sterba
2026-05-24 1:43 ` Rik van Riel
2026-05-24 19:59 ` Matthew Wilcox
2026-05-25 6:57 ` Christoph Hellwig
2026-05-20 14:59 ` [RFC PATCH 35/40] mm: page_alloc: refuse best-effort high-order allocs servable at lower orders Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 36/40] mm: page_alloc: set ALLOC_NOFRAGMENT on alloc_frozen_pages_nolock_noprof Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 37/40] mm: page_alloc: move spb_get_category and spb_tainted_reserve to mmzone.h Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 38/40] mm: compaction: skip empty tainted superpageblocks as migration source Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 39/40] mm: compaction: respect tainted SPB reserve in destination selection Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 40/40] mm: page_alloc: SPB tracepoint instrumentation [DO-NOT-MERGE] Rik van Riel
2026-05-21 5:09 ` kernel test robot
2026-05-21 7:39 ` [syzbot ci] Re: mm: reliable 1GB page allocation syzbot ci
2026-05-22 11:02 ` [RFC PATCH 00/40] " Usama Arif
2026-05-22 13:55 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520150018.2491267-10-riel@surriel.com \
--to=riel@surriel.com \
--cc=david@kernel.org \
--cc=fvdl@google.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.