public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups
@ 2026-03-20 22:13 David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 01/15] mm/memory_hotplug: fix possible race in scan_movable_pages() David Hildenbrand (Arm)
                   ` (14 more replies)
  0 siblings, 15 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

Some cleanups around memory hot(un)plug and SPARSEMEM. In essence,
we can limit CONFIG_MEMORY_HOTPLUG to CONFIG_SPARSEMEM_VMEMMAP,
remove some dead code, and move all the hotplug bits over to
mm/sparse-vmemmap.c.

Some further/related cleanups around other unnecessary code
(memory hole handling and complicated usemap allocation).

I have some further sparse.c cleanups lying around, and I'm planning
on getting rid of bootmem_info.c entirely.

Cross-compiled on a bunch of machines. Hot(un)plug tested with virtio-mem.

v1 -> v2:
* Added "mm/memory_hotplug: fix possible race in scan_movable_pages()"
* Update the comment above section_deactivate()
* Reordered the flags in sparse_init_one_section()
* Patch description improvements

---
David Hildenbrand (Arm) (15):
      mm/memory_hotplug: fix possible race in scan_movable_pages()
      mm/memory_hotplug: remove for_each_valid_pfn() usage
      mm/sparse: remove WARN_ONs from (online|offline)_mem_sections()
      mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
      mm/memory_hotplug: simplify check_pfn_span()
      mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG
      mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP
      mm/bootmem_info: avoid using sparse_decode_mem_map()
      mm/sparse: remove sparse_decode_mem_map()
      mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
      mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c
      mm/sparse: drop set_section_nid() from sparse_add_section()
      mm/sparse: move sparse_init_one_section() to internal.h
      mm/sparse: move __section_mark_present() to internal.h
      mm/sparse: move memory hotplug bits to sparse-vmemmap.c

 include/linux/memory_hotplug.h |   2 -
 include/linux/mmzone.h         |   6 +-
 mm/Kconfig                     |   2 +-
 mm/bootmem_info.c              |  46 +---
 mm/internal.h                  |  47 ++++
 mm/memory_hotplug.c            |  35 ++-
 mm/mm_init.c                   |   2 +-
 mm/sparse-vmemmap.c            | 304 +++++++++++++++++++++++
 mm/sparse.c                    | 539 +----------------------------------------
 9 files changed, 377 insertions(+), 606 deletions(-)
---
base-commit: 3f4f1faa33544d0bd724e32980b6f211c3a9bc7b
change-id: 20260320-sparsemem_cleanups-ce4ddb2c47de

Best regards,
-- 
David Hildenbrand (Arm) <david@kernel.org>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/15] mm/memory_hotplug: fix possible race in scan_movable_pages()
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-23 13:26   ` Lorenzo Stoakes (Oracle)
  2026-03-20 22:13 ` [PATCH v2 02/15] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

If a hugetlb folio gets freed while we are in scan_movable_pages(),
folio_nr_pages() could return 0, resulting in or'ing "0 - 1 = -1"
to the PFN, resulting in PFN = -1. We're not holding any locks or
references that would prevent that.

for_each_valid_pfn() would then search for the next valid PFN, and could
return a PFN that is outside of the range of the original requested
range. do_migrate_page() would then try to migrate quite a big range,
which is certainly undesirable.

To fix it, simply test for valid folio_nr_pages() values. While at it,
as PageHuge() really just does a page_folio() internally, we can just
use folio_test_hugetlb() on the folio directly.

scan_movable_pages() is expected to be fast, and we try to avoid taking
locks or grabbing references. We cannot use folio_try_get() as that does
not work for free hugetlb folios. We could grab the hugetlb_lock, but
that just adds complexity.

The race is unlikely to trigger in practice, so we won't be CCing
stable.

Fixes: 16540dae959d ("mm/hugetlb: mm/memory_hotplug: use a folio in scan_movable_pages()")
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/memory_hotplug.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 86d3faf50453..969cd7ddf68f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1747,6 +1747,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
 	unsigned long pfn;
 
 	for_each_valid_pfn(pfn, start, end) {
+		unsigned long nr_pages;
 		struct page *page;
 		struct folio *folio;
 
@@ -1763,9 +1764,9 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
 		if (PageOffline(page) && page_count(page))
 			return -EBUSY;
 
-		if (!PageHuge(page))
-			continue;
 		folio = page_folio(page);
+		if (!folio_test_hugetlb(folio))
+			continue;
 		/*
 		 * This test is racy as we hold no reference or lock.  The
 		 * hugetlb page could have been free'ed and head is no longer
@@ -1775,7 +1776,11 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
 		 */
 		if (folio_test_hugetlb_migratable(folio))
 			goto found;
-		pfn |= folio_nr_pages(folio) - 1;
+		nr_pages = folio_nr_pages(folio);
+		if (unlikely(nr_pages < 1 || nr_pages > MAX_FOLIO_NR_PAGES ||
+			     !is_power_of_2(nr_pages)))
+			continue;
+		pfn |= nr_pages - 1;
 	}
 	return -ENOENT;
 found:

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/15] mm/memory_hotplug: remove for_each_valid_pfn() usage
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 01/15] mm/memory_hotplug: fix possible race in scan_movable_pages() David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 03/15] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections() David Hildenbrand (Arm)
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

When offlining memory, we know that the memory range has no holes.
Checking for valid pfns is not required.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/memory_hotplug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 969cd7ddf68f..0c26b1f2be6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1746,7 +1746,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
 {
 	unsigned long pfn;
 
-	for_each_valid_pfn(pfn, start, end) {
+	for (pfn = start; pfn < end; pfn++) {
 		unsigned long nr_pages;
 		struct page *page;
 		struct folio *folio;
@@ -1796,7 +1796,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
 
-	for_each_valid_pfn(pfn, start_pfn, end_pfn) {
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		struct page *page;
 
 		page = pfn_to_page(pfn);

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/15] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections()
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 01/15] mm/memory_hotplug: fix possible race in scan_movable_pages() David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 02/15] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 04/15] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

We do not allow offlining of memory with memory holes, and always
hotplug memory without holes.

Consequently, we cannot end up onlining or offlining memory sections that
have holes (including invalid sections). That's also why these
WARN_ONs never fired.

Let's remove the WARN_ONs along with the TODO regarding double-checking.
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index dfabe554adf8..93252112860e 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -638,13 +638,8 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
 		unsigned long section_nr = pfn_to_section_nr(pfn);
-		struct mem_section *ms;
-
-		/* onlining code should never touch invalid ranges */
-		if (WARN_ON(!valid_section_nr(section_nr)))
-			continue;
+		struct mem_section *ms = __nr_to_section(section_nr);
 
-		ms = __nr_to_section(section_nr);
 		ms->section_mem_map |= SECTION_IS_ONLINE;
 	}
 }
@@ -656,16 +651,8 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
 		unsigned long section_nr = pfn_to_section_nr(pfn);
-		struct mem_section *ms;
+		struct mem_section *ms = __nr_to_section(section_nr);
 
-		/*
-		 * TODO this needs some double checking. Offlining code makes
-		 * sure to check pfn_valid but those checks might be just bogus
-		 */
-		if (WARN_ON(!valid_section_nr(section_nr)))
-			continue;
-
-		ms = __nr_to_section(section_nr);
 		ms->section_mem_map &= ~SECTION_IS_ONLINE;
 	}
 }

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/15] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (2 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 03/15] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections() David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 05/15] mm/memory_hotplug: simplify check_pfn_span() David Hildenbrand (Arm)
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

Ever since commit f8f03eb5f0f9 ("mm: stop making SPARSEMEM_VMEMMAP
user-selectable"), an architecture that supports CONFIG_SPARSEMEM_VMEMMAP
(by selecting SPARSEMEM_VMEMMAP_ENABLE) can no longer enable
CONFIG_SPARSEMEM without CONFIG_SPARSEMEM_VMEMMAP.

Right now, CONFIG_MEMORY_HOTPLUG is guarded by CONFIG_SPARSEMEM.

However, CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG is only enabled by
* arm64: which selects SPARSEMEM_VMEMMAP_ENABLE
* loongarch: which selects SPARSEMEM_VMEMMAP_ENABLE
* powerpc (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
* riscv (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
* s390 with SPARSEMEM: which selects SPARSEMEM_VMEMMAP_ENABLE
* x86 (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE

So, we can make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
without affecting any setups.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index ebd8ea353687..c012944938a7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -472,7 +472,7 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
 menuconfig MEMORY_HOTPLUG
 	bool "Memory hotplug"
 	select MEMORY_ISOLATION
-	depends on SPARSEMEM
+	depends on SPARSEMEM_VMEMMAP
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 	depends on 64BIT
 	select NUMA_KEEP_MEMINFO if NUMA

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/15] mm/memory_hotplug: simplify check_pfn_span()
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (3 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 04/15] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 06/15] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG David Hildenbrand (Arm)
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

We now always have CONFIG_SPARSEMEM_VMEMMAP, so remove the dead code.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/memory_hotplug.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0c26b1f2be6e..ef2b03eb1873 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -320,21 +320,13 @@ static void release_memory_resource(struct resource *res)
 static int check_pfn_span(unsigned long pfn, unsigned long nr_pages)
 {
 	/*
-	 * Disallow all operations smaller than a sub-section and only
-	 * allow operations smaller than a section for
-	 * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
-	 * enforces a larger memory_block_size_bytes() granularity for
-	 * memory that will be marked online, so this check should only
-	 * fire for direct arch_{add,remove}_memory() users outside of
-	 * add_memory_resource().
+	 * Disallow all operations smaller than a sub-section.
+	 * Note that check_hotplug_memory_range() enforces a larger
+	 * memory_block_size_bytes() granularity for memory that will be marked
+	 * online, so this check should only fire for direct
+	 * arch_{add,remove}_memory() users outside of add_memory_resource().
 	 */
-	unsigned long min_align;
-
-	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
-		min_align = PAGES_PER_SUBSECTION;
-	else
-		min_align = PAGES_PER_SECTION;
-	if (!IS_ALIGNED(pfn | nr_pages, min_align))
+	if (!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION))
 		return -EINVAL;
 	return 0;
 }

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/15] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (4 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 05/15] mm/memory_hotplug: simplify check_pfn_span() David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 07/15] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

CONFIG_MEMORY_HOTPLUG now depends on CONFIG_SPARSEMEM_VMEMMAP. So
let's remove the !CONFIG_SPARSEMEM_VMEMMAP leftovers that are dead code.

Adjust the comment above fill_subsection_map() accordingly.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse.c | 69 ++-----------------------------------------------------------
 1 file changed, 2 insertions(+), 67 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 93252112860e..875f718a4c79 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -657,7 +657,6 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 	}
 }
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
 static struct page * __meminit populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
@@ -729,73 +728,11 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
 
 	return rc;
 }
-#else
-static struct page * __meminit populate_section_memmap(unsigned long pfn,
-		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
-		struct dev_pagemap *pgmap)
-{
-	return kvmalloc_node(array_size(sizeof(struct page),
-					PAGES_PER_SECTION), GFP_KERNEL, nid);
-}
-
-static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
-{
-	kvfree(pfn_to_page(pfn));
-}
-
-static void free_map_bootmem(struct page *memmap)
-{
-	unsigned long maps_section_nr, removing_section_nr, i;
-	unsigned long type, nr_pages;
-	struct page *page = virt_to_page(memmap);
-
-	nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))
-		>> PAGE_SHIFT;
-
-	for (i = 0; i < nr_pages; i++, page++) {
-		type = bootmem_type(page);
-
-		BUG_ON(type == NODE_INFO);
-
-		maps_section_nr = pfn_to_section_nr(page_to_pfn(page));
-		removing_section_nr = bootmem_info(page);
-
-		/*
-		 * When this function is called, the removing section is
-		 * logical offlined state. This means all pages are isolated
-		 * from page allocator. If removing section's memmap is placed
-		 * on the same section, it must not be freed.
-		 * If it is freed, page allocator may allocate it which will
-		 * be removed physically soon.
-		 */
-		if (maps_section_nr != removing_section_nr)
-			put_page_bootmem(page);
-	}
-}
-
-static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	return 0;
-}
-
-static bool is_subsection_map_empty(struct mem_section *ms)
-{
-	return true;
-}
-
-static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	return 0;
-}
-#endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 /*
- * To deactivate a memory region, there are 3 cases to handle across
- * two configurations (SPARSEMEM_VMEMMAP={y,n}):
+ * To deactivate a memory region, there are 3 cases to handle:
  *
- * 1. deactivation of a partial hot-added section (only possible in
- *    the SPARSEMEM_VMEMMAP=y case).
+ * 1. deactivation of a partial hot-added section:
  *      a) section was present at memory init.
  *      b) section was hot-added post memory init.
  * 2. deactivation of a complete hot-added section.
@@ -803,8 +740,6 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
  *
  * For 1, when subsection_map does not empty we will not be freeing the
  * usage map, but still need to free the vmemmap range.
- *
- * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
  */
 static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 		struct vmem_altmap *altmap)

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/15] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (5 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 06/15] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 08/15] mm/bootmem_info: avoid using sparse_decode_mem_map() David Hildenbrand (Arm)
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

It is not immediately obvious that CONFIG_HAVE_BOOTMEM_INFO_NODE is
only selected from CONFIG_MEMORY_HOTREMOVE, which itself depends on
CONFIG_MEMORY_HOTPLUG that ... depends on CONFIG_SPARSEMEM_VMEMMAP.

Let's remove the !CONFIG_SPARSEMEM_VMEMMAP leftovers that are dead code.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/bootmem_info.c | 37 -------------------------------------
 1 file changed, 37 deletions(-)

diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index b0e2a9fa641f..e61e08e24924 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -40,42 +40,6 @@ void put_page_bootmem(struct page *page)
 	}
 }
 
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
-static void __init register_page_bootmem_info_section(unsigned long start_pfn)
-{
-	unsigned long mapsize, section_nr, i;
-	struct mem_section *ms;
-	struct page *page, *memmap;
-	struct mem_section_usage *usage;
-
-	section_nr = pfn_to_section_nr(start_pfn);
-	ms = __nr_to_section(section_nr);
-
-	/* Get section's memmap address */
-	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
-
-	/*
-	 * Get page for the memmap's phys address
-	 * XXX: need more consideration for sparse_vmemmap...
-	 */
-	page = virt_to_page(memmap);
-	mapsize = sizeof(struct page) * PAGES_PER_SECTION;
-	mapsize = PAGE_ALIGN(mapsize) >> PAGE_SHIFT;
-
-	/* remember memmap's page */
-	for (i = 0; i < mapsize; i++, page++)
-		get_page_bootmem(section_nr, page, SECTION_INFO);
-
-	usage = ms->usage;
-	page = virt_to_page(usage);
-
-	mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
-
-	for (i = 0; i < mapsize; i++, page++)
-		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
-
-}
-#else /* CONFIG_SPARSEMEM_VMEMMAP */
 static void __init register_page_bootmem_info_section(unsigned long start_pfn)
 {
 	unsigned long mapsize, section_nr, i;
@@ -100,7 +64,6 @@ static void __init register_page_bootmem_info_section(unsigned long start_pfn)
 	for (i = 0; i < mapsize; i++, page++)
 		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
 }
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 {

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/15] mm/bootmem_info: avoid using sparse_decode_mem_map()
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (6 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 07/15] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 09/15] mm/sparse: remove sparse_decode_mem_map() David Hildenbrand (Arm)
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

With SPARSEMEM_VMEMMAP, we can just do a pfn_to_page(). It is not super
clear whether the start_pfn is properly aligned ... so let's just make
sure it is properly aligned to the start of the section.

We will soon might try to remove the bootmem info completely, for now,
just keep it working as is.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/bootmem_info.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index e61e08e24924..3d7675a3ae04 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -44,17 +44,16 @@ static void __init register_page_bootmem_info_section(unsigned long start_pfn)
 {
 	unsigned long mapsize, section_nr, i;
 	struct mem_section *ms;
-	struct page *page, *memmap;
 	struct mem_section_usage *usage;
+	struct page *page;
 
+	start_pfn = SECTION_ALIGN_DOWN(start_pfn);
 	section_nr = pfn_to_section_nr(start_pfn);
 	ms = __nr_to_section(section_nr);
 
-	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
-
 	if (!preinited_vmemmap_section(ms))
-		register_page_bootmem_memmap(section_nr, memmap,
-				PAGES_PER_SECTION);
+		register_page_bootmem_memmap(section_nr, pfn_to_page(start_pfn),
+					     PAGES_PER_SECTION);
 
 	usage = ms->usage;
 	page = virt_to_page(usage);

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/15] mm/sparse: remove sparse_decode_mem_map()
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (7 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 08/15] mm/bootmem_info: avoid using sparse_decode_mem_map() David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 10/15] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling David Hildenbrand (Arm)
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

section_deactivate() applies to CONFIG_SPARSEMEM_VMEMMAP only. So we can
just use pfn_to_page() (after making sure we have the start PFN of the
section), and remove sparse_decode_mem_map().

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/memory_hotplug.h |  2 --
 mm/sparse.c                    | 16 +---------------
 2 files changed, 1 insertion(+), 17 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index e77ef3d7ff73..815e908c4135 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -308,8 +308,6 @@ extern int sparse_add_section(int nid, unsigned long pfn,
 		struct dev_pagemap *pgmap);
 extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
 				  struct vmem_altmap *altmap);
-extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
-					  unsigned long pnum);
 extern struct zone *zone_for_pfn_range(enum mmop online_type,
 		int nid, struct memory_group *group, unsigned long start_pfn,
 		unsigned long nr_pages);
diff --git a/mm/sparse.c b/mm/sparse.c
index 875f718a4c79..b5825c9ee2f2 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -274,18 +274,6 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
 	return coded_mem_map;
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
-/*
- * Decode mem_map from the coded memmap
- */
-struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
-{
-	/* mask off the extra low bits of information */
-	coded_mem_map &= SECTION_MAP_MASK;
-	return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
-}
-#endif /* CONFIG_MEMORY_HOTPLUG */
-
 static void __meminit sparse_init_one_section(struct mem_section *ms,
 		unsigned long pnum, struct page *mem_map,
 		struct mem_section_usage *usage, unsigned long flags)
@@ -754,8 +742,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 
 	empty = is_subsection_map_empty(ms);
 	if (empty) {
-		unsigned long section_nr = pfn_to_section_nr(pfn);
-
 		/*
 		 * Mark the section invalid so that valid_section()
 		 * return false. This prevents code from dereferencing
@@ -774,7 +760,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 			kfree_rcu(ms->usage, rcu);
 			WRITE_ONCE(ms->usage, NULL);
 		}
-		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
 	}
 
 	/*

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/15] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (8 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 09/15] mm/sparse: remove sparse_decode_mem_map() David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 11/15] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c David Hildenbrand (Arm)
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
usemap on the section with pgdat") quite some complexity to try
allocating memory for the "usemap" (storing pageblock information
per memory section) for a memory section close to the memory of the
"pgdat" of the node.

The goal was to make memory hotunplug of boot memory more likely to
succeed. That commit also added some checks for circular dependencies
between two memory sections, whereby two memory sections would contain
each others usemap, turning both boot memory sections un-removable.

However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
together") started allocating the usemap for multiple memory
sections on the same node in one chunk, effectively grouping all usemap
allocations of the same node in a single memblock allocation.

We don't really give guarantees about memory hotunplug of boot memory, and
with the change in 2010, it is impossible in practice to get any circular
dependencies.

So let's simply remove this complexity.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse.c | 100 +-----------------------------------------------------------
 1 file changed, 1 insertion(+), 99 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index b5825c9ee2f2..e2048b1fbf5f 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -294,102 +294,6 @@ size_t mem_section_usage_size(void)
 	return sizeof(struct mem_section_usage) + usemap_size();
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
-static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
-{
-#ifndef CONFIG_NUMA
-	VM_BUG_ON(pgdat != &contig_page_data);
-	return __pa_symbol(&contig_page_data);
-#else
-	return __pa(pgdat);
-#endif
-}
-
-static struct mem_section_usage * __init
-sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
-					 unsigned long size)
-{
-	struct mem_section_usage *usage;
-	unsigned long goal, limit;
-	int nid;
-	/*
-	 * A page may contain usemaps for other sections preventing the
-	 * page being freed and making a section unremovable while
-	 * other sections referencing the usemap remain active. Similarly,
-	 * a pgdat can prevent a section being removed. If section A
-	 * contains a pgdat and section B contains the usemap, both
-	 * sections become inter-dependent. This allocates usemaps
-	 * from the same section as the pgdat where possible to avoid
-	 * this problem.
-	 */
-	goal = pgdat_to_phys(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
-	limit = goal + (1UL << PA_SECTION_SHIFT);
-	nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
-again:
-	usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid);
-	if (!usage && limit) {
-		limit = MEMBLOCK_ALLOC_ACCESSIBLE;
-		goto again;
-	}
-	return usage;
-}
-
-static void __init check_usemap_section_nr(int nid,
-		struct mem_section_usage *usage)
-{
-	unsigned long usemap_snr, pgdat_snr;
-	static unsigned long old_usemap_snr;
-	static unsigned long old_pgdat_snr;
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	int usemap_nid;
-
-	/* First call */
-	if (!old_usemap_snr) {
-		old_usemap_snr = NR_MEM_SECTIONS;
-		old_pgdat_snr = NR_MEM_SECTIONS;
-	}
-
-	usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT);
-	pgdat_snr = pfn_to_section_nr(pgdat_to_phys(pgdat) >> PAGE_SHIFT);
-	if (usemap_snr == pgdat_snr)
-		return;
-
-	if (old_usemap_snr == usemap_snr && old_pgdat_snr == pgdat_snr)
-		/* skip redundant message */
-		return;
-
-	old_usemap_snr = usemap_snr;
-	old_pgdat_snr = pgdat_snr;
-
-	usemap_nid = sparse_early_nid(__nr_to_section(usemap_snr));
-	if (usemap_nid != nid) {
-		pr_info("node %d must be removed before remove section %ld\n",
-			nid, usemap_snr);
-		return;
-	}
-	/*
-	 * There is a circular dependency.
-	 * Some platforms allow un-removable section because they will just
-	 * gather other removable sections for dynamic partitioning.
-	 * Just notify un-removable section's number here.
-	 */
-	pr_info("Section %ld and %ld (node %d) have a circular dependency on usemap and pgdat allocations\n",
-		usemap_snr, pgdat_snr, nid);
-}
-#else
-static struct mem_section_usage * __init
-sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
-					 unsigned long size)
-{
-	return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id);
-}
-
-static void __init check_usemap_section_nr(int nid,
-		struct mem_section_usage *usage)
-{
-}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
-
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 unsigned long __init section_map_size(void)
 {
@@ -486,7 +390,6 @@ void __init sparse_init_early_section(int nid, struct page *map,
 				      unsigned long pnum, unsigned long flags)
 {
 	BUG_ON(!sparse_usagebuf || sparse_usagebuf >= sparse_usagebuf_end);
-	check_usemap_section_nr(nid, sparse_usagebuf);
 	sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 			sparse_usagebuf, SECTION_IS_EARLY | flags);
 	sparse_usagebuf = (void *)sparse_usagebuf + mem_section_usage_size();
@@ -497,8 +400,7 @@ static int __init sparse_usage_init(int nid, unsigned long map_count)
 	unsigned long size;
 
 	size = mem_section_usage_size() * map_count;
-	sparse_usagebuf = sparse_early_usemaps_alloc_pgdat_section(
-				NODE_DATA(nid), size);
+	sparse_usagebuf = memblock_alloc_node(size, SMP_CACHE_BYTES, nid);
 	if (!sparse_usagebuf) {
 		sparse_usagebuf_end = NULL;
 		return -ENOMEM;

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/15] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (9 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 10/15] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 12/15] mm/sparse: drop set_section_nid() from sparse_add_section() David Hildenbrand (Arm)
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

We want to move subsection_map_init() to mm/sparse-vmemmap.c.

To prepare for getting rid of subsection_map_init() in mm/sparse.c
completely, use a static inline function for !CONFIG_SPARSEMEM_VMEMMAP.

While at it, move the declaration to internal.h and rename it to
"sparse_init_subsection_map()".

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/mmzone.h |  3 ---
 mm/internal.h          | 12 ++++++++++++
 mm/mm_init.c           |  2 +-
 mm/sparse.c            |  6 +-----
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7bd0134c241c..b694c69dee04 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2002,8 +2002,6 @@ struct mem_section_usage {
 	unsigned long pageblock_flags[0];
 };
 
-void subsection_map_init(unsigned long pfn, unsigned long nr_pages);
-
 struct page;
 struct page_ext;
 struct mem_section {
@@ -2396,7 +2394,6 @@ static inline unsigned long next_present_section_nr(unsigned long section_nr)
 #define sparse_vmemmap_init_nid_early(_nid) do {} while (0)
 #define sparse_vmemmap_init_nid_late(_nid) do {} while (0)
 #define pfn_in_present_section pfn_valid
-#define subsection_map_init(_pfn, _nr_pages) do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
 /*
diff --git a/mm/internal.h b/mm/internal.h
index f98f4746ac41..5f5c45d80aca 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -960,12 +960,24 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
 		unsigned long, enum meminit_context, struct vmem_altmap *, int,
 		bool);
 
+/*
+ * mm/sparse.c
+ */
 #ifdef CONFIG_SPARSEMEM
 void sparse_init(void);
 #else
 static inline void sparse_init(void) {}
 #endif /* CONFIG_SPARSEMEM */
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+void sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages);
+#else
+static inline void sparse_init_subsection_map(unsigned long pfn,
+		unsigned long nr_pages)
+{
+}
+#endif /* CONFIG_SPARSEMEM_VMEMMAP */
+
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 
 /*
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 969048f9b320..3c5f18537cd1 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1898,7 +1898,7 @@ static void __init free_area_init(void)
 		pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
 			(u64)start_pfn << PAGE_SHIFT,
 			((u64)end_pfn << PAGE_SHIFT) - 1);
-		subsection_map_init(start_pfn, end_pfn - start_pfn);
+		sparse_init_subsection_map(start_pfn, end_pfn - start_pfn);
 	}
 
 	/* Initialise every node */
diff --git a/mm/sparse.c b/mm/sparse.c
index e2048b1fbf5f..c96ac5e70c22 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -185,7 +185,7 @@ static void subsection_mask_set(unsigned long *map, unsigned long pfn,
 	bitmap_set(map, idx, end - idx + 1);
 }
 
-void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
+void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
 {
 	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
 	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
@@ -207,10 +207,6 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
 		nr_pages -= pfns;
 	}
 }
-#else
-void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
-{
-}
 #endif
 
 /* Record a memory area against a node. */

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 12/15] mm/sparse: drop set_section_nid() from sparse_add_section()
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (10 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 11/15] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 13/15] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

CONFIG_MEMORY_HOTPLUG is CONFIG_SPARSEMEM_VMEMMAP-only. And
CONFIG_SPARSEMEM_VMEMMAP implies that NODE_NOT_IN_PAGE_FLAGS cannot be set:
see include/linux/page-flags-layout.h

	...
	#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
	#error "Vmemmap: No space for nodes field in page flags"
	...

Which implies that the node is always stored in page flags and
NODE_NOT_IN_PAGE_FLAGS cannot be set. Therefore,
set_section_nid() is a NOP on CONFIG_SPARSEMEM_VMEMMAP.

So let's remove the set_section_nid() call to prepare for moving
CONFIG_MEMORY_HOTPLUG to mm/sparse-vmemmap.c

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index c96ac5e70c22..5c9cad390282 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -765,7 +765,6 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 	page_init_poison(memmap, sizeof(struct page) * nr_pages);
 
 	ms = __nr_to_section(section_nr);
-	set_section_nid(section_nr, nid);
 	__section_mark_present(ms, section_nr);
 
 	/* Align memmap to section boundary in the subsection case */

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 13/15] mm/sparse: move sparse_init_one_section() to internal.h
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (11 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 12/15] mm/sparse: drop set_section_nid() from sparse_add_section() David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-23  8:49   ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 14/15] mm/sparse: move __section_mark_present() " David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 15/15] mm/sparse: move memory hotplug bits to sparse-vmemmap.c David Hildenbrand (Arm)
  14 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

While at it, convert the BUG_ON to a VM_WARN_ON_ONCE, avoid long lines, and
merge sparse_encode_mem_map() into its only caller
sparse_init_one_section().

Clarify the comment a bit, pointing at page_to_pfn().

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/mmzone.h |  2 +-
 mm/internal.h          | 22 ++++++++++++++++++++++
 mm/sparse.c            | 24 ------------------------
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b694c69dee04..dcbbf36ed88c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2008,7 +2008,7 @@ struct mem_section {
 	/*
 	 * This is, logically, a pointer to an array of struct
 	 * pages.  However, it is stored with some other magic.
-	 * (see sparse.c::sparse_init_one_section())
+	 * (see sparse_init_one_section())
 	 *
 	 * Additionally during early boot we encode node id of
 	 * the location of the section here to guide allocation.
diff --git a/mm/internal.h b/mm/internal.h
index 5f5c45d80aca..2f188f7702f7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -965,6 +965,28 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
  */
 #ifdef CONFIG_SPARSEMEM
 void sparse_init(void);
+
+static inline void sparse_init_one_section(struct mem_section *ms,
+		unsigned long pnum, struct page *mem_map,
+		struct mem_section_usage *usage, unsigned long flags)
+{
+	unsigned long coded_mem_map;
+
+	BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
+
+	/*
+	 * We encode the start PFN of the section into the mem_map such that
+	 * page_to_pfn() on !CONFIG_SPARSEMEM_VMEMMAP can simply subtract it
+	 * from the page pointer to obtain the PFN.
+	 */
+	coded_mem_map = (unsigned long)(mem_map - section_nr_to_pfn(pnum));
+	VM_WARN_ON(coded_mem_map & ~SECTION_MAP_MASK);
+
+	ms->section_mem_map &= ~SECTION_MAP_MASK;
+	ms->section_mem_map |= coded_mem_map;
+	ms->section_mem_map |= flags | SECTION_HAS_MEM_MAP;
+	ms->usage = usage;
+}
 #else
 static inline void sparse_init(void) {}
 #endif /* CONFIG_SPARSEMEM */
diff --git a/mm/sparse.c b/mm/sparse.c
index 5c9cad390282..ed5de1a25f04 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -256,30 +256,6 @@ static void __init memblocks_present(void)
 		memory_present(nid, start, end);
 }
 
-/*
- * Subtle, we encode the real pfn into the mem_map such that
- * the identity pfn - section_mem_map will return the actual
- * physical page frame number.
- */
-static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long pnum)
-{
-	unsigned long coded_mem_map =
-		(unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
-	BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
-	BUG_ON(coded_mem_map & ~SECTION_MAP_MASK);
-	return coded_mem_map;
-}
-
-static void __meminit sparse_init_one_section(struct mem_section *ms,
-		unsigned long pnum, struct page *mem_map,
-		struct mem_section_usage *usage, unsigned long flags)
-{
-	ms->section_mem_map &= ~SECTION_MAP_MASK;
-	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum)
-		| SECTION_HAS_MEM_MAP | flags;
-	ms->usage = usage;
-}
-
 static unsigned long usemap_size(void)
 {
 	return BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS) * sizeof(unsigned long);

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 14/15] mm/sparse: move __section_mark_present() to internal.h
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (12 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 13/15] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  2026-03-20 22:13 ` [PATCH v2 15/15] mm/sparse: move memory hotplug bits to sparse-vmemmap.c David Hildenbrand (Arm)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

Let's prepare for moving memory hotplug handling from sparse.c to
sparse-vmemmap.c by moving __section_mark_present() to internal.h.

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/internal.h | 9 +++++++++
 mm/sparse.c   | 8 --------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 2f188f7702f7..b002c91e40a5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -987,6 +987,15 @@ static inline void sparse_init_one_section(struct mem_section *ms,
 	ms->section_mem_map |= flags | SECTION_HAS_MEM_MAP;
 	ms->usage = usage;
 }
+
+static inline void __section_mark_present(struct mem_section *ms,
+		unsigned long section_nr)
+{
+	if (section_nr > __highest_present_section_nr)
+		__highest_present_section_nr = section_nr;
+
+	ms->section_mem_map |= SECTION_MARKED_PRESENT;
+}
 #else
 static inline void sparse_init(void) {}
 #endif /* CONFIG_SPARSEMEM */
diff --git a/mm/sparse.c b/mm/sparse.c
index ed5de1a25f04..ecd4c41c0ff0 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -161,14 +161,6 @@ static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
  * those loops early.
  */
 unsigned long __highest_present_section_nr;
-static void __section_mark_present(struct mem_section *ms,
-		unsigned long section_nr)
-{
-	if (section_nr > __highest_present_section_nr)
-		__highest_present_section_nr = section_nr;
-
-	ms->section_mem_map |= SECTION_MARKED_PRESENT;
-}
 
 static inline unsigned long first_present_section_nr(void)
 {

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 15/15] mm/sparse: move memory hotplug bits to sparse-vmemmap.c
  2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (13 preceding siblings ...)
  2026-03-20 22:13 ` [PATCH v2 14/15] mm/sparse: move __section_mark_present() " David Hildenbrand (Arm)
@ 2026-03-20 22:13 ` David Hildenbrand (Arm)
  14 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv, David Hildenbrand (Arm)

Let's move all memory hoptplug related code to sparse-vmemmap.c.

We only have to expose sparse_index_init(). While at it, drop the
definition of sparse_index_init() for !CONFIG_SPARSEMEM, which is unused,
and place the declaration in internal.h.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/mmzone.h |   1 -
 mm/internal.h          |   4 +
 mm/sparse-vmemmap.c    | 304 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/sparse.c            | 310 +------------------------------------------------
 4 files changed, 310 insertions(+), 309 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index dcbbf36ed88c..e11513f581eb 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2390,7 +2390,6 @@ static inline unsigned long next_present_section_nr(unsigned long section_nr)
 #endif
 
 #else
-#define sparse_index_init(_sec, _nid)  do {} while (0)
 #define sparse_vmemmap_init_nid_early(_nid) do {} while (0)
 #define sparse_vmemmap_init_nid_late(_nid) do {} while (0)
 #define pfn_in_present_section pfn_valid
diff --git a/mm/internal.h b/mm/internal.h
index b002c91e40a5..83e781147a28 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -965,6 +965,7 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
  */
 #ifdef CONFIG_SPARSEMEM
 void sparse_init(void);
+int sparse_index_init(unsigned long section_nr, int nid);
 
 static inline void sparse_init_one_section(struct mem_section *ms,
 		unsigned long pnum, struct page *mem_map,
@@ -1000,6 +1001,9 @@ static inline void __section_mark_present(struct mem_section *ms,
 static inline void sparse_init(void) {}
 #endif /* CONFIG_SPARSEMEM */
 
+/*
+ * mm/sparse-vmemmap.c
+ */
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 void sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages);
 #else
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index f0690797667f..08fef7b5c8b0 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -591,3 +591,307 @@ void __init sparse_vmemmap_init_nid_late(int nid)
 	hugetlb_vmemmap_init_late(nid);
 }
 #endif
+
+static void subsection_mask_set(unsigned long *map, unsigned long pfn,
+		unsigned long nr_pages)
+{
+	int idx = subsection_map_index(pfn);
+	int end = subsection_map_index(pfn + nr_pages - 1);
+
+	bitmap_set(map, idx, end - idx + 1);
+}
+
+void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
+	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
+
+	for (nr = start_sec_nr; nr <= end_sec_nr; nr++) {
+		struct mem_section *ms;
+		unsigned long pfns;
+
+		pfns = min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK));
+		ms = __nr_to_section(nr);
+		subsection_mask_set(ms->usage->subsection_map, pfn, pfns);
+
+		pr_debug("%s: sec: %lu pfns: %lu set(%d, %d)\n", __func__, nr,
+				pfns, subsection_map_index(pfn),
+				subsection_map_index(pfn + pfns - 1));
+
+		pfn += pfns;
+		nr_pages -= pfns;
+	}
+}
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+
+/* Mark all memory sections within the pfn range as online */
+void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		unsigned long section_nr = pfn_to_section_nr(pfn);
+		struct mem_section *ms = __nr_to_section(section_nr);
+
+		ms->section_mem_map |= SECTION_IS_ONLINE;
+	}
+}
+
+/* Mark all memory sections within the pfn range as offline */
+void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		unsigned long section_nr = pfn_to_section_nr(pfn);
+		struct mem_section *ms = __nr_to_section(section_nr);
+
+		ms->section_mem_map &= ~SECTION_IS_ONLINE;
+	}
+}
+
+static struct page * __meminit populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
+		struct dev_pagemap *pgmap)
+{
+	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
+}
+
+static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap)
+{
+	unsigned long start = (unsigned long) pfn_to_page(pfn);
+	unsigned long end = start + nr_pages * sizeof(struct page);
+
+	vmemmap_free(start, end, altmap);
+}
+static void free_map_bootmem(struct page *memmap)
+{
+	unsigned long start = (unsigned long)memmap;
+	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+
+	vmemmap_free(start, end, NULL);
+}
+
+static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
+	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
+	struct mem_section *ms = __pfn_to_section(pfn);
+	unsigned long *subsection_map = ms->usage
+		? &ms->usage->subsection_map[0] : NULL;
+
+	subsection_mask_set(map, pfn, nr_pages);
+	if (subsection_map)
+		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
+
+	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
+				"section already deactivated (%#lx + %ld)\n",
+				pfn, nr_pages))
+		return -EINVAL;
+
+	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
+	return 0;
+}
+
+static bool is_subsection_map_empty(struct mem_section *ms)
+{
+	return bitmap_empty(&ms->usage->subsection_map[0],
+			    SUBSECTIONS_PER_SECTION);
+}
+
+static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
+	unsigned long *subsection_map;
+	int rc = 0;
+
+	subsection_mask_set(map, pfn, nr_pages);
+
+	subsection_map = &ms->usage->subsection_map[0];
+
+	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
+		rc = -EINVAL;
+	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
+		rc = -EEXIST;
+	else
+		bitmap_or(subsection_map, map, subsection_map,
+				SUBSECTIONS_PER_SECTION);
+
+	return rc;
+}
+
+/*
+ * To deactivate a memory region, there are 3 cases to handle:
+ *
+ * 1. deactivation of a partial hot-added section:
+ *      a) section was present at memory init.
+ *      b) section was hot-added post memory init.
+ * 2. deactivation of a complete hot-added section.
+ * 3. deactivation of a complete section from memory init.
+ *
+ * For 1, when subsection_map does not empty we will not be freeing the
+ * usage map, but still need to free the vmemmap range.
+ */
+static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+	bool section_is_early = early_section(ms);
+	struct page *memmap = NULL;
+	bool empty;
+
+	if (clear_subsection_map(pfn, nr_pages))
+		return;
+
+	empty = is_subsection_map_empty(ms);
+	if (empty) {
+		/*
+		 * Mark the section invalid so that valid_section()
+		 * return false. This prevents code from dereferencing
+		 * ms->usage array.
+		 */
+		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
+
+		/*
+		 * When removing an early section, the usage map is kept (as the
+		 * usage maps of other sections fall into the same page). It
+		 * will be re-used when re-adding the section - which is then no
+		 * longer an early section. If the usage map is PageReserved, it
+		 * was allocated during boot.
+		 */
+		if (!PageReserved(virt_to_page(ms->usage))) {
+			kfree_rcu(ms->usage, rcu);
+			WRITE_ONCE(ms->usage, NULL);
+		}
+		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
+	}
+
+	/*
+	 * The memmap of early sections is always fully populated. See
+	 * section_activate() and pfn_valid() .
+	 */
+	if (!section_is_early) {
+		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+		depopulate_section_memmap(pfn, nr_pages, altmap);
+	} else if (memmap) {
+		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
+							  PAGE_SIZE)));
+		free_map_bootmem(memmap);
+	}
+
+	if (empty)
+		ms->section_mem_map = (unsigned long)NULL;
+}
+
+static struct page * __meminit section_activate(int nid, unsigned long pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap,
+		struct dev_pagemap *pgmap)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+	struct mem_section_usage *usage = NULL;
+	struct page *memmap;
+	int rc;
+
+	if (!ms->usage) {
+		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
+		if (!usage)
+			return ERR_PTR(-ENOMEM);
+		ms->usage = usage;
+	}
+
+	rc = fill_subsection_map(pfn, nr_pages);
+	if (rc) {
+		if (usage)
+			ms->usage = NULL;
+		kfree(usage);
+		return ERR_PTR(rc);
+	}
+
+	/*
+	 * The early init code does not consider partially populated
+	 * initial sections, it simply assumes that memory will never be
+	 * referenced.  If we hot-add memory into such a section then we
+	 * do not need to populate the memmap and can simply reuse what
+	 * is already there.
+	 */
+	if (nr_pages < PAGES_PER_SECTION && early_section(ms))
+		return pfn_to_page(pfn);
+
+	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
+	if (!memmap) {
+		section_deactivate(pfn, nr_pages, altmap);
+		return ERR_PTR(-ENOMEM);
+	}
+	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+
+	return memmap;
+}
+
+/**
+ * sparse_add_section - add a memory section, or populate an existing one
+ * @nid: The node to add section on
+ * @start_pfn: start pfn of the memory range
+ * @nr_pages: number of pfns to add in the section
+ * @altmap: alternate pfns to allocate the memmap backing store
+ * @pgmap: alternate compound page geometry for devmap mappings
+ *
+ * This is only intended for hotplug.
+ *
+ * Note that only VMEMMAP supports sub-section aligned hotplug,
+ * the proper alignment and size are gated by check_pfn_span().
+ *
+ *
+ * Return:
+ * * 0		- On success.
+ * * -EEXIST	- Section has been present.
+ * * -ENOMEM	- Out of memory.
+ */
+int __meminit sparse_add_section(int nid, unsigned long start_pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap,
+		struct dev_pagemap *pgmap)
+{
+	unsigned long section_nr = pfn_to_section_nr(start_pfn);
+	struct mem_section *ms;
+	struct page *memmap;
+	int ret;
+
+	ret = sparse_index_init(section_nr, nid);
+	if (ret < 0)
+		return ret;
+
+	memmap = section_activate(nid, start_pfn, nr_pages, altmap, pgmap);
+	if (IS_ERR(memmap))
+		return PTR_ERR(memmap);
+
+	/*
+	 * Poison uninitialized struct pages in order to catch invalid flags
+	 * combinations.
+	 */
+	page_init_poison(memmap, sizeof(struct page) * nr_pages);
+
+	ms = __nr_to_section(section_nr);
+	__section_mark_present(ms, section_nr);
+
+	/* Align memmap to section boundary in the subsection case */
+	if (section_nr_to_pfn(section_nr) != start_pfn)
+		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
+	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
+
+	return 0;
+}
+
+void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
+			   struct vmem_altmap *altmap)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+
+	if (WARN_ON_ONCE(!valid_section(ms)))
+		return;
+
+	section_deactivate(pfn, nr_pages, altmap);
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/mm/sparse.c b/mm/sparse.c
index ecd4c41c0ff0..007fd52c621e 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -79,7 +79,7 @@ static noinline struct mem_section __ref *sparse_index_alloc(int nid)
 	return section;
 }
 
-static int __meminit sparse_index_init(unsigned long section_nr, int nid)
+int __meminit sparse_index_init(unsigned long section_nr, int nid)
 {
 	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
 	struct mem_section *section;
@@ -103,7 +103,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
 	return 0;
 }
 #else /* !SPARSEMEM_EXTREME */
-static inline int sparse_index_init(unsigned long section_nr, int nid)
+int sparse_index_init(unsigned long section_nr, int nid)
 {
 	return 0;
 }
@@ -167,40 +167,6 @@ static inline unsigned long first_present_section_nr(void)
 	return next_present_section_nr(-1);
 }
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static void subsection_mask_set(unsigned long *map, unsigned long pfn,
-		unsigned long nr_pages)
-{
-	int idx = subsection_map_index(pfn);
-	int end = subsection_map_index(pfn + nr_pages - 1);
-
-	bitmap_set(map, idx, end - idx + 1);
-}
-
-void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
-	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
-
-	for (nr = start_sec_nr; nr <= end_sec_nr; nr++) {
-		struct mem_section *ms;
-		unsigned long pfns;
-
-		pfns = min(nr_pages, PAGES_PER_SECTION
-				- (pfn & ~PAGE_SECTION_MASK));
-		ms = __nr_to_section(nr);
-		subsection_mask_set(ms->usage->subsection_map, pfn, pfns);
-
-		pr_debug("%s: sec: %lu pfns: %lu set(%d, %d)\n", __func__, nr,
-				pfns, subsection_map_index(pfn),
-				subsection_map_index(pfn + pfns - 1));
-
-		pfn += pfns;
-		nr_pages -= pfns;
-	}
-}
-#endif
-
 /* Record a memory area against a node. */
 static void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -482,275 +448,3 @@ void __init sparse_init(void)
 	sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
 	vmemmap_populate_print_last();
 }
-
-#ifdef CONFIG_MEMORY_HOTPLUG
-
-/* Mark all memory sections within the pfn range as online */
-void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-
-	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
-		unsigned long section_nr = pfn_to_section_nr(pfn);
-		struct mem_section *ms = __nr_to_section(section_nr);
-
-		ms->section_mem_map |= SECTION_IS_ONLINE;
-	}
-}
-
-/* Mark all memory sections within the pfn range as offline */
-void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-
-	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
-		unsigned long section_nr = pfn_to_section_nr(pfn);
-		struct mem_section *ms = __nr_to_section(section_nr);
-
-		ms->section_mem_map &= ~SECTION_IS_ONLINE;
-	}
-}
-
-static struct page * __meminit populate_section_memmap(unsigned long pfn,
-		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
-		struct dev_pagemap *pgmap)
-{
-	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
-}
-
-static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
-{
-	unsigned long start = (unsigned long) pfn_to_page(pfn);
-	unsigned long end = start + nr_pages * sizeof(struct page);
-
-	vmemmap_free(start, end, altmap);
-}
-static void free_map_bootmem(struct page *memmap)
-{
-	unsigned long start = (unsigned long)memmap;
-	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
-
-	vmemmap_free(start, end, NULL);
-}
-
-static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
-	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
-	struct mem_section *ms = __pfn_to_section(pfn);
-	unsigned long *subsection_map = ms->usage
-		? &ms->usage->subsection_map[0] : NULL;
-
-	subsection_mask_set(map, pfn, nr_pages);
-	if (subsection_map)
-		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
-
-	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
-				"section already deactivated (%#lx + %ld)\n",
-				pfn, nr_pages))
-		return -EINVAL;
-
-	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
-	return 0;
-}
-
-static bool is_subsection_map_empty(struct mem_section *ms)
-{
-	return bitmap_empty(&ms->usage->subsection_map[0],
-			    SUBSECTIONS_PER_SECTION);
-}
-
-static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	struct mem_section *ms = __pfn_to_section(pfn);
-	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
-	unsigned long *subsection_map;
-	int rc = 0;
-
-	subsection_mask_set(map, pfn, nr_pages);
-
-	subsection_map = &ms->usage->subsection_map[0];
-
-	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
-		rc = -EINVAL;
-	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
-		rc = -EEXIST;
-	else
-		bitmap_or(subsection_map, map, subsection_map,
-				SUBSECTIONS_PER_SECTION);
-
-	return rc;
-}
-
-/*
- * To deactivate a memory region, there are 3 cases to handle:
- *
- * 1. deactivation of a partial hot-added section:
- *      a) section was present at memory init.
- *      b) section was hot-added post memory init.
- * 2. deactivation of a complete hot-added section.
- * 3. deactivation of a complete section from memory init.
- *
- * For 1, when subsection_map does not empty we will not be freeing the
- * usage map, but still need to free the vmemmap range.
- */
-static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
-{
-	struct mem_section *ms = __pfn_to_section(pfn);
-	bool section_is_early = early_section(ms);
-	struct page *memmap = NULL;
-	bool empty;
-
-	if (clear_subsection_map(pfn, nr_pages))
-		return;
-
-	empty = is_subsection_map_empty(ms);
-	if (empty) {
-		/*
-		 * Mark the section invalid so that valid_section()
-		 * return false. This prevents code from dereferencing
-		 * ms->usage array.
-		 */
-		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
-
-		/*
-		 * When removing an early section, the usage map is kept (as the
-		 * usage maps of other sections fall into the same page). It
-		 * will be re-used when re-adding the section - which is then no
-		 * longer an early section. If the usage map is PageReserved, it
-		 * was allocated during boot.
-		 */
-		if (!PageReserved(virt_to_page(ms->usage))) {
-			kfree_rcu(ms->usage, rcu);
-			WRITE_ONCE(ms->usage, NULL);
-		}
-		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
-	}
-
-	/*
-	 * The memmap of early sections is always fully populated. See
-	 * section_activate() and pfn_valid() .
-	 */
-	if (!section_is_early) {
-		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
-		depopulate_section_memmap(pfn, nr_pages, altmap);
-	} else if (memmap) {
-		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
-							  PAGE_SIZE)));
-		free_map_bootmem(memmap);
-	}
-
-	if (empty)
-		ms->section_mem_map = (unsigned long)NULL;
-}
-
-static struct page * __meminit section_activate(int nid, unsigned long pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		struct dev_pagemap *pgmap)
-{
-	struct mem_section *ms = __pfn_to_section(pfn);
-	struct mem_section_usage *usage = NULL;
-	struct page *memmap;
-	int rc;
-
-	if (!ms->usage) {
-		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
-		if (!usage)
-			return ERR_PTR(-ENOMEM);
-		ms->usage = usage;
-	}
-
-	rc = fill_subsection_map(pfn, nr_pages);
-	if (rc) {
-		if (usage)
-			ms->usage = NULL;
-		kfree(usage);
-		return ERR_PTR(rc);
-	}
-
-	/*
-	 * The early init code does not consider partially populated
-	 * initial sections, it simply assumes that memory will never be
-	 * referenced.  If we hot-add memory into such a section then we
-	 * do not need to populate the memmap and can simply reuse what
-	 * is already there.
-	 */
-	if (nr_pages < PAGES_PER_SECTION && early_section(ms))
-		return pfn_to_page(pfn);
-
-	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
-	if (!memmap) {
-		section_deactivate(pfn, nr_pages, altmap);
-		return ERR_PTR(-ENOMEM);
-	}
-	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
-
-	return memmap;
-}
-
-/**
- * sparse_add_section - add a memory section, or populate an existing one
- * @nid: The node to add section on
- * @start_pfn: start pfn of the memory range
- * @nr_pages: number of pfns to add in the section
- * @altmap: alternate pfns to allocate the memmap backing store
- * @pgmap: alternate compound page geometry for devmap mappings
- *
- * This is only intended for hotplug.
- *
- * Note that only VMEMMAP supports sub-section aligned hotplug,
- * the proper alignment and size are gated by check_pfn_span().
- *
- *
- * Return:
- * * 0		- On success.
- * * -EEXIST	- Section has been present.
- * * -ENOMEM	- Out of memory.
- */
-int __meminit sparse_add_section(int nid, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		struct dev_pagemap *pgmap)
-{
-	unsigned long section_nr = pfn_to_section_nr(start_pfn);
-	struct mem_section *ms;
-	struct page *memmap;
-	int ret;
-
-	ret = sparse_index_init(section_nr, nid);
-	if (ret < 0)
-		return ret;
-
-	memmap = section_activate(nid, start_pfn, nr_pages, altmap, pgmap);
-	if (IS_ERR(memmap))
-		return PTR_ERR(memmap);
-
-	/*
-	 * Poison uninitialized struct pages in order to catch invalid flags
-	 * combinations.
-	 */
-	page_init_poison(memmap, sizeof(struct page) * nr_pages);
-
-	ms = __nr_to_section(section_nr);
-	__section_mark_present(ms, section_nr);
-
-	/* Align memmap to section boundary in the subsection case */
-	if (section_nr_to_pfn(section_nr) != start_pfn)
-		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
-	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
-
-	return 0;
-}
-
-void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
-			   struct vmem_altmap *altmap)
-{
-	struct mem_section *ms = __pfn_to_section(pfn);
-
-	if (WARN_ON_ONCE(!valid_section(ms)))
-		return;
-
-	section_deactivate(pfn, nr_pages, altmap);
-}
-#endif /* CONFIG_MEMORY_HOTPLUG */

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 13/15] mm/sparse: move sparse_init_one_section() to internal.h
  2026-03-20 22:13 ` [PATCH v2 13/15] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
@ 2026-03-23  8:49   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-23  8:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Oscar Salvador, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv

On 3/20/26 23:13, David Hildenbrand (Arm) wrote:
> While at it, convert the BUG_ON to a VM_WARN_ON_ONCE, avoid long lines, and
> merge sparse_encode_mem_map() into its only caller
> sparse_init_one_section().
> 
> Clarify the comment a bit, pointing at page_to_pfn().
> 
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
>  include/linux/mmzone.h |  2 +-
>  mm/internal.h          | 22 ++++++++++++++++++++++
>  mm/sparse.c            | 24 ------------------------
>  3 files changed, 23 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b694c69dee04..dcbbf36ed88c 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2008,7 +2008,7 @@ struct mem_section {
>  	/*
>  	 * This is, logically, a pointer to an array of struct
>  	 * pages.  However, it is stored with some other magic.
> -	 * (see sparse.c::sparse_init_one_section())
> +	 * (see sparse_init_one_section())
>  	 *
>  	 * Additionally during early boot we encode node id of
>  	 * the location of the section here to guide allocation.
> diff --git a/mm/internal.h b/mm/internal.h
> index 5f5c45d80aca..2f188f7702f7 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -965,6 +965,28 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
>   */
>  #ifdef CONFIG_SPARSEMEM
>  void sparse_init(void);
> +
> +static inline void sparse_init_one_section(struct mem_section *ms,
> +		unsigned long pnum, struct page *mem_map,
> +		struct mem_section_usage *usage, unsigned long flags)
> +{
> +	unsigned long coded_mem_map;
> +
> +	BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
> +
> +	/*
> +	 * We encode the start PFN of the section into the mem_map such that
> +	 * page_to_pfn() on !CONFIG_SPARSEMEM_VMEMMAP can simply subtract it
> +	 * from the page pointer to obtain the PFN.
> +	 */
> +	coded_mem_map = (unsigned long)(mem_map - section_nr_to_pfn(pnum));
> +	VM_WARN_ON(coded_mem_map & ~SECTION_MAP_MASK);
> +
> +	ms->section_mem_map &= ~SECTION_MAP_MASK;
> +	ms->section_mem_map |= coded_mem_map;
> +	ms->section_mem_map |= flags | SECTION_HAS_MEM_MAP;
> +	ms->usage = usage;

The following fixup on top:

From 66ec42c610feb3b84c405222f3f39ba0776549c6 Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@kernel.org>
Date: Mon, 23 Mar 2026 09:48:21 +0100
Subject: [PATCH] fixup: mm/sparse: move sparse_init_one_section() to
 internal.h

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/internal.h b/mm/internal.h
index 2f188f7702f7..969e58e5b3db 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -980,7 +980,7 @@ static inline void sparse_init_one_section(struct mem_section *ms,
 	 * from the page pointer to obtain the PFN.
 	 */
 	coded_mem_map = (unsigned long)(mem_map - section_nr_to_pfn(pnum));
-	VM_WARN_ON(coded_mem_map & ~SECTION_MAP_MASK);
+	VM_WARN_ON_ONCE(coded_mem_map & ~SECTION_MAP_MASK);
 
 	ms->section_mem_map &= ~SECTION_MAP_MASK;
 	ms->section_mem_map |= coded_mem_map;
-- 
2.43.0


-- 
Cheers,

David


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 01/15] mm/memory_hotplug: fix possible race in scan_movable_pages()
  2026-03-20 22:13 ` [PATCH v2 01/15] mm/memory_hotplug: fix possible race in scan_movable_pages() David Hildenbrand (Arm)
@ 2026-03-23 13:26   ` Lorenzo Stoakes (Oracle)
  2026-03-23 13:40     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 20+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-23 13:26 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, Andrew Morton, Oscar Salvador, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv

On Fri, Mar 20, 2026 at 11:13:33PM +0100, David Hildenbrand (Arm) wrote:
> If a hugetlb folio gets freed while we are in scan_movable_pages(),
> folio_nr_pages() could return 0, resulting in or'ing "0 - 1 = -1"
> to the PFN, resulting in PFN = -1. We're not holding any locks or
> references that would prevent that.
>
> for_each_valid_pfn() would then search for the next valid PFN, and could
> return a PFN that is outside of the range of the original requested
> range. do_migrate_page() would then try to migrate quite a big range,
> which is certainly undesirable.
>
> To fix it, simply test for valid folio_nr_pages() values. While at it,
> as PageHuge() really just does a page_folio() internally, we can just
> use folio_test_hugetlb() on the folio directly.
>
> scan_movable_pages() is expected to be fast, and we try to avoid taking
> locks or grabbing references. We cannot use folio_try_get() as that does
> not work for free hugetlb folios. We could grab the hugetlb_lock, but
> that just adds complexity.
>
> The race is unlikely to trigger in practice, so we won't be CCing
> stable.
>
> Fixes: 16540dae959d ("mm/hugetlb: mm/memory_hotplug: use a folio in scan_movable_pages()")
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Logic looks right to me, though some nits below. With those accounted for:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/memory_hotplug.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 86d3faf50453..969cd7ddf68f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1747,6 +1747,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  	unsigned long pfn;
>
>  	for_each_valid_pfn(pfn, start, end) {
> +		unsigned long nr_pages;
>  		struct page *page;
>  		struct folio *folio;
>
> @@ -1763,9 +1764,9 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  		if (PageOffline(page) && page_count(page))
>  			return -EBUSY;
>
> -		if (!PageHuge(page))

Yeah interesting to see this is folio_test_hugetlb(page_folio(page)) :))

So this is a nice change for sure.

> -			continue;
>  		folio = page_folio(page);
> +		if (!folio_test_hugetlb(folio))
> +			continue;
>  		/*
>  		 * This test is racy as we hold no reference or lock.  The
>  		 * hugetlb page could have been free'ed and head is no longer
> @@ -1775,7 +1776,11 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  		 */
>  		if (folio_test_hugetlb_migratable(folio))
>  			goto found;
> -		pfn |= folio_nr_pages(folio) - 1;
> +		nr_pages = folio_nr_pages(folio);
> +		if (unlikely(nr_pages < 1 || nr_pages > MAX_FOLIO_NR_PAGES ||

NIT: since nr_pages is an unsigned long, would this be better as !nr_pages || ...?

> +			     !is_power_of_2(nr_pages)))

Could the latter two conditions ever really happen? I guess some weird tearing
or something maybe?

It would also be nice to maybe separate this out as is_valid_nr_pages() or
something, but then again, I suppose given this is a rare case of us
checking this under circumstances where the value might not be valid, maybe
not worth it.

> +			continue;
> +		pfn |= nr_pages - 1;
>  	}
>  	return -ENOENT;
>  found:
>
> --
> 2.43.0
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 01/15] mm/memory_hotplug: fix possible race in scan_movable_pages()
  2026-03-23 13:26   ` Lorenzo Stoakes (Oracle)
@ 2026-03-23 13:40     ` David Hildenbrand (Arm)
  2026-03-23 14:00       ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-23 13:40 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: linux-kernel, Andrew Morton, Oscar Salvador, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv

On 3/23/26 14:26, Lorenzo Stoakes (Oracle) wrote:
> On Fri, Mar 20, 2026 at 11:13:33PM +0100, David Hildenbrand (Arm) wrote:
>> If a hugetlb folio gets freed while we are in scan_movable_pages(),
>> folio_nr_pages() could return 0, resulting in or'ing "0 - 1 = -1"
>> to the PFN, resulting in PFN = -1. We're not holding any locks or
>> references that would prevent that.
>>
>> for_each_valid_pfn() would then search for the next valid PFN, and could
>> return a PFN that is outside of the range of the original requested
>> range. do_migrate_page() would then try to migrate quite a big range,
>> which is certainly undesirable.
>>
>> To fix it, simply test for valid folio_nr_pages() values. While at it,
>> as PageHuge() really just does a page_folio() internally, we can just
>> use folio_test_hugetlb() on the folio directly.
>>
>> scan_movable_pages() is expected to be fast, and we try to avoid taking
>> locks or grabbing references. We cannot use folio_try_get() as that does
>> not work for free hugetlb folios. We could grab the hugetlb_lock, but
>> that just adds complexity.
>>
>> The race is unlikely to trigger in practice, so we won't be CCing
>> stable.
>>
>> Fixes: 16540dae959d ("mm/hugetlb: mm/memory_hotplug: use a folio in scan_movable_pages()")
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> Logic looks right to me, though some nits below. With those accounted for:
> 
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> 
>> ---
>>  mm/memory_hotplug.c | 11 ++++++++---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 86d3faf50453..969cd7ddf68f 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1747,6 +1747,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>>  	unsigned long pfn;
>>
>>  	for_each_valid_pfn(pfn, start, end) {
>> +		unsigned long nr_pages;
>>  		struct page *page;
>>  		struct folio *folio;
>>
>> @@ -1763,9 +1764,9 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>>  		if (PageOffline(page) && page_count(page))
>>  			return -EBUSY;
>>
>> -		if (!PageHuge(page))
> 
> Yeah interesting to see this is folio_test_hugetlb(page_folio(page)) :))
> 
> So this is a nice change for sure.
> 
>> -			continue;
>>  		folio = page_folio(page);
>> +		if (!folio_test_hugetlb(folio))
>> +			continue;
>>  		/*
>>  		 * This test is racy as we hold no reference or lock.  The
>>  		 * hugetlb page could have been free'ed and head is no longer
>> @@ -1775,7 +1776,11 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>>  		 */
>>  		if (folio_test_hugetlb_migratable(folio))
>>  			goto found;
>> -		pfn |= folio_nr_pages(folio) - 1;
>> +		nr_pages = folio_nr_pages(folio);
>> +		if (unlikely(nr_pages < 1 || nr_pages > MAX_FOLIO_NR_PAGES ||
> 
> NIT: since nr_pages is an unsigned long, would this be better as !nr_pages || ...?

It's easier on the brain when spotting that only a given range is
allowed, without having to remember the exact type of the variable :)

So I guess it doesn't really make a difference in the end.

> 
>> +			     !is_power_of_2(nr_pages)))
> 
> Could the latter two conditions ever really happen? I guess some weird tearing
> or something maybe?

Yes, or when the fields gets reused for something else.

> 
> It would also be nice to maybe separate this out as is_valid_nr_pages() or
> something, but then again, I suppose given this is a rare case of us
> checking this under circumstances where the value might not be valid, maybe
> not worth it.

I had the same thought. But this code is way too special regarding
raciness that I hope nobody else will really require this ... and if
they do, they might be doing something wrong :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 01/15] mm/memory_hotplug: fix possible race in scan_movable_pages()
  2026-03-23 13:40     ` David Hildenbrand (Arm)
@ 2026-03-23 14:00       ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 20+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-23 14:00 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, Andrew Morton, Oscar Salvador, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Sidhartha Kumar,
	linux-mm, linux-cxl, linux-riscv

On Mon, Mar 23, 2026 at 02:40:16PM +0100, David Hildenbrand (Arm) wrote:
> On 3/23/26 14:26, Lorenzo Stoakes (Oracle) wrote:
> > On Fri, Mar 20, 2026 at 11:13:33PM +0100, David Hildenbrand (Arm) wrote:
> >> If a hugetlb folio gets freed while we are in scan_movable_pages(),
> >> folio_nr_pages() could return 0, resulting in or'ing "0 - 1 = -1"
> >> to the PFN, resulting in PFN = -1. We're not holding any locks or
> >> references that would prevent that.
> >>
> >> for_each_valid_pfn() would then search for the next valid PFN, and could
> >> return a PFN that is outside of the range of the original requested
> >> range. do_migrate_page() would then try to migrate quite a big range,
> >> which is certainly undesirable.
> >>
> >> To fix it, simply test for valid folio_nr_pages() values. While at it,
> >> as PageHuge() really just does a page_folio() internally, we can just
> >> use folio_test_hugetlb() on the folio directly.
> >>
> >> scan_movable_pages() is expected to be fast, and we try to avoid taking
> >> locks or grabbing references. We cannot use folio_try_get() as that does
> >> not work for free hugetlb folios. We could grab the hugetlb_lock, but
> >> that just adds complexity.
> >>
> >> The race is unlikely to trigger in practice, so we won't be CCing
> >> stable.
> >>
> >> Fixes: 16540dae959d ("mm/hugetlb: mm/memory_hotplug: use a folio in scan_movable_pages()")
> >> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> >
> > Logic looks right to me, though some nits below. With those accounted for:
> >
> > Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> >
> >> ---
> >>  mm/memory_hotplug.c | 11 ++++++++---
> >>  1 file changed, 8 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >> index 86d3faf50453..969cd7ddf68f 100644
> >> --- a/mm/memory_hotplug.c
> >> +++ b/mm/memory_hotplug.c
> >> @@ -1747,6 +1747,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
> >>  	unsigned long pfn;
> >>
> >>  	for_each_valid_pfn(pfn, start, end) {
> >> +		unsigned long nr_pages;
> >>  		struct page *page;
> >>  		struct folio *folio;
> >>
> >> @@ -1763,9 +1764,9 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
> >>  		if (PageOffline(page) && page_count(page))
> >>  			return -EBUSY;
> >>
> >> -		if (!PageHuge(page))
> >
> > Yeah interesting to see this is folio_test_hugetlb(page_folio(page)) :))
> >
> > So this is a nice change for sure.
> >
> >> -			continue;
> >>  		folio = page_folio(page);
> >> +		if (!folio_test_hugetlb(folio))
> >> +			continue;
> >>  		/*
> >>  		 * This test is racy as we hold no reference or lock.  The
> >>  		 * hugetlb page could have been free'ed and head is no longer
> >> @@ -1775,7 +1776,11 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
> >>  		 */
> >>  		if (folio_test_hugetlb_migratable(folio))
> >>  			goto found;
> >> -		pfn |= folio_nr_pages(folio) - 1;
> >> +		nr_pages = folio_nr_pages(folio);
> >> +		if (unlikely(nr_pages < 1 || nr_pages > MAX_FOLIO_NR_PAGES ||
> >
> > NIT: since nr_pages is an unsigned long, would this be better as !nr_pages || ...?
>
> It's easier on the brain when spotting that only a given range is
> allowed, without having to remember the exact type of the variable :)

Yeah it's not a big deal!

>
> So I guess it doesn't really make a difference in the end.
>
> >
> >> +			     !is_power_of_2(nr_pages)))
> >
> > Could the latter two conditions ever really happen? I guess some weird tearing
> > or something maybe?
>
> Yes, or when the fields gets reused for something else.
>
> >
> > It would also be nice to maybe separate this out as is_valid_nr_pages() or
> > something, but then again, I suppose given this is a rare case of us
> > checking this under circumstances where the value might not be valid, maybe
> > not worth it.
>
> I had the same thought. But this code is way too special regarding
> raciness that I hope nobody else will really require this ... and if
> they do, they might be doing something wrong :)

Yeah for sure, it does seem unique to this situation, so probably not worth
it!

>
> --
> Cheers,
>
> David

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-03-23 14:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20 22:13 [PATCH v2 00/15] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 01/15] mm/memory_hotplug: fix possible race in scan_movable_pages() David Hildenbrand (Arm)
2026-03-23 13:26   ` Lorenzo Stoakes (Oracle)
2026-03-23 13:40     ` David Hildenbrand (Arm)
2026-03-23 14:00       ` Lorenzo Stoakes (Oracle)
2026-03-20 22:13 ` [PATCH v2 02/15] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 03/15] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections() David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 04/15] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 05/15] mm/memory_hotplug: simplify check_pfn_span() David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 06/15] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 07/15] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 08/15] mm/bootmem_info: avoid using sparse_decode_mem_map() David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 09/15] mm/sparse: remove sparse_decode_mem_map() David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 10/15] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 11/15] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 12/15] mm/sparse: drop set_section_nid() from sparse_add_section() David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 13/15] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
2026-03-23  8:49   ` David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 14/15] mm/sparse: move __section_mark_present() " David Hildenbrand (Arm)
2026-03-20 22:13 ` [PATCH v2 15/15] mm/sparse: move memory hotplug bits to sparse-vmemmap.c David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox