public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups
@ 2026-03-17 16:56 David Hildenbrand (Arm)
  2026-03-17 16:56 ` [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
                   ` (14 more replies)
  0 siblings, 15 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

Some cleanups around memory hot(un)plug and SPARSEMEM. In essence,
we can limit CONFIG_MEMORY_HOTPLUG to CONFIG_SPARSEMEM_VMEMMAP,
remove some dead code, and move all the hotplug bits over to
mm/sparse-vmemmap.c.

Some further/related cleanups around other unnecessary code
(memory hole handling and complicated usemap allocation).

I have some further sparse.c cleanups lying around, and I'm planning
on getting rid of bootmem_info.c entirely.

Cross-compiled on a bunch of machines. Hot(un)plug tested with virtio-mem.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>

David Hildenbrand (Arm) (14):
  mm/memory_hotplug: remove for_each_valid_pfn() usage
  mm/sparse: remove WARN_ONs from (online|offline)_mem_sections()
  mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on
    CONFIG_SPARSEMEM_VMEMMAP
  mm/memory_hotplug: simplify check_pfn_span()
  mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for
    CONFIG_MEMORY_HOTPLUG
  mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP
  mm/bootmem_info: avoid using sparse_decode_mem_map()
  mm/sparse: remove sparse_decode_mem_map()
  mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation
    handling
  mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c
  mm/sparse: drop set_section_nid() from sparse_add_section()
  mm/sparse: move sparse_init_one_section() to internal.h
  mm/sparse: move __section_mark_present() to internal.h
  mm/sparse: move memory hotplug bits to sparse-vmemmap.c

 include/linux/memory_hotplug.h |   2 -
 include/linux/mmzone.h         |   6 +-
 mm/Kconfig                     |   2 +-
 mm/bootmem_info.c              |  46 +--
 mm/internal.h                  |  47 +++
 mm/memory_hotplug.c            |  24 +-
 mm/mm_init.c                   |   2 +-
 mm/sparse-vmemmap.c            | 308 +++++++++++++++++++
 mm/sparse.c                    | 539 +--------------------------------
 9 files changed, 373 insertions(+), 603 deletions(-)


base-commit: 3f4f1faa33544d0bd724e32980b6f211c3a9bc7b
-- 
2.43.0



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 17:19   ` Lorenzo Stoakes (Oracle)
                     ` (2 more replies)
  2026-03-17 16:56 ` [PATCH 02/14] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections() David Hildenbrand (Arm)
                   ` (13 subsequent siblings)
  14 siblings, 3 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

When offlining memory, we know that the memory range has no holes.
Checking for valid pfns is not required.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/memory_hotplug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 86d3faf50453..3495d94587e7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1746,7 +1746,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
 {
 	unsigned long pfn;
 
-	for_each_valid_pfn(pfn, start, end) {
+	for (pfn = start; pfn < end; pfn++) {
 		struct page *page;
 		struct folio *folio;
 
@@ -1791,7 +1791,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
 
-	for_each_valid_pfn(pfn, start_pfn, end_pfn) {
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		struct page *page;
 
 		page = pfn_to_page(pfn);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 02/14] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections()
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
  2026-03-17 16:56 ` [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 17:21   ` Lorenzo Stoakes (Oracle)
  2026-03-18  7:53   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 03/14] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

We do not allow offlining of memory with memory holes, and always
hotplug memory without holes.

Consequently, we cannot end up onlining or offlining memory sections that
have holes (including invalid sections). That's also why these
WARN_ONs never fired.

Let's remove the WARN_ONs along with the TODO regarding double-checking.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index dfabe554adf8..93252112860e 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -638,13 +638,8 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
 		unsigned long section_nr = pfn_to_section_nr(pfn);
-		struct mem_section *ms;
-
-		/* onlining code should never touch invalid ranges */
-		if (WARN_ON(!valid_section_nr(section_nr)))
-			continue;
+		struct mem_section *ms = __nr_to_section(section_nr);
 
-		ms = __nr_to_section(section_nr);
 		ms->section_mem_map |= SECTION_IS_ONLINE;
 	}
 }
@@ -656,16 +651,8 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
 		unsigned long section_nr = pfn_to_section_nr(pfn);
-		struct mem_section *ms;
+		struct mem_section *ms = __nr_to_section(section_nr);
 
-		/*
-		 * TODO this needs some double checking. Offlining code makes
-		 * sure to check pfn_valid but those checks might be just bogus
-		 */
-		if (WARN_ON(!valid_section_nr(section_nr)))
-			continue;
-
-		ms = __nr_to_section(section_nr);
 		ms->section_mem_map &= ~SECTION_IS_ONLINE;
 	}
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 03/14] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
  2026-03-17 16:56 ` [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
  2026-03-17 16:56 ` [PATCH 02/14] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections() David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 17:22   ` Lorenzo Stoakes (Oracle)
  2026-03-18  7:55   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 04/14] mm/memory_hotplug: simplify check_pfn_span() David Hildenbrand (Arm)
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

Ever since commit f8f03eb5f0f9 ("mm: stop making SPARSEMEM_VMEMMAP
user-selectable"), an architecture that supports CONFIG_SPARSEMEM_VMEMMAP
(by selecting SPARSEMEM_VMEMMAP_ENABLE) can no longer enable
CONFIG_SPARSEMEM without CONFIG_SPARSEMEM_VMEMMAP.

Right now, CONFIG_MEMORY_HOTPLUG is guarded by CONFIG_SPARSEMEM.

However, CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG is only enabled by
* arm64: which selects SPARSEMEM_VMEMMAP_ENABLE
* loongarch: which selects SPARSEMEM_VMEMMAP_ENABLE
* powerpc (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
* riscv (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
* s390 with SPARSEMEM: which selects SPARSEMEM_VMEMMAP_ENABLE
* x86 (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE

So, we can make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
without affecting any setups.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index ebd8ea353687..c012944938a7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -472,7 +472,7 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
 menuconfig MEMORY_HOTPLUG
 	bool "Memory hotplug"
 	select MEMORY_ISOLATION
-	depends on SPARSEMEM
+	depends on SPARSEMEM_VMEMMAP
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 	depends on 64BIT
 	select NUMA_KEEP_MEMINFO if NUMA
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 04/14] mm/memory_hotplug: simplify check_pfn_span()
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (2 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 03/14] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 17:24   ` Lorenzo Stoakes (Oracle)
  2026-03-18  7:56   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 05/14] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG David Hildenbrand (Arm)
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

We now always have CONFIG_SPARSEMEM_VMEMMAP, so remove the dead code.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/memory_hotplug.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3495d94587e7..70e620496cec 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -320,21 +320,13 @@ static void release_memory_resource(struct resource *res)
 static int check_pfn_span(unsigned long pfn, unsigned long nr_pages)
 {
 	/*
-	 * Disallow all operations smaller than a sub-section and only
-	 * allow operations smaller than a section for
-	 * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
-	 * enforces a larger memory_block_size_bytes() granularity for
-	 * memory that will be marked online, so this check should only
-	 * fire for direct arch_{add,remove}_memory() users outside of
-	 * add_memory_resource().
+	 * Disallow all operations smaller than a sub-section.
+	 * Note that check_hotplug_memory_range() enforces a larger
+	 * memory_block_size_bytes() granularity for memory that will be marked
+	 * online, so this check should only fire for direct
+	 * arch_{add,remove}_memory() users outside of add_memory_resource().
 	 */
-	unsigned long min_align;
-
-	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
-		min_align = PAGES_PER_SUBSECTION;
-	else
-		min_align = PAGES_PER_SECTION;
-	if (!IS_ALIGNED(pfn | nr_pages, min_align))
+	if (!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION))
 		return -EINVAL;
 	return 0;
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 05/14] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (3 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 04/14] mm/memory_hotplug: simplify check_pfn_span() David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 17:54   ` Lorenzo Stoakes (Oracle)
  2026-03-18  7:58   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

CONFIG_MEMORY_HOTPLUG now depends on CONFIG_SPARSEMEM_SPARSEMEM. So
let's remove the !CONFIG_SPARSEMEM_VMEMMAP leftovers.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse.c | 61 -----------------------------------------------------
 1 file changed, 61 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 93252112860e..636a4a0f1199 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -657,7 +657,6 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 	}
 }
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
 static struct page * __meminit populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
@@ -729,66 +728,6 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
 
 	return rc;
 }
-#else
-static struct page * __meminit populate_section_memmap(unsigned long pfn,
-		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
-		struct dev_pagemap *pgmap)
-{
-	return kvmalloc_node(array_size(sizeof(struct page),
-					PAGES_PER_SECTION), GFP_KERNEL, nid);
-}
-
-static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
-{
-	kvfree(pfn_to_page(pfn));
-}
-
-static void free_map_bootmem(struct page *memmap)
-{
-	unsigned long maps_section_nr, removing_section_nr, i;
-	unsigned long type, nr_pages;
-	struct page *page = virt_to_page(memmap);
-
-	nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))
-		>> PAGE_SHIFT;
-
-	for (i = 0; i < nr_pages; i++, page++) {
-		type = bootmem_type(page);
-
-		BUG_ON(type == NODE_INFO);
-
-		maps_section_nr = pfn_to_section_nr(page_to_pfn(page));
-		removing_section_nr = bootmem_info(page);
-
-		/*
-		 * When this function is called, the removing section is
-		 * logical offlined state. This means all pages are isolated
-		 * from page allocator. If removing section's memmap is placed
-		 * on the same section, it must not be freed.
-		 * If it is freed, page allocator may allocate it which will
-		 * be removed physically soon.
-		 */
-		if (maps_section_nr != removing_section_nr)
-			put_page_bootmem(page);
-	}
-}
-
-static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	return 0;
-}
-
-static bool is_subsection_map_empty(struct mem_section *ms)
-{
-	return true;
-}
-
-static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	return 0;
-}
-#endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 /*
  * To deactivate a memory region, there are 3 cases to handle across
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (4 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 05/14] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 17:49   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:15   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 07/14] mm/bootmem_info: avoid using sparse_decode_mem_map() David Hildenbrand (Arm)
                   ` (8 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

It is not immediately obvious that CONFIG_HAVE_BOOTMEM_INFO_NODE is
only selected from CONFIG_MEMORY_HOTREMOVE, which itself depends on
CONFIG_MEMORY_HOTPLUG that ... depends on CONFIG_SPARSEMEM_VMEMMAP.

Let's remove the !CONFIG_SPARSEMEM_VMEMMAP leftovers.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/bootmem_info.c | 37 -------------------------------------
 1 file changed, 37 deletions(-)

diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index b0e2a9fa641f..e61e08e24924 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -40,42 +40,6 @@ void put_page_bootmem(struct page *page)
 	}
 }
 
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
-static void __init register_page_bootmem_info_section(unsigned long start_pfn)
-{
-	unsigned long mapsize, section_nr, i;
-	struct mem_section *ms;
-	struct page *page, *memmap;
-	struct mem_section_usage *usage;
-
-	section_nr = pfn_to_section_nr(start_pfn);
-	ms = __nr_to_section(section_nr);
-
-	/* Get section's memmap address */
-	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
-
-	/*
-	 * Get page for the memmap's phys address
-	 * XXX: need more consideration for sparse_vmemmap...
-	 */
-	page = virt_to_page(memmap);
-	mapsize = sizeof(struct page) * PAGES_PER_SECTION;
-	mapsize = PAGE_ALIGN(mapsize) >> PAGE_SHIFT;
-
-	/* remember memmap's page */
-	for (i = 0; i < mapsize; i++, page++)
-		get_page_bootmem(section_nr, page, SECTION_INFO);
-
-	usage = ms->usage;
-	page = virt_to_page(usage);
-
-	mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
-
-	for (i = 0; i < mapsize; i++, page++)
-		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
-
-}
-#else /* CONFIG_SPARSEMEM_VMEMMAP */
 static void __init register_page_bootmem_info_section(unsigned long start_pfn)
 {
 	unsigned long mapsize, section_nr, i;
@@ -100,7 +64,6 @@ static void __init register_page_bootmem_info_section(unsigned long start_pfn)
 	for (i = 0; i < mapsize; i++, page++)
 		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
 }
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 07/14] mm/bootmem_info: avoid using sparse_decode_mem_map()
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (5 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 18:02   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:20   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 08/14] mm/sparse: remove sparse_decode_mem_map() David Hildenbrand (Arm)
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

With SPARSEMEM_VMEMMAP, we can just do a pfn_to_page(). It is not super
clear whether the start_pfn is properly aligned ... so let's just make
sure it is.

We will soon might try to remove the bootmem info completely, for now,
just keep it working as is.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/bootmem_info.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index e61e08e24924..3d7675a3ae04 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -44,17 +44,16 @@ static void __init register_page_bootmem_info_section(unsigned long start_pfn)
 {
 	unsigned long mapsize, section_nr, i;
 	struct mem_section *ms;
-	struct page *page, *memmap;
 	struct mem_section_usage *usage;
+	struct page *page;
 
+	start_pfn = SECTION_ALIGN_DOWN(start_pfn);
 	section_nr = pfn_to_section_nr(start_pfn);
 	ms = __nr_to_section(section_nr);
 
-	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
-
 	if (!preinited_vmemmap_section(ms))
-		register_page_bootmem_memmap(section_nr, memmap,
-				PAGES_PER_SECTION);
+		register_page_bootmem_memmap(section_nr, pfn_to_page(start_pfn),
+					     PAGES_PER_SECTION);
 
 	usage = ms->usage;
 	page = virt_to_page(usage);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 08/14] mm/sparse: remove sparse_decode_mem_map()
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (6 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 07/14] mm/bootmem_info: avoid using sparse_decode_mem_map() David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 19:25   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:20   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling David Hildenbrand (Arm)
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

section_deactivate() applies to CONFIG_SPARSEMEM_VMEMMAP only. So we can
just use pfn_to_page() (after making sure we have the start PFN of the
section), and remove sparse_decode_mem_map().

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/memory_hotplug.h |  2 --
 mm/sparse.c                    | 16 +---------------
 2 files changed, 1 insertion(+), 17 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index e77ef3d7ff73..815e908c4135 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -308,8 +308,6 @@ extern int sparse_add_section(int nid, unsigned long pfn,
 		struct dev_pagemap *pgmap);
 extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
 				  struct vmem_altmap *altmap);
-extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
-					  unsigned long pnum);
 extern struct zone *zone_for_pfn_range(enum mmop online_type,
 		int nid, struct memory_group *group, unsigned long start_pfn,
 		unsigned long nr_pages);
diff --git a/mm/sparse.c b/mm/sparse.c
index 636a4a0f1199..2a1f662245bc 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -274,18 +274,6 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
 	return coded_mem_map;
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
-/*
- * Decode mem_map from the coded memmap
- */
-struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
-{
-	/* mask off the extra low bits of information */
-	coded_mem_map &= SECTION_MAP_MASK;
-	return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
-}
-#endif /* CONFIG_MEMORY_HOTPLUG */
-
 static void __meminit sparse_init_one_section(struct mem_section *ms,
 		unsigned long pnum, struct page *mem_map,
 		struct mem_section_usage *usage, unsigned long flags)
@@ -758,8 +746,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 
 	empty = is_subsection_map_empty(ms);
 	if (empty) {
-		unsigned long section_nr = pfn_to_section_nr(pfn);
-
 		/*
 		 * Mark the section invalid so that valid_section()
 		 * return false. This prevents code from dereferencing
@@ -778,7 +764,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 			kfree_rcu(ms->usage, rcu);
 			WRITE_ONCE(ms->usage, NULL);
 		}
-		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
 	}
 
 	/*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (7 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 08/14] mm/sparse: remove sparse_decode_mem_map() David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 19:48   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:34   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c David Hildenbrand (Arm)
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
usemap on the section with pgdat") quite some complexity to try
allocating memory for the "usemap" (storing pageblock information
per memory section) for a memory section close to the memory of the
"pgdat" of the node.

The goal was to make memory hotunplug of boot memory more likely to
succeed. That commit also added some checks for circular dependencies
between two memory sections, whereby two memory sections would contain
each others usemap, turning bot memory sections un-removable.

However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
together") started allocating the usemap for multiple memory
sections on the same node in one chunk, effectively grouping all usemap
allocations of the same node in a single memblock allocation.

We don't really give guarantees about memory hotunplug of boot memory, and
with the change in 2010, it is pretty much impossible in practice to get
any circular dependencies.

commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
pgdat") also added the comment:

	"Similarly, a pgdat can prevent a section being removed. If
	 section A contains a pgdat and section B
	 contains the usemap, both sections become inter-dependent."

Given that we don't free the pgdat anymore, that comment (and handling)
does not apply.

So let's simply remove this complexity.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse.c | 100 +---------------------------------------------------
 1 file changed, 1 insertion(+), 99 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 2a1f662245bc..b57c81e99340 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -294,102 +294,6 @@ size_t mem_section_usage_size(void)
 	return sizeof(struct mem_section_usage) + usemap_size();
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
-static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
-{
-#ifndef CONFIG_NUMA
-	VM_BUG_ON(pgdat != &contig_page_data);
-	return __pa_symbol(&contig_page_data);
-#else
-	return __pa(pgdat);
-#endif
-}
-
-static struct mem_section_usage * __init
-sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
-					 unsigned long size)
-{
-	struct mem_section_usage *usage;
-	unsigned long goal, limit;
-	int nid;
-	/*
-	 * A page may contain usemaps for other sections preventing the
-	 * page being freed and making a section unremovable while
-	 * other sections referencing the usemap remain active. Similarly,
-	 * a pgdat can prevent a section being removed. If section A
-	 * contains a pgdat and section B contains the usemap, both
-	 * sections become inter-dependent. This allocates usemaps
-	 * from the same section as the pgdat where possible to avoid
-	 * this problem.
-	 */
-	goal = pgdat_to_phys(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
-	limit = goal + (1UL << PA_SECTION_SHIFT);
-	nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
-again:
-	usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid);
-	if (!usage && limit) {
-		limit = MEMBLOCK_ALLOC_ACCESSIBLE;
-		goto again;
-	}
-	return usage;
-}
-
-static void __init check_usemap_section_nr(int nid,
-		struct mem_section_usage *usage)
-{
-	unsigned long usemap_snr, pgdat_snr;
-	static unsigned long old_usemap_snr;
-	static unsigned long old_pgdat_snr;
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	int usemap_nid;
-
-	/* First call */
-	if (!old_usemap_snr) {
-		old_usemap_snr = NR_MEM_SECTIONS;
-		old_pgdat_snr = NR_MEM_SECTIONS;
-	}
-
-	usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT);
-	pgdat_snr = pfn_to_section_nr(pgdat_to_phys(pgdat) >> PAGE_SHIFT);
-	if (usemap_snr == pgdat_snr)
-		return;
-
-	if (old_usemap_snr == usemap_snr && old_pgdat_snr == pgdat_snr)
-		/* skip redundant message */
-		return;
-
-	old_usemap_snr = usemap_snr;
-	old_pgdat_snr = pgdat_snr;
-
-	usemap_nid = sparse_early_nid(__nr_to_section(usemap_snr));
-	if (usemap_nid != nid) {
-		pr_info("node %d must be removed before remove section %ld\n",
-			nid, usemap_snr);
-		return;
-	}
-	/*
-	 * There is a circular dependency.
-	 * Some platforms allow un-removable section because they will just
-	 * gather other removable sections for dynamic partitioning.
-	 * Just notify un-removable section's number here.
-	 */
-	pr_info("Section %ld and %ld (node %d) have a circular dependency on usemap and pgdat allocations\n",
-		usemap_snr, pgdat_snr, nid);
-}
-#else
-static struct mem_section_usage * __init
-sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
-					 unsigned long size)
-{
-	return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id);
-}
-
-static void __init check_usemap_section_nr(int nid,
-		struct mem_section_usage *usage)
-{
-}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
-
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 unsigned long __init section_map_size(void)
 {
@@ -486,7 +390,6 @@ void __init sparse_init_early_section(int nid, struct page *map,
 				      unsigned long pnum, unsigned long flags)
 {
 	BUG_ON(!sparse_usagebuf || sparse_usagebuf >= sparse_usagebuf_end);
-	check_usemap_section_nr(nid, sparse_usagebuf);
 	sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 			sparse_usagebuf, SECTION_IS_EARLY | flags);
 	sparse_usagebuf = (void *)sparse_usagebuf + mem_section_usage_size();
@@ -497,8 +400,7 @@ static int __init sparse_usage_init(int nid, unsigned long map_count)
 	unsigned long size;
 
 	size = mem_section_usage_size() * map_count;
-	sparse_usagebuf = sparse_early_usemaps_alloc_pgdat_section(
-				NODE_DATA(nid), size);
+	sparse_usagebuf = memblock_alloc_node(size, SMP_CACHE_BYTES, nid);
 	if (!sparse_usagebuf) {
 		sparse_usagebuf_end = NULL;
 		return -ENOMEM;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (8 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 19:51   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:46   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 11/14] mm/sparse: drop set_section_nid() from sparse_add_section() David Hildenbrand (Arm)
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

We want to move subsection_map_init() to mm/sparse-vmemmap.c.

To prepare for getting rid of subsection_map_init() in mm/sparse.c
completely, use a static inline function for !CONFIG_SPARSEMEM_VMEMMAP.

While at it, move the declaration to internal.h and rename it to
"sparse_init_subsection_map()".

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/mmzone.h |  3 ---
 mm/internal.h          | 12 ++++++++++++
 mm/mm_init.c           |  2 +-
 mm/sparse.c            |  6 +-----
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7bd0134c241c..b694c69dee04 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2002,8 +2002,6 @@ struct mem_section_usage {
 	unsigned long pageblock_flags[0];
 };
 
-void subsection_map_init(unsigned long pfn, unsigned long nr_pages);
-
 struct page;
 struct page_ext;
 struct mem_section {
@@ -2396,7 +2394,6 @@ static inline unsigned long next_present_section_nr(unsigned long section_nr)
 #define sparse_vmemmap_init_nid_early(_nid) do {} while (0)
 #define sparse_vmemmap_init_nid_late(_nid) do {} while (0)
 #define pfn_in_present_section pfn_valid
-#define subsection_map_init(_pfn, _nr_pages) do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
 /*
diff --git a/mm/internal.h b/mm/internal.h
index f98f4746ac41..5f5c45d80aca 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -960,12 +960,24 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
 		unsigned long, enum meminit_context, struct vmem_altmap *, int,
 		bool);
 
+/*
+ * mm/sparse.c
+ */
 #ifdef CONFIG_SPARSEMEM
 void sparse_init(void);
 #else
 static inline void sparse_init(void) {}
 #endif /* CONFIG_SPARSEMEM */
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+void sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages);
+#else
+static inline void sparse_init_subsection_map(unsigned long pfn,
+		unsigned long nr_pages)
+{
+}
+#endif /* CONFIG_SPARSEMEM_VMEMMAP */
+
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 
 /*
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 969048f9b320..3c5f18537cd1 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1898,7 +1898,7 @@ static void __init free_area_init(void)
 		pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
 			(u64)start_pfn << PAGE_SHIFT,
 			((u64)end_pfn << PAGE_SHIFT) - 1);
-		subsection_map_init(start_pfn, end_pfn - start_pfn);
+		sparse_init_subsection_map(start_pfn, end_pfn - start_pfn);
 	}
 
 	/* Initialise every node */
diff --git a/mm/sparse.c b/mm/sparse.c
index b57c81e99340..7b0bfea73a9b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -185,7 +185,7 @@ static void subsection_mask_set(unsigned long *map, unsigned long pfn,
 	bitmap_set(map, idx, end - idx + 1);
 }
 
-void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
+void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
 {
 	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
 	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
@@ -207,10 +207,6 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
 		nr_pages -= pfns;
 	}
 }
-#else
-void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
-{
-}
 #endif
 
 /* Record a memory area against a node. */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 11/14] mm/sparse: drop set_section_nid() from sparse_add_section()
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (9 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 19:55   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:50   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 12/14] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

CONFIG_MEMORY_HOTPLUG is CONFIG_SPARSEMEM_VMEMMAP-only. And
CONFIG_SPARSEMEM_VMEMMAP implies that NODE_NOT_IN_PAGE_FLAGS cannot be set:
see include/linux/page-flags-layout.h

	...
	#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
	#error "Vmemmap: No space for nodes field in page flags"
	...

So let's remove the set_section_nid() call to prepare for moving
CONFIG_MEMORY_HOTPLUG to mm/sparse-vmemmap.c

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 7b0bfea73a9b..b5a2de43ac40 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -769,7 +769,6 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 	page_init_poison(memmap, sizeof(struct page) * nr_pages);
 
 	ms = __nr_to_section(section_nr);
-	set_section_nid(section_nr, nid);
 	__section_mark_present(ms, section_nr);
 
 	/* Align memmap to section boundary in the subsection case */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 12/14] mm/sparse: move sparse_init_one_section() to internal.h
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (10 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 11/14] mm/sparse: drop set_section_nid() from sparse_add_section() David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 20:00   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:54   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 13/14] mm/sparse: move __section_mark_present() " David Hildenbrand (Arm)
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

While at it, convert the BUG_ON to a WARN_ON, avoid long lines, and merge
sparse_encode_mem_map() into sparse_init_one_section().

Clarify the comment a bit, pointing at page_to_pfn().

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/mmzone.h |  2 +-
 mm/internal.h          | 22 ++++++++++++++++++++++
 mm/sparse.c            | 24 ------------------------
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b694c69dee04..dcbbf36ed88c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2008,7 +2008,7 @@ struct mem_section {
 	/*
 	 * This is, logically, a pointer to an array of struct
 	 * pages.  However, it is stored with some other magic.
-	 * (see sparse.c::sparse_init_one_section())
+	 * (see sparse_init_one_section())
 	 *
 	 * Additionally during early boot we encode node id of
 	 * the location of the section here to guide allocation.
diff --git a/mm/internal.h b/mm/internal.h
index 5f5c45d80aca..bcf4df97b185 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -965,6 +965,28 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
  */
 #ifdef CONFIG_SPARSEMEM
 void sparse_init(void);
+
+static inline void sparse_init_one_section(struct mem_section *ms,
+		unsigned long pnum, struct page *mem_map,
+		struct mem_section_usage *usage, unsigned long flags)
+{
+	unsigned long coded_mem_map;
+
+	BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
+
+	/*
+	 * We encode the start PFN of the section into the mem_map such that
+	 * page_to_pfn() on !CONFIG_SPARSEMEM_VMEMMAP can simply subtract it
+	 * from the page pointer to obtain the PFN.
+	 */
+	coded_mem_map = (unsigned long)(mem_map - section_nr_to_pfn(pnum));
+	VM_WARN_ON(coded_mem_map & ~SECTION_MAP_MASK);
+
+	ms->section_mem_map &= ~SECTION_MAP_MASK;
+	ms->section_mem_map |= coded_mem_map;
+	ms->section_mem_map |= SECTION_HAS_MEM_MAP | flags;
+	ms->usage = usage;
+}
 #else
 static inline void sparse_init(void) {}
 #endif /* CONFIG_SPARSEMEM */
diff --git a/mm/sparse.c b/mm/sparse.c
index b5a2de43ac40..6f5f340301a3 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -256,30 +256,6 @@ static void __init memblocks_present(void)
 		memory_present(nid, start, end);
 }
 
-/*
- * Subtle, we encode the real pfn into the mem_map such that
- * the identity pfn - section_mem_map will return the actual
- * physical page frame number.
- */
-static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long pnum)
-{
-	unsigned long coded_mem_map =
-		(unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
-	BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
-	BUG_ON(coded_mem_map & ~SECTION_MAP_MASK);
-	return coded_mem_map;
-}
-
-static void __meminit sparse_init_one_section(struct mem_section *ms,
-		unsigned long pnum, struct page *mem_map,
-		struct mem_section_usage *usage, unsigned long flags)
-{
-	ms->section_mem_map &= ~SECTION_MAP_MASK;
-	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum)
-		| SECTION_HAS_MEM_MAP | flags;
-	ms->usage = usage;
-}
-
 static unsigned long usemap_size(void)
 {
 	return BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS) * sizeof(unsigned long);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 13/14] mm/sparse: move __section_mark_present() to internal.h
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (11 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 12/14] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 20:01   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:56   ` Mike Rapoport
  2026-03-17 16:56 ` [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c David Hildenbrand (Arm)
  2026-03-18 19:51 ` [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups Andrew Morton
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

Let's prepare for moving memory hotplug handling from sparse.c to
sparse-vmemmap.c by moving __section_mark_present() to internal.h.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/internal.h | 9 +++++++++
 mm/sparse.c   | 8 --------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index bcf4df97b185..835a6f00134e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -987,6 +987,15 @@ static inline void sparse_init_one_section(struct mem_section *ms,
 	ms->section_mem_map |= SECTION_HAS_MEM_MAP | flags;
 	ms->usage = usage;
 }
+
+static inline void __section_mark_present(struct mem_section *ms,
+		unsigned long section_nr)
+{
+	if (section_nr > __highest_present_section_nr)
+		__highest_present_section_nr = section_nr;
+
+	ms->section_mem_map |= SECTION_MARKED_PRESENT;
+}
 #else
 static inline void sparse_init(void) {}
 #endif /* CONFIG_SPARSEMEM */
diff --git a/mm/sparse.c b/mm/sparse.c
index 6f5f340301a3..bf620f3fe05d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -161,14 +161,6 @@ static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
  * those loops early.
  */
 unsigned long __highest_present_section_nr;
-static void __section_mark_present(struct mem_section *ms,
-		unsigned long section_nr)
-{
-	if (section_nr > __highest_present_section_nr)
-		__highest_present_section_nr = section_nr;
-
-	ms->section_mem_map |= SECTION_MARKED_PRESENT;
-}
 
 static inline unsigned long first_present_section_nr(void)
 {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (12 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 13/14] mm/sparse: move __section_mark_present() " David Hildenbrand (Arm)
@ 2026-03-17 16:56 ` David Hildenbrand (Arm)
  2026-03-17 20:09   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:57   ` Mike Rapoport
  2026-03-18 19:51 ` [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups Andrew Morton
  14 siblings, 2 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, David Hildenbrand (Arm), Andrew Morton,
	Oscar Salvador, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

Let's move all memory hoptplug related code to sparse-vmemmap.c.

We only have to expose sparse_index_init(). While at it, drop the
definition of sparse_index_init() for !CONFIG_SPARSEMEM, which is unused,
and place the declaration in internal.h.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/mmzone.h |   1 -
 mm/internal.h          |   4 +
 mm/sparse-vmemmap.c    | 308 ++++++++++++++++++++++++++++++++++++++++
 mm/sparse.c            | 314 +----------------------------------------
 4 files changed, 314 insertions(+), 313 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index dcbbf36ed88c..e11513f581eb 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2390,7 +2390,6 @@ static inline unsigned long next_present_section_nr(unsigned long section_nr)
 #endif
 
 #else
-#define sparse_index_init(_sec, _nid)  do {} while (0)
 #define sparse_vmemmap_init_nid_early(_nid) do {} while (0)
 #define sparse_vmemmap_init_nid_late(_nid) do {} while (0)
 #define pfn_in_present_section pfn_valid
diff --git a/mm/internal.h b/mm/internal.h
index 835a6f00134e..b1a9e9312ffe 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -965,6 +965,7 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
  */
 #ifdef CONFIG_SPARSEMEM
 void sparse_init(void);
+int sparse_index_init(unsigned long section_nr, int nid);
 
 static inline void sparse_init_one_section(struct mem_section *ms,
 		unsigned long pnum, struct page *mem_map,
@@ -1000,6 +1001,9 @@ static inline void __section_mark_present(struct mem_section *ms,
 static inline void sparse_init(void) {}
 #endif /* CONFIG_SPARSEMEM */
 
+/*
+ * mm/sparse-vmemmap.c
+ */
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 void sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages);
 #else
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index f0690797667f..330579365a0f 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -591,3 +591,311 @@ void __init sparse_vmemmap_init_nid_late(int nid)
 	hugetlb_vmemmap_init_late(nid);
 }
 #endif
+
+static void subsection_mask_set(unsigned long *map, unsigned long pfn,
+		unsigned long nr_pages)
+{
+	int idx = subsection_map_index(pfn);
+	int end = subsection_map_index(pfn + nr_pages - 1);
+
+	bitmap_set(map, idx, end - idx + 1);
+}
+
+void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
+	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
+
+	for (nr = start_sec_nr; nr <= end_sec_nr; nr++) {
+		struct mem_section *ms;
+		unsigned long pfns;
+
+		pfns = min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK));
+		ms = __nr_to_section(nr);
+		subsection_mask_set(ms->usage->subsection_map, pfn, pfns);
+
+		pr_debug("%s: sec: %lu pfns: %lu set(%d, %d)\n", __func__, nr,
+				pfns, subsection_map_index(pfn),
+				subsection_map_index(pfn + pfns - 1));
+
+		pfn += pfns;
+		nr_pages -= pfns;
+	}
+}
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+
+/* Mark all memory sections within the pfn range as online */
+void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		unsigned long section_nr = pfn_to_section_nr(pfn);
+		struct mem_section *ms = __nr_to_section(section_nr);
+
+		ms->section_mem_map |= SECTION_IS_ONLINE;
+	}
+}
+
+/* Mark all memory sections within the pfn range as offline */
+void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		unsigned long section_nr = pfn_to_section_nr(pfn);
+		struct mem_section *ms = __nr_to_section(section_nr);
+
+		ms->section_mem_map &= ~SECTION_IS_ONLINE;
+	}
+}
+
+static struct page * __meminit populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
+		struct dev_pagemap *pgmap)
+{
+	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
+}
+
+static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap)
+{
+	unsigned long start = (unsigned long) pfn_to_page(pfn);
+	unsigned long end = start + nr_pages * sizeof(struct page);
+
+	vmemmap_free(start, end, altmap);
+}
+static void free_map_bootmem(struct page *memmap)
+{
+	unsigned long start = (unsigned long)memmap;
+	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+
+	vmemmap_free(start, end, NULL);
+}
+
+static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
+	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
+	struct mem_section *ms = __pfn_to_section(pfn);
+	unsigned long *subsection_map = ms->usage
+		? &ms->usage->subsection_map[0] : NULL;
+
+	subsection_mask_set(map, pfn, nr_pages);
+	if (subsection_map)
+		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
+
+	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
+				"section already deactivated (%#lx + %ld)\n",
+				pfn, nr_pages))
+		return -EINVAL;
+
+	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
+	return 0;
+}
+
+static bool is_subsection_map_empty(struct mem_section *ms)
+{
+	return bitmap_empty(&ms->usage->subsection_map[0],
+			    SUBSECTIONS_PER_SECTION);
+}
+
+static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
+	unsigned long *subsection_map;
+	int rc = 0;
+
+	subsection_mask_set(map, pfn, nr_pages);
+
+	subsection_map = &ms->usage->subsection_map[0];
+
+	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
+		rc = -EINVAL;
+	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
+		rc = -EEXIST;
+	else
+		bitmap_or(subsection_map, map, subsection_map,
+				SUBSECTIONS_PER_SECTION);
+
+	return rc;
+}
+
+/*
+ * To deactivate a memory region, there are 3 cases to handle across
+ * two configurations (SPARSEMEM_VMEMMAP={y,n}):
+ *
+ * 1. deactivation of a partial hot-added section (only possible in
+ *    the SPARSEMEM_VMEMMAP=y case).
+ *      a) section was present at memory init.
+ *      b) section was hot-added post memory init.
+ * 2. deactivation of a complete hot-added section.
+ * 3. deactivation of a complete section from memory init.
+ *
+ * For 1, when subsection_map does not empty we will not be freeing the
+ * usage map, but still need to free the vmemmap range.
+ *
+ * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
+ */
+static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+	bool section_is_early = early_section(ms);
+	struct page *memmap = NULL;
+	bool empty;
+
+	if (clear_subsection_map(pfn, nr_pages))
+		return;
+
+	empty = is_subsection_map_empty(ms);
+	if (empty) {
+		/*
+		 * Mark the section invalid so that valid_section()
+		 * return false. This prevents code from dereferencing
+		 * ms->usage array.
+		 */
+		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
+
+		/*
+		 * When removing an early section, the usage map is kept (as the
+		 * usage maps of other sections fall into the same page). It
+		 * will be re-used when re-adding the section - which is then no
+		 * longer an early section. If the usage map is PageReserved, it
+		 * was allocated during boot.
+		 */
+		if (!PageReserved(virt_to_page(ms->usage))) {
+			kfree_rcu(ms->usage, rcu);
+			WRITE_ONCE(ms->usage, NULL);
+		}
+		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
+	}
+
+	/*
+	 * The memmap of early sections is always fully populated. See
+	 * section_activate() and pfn_valid() .
+	 */
+	if (!section_is_early) {
+		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+		depopulate_section_memmap(pfn, nr_pages, altmap);
+	} else if (memmap) {
+		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
+							  PAGE_SIZE)));
+		free_map_bootmem(memmap);
+	}
+
+	if (empty)
+		ms->section_mem_map = (unsigned long)NULL;
+}
+
+static struct page * __meminit section_activate(int nid, unsigned long pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap,
+		struct dev_pagemap *pgmap)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+	struct mem_section_usage *usage = NULL;
+	struct page *memmap;
+	int rc;
+
+	if (!ms->usage) {
+		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
+		if (!usage)
+			return ERR_PTR(-ENOMEM);
+		ms->usage = usage;
+	}
+
+	rc = fill_subsection_map(pfn, nr_pages);
+	if (rc) {
+		if (usage)
+			ms->usage = NULL;
+		kfree(usage);
+		return ERR_PTR(rc);
+	}
+
+	/*
+	 * The early init code does not consider partially populated
+	 * initial sections, it simply assumes that memory will never be
+	 * referenced.  If we hot-add memory into such a section then we
+	 * do not need to populate the memmap and can simply reuse what
+	 * is already there.
+	 */
+	if (nr_pages < PAGES_PER_SECTION && early_section(ms))
+		return pfn_to_page(pfn);
+
+	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
+	if (!memmap) {
+		section_deactivate(pfn, nr_pages, altmap);
+		return ERR_PTR(-ENOMEM);
+	}
+	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+
+	return memmap;
+}
+
+/**
+ * sparse_add_section - add a memory section, or populate an existing one
+ * @nid: The node to add section on
+ * @start_pfn: start pfn of the memory range
+ * @nr_pages: number of pfns to add in the section
+ * @altmap: alternate pfns to allocate the memmap backing store
+ * @pgmap: alternate compound page geometry for devmap mappings
+ *
+ * This is only intended for hotplug.
+ *
+ * Note that only VMEMMAP supports sub-section aligned hotplug,
+ * the proper alignment and size are gated by check_pfn_span().
+ *
+ *
+ * Return:
+ * * 0		- On success.
+ * * -EEXIST	- Section has been present.
+ * * -ENOMEM	- Out of memory.
+ */
+int __meminit sparse_add_section(int nid, unsigned long start_pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap,
+		struct dev_pagemap *pgmap)
+{
+	unsigned long section_nr = pfn_to_section_nr(start_pfn);
+	struct mem_section *ms;
+	struct page *memmap;
+	int ret;
+
+	ret = sparse_index_init(section_nr, nid);
+	if (ret < 0)
+		return ret;
+
+	memmap = section_activate(nid, start_pfn, nr_pages, altmap, pgmap);
+	if (IS_ERR(memmap))
+		return PTR_ERR(memmap);
+
+	/*
+	 * Poison uninitialized struct pages in order to catch invalid flags
+	 * combinations.
+	 */
+	page_init_poison(memmap, sizeof(struct page) * nr_pages);
+
+	ms = __nr_to_section(section_nr);
+	__section_mark_present(ms, section_nr);
+
+	/* Align memmap to section boundary in the subsection case */
+	if (section_nr_to_pfn(section_nr) != start_pfn)
+		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
+	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
+
+	return 0;
+}
+
+void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
+			   struct vmem_altmap *altmap)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+
+	if (WARN_ON_ONCE(!valid_section(ms)))
+		return;
+
+	section_deactivate(pfn, nr_pages, altmap);
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/mm/sparse.c b/mm/sparse.c
index bf620f3fe05d..007fd52c621e 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -79,7 +79,7 @@ static noinline struct mem_section __ref *sparse_index_alloc(int nid)
 	return section;
 }
 
-static int __meminit sparse_index_init(unsigned long section_nr, int nid)
+int __meminit sparse_index_init(unsigned long section_nr, int nid)
 {
 	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
 	struct mem_section *section;
@@ -103,7 +103,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
 	return 0;
 }
 #else /* !SPARSEMEM_EXTREME */
-static inline int sparse_index_init(unsigned long section_nr, int nid)
+int sparse_index_init(unsigned long section_nr, int nid)
 {
 	return 0;
 }
@@ -167,40 +167,6 @@ static inline unsigned long first_present_section_nr(void)
 	return next_present_section_nr(-1);
 }
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static void subsection_mask_set(unsigned long *map, unsigned long pfn,
-		unsigned long nr_pages)
-{
-	int idx = subsection_map_index(pfn);
-	int end = subsection_map_index(pfn + nr_pages - 1);
-
-	bitmap_set(map, idx, end - idx + 1);
-}
-
-void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
-	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
-
-	for (nr = start_sec_nr; nr <= end_sec_nr; nr++) {
-		struct mem_section *ms;
-		unsigned long pfns;
-
-		pfns = min(nr_pages, PAGES_PER_SECTION
-				- (pfn & ~PAGE_SECTION_MASK));
-		ms = __nr_to_section(nr);
-		subsection_mask_set(ms->usage->subsection_map, pfn, pfns);
-
-		pr_debug("%s: sec: %lu pfns: %lu set(%d, %d)\n", __func__, nr,
-				pfns, subsection_map_index(pfn),
-				subsection_map_index(pfn + pfns - 1));
-
-		pfn += pfns;
-		nr_pages -= pfns;
-	}
-}
-#endif
-
 /* Record a memory area against a node. */
 static void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -482,279 +448,3 @@ void __init sparse_init(void)
 	sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
 	vmemmap_populate_print_last();
 }
-
-#ifdef CONFIG_MEMORY_HOTPLUG
-
-/* Mark all memory sections within the pfn range as online */
-void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-
-	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
-		unsigned long section_nr = pfn_to_section_nr(pfn);
-		struct mem_section *ms = __nr_to_section(section_nr);
-
-		ms->section_mem_map |= SECTION_IS_ONLINE;
-	}
-}
-
-/* Mark all memory sections within the pfn range as offline */
-void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-
-	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
-		unsigned long section_nr = pfn_to_section_nr(pfn);
-		struct mem_section *ms = __nr_to_section(section_nr);
-
-		ms->section_mem_map &= ~SECTION_IS_ONLINE;
-	}
-}
-
-static struct page * __meminit populate_section_memmap(unsigned long pfn,
-		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
-		struct dev_pagemap *pgmap)
-{
-	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
-}
-
-static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
-{
-	unsigned long start = (unsigned long) pfn_to_page(pfn);
-	unsigned long end = start + nr_pages * sizeof(struct page);
-
-	vmemmap_free(start, end, altmap);
-}
-static void free_map_bootmem(struct page *memmap)
-{
-	unsigned long start = (unsigned long)memmap;
-	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
-
-	vmemmap_free(start, end, NULL);
-}
-
-static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
-	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
-	struct mem_section *ms = __pfn_to_section(pfn);
-	unsigned long *subsection_map = ms->usage
-		? &ms->usage->subsection_map[0] : NULL;
-
-	subsection_mask_set(map, pfn, nr_pages);
-	if (subsection_map)
-		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
-
-	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
-				"section already deactivated (%#lx + %ld)\n",
-				pfn, nr_pages))
-		return -EINVAL;
-
-	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
-	return 0;
-}
-
-static bool is_subsection_map_empty(struct mem_section *ms)
-{
-	return bitmap_empty(&ms->usage->subsection_map[0],
-			    SUBSECTIONS_PER_SECTION);
-}
-
-static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	struct mem_section *ms = __pfn_to_section(pfn);
-	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
-	unsigned long *subsection_map;
-	int rc = 0;
-
-	subsection_mask_set(map, pfn, nr_pages);
-
-	subsection_map = &ms->usage->subsection_map[0];
-
-	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
-		rc = -EINVAL;
-	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
-		rc = -EEXIST;
-	else
-		bitmap_or(subsection_map, map, subsection_map,
-				SUBSECTIONS_PER_SECTION);
-
-	return rc;
-}
-
-/*
- * To deactivate a memory region, there are 3 cases to handle across
- * two configurations (SPARSEMEM_VMEMMAP={y,n}):
- *
- * 1. deactivation of a partial hot-added section (only possible in
- *    the SPARSEMEM_VMEMMAP=y case).
- *      a) section was present at memory init.
- *      b) section was hot-added post memory init.
- * 2. deactivation of a complete hot-added section.
- * 3. deactivation of a complete section from memory init.
- *
- * For 1, when subsection_map does not empty we will not be freeing the
- * usage map, but still need to free the vmemmap range.
- *
- * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
- */
-static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
-{
-	struct mem_section *ms = __pfn_to_section(pfn);
-	bool section_is_early = early_section(ms);
-	struct page *memmap = NULL;
-	bool empty;
-
-	if (clear_subsection_map(pfn, nr_pages))
-		return;
-
-	empty = is_subsection_map_empty(ms);
-	if (empty) {
-		/*
-		 * Mark the section invalid so that valid_section()
-		 * return false. This prevents code from dereferencing
-		 * ms->usage array.
-		 */
-		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
-
-		/*
-		 * When removing an early section, the usage map is kept (as the
-		 * usage maps of other sections fall into the same page). It
-		 * will be re-used when re-adding the section - which is then no
-		 * longer an early section. If the usage map is PageReserved, it
-		 * was allocated during boot.
-		 */
-		if (!PageReserved(virt_to_page(ms->usage))) {
-			kfree_rcu(ms->usage, rcu);
-			WRITE_ONCE(ms->usage, NULL);
-		}
-		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
-	}
-
-	/*
-	 * The memmap of early sections is always fully populated. See
-	 * section_activate() and pfn_valid() .
-	 */
-	if (!section_is_early) {
-		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
-		depopulate_section_memmap(pfn, nr_pages, altmap);
-	} else if (memmap) {
-		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
-							  PAGE_SIZE)));
-		free_map_bootmem(memmap);
-	}
-
-	if (empty)
-		ms->section_mem_map = (unsigned long)NULL;
-}
-
-static struct page * __meminit section_activate(int nid, unsigned long pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		struct dev_pagemap *pgmap)
-{
-	struct mem_section *ms = __pfn_to_section(pfn);
-	struct mem_section_usage *usage = NULL;
-	struct page *memmap;
-	int rc;
-
-	if (!ms->usage) {
-		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
-		if (!usage)
-			return ERR_PTR(-ENOMEM);
-		ms->usage = usage;
-	}
-
-	rc = fill_subsection_map(pfn, nr_pages);
-	if (rc) {
-		if (usage)
-			ms->usage = NULL;
-		kfree(usage);
-		return ERR_PTR(rc);
-	}
-
-	/*
-	 * The early init code does not consider partially populated
-	 * initial sections, it simply assumes that memory will never be
-	 * referenced.  If we hot-add memory into such a section then we
-	 * do not need to populate the memmap and can simply reuse what
-	 * is already there.
-	 */
-	if (nr_pages < PAGES_PER_SECTION && early_section(ms))
-		return pfn_to_page(pfn);
-
-	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
-	if (!memmap) {
-		section_deactivate(pfn, nr_pages, altmap);
-		return ERR_PTR(-ENOMEM);
-	}
-	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
-
-	return memmap;
-}
-
-/**
- * sparse_add_section - add a memory section, or populate an existing one
- * @nid: The node to add section on
- * @start_pfn: start pfn of the memory range
- * @nr_pages: number of pfns to add in the section
- * @altmap: alternate pfns to allocate the memmap backing store
- * @pgmap: alternate compound page geometry for devmap mappings
- *
- * This is only intended for hotplug.
- *
- * Note that only VMEMMAP supports sub-section aligned hotplug,
- * the proper alignment and size are gated by check_pfn_span().
- *
- *
- * Return:
- * * 0		- On success.
- * * -EEXIST	- Section has been present.
- * * -ENOMEM	- Out of memory.
- */
-int __meminit sparse_add_section(int nid, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		struct dev_pagemap *pgmap)
-{
-	unsigned long section_nr = pfn_to_section_nr(start_pfn);
-	struct mem_section *ms;
-	struct page *memmap;
-	int ret;
-
-	ret = sparse_index_init(section_nr, nid);
-	if (ret < 0)
-		return ret;
-
-	memmap = section_activate(nid, start_pfn, nr_pages, altmap, pgmap);
-	if (IS_ERR(memmap))
-		return PTR_ERR(memmap);
-
-	/*
-	 * Poison uninitialized struct pages in order to catch invalid flags
-	 * combinations.
-	 */
-	page_init_poison(memmap, sizeof(struct page) * nr_pages);
-
-	ms = __nr_to_section(section_nr);
-	__section_mark_present(ms, section_nr);
-
-	/* Align memmap to section boundary in the subsection case */
-	if (section_nr_to_pfn(section_nr) != start_pfn)
-		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
-	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
-
-	return 0;
-}
-
-void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
-			   struct vmem_altmap *altmap)
-{
-	struct mem_section *ms = __pfn_to_section(pfn);
-
-	if (WARN_ON_ONCE(!valid_section(ms)))
-		return;
-
-	section_deactivate(pfn, nr_pages, altmap);
-}
-#endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage
  2026-03-17 16:56 ` [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
@ 2026-03-17 17:19   ` Lorenzo Stoakes (Oracle)
  2026-03-17 20:30   ` David Hildenbrand (Arm)
  2026-03-18  7:51   ` Mike Rapoport
  2 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 17:19 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:39PM +0100, David Hildenbrand (Arm) wrote:
> When offlining memory, we know that the memory range has no holes.
> Checking for valid pfns is not required.
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Holey Cow! LGTM, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/memory_hotplug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 86d3faf50453..3495d94587e7 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1746,7 +1746,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  {
>  	unsigned long pfn;
>
> -	for_each_valid_pfn(pfn, start, end) {
> +	for (pfn = start; pfn < end; pfn++) {
>  		struct page *page;
>  		struct folio *folio;
>
> @@ -1791,7 +1791,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
>  				      DEFAULT_RATELIMIT_BURST);
>
> -	for_each_valid_pfn(pfn, start_pfn, end_pfn) {
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>  		struct page *page;
>
>  		page = pfn_to_page(pfn);
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 02/14] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections()
  2026-03-17 16:56 ` [PATCH 02/14] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections() David Hildenbrand (Arm)
@ 2026-03-17 17:21   ` Lorenzo Stoakes (Oracle)
  2026-03-18  7:53   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 17:21 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:40PM +0100, David Hildenbrand (Arm) wrote:
> We do not allow offlining of memory with memory holes, and always
> hotplug memory without holes.
>
> Consequently, we cannot end up onlining or offlining memory sections that
> have holes (including invalid sections). That's also why these
> WARN_ONs never fired.
>
> Let's remove the WARN_ONs along with the TODO regarding double-checking.
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

I'm warning up to your series! (The bad puns may/may not continue) so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/sparse.c | 17 ++---------------
>  1 file changed, 2 insertions(+), 15 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index dfabe554adf8..93252112860e 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -638,13 +638,8 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
> -		struct mem_section *ms;
> -
> -		/* onlining code should never touch invalid ranges */
> -		if (WARN_ON(!valid_section_nr(section_nr)))
> -			continue;
> +		struct mem_section *ms = __nr_to_section(section_nr);
>
> -		ms = __nr_to_section(section_nr);
>  		ms->section_mem_map |= SECTION_IS_ONLINE;
>  	}
>  }
> @@ -656,16 +651,8 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
> -		struct mem_section *ms;
> +		struct mem_section *ms = __nr_to_section(section_nr);
>
> -		/*
> -		 * TODO this needs some double checking. Offlining code makes
> -		 * sure to check pfn_valid but those checks might be just bogus
> -		 */
> -		if (WARN_ON(!valid_section_nr(section_nr)))
> -			continue;
> -
> -		ms = __nr_to_section(section_nr);
>  		ms->section_mem_map &= ~SECTION_IS_ONLINE;
>  	}
>  }
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 03/14] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
  2026-03-17 16:56 ` [PATCH 03/14] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
@ 2026-03-17 17:22   ` Lorenzo Stoakes (Oracle)
  2026-03-18  7:55   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 17:22 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:41PM +0100, David Hildenbrand (Arm) wrote:
> Ever since commit f8f03eb5f0f9 ("mm: stop making SPARSEMEM_VMEMMAP
> user-selectable"), an architecture that supports CONFIG_SPARSEMEM_VMEMMAP
> (by selecting SPARSEMEM_VMEMMAP_ENABLE) can no longer enable
> CONFIG_SPARSEMEM without CONFIG_SPARSEMEM_VMEMMAP.
>
> Right now, CONFIG_MEMORY_HOTPLUG is guarded by CONFIG_SPARSEMEM.
>
> However, CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG is only enabled by
> * arm64: which selects SPARSEMEM_VMEMMAP_ENABLE
> * loongarch: which selects SPARSEMEM_VMEMMAP_ENABLE
> * powerpc (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
> * riscv (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
> * s390 with SPARSEMEM: which selects SPARSEMEM_VMEMMAP_ENABLE
> * x86 (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
>
> So, we can make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
> without affecting any setups.
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Some risc-y business Dave but I believe in you! So:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ebd8ea353687..c012944938a7 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -472,7 +472,7 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
>  menuconfig MEMORY_HOTPLUG
>  	bool "Memory hotplug"
>  	select MEMORY_ISOLATION
> -	depends on SPARSEMEM
> +	depends on SPARSEMEM_VMEMMAP
>  	depends on ARCH_ENABLE_MEMORY_HOTPLUG
>  	depends on 64BIT
>  	select NUMA_KEEP_MEMINFO if NUMA
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 04/14] mm/memory_hotplug: simplify check_pfn_span()
  2026-03-17 16:56 ` [PATCH 04/14] mm/memory_hotplug: simplify check_pfn_span() David Hildenbrand (Arm)
@ 2026-03-17 17:24   ` Lorenzo Stoakes (Oracle)
  2026-03-18  7:56   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 17:24 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:42PM +0100, David Hildenbrand (Arm) wrote:
> We now always have CONFIG_SPARSEMEM_VMEMMAP, so remove the dead code.
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

It's a sparse patch but that's ok, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/memory_hotplug.c | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3495d94587e7..70e620496cec 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -320,21 +320,13 @@ static void release_memory_resource(struct resource *res)
>  static int check_pfn_span(unsigned long pfn, unsigned long nr_pages)
>  {
>  	/*
> -	 * Disallow all operations smaller than a sub-section and only
> -	 * allow operations smaller than a section for
> -	 * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
> -	 * enforces a larger memory_block_size_bytes() granularity for
> -	 * memory that will be marked online, so this check should only
> -	 * fire for direct arch_{add,remove}_memory() users outside of
> -	 * add_memory_resource().
> +	 * Disallow all operations smaller than a sub-section.
> +	 * Note that check_hotplug_memory_range() enforces a larger
> +	 * memory_block_size_bytes() granularity for memory that will be marked
> +	 * online, so this check should only fire for direct
> +	 * arch_{add,remove}_memory() users outside of add_memory_resource().
>  	 */
> -	unsigned long min_align;
> -
> -	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
> -		min_align = PAGES_PER_SUBSECTION;
> -	else
> -		min_align = PAGES_PER_SECTION;
> -	if (!IS_ALIGNED(pfn | nr_pages, min_align))
> +	if (!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION))
>  		return -EINVAL;
>  	return 0;
>  }
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP
  2026-03-17 16:56 ` [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
@ 2026-03-17 17:49   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:15   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 17:49 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:44PM +0100, David Hildenbrand (Arm) wrote:
> It is not immediately obvious that CONFIG_HAVE_BOOTMEM_INFO_NODE is
> only selected from CONFIG_MEMORY_HOTREMOVE, which itself depends on
> CONFIG_MEMORY_HOTPLUG that ... depends on CONFIG_SPARSEMEM_VMEMMAP.

Ugh god.

>
> Let's remove the !CONFIG_SPARSEMEM_VMEMMAP leftovers.

Maybe worth explicitly saying 'dead code' here just to underline how stupid this
was...

>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

This might be dead code, but this is a dead on patch, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/bootmem_info.c | 37 -------------------------------------
>  1 file changed, 37 deletions(-)
>
> diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
> index b0e2a9fa641f..e61e08e24924 100644
> --- a/mm/bootmem_info.c
> +++ b/mm/bootmem_info.c
> @@ -40,42 +40,6 @@ void put_page_bootmem(struct page *page)
>  	}
>  }
>
> -#ifndef CONFIG_SPARSEMEM_VMEMMAP
> -static void __init register_page_bootmem_info_section(unsigned long start_pfn)
> -{
> -	unsigned long mapsize, section_nr, i;
> -	struct mem_section *ms;
> -	struct page *page, *memmap;
> -	struct mem_section_usage *usage;
> -
> -	section_nr = pfn_to_section_nr(start_pfn);
> -	ms = __nr_to_section(section_nr);
> -
> -	/* Get section's memmap address */
> -	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> -
> -	/*
> -	 * Get page for the memmap's phys address
> -	 * XXX: need more consideration for sparse_vmemmap...
> -	 */
> -	page = virt_to_page(memmap);
> -	mapsize = sizeof(struct page) * PAGES_PER_SECTION;
> -	mapsize = PAGE_ALIGN(mapsize) >> PAGE_SHIFT;
> -
> -	/* remember memmap's page */
> -	for (i = 0; i < mapsize; i++, page++)
> -		get_page_bootmem(section_nr, page, SECTION_INFO);
> -
> -	usage = ms->usage;
> -	page = virt_to_page(usage);
> -
> -	mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
> -
> -	for (i = 0; i < mapsize; i++, page++)
> -		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
> -
> -}

So this was just dead code before? That's gross.

> -#else /* CONFIG_SPARSEMEM_VMEMMAP */
>  static void __init register_page_bootmem_info_section(unsigned long start_pfn)
>  {
>  	unsigned long mapsize, section_nr, i;
> @@ -100,7 +64,6 @@ static void __init register_page_bootmem_info_section(unsigned long start_pfn)
>  	for (i = 0; i < mapsize; i++, page++)
>  		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
>  }
> -#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
>
>  void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>  {
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 05/14] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG
  2026-03-17 16:56 ` [PATCH 05/14] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG David Hildenbrand (Arm)
@ 2026-03-17 17:54   ` Lorenzo Stoakes (Oracle)
  2026-03-18  7:58   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 17:54 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:43PM +0100, David Hildenbrand (Arm) wrote:
> CONFIG_MEMORY_HOTPLUG now depends on CONFIG_SPARSEMEM_SPARSEMEM. So
> let's remove the !CONFIG_SPARSEMEM_VMEMMAP leftovers.

(As said on 6/14 that I inexplicably reviewed before this one) - might be worth
explicitly saying 'dead code' here to underline it.

>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Sparsemem? More like sparsecode now! Right? RIGHT? Anyway,

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/sparse.c | 61 -----------------------------------------------------
>  1 file changed, 61 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 93252112860e..636a4a0f1199 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -657,7 +657,6 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  	}
>  }
>
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static struct page * __meminit populate_section_memmap(unsigned long pfn,
>  		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>  		struct dev_pagemap *pgmap)
> @@ -729,66 +728,6 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
>
>  	return rc;
>  }
> -#else
> -static struct page * __meminit populate_section_memmap(unsigned long pfn,
> -		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
> -		struct dev_pagemap *pgmap)
> -{
> -	return kvmalloc_node(array_size(sizeof(struct page),
> -					PAGES_PER_SECTION), GFP_KERNEL, nid);
> -}
> -
> -static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
> -		struct vmem_altmap *altmap)
> -{
> -	kvfree(pfn_to_page(pfn));
> -}
> -
> -static void free_map_bootmem(struct page *memmap)
> -{
> -	unsigned long maps_section_nr, removing_section_nr, i;
> -	unsigned long type, nr_pages;
> -	struct page *page = virt_to_page(memmap);
> -
> -	nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))
> -		>> PAGE_SHIFT;
> -
> -	for (i = 0; i < nr_pages; i++, page++) {
> -		type = bootmem_type(page);
> -
> -		BUG_ON(type == NODE_INFO);
> -
> -		maps_section_nr = pfn_to_section_nr(page_to_pfn(page));
> -		removing_section_nr = bootmem_info(page);
> -
> -		/*
> -		 * When this function is called, the removing section is
> -		 * logical offlined state. This means all pages are isolated
> -		 * from page allocator. If removing section's memmap is placed
> -		 * on the same section, it must not be freed.
> -		 * If it is freed, page allocator may allocate it which will
> -		 * be removed physically soon.
> -		 */
> -		if (maps_section_nr != removing_section_nr)
> -			put_page_bootmem(page);
> -	}
> -}
> -
> -static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> -	return 0;
> -}
> -
> -static bool is_subsection_map_empty(struct mem_section *ms)
> -{
> -	return true;
> -}
> -
> -static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> -	return 0;
> -}
> -#endif /* CONFIG_SPARSEMEM_VMEMMAP */

So this was all dead code again? Ugh.

>
>  /*
>   * To deactivate a memory region, there are 3 cases to handle across
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 07/14] mm/bootmem_info: avoid using sparse_decode_mem_map()
  2026-03-17 16:56 ` [PATCH 07/14] mm/bootmem_info: avoid using sparse_decode_mem_map() David Hildenbrand (Arm)
@ 2026-03-17 18:02   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:20   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 18:02 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:45PM +0100, David Hildenbrand (Arm) wrote:
> With SPARSEMEM_VMEMMAP, we can just do a pfn_to_page(). It is not super
> clear whether the start_pfn is properly aligned ... so let's just make

Maybe worth saying aligned to the start of the section?

> sure it is.
>
> We will soon might try to remove the bootmem info completely, for now,
> just keep it working as is.

*gasp*

>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

If it boots ship it, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/bootmem_info.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
> index e61e08e24924..3d7675a3ae04 100644
> --- a/mm/bootmem_info.c
> +++ b/mm/bootmem_info.c
> @@ -44,17 +44,16 @@ static void __init register_page_bootmem_info_section(unsigned long start_pfn)
>  {
>  	unsigned long mapsize, section_nr, i;
>  	struct mem_section *ms;
> -	struct page *page, *memmap;
>  	struct mem_section_usage *usage;
> +	struct page *page;
>
> +	start_pfn = SECTION_ALIGN_DOWN(start_pfn);

Yeah SPARSE_VMEMMAP should make it this trivial.

>  	section_nr = pfn_to_section_nr(start_pfn);
>  	ms = __nr_to_section(section_nr);
>
> -	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> -

So this is nice...

>  	if (!preinited_vmemmap_section(ms))
> -		register_page_bootmem_memmap(section_nr, memmap,
> -				PAGES_PER_SECTION);
> +		register_page_bootmem_memmap(section_nr, pfn_to_page(start_pfn),
> +					     PAGES_PER_SECTION);
>
>  	usage = ms->usage;
>  	page = virt_to_page(usage);
> --
> 2.43.0
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 08/14] mm/sparse: remove sparse_decode_mem_map()
  2026-03-17 16:56 ` [PATCH 08/14] mm/sparse: remove sparse_decode_mem_map() David Hildenbrand (Arm)
@ 2026-03-17 19:25   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:20   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 19:25 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:46PM +0100, David Hildenbrand (Arm) wrote:
> section_deactivate() applies to CONFIG_SPARSEMEM_VMEMMAP only. So we can
> just use pfn_to_page() (after making sure we have the start PFN of the
> section), and remove sparse_decode_mem_map().
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

it's more like pfn_to_patch(), so that's a:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

From me!

> ---
>  include/linux/memory_hotplug.h |  2 --
>  mm/sparse.c                    | 16 +---------------
>  2 files changed, 1 insertion(+), 17 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index e77ef3d7ff73..815e908c4135 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -308,8 +308,6 @@ extern int sparse_add_section(int nid, unsigned long pfn,
>  		struct dev_pagemap *pgmap);
>  extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
>  				  struct vmem_altmap *altmap);
> -extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
> -					  unsigned long pnum);
>  extern struct zone *zone_for_pfn_range(enum mmop online_type,
>  		int nid, struct memory_group *group, unsigned long start_pfn,
>  		unsigned long nr_pages);
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 636a4a0f1199..2a1f662245bc 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -274,18 +274,6 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
>  	return coded_mem_map;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTPLUG
> -/*
> - * Decode mem_map from the coded memmap
> - */
> -struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
> -{
> -	/* mask off the extra low bits of information */
> -	coded_mem_map &= SECTION_MAP_MASK;
> -	return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
> -}
> -#endif /* CONFIG_MEMORY_HOTPLUG */
> -
>  static void __meminit sparse_init_one_section(struct mem_section *ms,
>  		unsigned long pnum, struct page *mem_map,
>  		struct mem_section_usage *usage, unsigned long flags)
> @@ -758,8 +746,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>
>  	empty = is_subsection_map_empty(ms);
>  	if (empty) {
> -		unsigned long section_nr = pfn_to_section_nr(pfn);
> -
>  		/*
>  		 * Mark the section invalid so that valid_section()
>  		 * return false. This prevents code from dereferencing
> @@ -778,7 +764,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  			kfree_rcu(ms->usage, rcu);
>  			WRITE_ONCE(ms->usage, NULL);
>  		}
> -		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> +		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
>  	}
>
>  	/*
> --
> 2.43.0
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
  2026-03-17 16:56 ` [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling David Hildenbrand (Arm)
@ 2026-03-17 19:48   ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:49     ` David Hildenbrand (Arm)
  2026-03-18  8:34   ` Mike Rapoport
  1 sibling, 1 reply; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 19:48 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:47PM +0100, David Hildenbrand (Arm) wrote:
> In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
> usemap on the section with pgdat") quite some complexity to try
> allocating memory for the "usemap" (storing pageblock information
> per memory section) for a memory section close to the memory of the
> "pgdat" of the node.
>
> The goal was to make memory hotunplug of boot memory more likely to
> succeed. That commit also added some checks for circular dependencies
> between two memory sections, whereby two memory sections would contain
> each others usemap, turning bot memory sections un-removable.

Typo: bot -> both. Presumably you are not talking about memory a bot of some
kind allocated :P

>
> However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
> together") started allocating the usemap for multiple memory
> sections on the same node in one chunk, effectively grouping all usemap
> allocations of the same node in a single memblock allocation.
>
> We don't really give guarantees about memory hotunplug of boot memory, and
> with the change in 2010, it is pretty much impossible in practice to get
> any circular dependencies.

Pretty much impossible? :) We can probably go so far as to so impossible no?

>
> commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
> pgdat") also added the comment:
>
> 	"Similarly, a pgdat can prevent a section being removed. If
> 	 section A contains a pgdat and section B
> 	 contains the usemap, both sections become inter-dependent."
>
> Given that we don't free the pgdat anymore, that comment (and handling)
> does not apply.

Isn't pgdat synonymous with a node and that's the data structure that describes
a node right? Confusingly typedef'd from pglist_data to pg_data_t but then
referred to as pgdat because all that makes so much sense :)

But I'm confused, does a section containing a pgdat mean a section having the
pgdat data structure literally allocated in it?

A usemap is... something that tracks pageblock metadata I think right?

Anyway I'm also confused by 'given we don't free the pgdat any more', but the
comment says a 'pgdat can prevent a section being removed' rather than anything
about it being removed?

I guess it means the OTHER section could be prevented from being removed even
after it's gone.. somehow?

Anyway! I think maybe this could be clearer, somehow :)

>
> So let's simply remove this complexity.
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

I think what you've done in the patch is right though, we're not doing any of
these dances after a4322e1bad91 and pgdats sitting around mean we don't really
care about where the usemap goes anyway I don't think so...

I usemap and I find myself in a place where I give you a:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

!

> ---
>  mm/sparse.c | 100 +---------------------------------------------------
>  1 file changed, 1 insertion(+), 99 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 2a1f662245bc..b57c81e99340 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -294,102 +294,6 @@ size_t mem_section_usage_size(void)
>  	return sizeof(struct mem_section_usage) + usemap_size();
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
> -static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
> -{
> -#ifndef CONFIG_NUMA
> -	VM_BUG_ON(pgdat != &contig_page_data);
> -	return __pa_symbol(&contig_page_data);
> -#else
> -	return __pa(pgdat);
> -#endif
> -}
> -
> -static struct mem_section_usage * __init
> -sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
> -					 unsigned long size)
> -{
> -	struct mem_section_usage *usage;
> -	unsigned long goal, limit;
> -	int nid;
> -	/*
> -	 * A page may contain usemaps for other sections preventing the
> -	 * page being freed and making a section unremovable while
> -	 * other sections referencing the usemap remain active. Similarly,
> -	 * a pgdat can prevent a section being removed. If section A
> -	 * contains a pgdat and section B contains the usemap, both
> -	 * sections become inter-dependent. This allocates usemaps
> -	 * from the same section as the pgdat where possible to avoid
> -	 * this problem.
> -	 */
> -	goal = pgdat_to_phys(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT);
> -	limit = goal + (1UL << PA_SECTION_SHIFT);
> -	nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
> -again:
> -	usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid);
> -	if (!usage && limit) {
> -		limit = MEMBLOCK_ALLOC_ACCESSIBLE;
> -		goto again;
> -	}
> -	return usage;
> -}
> -
> -static void __init check_usemap_section_nr(int nid,
> -		struct mem_section_usage *usage)
> -{
> -	unsigned long usemap_snr, pgdat_snr;
> -	static unsigned long old_usemap_snr;
> -	static unsigned long old_pgdat_snr;
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	int usemap_nid;
> -
> -	/* First call */
> -	if (!old_usemap_snr) {
> -		old_usemap_snr = NR_MEM_SECTIONS;
> -		old_pgdat_snr = NR_MEM_SECTIONS;
> -	}
> -
> -	usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT);
> -	pgdat_snr = pfn_to_section_nr(pgdat_to_phys(pgdat) >> PAGE_SHIFT);
> -	if (usemap_snr == pgdat_snr)
> -		return;
> -
> -	if (old_usemap_snr == usemap_snr && old_pgdat_snr == pgdat_snr)
> -		/* skip redundant message */
> -		return;
> -
> -	old_usemap_snr = usemap_snr;
> -	old_pgdat_snr = pgdat_snr;
> -
> -	usemap_nid = sparse_early_nid(__nr_to_section(usemap_snr));
> -	if (usemap_nid != nid) {
> -		pr_info("node %d must be removed before remove section %ld\n",
> -			nid, usemap_snr);
> -		return;
> -	}
> -	/*
> -	 * There is a circular dependency.
> -	 * Some platforms allow un-removable section because they will just
> -	 * gather other removable sections for dynamic partitioning.
> -	 * Just notify un-removable section's number here.
> -	 */
> -	pr_info("Section %ld and %ld (node %d) have a circular dependency on usemap and pgdat allocations\n",
> -		usemap_snr, pgdat_snr, nid);
> -}
> -#else
> -static struct mem_section_usage * __init
> -sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
> -					 unsigned long size)
> -{
> -	return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id);
> -}
> -
> -static void __init check_usemap_section_nr(int nid,
> -		struct mem_section_usage *usage)
> -{
> -}
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
> -
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  unsigned long __init section_map_size(void)
>  {
> @@ -486,7 +390,6 @@ void __init sparse_init_early_section(int nid, struct page *map,
>  				      unsigned long pnum, unsigned long flags)
>  {
>  	BUG_ON(!sparse_usagebuf || sparse_usagebuf >= sparse_usagebuf_end);
> -	check_usemap_section_nr(nid, sparse_usagebuf);
>  	sparse_init_one_section(__nr_to_section(pnum), pnum, map,
>  			sparse_usagebuf, SECTION_IS_EARLY | flags);
>  	sparse_usagebuf = (void *)sparse_usagebuf + mem_section_usage_size();
> @@ -497,8 +400,7 @@ static int __init sparse_usage_init(int nid, unsigned long map_count)
>  	unsigned long size;
>
>  	size = mem_section_usage_size() * map_count;
> -	sparse_usagebuf = sparse_early_usemaps_alloc_pgdat_section(
> -				NODE_DATA(nid), size);
> +	sparse_usagebuf = memblock_alloc_node(size, SMP_CACHE_BYTES, nid);

I guess nid here is the same node as the pgdat?

>  	if (!sparse_usagebuf) {
>  		sparse_usagebuf_end = NULL;
>  		return -ENOMEM;
> --
> 2.43.0
>

This is quite the simplification :)

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c
  2026-03-17 16:56 ` [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c David Hildenbrand (Arm)
@ 2026-03-17 19:51   ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:59     ` David Hildenbrand (Arm)
  2026-03-18  8:46   ` Mike Rapoport
  1 sibling, 1 reply; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 19:51 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:48PM +0100, David Hildenbrand (Arm) wrote:
> We want to move subsection_map_init() to mm/sparse-vmemmap.c.
>
> To prepare for getting rid of subsection_map_init() in mm/sparse.c
> completely, use a static inline function for !CONFIG_SPARSEMEM_VMEMMAP.
>
> While at it, move the declaration to internal.h and rename it to
> "sparse_init_subsection_map()".

Why not init_sparse_subsection_map()??
Or sparse_init_map_subsection()????
Or <all other permutations>

Joking that's fine ;)

>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

You've initialised the sparse subsection of my heart, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  include/linux/mmzone.h |  3 ---
>  mm/internal.h          | 12 ++++++++++++
>  mm/mm_init.c           |  2 +-
>  mm/sparse.c            |  6 +-----
>  4 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7bd0134c241c..b694c69dee04 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2002,8 +2002,6 @@ struct mem_section_usage {
>  	unsigned long pageblock_flags[0];
>  };
>
> -void subsection_map_init(unsigned long pfn, unsigned long nr_pages);
> -
>  struct page;
>  struct page_ext;
>  struct mem_section {
> @@ -2396,7 +2394,6 @@ static inline unsigned long next_present_section_nr(unsigned long section_nr)
>  #define sparse_vmemmap_init_nid_early(_nid) do {} while (0)
>  #define sparse_vmemmap_init_nid_late(_nid) do {} while (0)
>  #define pfn_in_present_section pfn_valid
> -#define subsection_map_init(_pfn, _nr_pages) do {} while (0)
>  #endif /* CONFIG_SPARSEMEM */
>
>  /*
> diff --git a/mm/internal.h b/mm/internal.h
> index f98f4746ac41..5f5c45d80aca 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -960,12 +960,24 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
>  		unsigned long, enum meminit_context, struct vmem_altmap *, int,
>  		bool);
>
> +/*
> + * mm/sparse.c
> + */
>  #ifdef CONFIG_SPARSEMEM
>  void sparse_init(void);
>  #else
>  static inline void sparse_init(void) {}
>  #endif /* CONFIG_SPARSEMEM */
>
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +void sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages);
> +#else
> +static inline void sparse_init_subsection_map(unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +}
> +#endif /* CONFIG_SPARSEMEM_VMEMMAP */
> +
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>
>  /*
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 969048f9b320..3c5f18537cd1 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1898,7 +1898,7 @@ static void __init free_area_init(void)
>  		pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
>  			(u64)start_pfn << PAGE_SHIFT,
>  			((u64)end_pfn << PAGE_SHIFT) - 1);
> -		subsection_map_init(start_pfn, end_pfn - start_pfn);
> +		sparse_init_subsection_map(start_pfn, end_pfn - start_pfn);
>  	}
>
>  	/* Initialise every node */
> diff --git a/mm/sparse.c b/mm/sparse.c
> index b57c81e99340..7b0bfea73a9b 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -185,7 +185,7 @@ static void subsection_mask_set(unsigned long *map, unsigned long pfn,
>  	bitmap_set(map, idx, end - idx + 1);
>  }
>
> -void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
> +void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
>  {
>  	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
>  	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
> @@ -207,10 +207,6 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
>  		nr_pages -= pfns;
>  	}
>  }
> -#else
> -void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
> -{
> -}
>  #endif
>
>  /* Record a memory area against a node. */
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 11/14] mm/sparse: drop set_section_nid() from sparse_add_section()
  2026-03-17 16:56 ` [PATCH 11/14] mm/sparse: drop set_section_nid() from sparse_add_section() David Hildenbrand (Arm)
@ 2026-03-17 19:55   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:50   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 19:55 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:49PM +0100, David Hildenbrand (Arm) wrote:
> CONFIG_MEMORY_HOTPLUG is CONFIG_SPARSEMEM_VMEMMAP-only. And
> CONFIG_SPARSEMEM_VMEMMAP implies that NODE_NOT_IN_PAGE_FLAGS cannot be set:
> see include/linux/page-flags-layout.h
>
> 	...
> 	#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
> 	#error "Vmemmap: No space for nodes field in page flags"
> 	...
>
> So let's remove the set_section_nid() call to prepare for moving
> CONFIG_MEMORY_HOTPLUG to mm/sparse-vmemmap.c
>

Maybe worth mentioning:

#ifdef NODE_NOT_IN_PAGE_FLAGS
...
static void set_section_nid(unsigned long section_nr, int nid)
{
	... actually does something ...
}
#else /* !NODE_NOT_IN_PAGE_FLAGS */
static inline void set_section_nid(unsigned long section_nr, int nid)
{
}
#endif

Or more succintly 'set_section_nid() is a nop' or something :P

> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

You may have dropped the call, but you've not dropped the ball, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/sparse.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 7b0bfea73a9b..b5a2de43ac40 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -769,7 +769,6 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>  	page_init_poison(memmap, sizeof(struct page) * nr_pages);
>
>  	ms = __nr_to_section(section_nr);
> -	set_section_nid(section_nr, nid);
>  	__section_mark_present(ms, section_nr);
>
>  	/* Align memmap to section boundary in the subsection case */
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 12/14] mm/sparse: move sparse_init_one_section() to internal.h
  2026-03-17 16:56 ` [PATCH 12/14] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
@ 2026-03-17 20:00   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:54   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 20:00 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:50PM +0100, David Hildenbrand (Arm) wrote:
> While at it, convert the BUG_ON to a WARN_ON, avoid long lines, and merge
> sparse_encode_mem_map() into sparse_init_one_section().
>
> Clarify the comment a bit, pointing at page_to_pfn().
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Don't need a long line to merge my:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Here!

> ---
>  include/linux/mmzone.h |  2 +-
>  mm/internal.h          | 22 ++++++++++++++++++++++
>  mm/sparse.c            | 24 ------------------------
>  3 files changed, 23 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b694c69dee04..dcbbf36ed88c 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2008,7 +2008,7 @@ struct mem_section {
>  	/*
>  	 * This is, logically, a pointer to an array of struct
>  	 * pages.  However, it is stored with some other magic.
> -	 * (see sparse.c::sparse_init_one_section())

What::the::hell::was::this::before, we're not C++ developers!!

> +	 * (see sparse_init_one_section())
>  	 *
>  	 * Additionally during early boot we encode node id of
>  	 * the location of the section here to guide allocation.
> diff --git a/mm/internal.h b/mm/internal.h
> index 5f5c45d80aca..bcf4df97b185 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -965,6 +965,28 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
>   */
>  #ifdef CONFIG_SPARSEMEM
>  void sparse_init(void);
> +
> +static inline void sparse_init_one_section(struct mem_section *ms,
> +		unsigned long pnum, struct page *mem_map,
> +		struct mem_section_usage *usage, unsigned long flags)
> +{
> +	unsigned long coded_mem_map;
> +
> +	BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
> +
> +	/*
> +	 * We encode the start PFN of the section into the mem_map such that
> +	 * page_to_pfn() on !CONFIG_SPARSEMEM_VMEMMAP can simply subtract it
> +	 * from the page pointer to obtain the PFN.
> +	 */
> +	coded_mem_map = (unsigned long)(mem_map - section_nr_to_pfn(pnum));
> +	VM_WARN_ON(coded_mem_map & ~SECTION_MAP_MASK);

Maybe VM_WARN_ON_ONCE()?

> +
> +	ms->section_mem_map &= ~SECTION_MAP_MASK;
> +	ms->section_mem_map |= coded_mem_map;
> +	ms->section_mem_map |= SECTION_HAS_MEM_MAP | flags;

I mean this is pretty nitty but prefer 'flags | SECTION_HAS_MEM_MAP' to
show you appended to the flags :P

> +	ms->usage = usage;
> +}
>  #else
>  static inline void sparse_init(void) {}
>  #endif /* CONFIG_SPARSEMEM */
> diff --git a/mm/sparse.c b/mm/sparse.c
> index b5a2de43ac40..6f5f340301a3 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -256,30 +256,6 @@ static void __init memblocks_present(void)
>  		memory_present(nid, start, end);
>  }
>
> -/*
> - * Subtle, we encode the real pfn into the mem_map such that
> - * the identity pfn - section_mem_map will return the actual
> - * physical page frame number.
> - */
> -static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long pnum)
> -{
> -	unsigned long coded_mem_map =
> -		(unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
> -	BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
> -	BUG_ON(coded_mem_map & ~SECTION_MAP_MASK);
> -	return coded_mem_map;
> -}
> -
> -static void __meminit sparse_init_one_section(struct mem_section *ms,
> -		unsigned long pnum, struct page *mem_map,
> -		struct mem_section_usage *usage, unsigned long flags)
> -{
> -	ms->section_mem_map &= ~SECTION_MAP_MASK;
> -	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum)
> -		| SECTION_HAS_MEM_MAP | flags;
> -	ms->usage = usage;
> -}
> -
>  static unsigned long usemap_size(void)
>  {
>  	return BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS) * sizeof(unsigned long);
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 13/14] mm/sparse: move __section_mark_present() to internal.h
  2026-03-17 16:56 ` [PATCH 13/14] mm/sparse: move __section_mark_present() " David Hildenbrand (Arm)
@ 2026-03-17 20:01   ` Lorenzo Stoakes (Oracle)
  2026-03-18  8:56   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 20:01 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:51PM +0100, David Hildenbrand (Arm) wrote:
> Let's prepare for moving memory hotplug handling from sparse.c to
> sparse-vmemmap.c by moving __section_mark_present() to internal.h.
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Let's prepare for me finishing reviewing the series with:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/internal.h | 9 +++++++++
>  mm/sparse.c   | 8 --------
>  2 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index bcf4df97b185..835a6f00134e 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -987,6 +987,15 @@ static inline void sparse_init_one_section(struct mem_section *ms,
>  	ms->section_mem_map |= SECTION_HAS_MEM_MAP | flags;
>  	ms->usage = usage;
>  }
> +
> +static inline void __section_mark_present(struct mem_section *ms,
> +		unsigned long section_nr)
> +{
> +	if (section_nr > __highest_present_section_nr)
> +		__highest_present_section_nr = section_nr;
> +
> +	ms->section_mem_map |= SECTION_MARKED_PRESENT;
> +}
>  #else
>  static inline void sparse_init(void) {}
>  #endif /* CONFIG_SPARSEMEM */
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 6f5f340301a3..bf620f3fe05d 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -161,14 +161,6 @@ static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
>   * those loops early.
>   */
>  unsigned long __highest_present_section_nr;
> -static void __section_mark_present(struct mem_section *ms,
> -		unsigned long section_nr)
> -{
> -	if (section_nr > __highest_present_section_nr)
> -		__highest_present_section_nr = section_nr;
> -
> -	ms->section_mem_map |= SECTION_MARKED_PRESENT;
> -}
>
>  static inline unsigned long first_present_section_nr(void)
>  {
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c
  2026-03-17 16:56 ` [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c David Hildenbrand (Arm)
@ 2026-03-17 20:09   ` Lorenzo Stoakes (Oracle)
  2026-03-20 19:07     ` David Hildenbrand (Arm)
  2026-03-18  8:57   ` Mike Rapoport
  1 sibling, 1 reply; 53+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 20:09 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, Mar 17, 2026 at 05:56:52PM +0100, David Hildenbrand (Arm) wrote:
> Let's move all memory hoptplug related code to sparse-vmemmap.c.
>
> We only have to expose sparse_index_init(). While at it, drop the
> definition of sparse_index_init() for !CONFIG_SPARSEMEM, which is unused,
> and place the declaration in internal.h.
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Looking through this it looks like it is just a code move modulo the other bits
you metion, overall very nice cleanup, so let me hotplug my:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Into this review!

OK I've actually cringed terribly at the puns here and maybe I'm cured for life
from doing that again ;)

Cheers, Lorenzo

> ---
>  include/linux/mmzone.h |   1 -
>  mm/internal.h          |   4 +
>  mm/sparse-vmemmap.c    | 308 ++++++++++++++++++++++++++++++++++++++++
>  mm/sparse.c            | 314 +----------------------------------------
>  4 files changed, 314 insertions(+), 313 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index dcbbf36ed88c..e11513f581eb 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2390,7 +2390,6 @@ static inline unsigned long next_present_section_nr(unsigned long section_nr)
>  #endif
>
>  #else
> -#define sparse_index_init(_sec, _nid)  do {} while (0)
>  #define sparse_vmemmap_init_nid_early(_nid) do {} while (0)
>  #define sparse_vmemmap_init_nid_late(_nid) do {} while (0)
>  #define pfn_in_present_section pfn_valid
> diff --git a/mm/internal.h b/mm/internal.h
> index 835a6f00134e..b1a9e9312ffe 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -965,6 +965,7 @@ void memmap_init_range(unsigned long, int, unsigned long, unsigned long,
>   */
>  #ifdef CONFIG_SPARSEMEM
>  void sparse_init(void);
> +int sparse_index_init(unsigned long section_nr, int nid);
>
>  static inline void sparse_init_one_section(struct mem_section *ms,
>  		unsigned long pnum, struct page *mem_map,
> @@ -1000,6 +1001,9 @@ static inline void __section_mark_present(struct mem_section *ms,
>  static inline void sparse_init(void) {}
>  #endif /* CONFIG_SPARSEMEM */
>
> +/*
> + * mm/sparse-vmemmap.c
> + */
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  void sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages);
>  #else
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index f0690797667f..330579365a0f 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -591,3 +591,311 @@ void __init sparse_vmemmap_init_nid_late(int nid)
>  	hugetlb_vmemmap_init_late(nid);
>  }
>  #endif
> +
> +static void subsection_mask_set(unsigned long *map, unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +	int idx = subsection_map_index(pfn);
> +	int end = subsection_map_index(pfn + nr_pages - 1);
> +
> +	bitmap_set(map, idx, end - idx + 1);
> +}
> +
> +void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> +	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
> +	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
> +
> +	for (nr = start_sec_nr; nr <= end_sec_nr; nr++) {
> +		struct mem_section *ms;
> +		unsigned long pfns;
> +
> +		pfns = min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK));
> +		ms = __nr_to_section(nr);
> +		subsection_mask_set(ms->usage->subsection_map, pfn, pfns);
> +
> +		pr_debug("%s: sec: %lu pfns: %lu set(%d, %d)\n", __func__, nr,
> +				pfns, subsection_map_index(pfn),
> +				subsection_map_index(pfn + pfns - 1));
> +
> +		pfn += pfns;
> +		nr_pages -= pfns;
> +	}
> +}
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +
> +/* Mark all memory sections within the pfn range as online */
> +void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> +		unsigned long section_nr = pfn_to_section_nr(pfn);
> +		struct mem_section *ms = __nr_to_section(section_nr);
> +
> +		ms->section_mem_map |= SECTION_IS_ONLINE;
> +	}
> +}
> +
> +/* Mark all memory sections within the pfn range as offline */
> +void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> +		unsigned long section_nr = pfn_to_section_nr(pfn);
> +		struct mem_section *ms = __nr_to_section(section_nr);
> +
> +		ms->section_mem_map &= ~SECTION_IS_ONLINE;
> +	}
> +}
> +
> +static struct page * __meminit populate_section_memmap(unsigned long pfn,
> +		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
> +		struct dev_pagemap *pgmap)
> +{
> +	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
> +}
> +
> +static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
> +		struct vmem_altmap *altmap)
> +{
> +	unsigned long start = (unsigned long) pfn_to_page(pfn);
> +	unsigned long end = start + nr_pages * sizeof(struct page);
> +
> +	vmemmap_free(start, end, altmap);
> +}
> +static void free_map_bootmem(struct page *memmap)
> +{
> +	unsigned long start = (unsigned long)memmap;
> +	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
> +
> +	vmemmap_free(start, end, NULL);
> +}
> +
> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> +	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> +	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +	unsigned long *subsection_map = ms->usage
> +		? &ms->usage->subsection_map[0] : NULL;
> +
> +	subsection_mask_set(map, pfn, nr_pages);
> +	if (subsection_map)
> +		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
> +
> +	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
> +				"section already deactivated (%#lx + %ld)\n",
> +				pfn, nr_pages))
> +		return -EINVAL;
> +
> +	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> +	return 0;
> +}
> +
> +static bool is_subsection_map_empty(struct mem_section *ms)
> +{
> +	return bitmap_empty(&ms->usage->subsection_map[0],
> +			    SUBSECTIONS_PER_SECTION);
> +}
> +
> +static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> +	unsigned long *subsection_map;
> +	int rc = 0;
> +
> +	subsection_mask_set(map, pfn, nr_pages);
> +
> +	subsection_map = &ms->usage->subsection_map[0];
> +
> +	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
> +		rc = -EINVAL;
> +	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
> +		rc = -EEXIST;
> +	else
> +		bitmap_or(subsection_map, map, subsection_map,
> +				SUBSECTIONS_PER_SECTION);
> +
> +	return rc;
> +}
> +
> +/*
> + * To deactivate a memory region, there are 3 cases to handle across
> + * two configurations (SPARSEMEM_VMEMMAP={y,n}):
> + *
> + * 1. deactivation of a partial hot-added section (only possible in
> + *    the SPARSEMEM_VMEMMAP=y case).
> + *      a) section was present at memory init.
> + *      b) section was hot-added post memory init.
> + * 2. deactivation of a complete hot-added section.
> + * 3. deactivation of a complete section from memory init.
> + *
> + * For 1, when subsection_map does not empty we will not be freeing the
> + * usage map, but still need to free the vmemmap range.
> + *
> + * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
> + */
> +static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> +		struct vmem_altmap *altmap)
> +{
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +	bool section_is_early = early_section(ms);
> +	struct page *memmap = NULL;
> +	bool empty;
> +
> +	if (clear_subsection_map(pfn, nr_pages))
> +		return;
> +
> +	empty = is_subsection_map_empty(ms);
> +	if (empty) {
> +		/*
> +		 * Mark the section invalid so that valid_section()
> +		 * return false. This prevents code from dereferencing
> +		 * ms->usage array.
> +		 */
> +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> +
> +		/*
> +		 * When removing an early section, the usage map is kept (as the
> +		 * usage maps of other sections fall into the same page). It
> +		 * will be re-used when re-adding the section - which is then no
> +		 * longer an early section. If the usage map is PageReserved, it
> +		 * was allocated during boot.
> +		 */
> +		if (!PageReserved(virt_to_page(ms->usage))) {
> +			kfree_rcu(ms->usage, rcu);
> +			WRITE_ONCE(ms->usage, NULL);
> +		}
> +		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
> +	}
> +
> +	/*
> +	 * The memmap of early sections is always fully populated. See
> +	 * section_activate() and pfn_valid() .
> +	 */
> +	if (!section_is_early) {
> +		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
> +		depopulate_section_memmap(pfn, nr_pages, altmap);
> +	} else if (memmap) {
> +		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
> +							  PAGE_SIZE)));
> +		free_map_bootmem(memmap);
> +	}
> +
> +	if (empty)
> +		ms->section_mem_map = (unsigned long)NULL;
> +}
> +
> +static struct page * __meminit section_activate(int nid, unsigned long pfn,
> +		unsigned long nr_pages, struct vmem_altmap *altmap,
> +		struct dev_pagemap *pgmap)
> +{
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +	struct mem_section_usage *usage = NULL;
> +	struct page *memmap;
> +	int rc;
> +
> +	if (!ms->usage) {
> +		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
> +		if (!usage)
> +			return ERR_PTR(-ENOMEM);
> +		ms->usage = usage;
> +	}
> +
> +	rc = fill_subsection_map(pfn, nr_pages);
> +	if (rc) {
> +		if (usage)
> +			ms->usage = NULL;
> +		kfree(usage);
> +		return ERR_PTR(rc);
> +	}
> +
> +	/*
> +	 * The early init code does not consider partially populated
> +	 * initial sections, it simply assumes that memory will never be
> +	 * referenced.  If we hot-add memory into such a section then we
> +	 * do not need to populate the memmap and can simply reuse what
> +	 * is already there.
> +	 */
> +	if (nr_pages < PAGES_PER_SECTION && early_section(ms))
> +		return pfn_to_page(pfn);
> +
> +	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
> +	if (!memmap) {
> +		section_deactivate(pfn, nr_pages, altmap);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
> +
> +	return memmap;
> +}
> +
> +/**
> + * sparse_add_section - add a memory section, or populate an existing one
> + * @nid: The node to add section on
> + * @start_pfn: start pfn of the memory range
> + * @nr_pages: number of pfns to add in the section
> + * @altmap: alternate pfns to allocate the memmap backing store
> + * @pgmap: alternate compound page geometry for devmap mappings
> + *
> + * This is only intended for hotplug.
> + *
> + * Note that only VMEMMAP supports sub-section aligned hotplug,
> + * the proper alignment and size are gated by check_pfn_span().
> + *
> + *
> + * Return:
> + * * 0		- On success.
> + * * -EEXIST	- Section has been present.
> + * * -ENOMEM	- Out of memory.
> + */
> +int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> +		unsigned long nr_pages, struct vmem_altmap *altmap,
> +		struct dev_pagemap *pgmap)
> +{
> +	unsigned long section_nr = pfn_to_section_nr(start_pfn);
> +	struct mem_section *ms;
> +	struct page *memmap;
> +	int ret;
> +
> +	ret = sparse_index_init(section_nr, nid);
> +	if (ret < 0)
> +		return ret;
> +
> +	memmap = section_activate(nid, start_pfn, nr_pages, altmap, pgmap);
> +	if (IS_ERR(memmap))
> +		return PTR_ERR(memmap);
> +
> +	/*
> +	 * Poison uninitialized struct pages in order to catch invalid flags
> +	 * combinations.
> +	 */
> +	page_init_poison(memmap, sizeof(struct page) * nr_pages);
> +
> +	ms = __nr_to_section(section_nr);
> +	__section_mark_present(ms, section_nr);
> +
> +	/* Align memmap to section boundary in the subsection case */
> +	if (section_nr_to_pfn(section_nr) != start_pfn)
> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> +	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
> +
> +	return 0;
> +}
> +
> +void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
> +			   struct vmem_altmap *altmap)
> +{
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +
> +	if (WARN_ON_ONCE(!valid_section(ms)))
> +		return;
> +
> +	section_deactivate(pfn, nr_pages, altmap);
> +}
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/mm/sparse.c b/mm/sparse.c
> index bf620f3fe05d..007fd52c621e 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -79,7 +79,7 @@ static noinline struct mem_section __ref *sparse_index_alloc(int nid)
>  	return section;
>  }
>
> -static int __meminit sparse_index_init(unsigned long section_nr, int nid)
> +int __meminit sparse_index_init(unsigned long section_nr, int nid)
>  {
>  	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
>  	struct mem_section *section;
> @@ -103,7 +103,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
>  	return 0;
>  }
>  #else /* !SPARSEMEM_EXTREME */
> -static inline int sparse_index_init(unsigned long section_nr, int nid)
> +int sparse_index_init(unsigned long section_nr, int nid)
>  {
>  	return 0;
>  }
> @@ -167,40 +167,6 @@ static inline unsigned long first_present_section_nr(void)
>  	return next_present_section_nr(-1);
>  }
>
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
> -static void subsection_mask_set(unsigned long *map, unsigned long pfn,
> -		unsigned long nr_pages)
> -{
> -	int idx = subsection_map_index(pfn);
> -	int end = subsection_map_index(pfn + nr_pages - 1);
> -
> -	bitmap_set(map, idx, end - idx + 1);
> -}
> -
> -void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> -	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
> -	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
> -
> -	for (nr = start_sec_nr; nr <= end_sec_nr; nr++) {
> -		struct mem_section *ms;
> -		unsigned long pfns;
> -
> -		pfns = min(nr_pages, PAGES_PER_SECTION
> -				- (pfn & ~PAGE_SECTION_MASK));
> -		ms = __nr_to_section(nr);
> -		subsection_mask_set(ms->usage->subsection_map, pfn, pfns);
> -
> -		pr_debug("%s: sec: %lu pfns: %lu set(%d, %d)\n", __func__, nr,
> -				pfns, subsection_map_index(pfn),
> -				subsection_map_index(pfn + pfns - 1));
> -
> -		pfn += pfns;
> -		nr_pages -= pfns;
> -	}
> -}
> -#endif
> -
>  /* Record a memory area against a node. */
>  static void __init memory_present(int nid, unsigned long start, unsigned long end)
>  {
> @@ -482,279 +448,3 @@ void __init sparse_init(void)
>  	sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
>  	vmemmap_populate_print_last();
>  }
> -
> -#ifdef CONFIG_MEMORY_HOTPLUG
> -
> -/* Mark all memory sections within the pfn range as online */
> -void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
> -{
> -	unsigned long pfn;
> -
> -	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> -		unsigned long section_nr = pfn_to_section_nr(pfn);
> -		struct mem_section *ms = __nr_to_section(section_nr);
> -
> -		ms->section_mem_map |= SECTION_IS_ONLINE;
> -	}
> -}
> -
> -/* Mark all memory sections within the pfn range as offline */
> -void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
> -{
> -	unsigned long pfn;
> -
> -	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> -		unsigned long section_nr = pfn_to_section_nr(pfn);
> -		struct mem_section *ms = __nr_to_section(section_nr);
> -
> -		ms->section_mem_map &= ~SECTION_IS_ONLINE;
> -	}
> -}
> -
> -static struct page * __meminit populate_section_memmap(unsigned long pfn,
> -		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
> -		struct dev_pagemap *pgmap)
> -{
> -	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
> -}
> -
> -static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
> -		struct vmem_altmap *altmap)
> -{
> -	unsigned long start = (unsigned long) pfn_to_page(pfn);
> -	unsigned long end = start + nr_pages * sizeof(struct page);
> -
> -	vmemmap_free(start, end, altmap);
> -}
> -static void free_map_bootmem(struct page *memmap)
> -{
> -	unsigned long start = (unsigned long)memmap;
> -	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
> -
> -	vmemmap_free(start, end, NULL);
> -}
> -
> -static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> -	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> -	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> -	struct mem_section *ms = __pfn_to_section(pfn);
> -	unsigned long *subsection_map = ms->usage
> -		? &ms->usage->subsection_map[0] : NULL;
> -
> -	subsection_mask_set(map, pfn, nr_pages);
> -	if (subsection_map)
> -		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -
> -	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
> -				"section already deactivated (%#lx + %ld)\n",
> -				pfn, nr_pages))
> -		return -EINVAL;
> -
> -	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -	return 0;
> -}
> -
> -static bool is_subsection_map_empty(struct mem_section *ms)
> -{
> -	return bitmap_empty(&ms->usage->subsection_map[0],
> -			    SUBSECTIONS_PER_SECTION);
> -}
> -
> -static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> -	struct mem_section *ms = __pfn_to_section(pfn);
> -	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> -	unsigned long *subsection_map;
> -	int rc = 0;
> -
> -	subsection_mask_set(map, pfn, nr_pages);
> -
> -	subsection_map = &ms->usage->subsection_map[0];
> -
> -	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
> -		rc = -EINVAL;
> -	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
> -		rc = -EEXIST;
> -	else
> -		bitmap_or(subsection_map, map, subsection_map,
> -				SUBSECTIONS_PER_SECTION);
> -
> -	return rc;
> -}
> -
> -/*
> - * To deactivate a memory region, there are 3 cases to handle across
> - * two configurations (SPARSEMEM_VMEMMAP={y,n}):
> - *
> - * 1. deactivation of a partial hot-added section (only possible in
> - *    the SPARSEMEM_VMEMMAP=y case).
> - *      a) section was present at memory init.
> - *      b) section was hot-added post memory init.
> - * 2. deactivation of a complete hot-added section.
> - * 3. deactivation of a complete section from memory init.
> - *
> - * For 1, when subsection_map does not empty we will not be freeing the
> - * usage map, but still need to free the vmemmap range.
> - *
> - * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
> - */
> -static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> -		struct vmem_altmap *altmap)
> -{
> -	struct mem_section *ms = __pfn_to_section(pfn);
> -	bool section_is_early = early_section(ms);
> -	struct page *memmap = NULL;
> -	bool empty;
> -
> -	if (clear_subsection_map(pfn, nr_pages))
> -		return;
> -
> -	empty = is_subsection_map_empty(ms);
> -	if (empty) {
> -		/*
> -		 * Mark the section invalid so that valid_section()
> -		 * return false. This prevents code from dereferencing
> -		 * ms->usage array.
> -		 */
> -		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> -
> -		/*
> -		 * When removing an early section, the usage map is kept (as the
> -		 * usage maps of other sections fall into the same page). It
> -		 * will be re-used when re-adding the section - which is then no
> -		 * longer an early section. If the usage map is PageReserved, it
> -		 * was allocated during boot.
> -		 */
> -		if (!PageReserved(virt_to_page(ms->usage))) {
> -			kfree_rcu(ms->usage, rcu);
> -			WRITE_ONCE(ms->usage, NULL);
> -		}
> -		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
> -	}
> -
> -	/*
> -	 * The memmap of early sections is always fully populated. See
> -	 * section_activate() and pfn_valid() .
> -	 */
> -	if (!section_is_early) {
> -		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
> -		depopulate_section_memmap(pfn, nr_pages, altmap);
> -	} else if (memmap) {
> -		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
> -							  PAGE_SIZE)));
> -		free_map_bootmem(memmap);
> -	}
> -
> -	if (empty)
> -		ms->section_mem_map = (unsigned long)NULL;
> -}
> -
> -static struct page * __meminit section_activate(int nid, unsigned long pfn,
> -		unsigned long nr_pages, struct vmem_altmap *altmap,
> -		struct dev_pagemap *pgmap)
> -{
> -	struct mem_section *ms = __pfn_to_section(pfn);
> -	struct mem_section_usage *usage = NULL;
> -	struct page *memmap;
> -	int rc;
> -
> -	if (!ms->usage) {
> -		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
> -		if (!usage)
> -			return ERR_PTR(-ENOMEM);
> -		ms->usage = usage;
> -	}
> -
> -	rc = fill_subsection_map(pfn, nr_pages);
> -	if (rc) {
> -		if (usage)
> -			ms->usage = NULL;
> -		kfree(usage);
> -		return ERR_PTR(rc);
> -	}
> -
> -	/*
> -	 * The early init code does not consider partially populated
> -	 * initial sections, it simply assumes that memory will never be
> -	 * referenced.  If we hot-add memory into such a section then we
> -	 * do not need to populate the memmap and can simply reuse what
> -	 * is already there.
> -	 */
> -	if (nr_pages < PAGES_PER_SECTION && early_section(ms))
> -		return pfn_to_page(pfn);
> -
> -	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
> -	if (!memmap) {
> -		section_deactivate(pfn, nr_pages, altmap);
> -		return ERR_PTR(-ENOMEM);
> -	}
> -	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
> -
> -	return memmap;
> -}
> -
> -/**
> - * sparse_add_section - add a memory section, or populate an existing one
> - * @nid: The node to add section on
> - * @start_pfn: start pfn of the memory range
> - * @nr_pages: number of pfns to add in the section
> - * @altmap: alternate pfns to allocate the memmap backing store
> - * @pgmap: alternate compound page geometry for devmap mappings
> - *
> - * This is only intended for hotplug.
> - *
> - * Note that only VMEMMAP supports sub-section aligned hotplug,
> - * the proper alignment and size are gated by check_pfn_span().
> - *
> - *
> - * Return:
> - * * 0		- On success.
> - * * -EEXIST	- Section has been present.
> - * * -ENOMEM	- Out of memory.
> - */
> -int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> -		unsigned long nr_pages, struct vmem_altmap *altmap,
> -		struct dev_pagemap *pgmap)
> -{
> -	unsigned long section_nr = pfn_to_section_nr(start_pfn);
> -	struct mem_section *ms;
> -	struct page *memmap;
> -	int ret;
> -
> -	ret = sparse_index_init(section_nr, nid);
> -	if (ret < 0)
> -		return ret;
> -
> -	memmap = section_activate(nid, start_pfn, nr_pages, altmap, pgmap);
> -	if (IS_ERR(memmap))
> -		return PTR_ERR(memmap);
> -
> -	/*
> -	 * Poison uninitialized struct pages in order to catch invalid flags
> -	 * combinations.
> -	 */
> -	page_init_poison(memmap, sizeof(struct page) * nr_pages);
> -
> -	ms = __nr_to_section(section_nr);
> -	__section_mark_present(ms, section_nr);
> -
> -	/* Align memmap to section boundary in the subsection case */
> -	if (section_nr_to_pfn(section_nr) != start_pfn)
> -		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> -	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
> -
> -	return 0;
> -}
> -
> -void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
> -			   struct vmem_altmap *altmap)
> -{
> -	struct mem_section *ms = __pfn_to_section(pfn);
> -
> -	if (WARN_ON_ONCE(!valid_section(ms)))
> -		return;
> -
> -	section_deactivate(pfn, nr_pages, altmap);
> -}
> -#endif /* CONFIG_MEMORY_HOTPLUG */
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage
  2026-03-17 16:56 ` [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
  2026-03-17 17:19   ` Lorenzo Stoakes (Oracle)
@ 2026-03-17 20:30   ` David Hildenbrand (Arm)
  2026-03-18  7:51   ` Mike Rapoport
  2 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko

On 3/17/26 17:56, David Hildenbrand (Arm) wrote:
> When offlining memory, we know that the memory range has no holes.
> Checking for valid pfns is not required.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
>  mm/memory_hotplug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 86d3faf50453..3495d94587e7 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1746,7 +1746,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  {
>  	unsigned long pfn;
>  
> -	for_each_valid_pfn(pfn, start, end) {
> +	for (pfn = start; pfn < end; pfn++) {
>  		struct page *page;
>  		struct folio *folio;
>  
> @@ -1791,7 +1791,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
>  				      DEFAULT_RATELIMIT_BURST);
>  
> -	for_each_valid_pfn(pfn, start_pfn, end_pfn) {
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>  		struct page *page;
>  
>  		page = pfn_to_page(pfn);

AI review reports something rather unrelated to this patch: if the stars
align, folio_nr_pages(folio) might return questionable values.

We certainly don't want to tryget all folios here, so we might just want
to make sure that the value we get from folio_nr_pages() is something
reasonable (e.g., >= 1, power of 2). Alternatively we might snapshot the
page.

Will look into it.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage
  2026-03-17 16:56 ` [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
  2026-03-17 17:19   ` Lorenzo Stoakes (Oracle)
  2026-03-17 20:30   ` David Hildenbrand (Arm)
@ 2026-03-18  7:51   ` Mike Rapoport
  2 siblings, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  7:51 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:39PM +0100, David Hildenbrand (Arm) wrote:
> When offlining memory, we know that the memory range has no holes.
> Checking for valid pfns is not required.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/memory_hotplug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 86d3faf50453..3495d94587e7 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1746,7 +1746,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  {
>  	unsigned long pfn;
>  
> -	for_each_valid_pfn(pfn, start, end) {
> +	for (pfn = start; pfn < end; pfn++) {
>  		struct page *page;
>  		struct folio *folio;
>  
> @@ -1791,7 +1791,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
>  				      DEFAULT_RATELIMIT_BURST);
>  
> -	for_each_valid_pfn(pfn, start_pfn, end_pfn) {
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>  		struct page *page;
>  
>  		page = pfn_to_page(pfn);
> -- 
> 2.43.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 02/14] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections()
  2026-03-17 16:56 ` [PATCH 02/14] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections() David Hildenbrand (Arm)
  2026-03-17 17:21   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  7:53   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  7:53 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:40PM +0100, David Hildenbrand (Arm) wrote:
> We do not allow offlining of memory with memory holes, and always
> hotplug memory without holes.
> 
> Consequently, we cannot end up onlining or offlining memory sections that
> have holes (including invalid sections). That's also why these
> WARN_ONs never fired.
> 
> Let's remove the WARN_ONs along with the TODO regarding double-checking.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/sparse.c | 17 ++---------------
>  1 file changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index dfabe554adf8..93252112860e 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -638,13 +638,8 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
> -		struct mem_section *ms;
> -
> -		/* onlining code should never touch invalid ranges */
> -		if (WARN_ON(!valid_section_nr(section_nr)))
> -			continue;
> +		struct mem_section *ms = __nr_to_section(section_nr);
>  
> -		ms = __nr_to_section(section_nr);
>  		ms->section_mem_map |= SECTION_IS_ONLINE;
>  	}
>  }
> @@ -656,16 +651,8 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
> -		struct mem_section *ms;
> +		struct mem_section *ms = __nr_to_section(section_nr);
>  
> -		/*
> -		 * TODO this needs some double checking. Offlining code makes
> -		 * sure to check pfn_valid but those checks might be just bogus
> -		 */
> -		if (WARN_ON(!valid_section_nr(section_nr)))
> -			continue;
> -
> -		ms = __nr_to_section(section_nr);
>  		ms->section_mem_map &= ~SECTION_IS_ONLINE;
>  	}
>  }
> -- 
> 2.43.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 03/14] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
  2026-03-17 16:56 ` [PATCH 03/14] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
  2026-03-17 17:22   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  7:55   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  7:55 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:41PM +0100, David Hildenbrand (Arm) wrote:
> Ever since commit f8f03eb5f0f9 ("mm: stop making SPARSEMEM_VMEMMAP
> user-selectable"), an architecture that supports CONFIG_SPARSEMEM_VMEMMAP
> (by selecting SPARSEMEM_VMEMMAP_ENABLE) can no longer enable
> CONFIG_SPARSEMEM without CONFIG_SPARSEMEM_VMEMMAP.
> 
> Right now, CONFIG_MEMORY_HOTPLUG is guarded by CONFIG_SPARSEMEM.
> 
> However, CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG is only enabled by
> * arm64: which selects SPARSEMEM_VMEMMAP_ENABLE
> * loongarch: which selects SPARSEMEM_VMEMMAP_ENABLE
> * powerpc (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
> * riscv (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
> * s390 with SPARSEMEM: which selects SPARSEMEM_VMEMMAP_ENABLE
> * x86 (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
> 
> So, we can make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
> without affecting any setups.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ebd8ea353687..c012944938a7 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -472,7 +472,7 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
>  menuconfig MEMORY_HOTPLUG
>  	bool "Memory hotplug"
>  	select MEMORY_ISOLATION
> -	depends on SPARSEMEM
> +	depends on SPARSEMEM_VMEMMAP
>  	depends on ARCH_ENABLE_MEMORY_HOTPLUG
>  	depends on 64BIT
>  	select NUMA_KEEP_MEMINFO if NUMA
> -- 
> 2.43.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 04/14] mm/memory_hotplug: simplify check_pfn_span()
  2026-03-17 16:56 ` [PATCH 04/14] mm/memory_hotplug: simplify check_pfn_span() David Hildenbrand (Arm)
  2026-03-17 17:24   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  7:56   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  7:56 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:42PM +0100, David Hildenbrand (Arm) wrote:
> We now always have CONFIG_SPARSEMEM_VMEMMAP, so remove the dead code.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/memory_hotplug.c | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3495d94587e7..70e620496cec 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -320,21 +320,13 @@ static void release_memory_resource(struct resource *res)
>  static int check_pfn_span(unsigned long pfn, unsigned long nr_pages)
>  {
>  	/*
> -	 * Disallow all operations smaller than a sub-section and only
> -	 * allow operations smaller than a section for
> -	 * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
> -	 * enforces a larger memory_block_size_bytes() granularity for
> -	 * memory that will be marked online, so this check should only
> -	 * fire for direct arch_{add,remove}_memory() users outside of
> -	 * add_memory_resource().
> +	 * Disallow all operations smaller than a sub-section.
> +	 * Note that check_hotplug_memory_range() enforces a larger
> +	 * memory_block_size_bytes() granularity for memory that will be marked
> +	 * online, so this check should only fire for direct
> +	 * arch_{add,remove}_memory() users outside of add_memory_resource().
>  	 */
> -	unsigned long min_align;
> -
> -	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
> -		min_align = PAGES_PER_SUBSECTION;
> -	else
> -		min_align = PAGES_PER_SECTION;
> -	if (!IS_ALIGNED(pfn | nr_pages, min_align))
> +	if (!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION))
>  		return -EINVAL;
>  	return 0;
>  }
> -- 
> 2.43.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 05/14] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG
  2026-03-17 16:56 ` [PATCH 05/14] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG David Hildenbrand (Arm)
  2026-03-17 17:54   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  7:58   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  7:58 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:43PM +0100, David Hildenbrand (Arm) wrote:
> CONFIG_MEMORY_HOTPLUG now depends on CONFIG_SPARSEMEM_SPARSEMEM. So

                                      ^ typo: CONFIG_SPARSEMEM_VMEMMAP

> let's remove the !CONFIG_SPARSEMEM_VMEMMAP leftovers.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/sparse.c | 61 -----------------------------------------------------
>  1 file changed, 61 deletions(-)

Nice :)

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP
  2026-03-17 16:56 ` [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
  2026-03-17 17:49   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  8:15   ` Mike Rapoport
  2026-03-20 18:37     ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  8:15 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:44PM +0100, David Hildenbrand (Arm) wrote:
> It is not immediately obvious that CONFIG_HAVE_BOOTMEM_INFO_NODE is

Would be nice to make it more obvious, e.g. something like the patch below.

> only selected from CONFIG_MEMORY_HOTREMOVE, which itself depends on
> CONFIG_MEMORY_HOTPLUG that ... depends on CONFIG_SPARSEMEM_VMEMMAP.
> 
> Let's remove the !CONFIG_SPARSEMEM_VMEMMAP leftovers.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/bootmem_info.c | 37 -------------------------------------
>  1 file changed, 37 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 102ddbd4298e..261abc3e1957 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1841,4 +1841,18 @@ config ARCH_WANTS_PRE_LINK_VMLINUX
 config ARCH_HAS_CPU_ATTACK_VECTORS
 	bool
 
+#
+# Only be set on architectures that have completely implemented memory hotplug
+# feature. If you are not sure, don't touch it.
+#
+config HAVE_BOOTMEM_INFO_NODE
+	def_bool n
+
+config ARCH_ENABLE_MEMORY_HOTPLUG
+	bool
+
+config ARCH_ENABLE_MEMORY_HOTREMOVE
+	bool
+
+
 endmenu
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ad7a2fe63a2a..2d6f348d11f7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -127,6 +127,7 @@ config PPC
 	select ARCH_DMA_DEFAULT_COHERENT	if !NOT_COHERENT_CACHE
 	select ARCH_ENABLE_MEMORY_HOTPLUG
 	select ARCH_ENABLE_MEMORY_HOTREMOVE
+	select HAVE_BOOTMEM_INFO_NODE if MEMORY_HOTREMOVE
 	select ARCH_HAS_COPY_MC			if PPC64
 	select ARCH_HAS_CURRENT_STACK_POINTER
 	select ARCH_HAS_DEBUG_VIRTUAL
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e2df1b147184..ef2d2044f1a9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -70,6 +70,7 @@ config X86
 	select ARCH_ENABLE_HUGEPAGE_MIGRATION if X86_64 && HUGETLB_PAGE && MIGRATION
 	select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64
 	select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
+	select HAVE_BOOTMEM_INFO_NODE if MEMORY_HOTREMOVE
 	select ARCH_ENABLE_SPLIT_PMD_PTLOCK if (PGTABLE_LEVELS > 2) && (X86_64 || X86_PAE)
 	select ARCH_ENABLE_THP_MIGRATION if X86_64 && TRANSPARENT_HUGEPAGE
 	select ARCH_HAS_ACPI_TABLE_UPGRADE	if ACPI
diff --git a/mm/Kconfig b/mm/Kconfig
index ebd8ea353687..a371df4f8da4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -455,19 +455,6 @@ config EXCLUSIVE_SYSTEM_RAM
 	def_bool y
 	depends on !DEVMEM || STRICT_DEVMEM
 
-#
-# Only be set on architectures that have completely implemented memory hotplug
-# feature. If you are not sure, don't touch it.
-#
-config HAVE_BOOTMEM_INFO_NODE
-	def_bool n
-
-config ARCH_ENABLE_MEMORY_HOTPLUG
-	bool
-
-config ARCH_ENABLE_MEMORY_HOTREMOVE
-	bool
-
 # eventually, we can have this option just 'select SPARSEMEM'
 menuconfig MEMORY_HOTPLUG
 	bool "Memory hotplug"
@@ -539,7 +526,6 @@ endchoice
 
 config MEMORY_HOTREMOVE
 	bool "Allow for memory hot remove"
-	select HAVE_BOOTMEM_INFO_NODE if (X86_64 || PPC64)
 	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
 	depends on MIGRATION
 
 
-- 
Sincerely yours,
Mike.


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 07/14] mm/bootmem_info: avoid using sparse_decode_mem_map()
  2026-03-17 16:56 ` [PATCH 07/14] mm/bootmem_info: avoid using sparse_decode_mem_map() David Hildenbrand (Arm)
  2026-03-17 18:02   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  8:20   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  8:20 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:45PM +0100, David Hildenbrand (Arm) wrote:
> With SPARSEMEM_VMEMMAP, we can just do a pfn_to_page(). It is not super
> clear whether the start_pfn is properly aligned ... so let's just make
> sure it is.
> 
> We will soon might try to remove the bootmem info completely, for now,
> just keep it working as is.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/bootmem_info.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
> index e61e08e24924..3d7675a3ae04 100644
> --- a/mm/bootmem_info.c
> +++ b/mm/bootmem_info.c
> @@ -44,17 +44,16 @@ static void __init register_page_bootmem_info_section(unsigned long start_pfn)
>  {
>  	unsigned long mapsize, section_nr, i;
>  	struct mem_section *ms;
> -	struct page *page, *memmap;
>  	struct mem_section_usage *usage;
> +	struct page *page;
>  
> +	start_pfn = SECTION_ALIGN_DOWN(start_pfn);
>  	section_nr = pfn_to_section_nr(start_pfn);
>  	ms = __nr_to_section(section_nr);
>  
> -	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> -
>  	if (!preinited_vmemmap_section(ms))
> -		register_page_bootmem_memmap(section_nr, memmap,
> -				PAGES_PER_SECTION);
> +		register_page_bootmem_memmap(section_nr, pfn_to_page(start_pfn),
> +					     PAGES_PER_SECTION);
>  
>  	usage = ms->usage;
>  	page = virt_to_page(usage);
> -- 
> 2.43.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 08/14] mm/sparse: remove sparse_decode_mem_map()
  2026-03-17 16:56 ` [PATCH 08/14] mm/sparse: remove sparse_decode_mem_map() David Hildenbrand (Arm)
  2026-03-17 19:25   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  8:20   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  8:20 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:46PM +0100, David Hildenbrand (Arm) wrote:
> section_deactivate() applies to CONFIG_SPARSEMEM_VMEMMAP only. So we can
> just use pfn_to_page() (after making sure we have the start PFN of the
> section), and remove sparse_decode_mem_map().
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  include/linux/memory_hotplug.h |  2 --
>  mm/sparse.c                    | 16 +---------------
>  2 files changed, 1 insertion(+), 17 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index e77ef3d7ff73..815e908c4135 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -308,8 +308,6 @@ extern int sparse_add_section(int nid, unsigned long pfn,
>  		struct dev_pagemap *pgmap);
>  extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
>  				  struct vmem_altmap *altmap);
> -extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
> -					  unsigned long pnum);
>  extern struct zone *zone_for_pfn_range(enum mmop online_type,
>  		int nid, struct memory_group *group, unsigned long start_pfn,
>  		unsigned long nr_pages);
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 636a4a0f1199..2a1f662245bc 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -274,18 +274,6 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
>  	return coded_mem_map;
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTPLUG
> -/*
> - * Decode mem_map from the coded memmap
> - */
> -struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
> -{
> -	/* mask off the extra low bits of information */
> -	coded_mem_map &= SECTION_MAP_MASK;
> -	return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
> -}
> -#endif /* CONFIG_MEMORY_HOTPLUG */
> -
>  static void __meminit sparse_init_one_section(struct mem_section *ms,
>  		unsigned long pnum, struct page *mem_map,
>  		struct mem_section_usage *usage, unsigned long flags)
> @@ -758,8 +746,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  
>  	empty = is_subsection_map_empty(ms);
>  	if (empty) {
> -		unsigned long section_nr = pfn_to_section_nr(pfn);
> -
>  		/*
>  		 * Mark the section invalid so that valid_section()
>  		 * return false. This prevents code from dereferencing
> @@ -778,7 +764,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  			kfree_rcu(ms->usage, rcu);
>  			WRITE_ONCE(ms->usage, NULL);
>  		}
> -		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> +		memmap = pfn_to_page(SECTION_ALIGN_DOWN(pfn));
>  	}
>  
>  	/*
> -- 
> 2.43.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
  2026-03-17 16:56 ` [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling David Hildenbrand (Arm)
  2026-03-17 19:48   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  8:34   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  8:34 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:47PM +0100, David Hildenbrand (Arm) wrote:
> In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
> usemap on the section with pgdat") quite some complexity to try
> allocating memory for the "usemap" (storing pageblock information
> per memory section) for a memory section close to the memory of the
> "pgdat" of the node.
> 
> The goal was to make memory hotunplug of boot memory more likely to
> succeed. That commit also added some checks for circular dependencies
> between two memory sections, whereby two memory sections would contain
> each others usemap, turning bot memory sections un-removable.

                            ^ typo: boot
> 
> However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
> together") started allocating the usemap for multiple memory
> sections on the same node in one chunk, effectively grouping all usemap
> allocations of the same node in a single memblock allocation.
> 
> We don't really give guarantees about memory hotunplug of boot memory, and
> with the change in 2010, it is pretty much impossible in practice to get
> any circular dependencies.
> 
> commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
> pgdat") also added the comment:
> 
> 	"Similarly, a pgdat can prevent a section being removed. If
> 	 section A contains a pgdat and section B
> 	 contains the usemap, both sections become inter-dependent."
> 
> Given that we don't free the pgdat anymore, that comment (and handling)
> does not apply.
> 
> So let's simply remove this complexity.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/sparse.c | 100 +---------------------------------------------------
>  1 file changed, 1 insertion(+), 99 deletions(-)

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c
  2026-03-17 16:56 ` [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c David Hildenbrand (Arm)
  2026-03-17 19:51   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  8:46   ` Mike Rapoport
  2026-03-20 19:01     ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  8:46 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:48PM +0100, David Hildenbrand (Arm) wrote:
> We want to move subsection_map_init() to mm/sparse-vmemmap.c.
> 
> To prepare for getting rid of subsection_map_init() in mm/sparse.c
> completely, use a static inline function for !CONFIG_SPARSEMEM_VMEMMAP.
> 
> While at it, move the declaration to internal.h and rename it to
> "sparse_init_subsection_map()".

Do we really need to rename it?

Maybe add a "global renaming patch on top, like

s/clear_subsection_map/subsection_map_clear/
s/fill_subsection_map/subsection_map_fill/

etc
 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  include/linux/mmzone.h |  3 ---
>  mm/internal.h          | 12 ++++++++++++
>  mm/mm_init.c           |  2 +-
>  mm/sparse.c            |  6 +-----
>  4 files changed, 14 insertions(+), 9 deletions(-)

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 11/14] mm/sparse: drop set_section_nid() from sparse_add_section()
  2026-03-17 16:56 ` [PATCH 11/14] mm/sparse: drop set_section_nid() from sparse_add_section() David Hildenbrand (Arm)
  2026-03-17 19:55   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  8:50   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  8:50 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:49PM +0100, David Hildenbrand (Arm) wrote:
> CONFIG_MEMORY_HOTPLUG is CONFIG_SPARSEMEM_VMEMMAP-only. And
> CONFIG_SPARSEMEM_VMEMMAP implies that NODE_NOT_IN_PAGE_FLAGS cannot be set:

Maybe 

... implies that node is always in page flags and NODE_NOT_IN_PAGE_FLAGS
cannot be set

> see include/linux/page-flags-layout.h
> 
> 	...
> 	#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
> 	#error "Vmemmap: No space for nodes field in page flags"
> 	...
> 
> So let's remove the set_section_nid() call to prepare for moving
> CONFIG_MEMORY_HOTPLUG to mm/sparse-vmemmap.c
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/sparse.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 7b0bfea73a9b..b5a2de43ac40 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -769,7 +769,6 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>  	page_init_poison(memmap, sizeof(struct page) * nr_pages);
>  
>  	ms = __nr_to_section(section_nr);
> -	set_section_nid(section_nr, nid);
>  	__section_mark_present(ms, section_nr);
>  
>  	/* Align memmap to section boundary in the subsection case */
> -- 
> 2.43.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 12/14] mm/sparse: move sparse_init_one_section() to internal.h
  2026-03-17 16:56 ` [PATCH 12/14] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
  2026-03-17 20:00   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  8:54   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  8:54 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:50PM +0100, David Hildenbrand (Arm) wrote:
> While at it, convert the BUG_ON to a WARN_ON, avoid long lines, and merge
> sparse_encode_mem_map() into sparse_init_one_section().

... merge sparse_encode_mem_map() into its only caller
sparse_init_one_section().

> 
> Clarify the comment a bit, pointing at page_to_pfn().
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  include/linux/mmzone.h |  2 +-
>  mm/internal.h          | 22 ++++++++++++++++++++++
>  mm/sparse.c            | 24 ------------------------
>  3 files changed, 23 insertions(+), 25 deletions(-)

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 13/14] mm/sparse: move __section_mark_present() to internal.h
  2026-03-17 16:56 ` [PATCH 13/14] mm/sparse: move __section_mark_present() " David Hildenbrand (Arm)
  2026-03-17 20:01   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  8:56   ` Mike Rapoport
  2026-03-20 19:06     ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  8:56 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:51PM +0100, David Hildenbrand (Arm) wrote:
> Let's prepare for moving memory hotplug handling from sparse.c to
> sparse-vmemmap.c by moving __section_mark_present() to internal.h.

Not strictly related to this patchset, we might want to start splitting
internal.h to sub-headers.
 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/internal.h | 9 +++++++++
>  mm/sparse.c   | 8 --------
>  2 files changed, 9 insertions(+), 8 deletions(-)

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c
  2026-03-17 16:56 ` [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c David Hildenbrand (Arm)
  2026-03-17 20:09   ` Lorenzo Stoakes (Oracle)
@ 2026-03-18  8:57   ` Mike Rapoport
  1 sibling, 0 replies; 53+ messages in thread
From: Mike Rapoport @ 2026-03-18  8:57 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On Tue, Mar 17, 2026 at 05:56:52PM +0100, David Hildenbrand (Arm) wrote:
> Let's move all memory hoptplug related code to sparse-vmemmap.c.
> 
> We only have to expose sparse_index_init(). While at it, drop the
> definition of sparse_index_init() for !CONFIG_SPARSEMEM, which is unused,
> and place the declaration in internal.h.
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  include/linux/mmzone.h |   1 -
>  mm/internal.h          |   4 +
>  mm/sparse-vmemmap.c    | 308 ++++++++++++++++++++++++++++++++++++++++
>  mm/sparse.c            | 314 +----------------------------------------
>  4 files changed, 314 insertions(+), 313 deletions(-)

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups
  2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
                   ` (13 preceding siblings ...)
  2026-03-17 16:56 ` [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c David Hildenbrand (Arm)
@ 2026-03-18 19:51 ` Andrew Morton
  2026-03-18 19:54   ` David Hildenbrand (Arm)
  14 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2026-03-18 19:51 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, linux-mm, linux-cxl, Oscar Salvador, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On Tue, 17 Mar 2026 17:56:38 +0100 "David Hildenbrand (Arm)" <david@kernel.org> wrote:

> Some cleanups around memory hot(un)plug and SPARSEMEM. In essence,
> we can limit CONFIG_MEMORY_HOTPLUG to CONFIG_SPARSEMEM_VMEMMAP,
> remove some dead code, and move all the hotplug bits over to
> mm/sparse-vmemmap.c.
> 
> Some further/related cleanups around other unnecessary code
> (memory hole handling and complicated usemap allocation).
> 
> I have some further sparse.c cleanups lying around, and I'm planning
> on getting rid of bootmem_info.c entirely.

Added to mm-new, thanks.

Sashiko said a few things, as I expect you've seen:
https://sashiko.dev/#/patchset/20260317165652.99114-1-david%40kernel.org


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups
  2026-03-18 19:51 ` [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups Andrew Morton
@ 2026-03-18 19:54   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-18 19:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, linux-cxl, Oscar Salvador, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On 3/18/26 20:51, Andrew Morton wrote:
> On Tue, 17 Mar 2026 17:56:38 +0100 "David Hildenbrand (Arm)" <david@kernel.org> wrote:
> 
>> Some cleanups around memory hot(un)plug and SPARSEMEM. In essence,
>> we can limit CONFIG_MEMORY_HOTPLUG to CONFIG_SPARSEMEM_VMEMMAP,
>> remove some dead code, and move all the hotplug bits over to
>> mm/sparse-vmemmap.c.
>>
>> Some further/related cleanups around other unnecessary code
>> (memory hole handling and complicated usemap allocation).
>>
>> I have some further sparse.c cleanups lying around, and I'm planning
>> on getting rid of bootmem_info.c entirely.
> 
> Added to mm-new, thanks.
> 
> Sashiko said a few things, as I expect you've seen:
> https://sashiko.dev/#/patchset/20260317165652.99114-1-david%40kernel.org

Yes. I'll respin later this week (only small nits and taking care of the
existing weirdness in scan_movable_pages() in a less ugly way).

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP
  2026-03-18  8:15   ` Mike Rapoport
@ 2026-03-20 18:37     ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 18:37 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On 3/18/26 09:15, Mike Rapoport wrote:
> On Tue, Mar 17, 2026 at 05:56:44PM +0100, David Hildenbrand (Arm) wrote:
>> It is not immediately obvious that CONFIG_HAVE_BOOTMEM_INFO_NODE is
> 
> Would be nice to make it more obvious, e.g. something like the patch below.

Given that I'll remove CONFIG_HAVE_BOOTMEM_INFO_NODE completely soon,
I'll not perform that for now.

If, however, I don't get it done within the next month I'll send such a
patch to make it clearer. :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
  2026-03-17 19:48   ` Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:49     ` David Hildenbrand (Arm)
  2026-03-20 18:58       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 18:49 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On 3/17/26 20:48, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 17, 2026 at 05:56:47PM +0100, David Hildenbrand (Arm) wrote:
>> In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
>> usemap on the section with pgdat") quite some complexity to try
>> allocating memory for the "usemap" (storing pageblock information
>> per memory section) for a memory section close to the memory of the
>> "pgdat" of the node.
>>
>> The goal was to make memory hotunplug of boot memory more likely to
>> succeed. That commit also added some checks for circular dependencies
>> between two memory sections, whereby two memory sections would contain
>> each others usemap, turning bot memory sections un-removable.
> 
> Typo: bot -> both. Presumably you are not talking about memory a bot of some
> kind allocated :P
> 
>>
>> However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
>> together") started allocating the usemap for multiple memory
>> sections on the same node in one chunk, effectively grouping all usemap
>> allocations of the same node in a single memblock allocation.
>>
>> We don't really give guarantees about memory hotunplug of boot memory, and
>> with the change in 2010, it is pretty much impossible in practice to get
>> any circular dependencies.
> 
> Pretty much impossible? :) We can probably go so far as to so impossible no?

Yes.

> 
>>
>> commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
>> pgdat") also added the comment:
>>
>> 	"Similarly, a pgdat can prevent a section being removed. If
>> 	 section A contains a pgdat and section B
>> 	 contains the usemap, both sections become inter-dependent."
>>
>> Given that we don't free the pgdat anymore, that comment (and handling)
>> does not apply.
> 
> Isn't pgdat synonymous with a node and that's the data structure that describes
> a node right? Confusingly typedef'd from pglist_data to pg_data_t but then
> referred to as pgdat because all that makes so much sense :)

Yeah, in general we refer to the NODE_DATA as pgdat (grep for it and
you'll be surprised).

> 
> But I'm confused, does a section containing a pgdat mean a section having the
> pgdat data structure literally allocated in it?

Yes. "struct pgdat" placed in some memory section.

> 
> A usemap is... something that tracks pageblock metadata I think right?

Yes. Essentially a large array of bytes, whereby each byte describes a
pageblock data (migratetype etc)

> 
> Anyway I'm also confused by 'given we don't free the pgdat any more', but the
> comment says a 'pgdat can prevent a section being removed' rather than anything
> about it being removed?

Well, if a pgdat resides in some memory section, given that it is
unmovable turns the whole memory section unremovable -> hotunplug fails.

Assuming you could free the pgdat when the node goes offlining, you
would turn that memory section removable.

And I think that commit somehow assumed that the last memory section
could be removed if all it contains is the corresponding pgdat (which
was never the case).

> 
> I guess it means the OTHER section could be prevented from being removed even
> after it's gone.. somehow?
> 
> Anyway! I think maybe this could be clearer, somehow :)

I'm afraid the whole purpose of the original patch was sketchy, which is
also while I fail to even explain the original motivation clearly.

Now it's fortunately no longer required. :)

> 
>>
>> So let's simply remove this complexity.
>>
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> I think what you've done in the patch is right though, we're not doing any of
> these dances after a4322e1bad91 and pgdats sitting around mean we don't really
> care about where the usemap goes anyway I don't think so...
> 
> I usemap and I find myself in a place where I give you a:
> 
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> 

Thanks ;)

[...]

>> -
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>  unsigned long __init section_map_size(void)
>>  {
>> @@ -486,7 +390,6 @@ void __init sparse_init_early_section(int nid, struct page *map,
>>  				      unsigned long pnum, unsigned long flags)
>>  {
>>  	BUG_ON(!sparse_usagebuf || sparse_usagebuf >= sparse_usagebuf_end);
>> -	check_usemap_section_nr(nid, sparse_usagebuf);
>>  	sparse_init_one_section(__nr_to_section(pnum), pnum, map,
>>  			sparse_usagebuf, SECTION_IS_EARLY | flags);
>>  	sparse_usagebuf = (void *)sparse_usagebuf + mem_section_usage_size();
>> @@ -497,8 +400,7 @@ static int __init sparse_usage_init(int nid, unsigned long map_count)
>>  	unsigned long size;
>>
>>  	size = mem_section_usage_size() * map_count;
>> -	sparse_usagebuf = sparse_early_usemaps_alloc_pgdat_section(
>> -				NODE_DATA(nid), size);
>> +	sparse_usagebuf = memblock_alloc_node(size, SMP_CACHE_BYTES, nid);
> 
> I guess nid here is the same node as the pgdat?

Yes! before we used NODE_DATA(nid)->node_id, which is really just ... nid :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
  2026-03-20 18:49     ` David Hildenbrand (Arm)
@ 2026-03-20 18:58       ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 18:58 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

>>
>> Anyway I'm also confused by 'given we don't free the pgdat any more', but the
>> comment says a 'pgdat can prevent a section being removed' rather than anything
>> about it being removed?
> 
> Well, if a pgdat resides in some memory section, given that it is
> unmovable turns the whole memory section unremovable -> hotunplug fails.
> 
> Assuming you could free the pgdat when the node goes offlining, you
> would turn that memory section removable.
> 
> And I think that commit somehow assumed that the last memory section
> could be removed if all it contains is the corresponding pgdat (which
> was never the case).

I decided to just drop that whole comment block completely.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c
  2026-03-17 19:51   ` Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:59     ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 18:59 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On 3/17/26 20:51, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 17, 2026 at 05:56:48PM +0100, David Hildenbrand (Arm) wrote:
>> We want to move subsection_map_init() to mm/sparse-vmemmap.c.
>>
>> To prepare for getting rid of subsection_map_init() in mm/sparse.c
>> completely, use a static inline function for !CONFIG_SPARSEMEM_VMEMMAP.
>>
>> While at it, move the declaration to internal.h and rename it to
>> "sparse_init_subsection_map()".
> 
> Why not init_sparse_subsection_map()??
> Or sparse_init_map_subsection()????
> Or <all other permutations>
> 
> Joking that's fine ;)

To match sparse_init() of course :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c
  2026-03-18  8:46   ` Mike Rapoport
@ 2026-03-20 19:01     ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 19:01 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On 3/18/26 09:46, Mike Rapoport wrote:
> On Tue, Mar 17, 2026 at 05:56:48PM +0100, David Hildenbrand (Arm) wrote:
>> We want to move subsection_map_init() to mm/sparse-vmemmap.c.
>>
>> To prepare for getting rid of subsection_map_init() in mm/sparse.c
>> completely, use a static inline function for !CONFIG_SPARSEMEM_VMEMMAP.
>>
>> While at it, move the declaration to internal.h and rename it to
>> "sparse_init_subsection_map()".
> 
> Do we really need to rename it?

Given that I am placing it next to sparse_init(), yes I want to rename it.

> 
> Maybe add a "global renaming patch on top, like
> 
> s/clear_subsection_map/subsection_map_clear/
> s/fill_subsection_map/subsection_map_fill/

Well, these are all internal helpers in mm/sparse.c, not exposed to
other MM bits. :)

> 
> etc
>  
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Thanks!

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 13/14] mm/sparse: move __section_mark_present() to internal.h
  2026-03-18  8:56   ` Mike Rapoport
@ 2026-03-20 19:06     ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 19:06 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko

On 3/18/26 09:56, Mike Rapoport wrote:
> On Tue, Mar 17, 2026 at 05:56:51PM +0100, David Hildenbrand (Arm) wrote:
>> Let's prepare for moving memory hotplug handling from sparse.c to
>> sparse-vmemmap.c by moving __section_mark_present() to internal.h.
> 
> Not strictly related to this patchset, we might want to start splitting
> internal.h to sub-headers.

Yes, that makes sense. And we should also split the non-internal headers
in a smarter way (and do some serious cleanups, it's a mess).

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c
  2026-03-17 20:09   ` Lorenzo Stoakes (Oracle)
@ 2026-03-20 19:07     ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-20 19:07 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: linux-kernel, linux-mm, linux-cxl, Andrew Morton, Oscar Salvador,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko

On 3/17/26 21:09, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 17, 2026 at 05:56:52PM +0100, David Hildenbrand (Arm) wrote:
>> Let's move all memory hoptplug related code to sparse-vmemmap.c.
>>
>> We only have to expose sparse_index_init(). While at it, drop the
>> definition of sparse_index_init() for !CONFIG_SPARSEMEM, which is unused,
>> and place the declaration in internal.h.
>>
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> Looking through this it looks like it is just a code move modulo the other bits
> you metion, overall very nice cleanup, so let me hotplug my:

:)

> 
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> 
> Into this review!
> 
> OK I've actually cringed terribly at the puns here and maybe I'm cured for life
> from doing that again ;)

Never say never ;)

Thanks!

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2026-03-20 19:07 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
2026-03-17 16:56 ` [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
2026-03-17 17:19   ` Lorenzo Stoakes (Oracle)
2026-03-17 20:30   ` David Hildenbrand (Arm)
2026-03-18  7:51   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 02/14] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections() David Hildenbrand (Arm)
2026-03-17 17:21   ` Lorenzo Stoakes (Oracle)
2026-03-18  7:53   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 03/14] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
2026-03-17 17:22   ` Lorenzo Stoakes (Oracle)
2026-03-18  7:55   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 04/14] mm/memory_hotplug: simplify check_pfn_span() David Hildenbrand (Arm)
2026-03-17 17:24   ` Lorenzo Stoakes (Oracle)
2026-03-18  7:56   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 05/14] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG David Hildenbrand (Arm)
2026-03-17 17:54   ` Lorenzo Stoakes (Oracle)
2026-03-18  7:58   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
2026-03-17 17:49   ` Lorenzo Stoakes (Oracle)
2026-03-18  8:15   ` Mike Rapoport
2026-03-20 18:37     ` David Hildenbrand (Arm)
2026-03-17 16:56 ` [PATCH 07/14] mm/bootmem_info: avoid using sparse_decode_mem_map() David Hildenbrand (Arm)
2026-03-17 18:02   ` Lorenzo Stoakes (Oracle)
2026-03-18  8:20   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 08/14] mm/sparse: remove sparse_decode_mem_map() David Hildenbrand (Arm)
2026-03-17 19:25   ` Lorenzo Stoakes (Oracle)
2026-03-18  8:20   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling David Hildenbrand (Arm)
2026-03-17 19:48   ` Lorenzo Stoakes (Oracle)
2026-03-20 18:49     ` David Hildenbrand (Arm)
2026-03-20 18:58       ` David Hildenbrand (Arm)
2026-03-18  8:34   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c David Hildenbrand (Arm)
2026-03-17 19:51   ` Lorenzo Stoakes (Oracle)
2026-03-20 18:59     ` David Hildenbrand (Arm)
2026-03-18  8:46   ` Mike Rapoport
2026-03-20 19:01     ` David Hildenbrand (Arm)
2026-03-17 16:56 ` [PATCH 11/14] mm/sparse: drop set_section_nid() from sparse_add_section() David Hildenbrand (Arm)
2026-03-17 19:55   ` Lorenzo Stoakes (Oracle)
2026-03-18  8:50   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 12/14] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
2026-03-17 20:00   ` Lorenzo Stoakes (Oracle)
2026-03-18  8:54   ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 13/14] mm/sparse: move __section_mark_present() " David Hildenbrand (Arm)
2026-03-17 20:01   ` Lorenzo Stoakes (Oracle)
2026-03-18  8:56   ` Mike Rapoport
2026-03-20 19:06     ` David Hildenbrand (Arm)
2026-03-17 16:56 ` [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c David Hildenbrand (Arm)
2026-03-17 20:09   ` Lorenzo Stoakes (Oracle)
2026-03-20 19:07     ` David Hildenbrand (Arm)
2026-03-18  8:57   ` Mike Rapoport
2026-03-18 19:51 ` [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups Andrew Morton
2026-03-18 19:54   ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox