[PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization
@ 2026-04-24  2:55 Muchun Song
  2026-04-24  2:55 ` [PATCH v6 1/7] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Muchun Song @ 2026-04-24  2:55 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

The series fixes several bugs in vmemmap optimization, mainly around
incorrect page accounting and memmap initialization in DAX and memory
hotplug paths. It also fixes pageblock migratetype initialization and
struct page initialization for ZONE_DEVICE compound pages.

The first four patches fix vmemmap accounting issues. The first patch
fixes an accounting underflow in the section activation failure path.
The second patch fixes incorrect altmap passing in the error path.
The third patch passes pgmap through memory deactivation paths so the
teardown side can determine whether vmemmap optimization was in effect.
The fourth patch uses that information to account the optimized DAX
vmemmap size correctly.

The last three patches handle follow-up initialization and cleanup
issues. Patches 5 and 6 fix initialization issues in mm/mm_init. One
makes sure all pageblocks in ZONE_DEVICE compound pages get their
migratetype initialized. The other fixes a case where DAX memory
hotplug reuses an unoptimized early-section memmap while
compound_nr_pages() still assumes vmemmap optimization, leaving tail
struct pages uninitialized. The last patch factors out the altmap free
and verification logic into a helper.

Changelog:
v5 -> v6:
- Add Cc: stable@vger.kernel.org to bugfix patches.
- mm/sparse-vmemmap: Relax the alignment warning in
  section_nr_vmemmap_pages() for sub-section callers.
- mm/memory_hotplug: Factor out altmap free/check handling into a final
  standalone patch suggested-by from David Hildenbrand.
- Collect Acked-by tags from David.

Muchun Song (7):
  mm/sparse-vmemmap: Fix vmemmap accounting underflow
  mm/memory_hotplug: Fix incorrect altmap passing in error path
  mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
  mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
  mm/mm_init: Fix uninitialized struct pages for ZONE_DEVICE
  mm/memory_hotplug: Factor out altmap freeing checks

 arch/arm64/mm/mmu.c            |  5 ++--
 arch/loongarch/mm/init.c       |  5 ++--
 arch/powerpc/mm/mem.c          |  5 ++--
 arch/riscv/mm/init.c           |  5 ++--
 arch/s390/mm/init.c            |  5 ++--
 arch/x86/mm/init_64.c          |  5 ++--
 include/linux/memory_hotplug.h |  8 +++--
 mm/memory_hotplug.c            | 29 ++++++++++--------
 mm/memremap.c                  |  4 +--
 mm/mm_init.c                   | 47 ++++++++++++++++++-----------
 mm/sparse-vmemmap.c            | 55 +++++++++++++++++++++++++---------
 11 files changed, 111 insertions(+), 62 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v6 1/7] mm/sparse-vmemmap: Fix vmemmap accounting underflow
  2026-04-24  2:55 [PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization Muchun Song
@ 2026-04-24  2:55 ` Muchun Song
  2026-04-24  2:55 ` [PATCH v6 2/7] mm/memory_hotplug: Fix incorrect altmap passing in error path Muchun Song
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Muchun Song @ 2026-04-24  2:55 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable

In section_activate(), if populate_section_memmap() fails, the error
handling path calls section_deactivate() to roll back the state. This
causes a vmemmap accounting imbalance.

Since commit c3576889d87b ("mm: fix accounting of memmap pages"),
memmap pages are accounted for only after populate_section_memmap()
succeeds. However, the failure path unconditionally calls
section_deactivate(), which decreases the vmemmap count. Consequently,
a failure in populate_section_memmap() leads to an accounting underflow,
incorrectly reducing the system's tracked vmemmap usage.

Fix this more thoroughly by moving all accounting calls into the lower
level functions that actually perform the vmemmap allocation and freeing:

  - populate_section_memmap() accounts for newly allocated vmemmap pages
  - depopulate_section_memmap() unaccounts when vmemmap is freed

This ensures proper accounting in all code paths, including error
handling and early section cases.

Fixes: c3576889d87b ("mm: fix accounting of memmap pages")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse-vmemmap.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 6eadb9d116e4..a7b11248b989 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -656,7 +656,12 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
 {
-	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
+	struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
+						      pgmap);
+
+	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+
+	return page;
 }
 
 static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
@@ -665,13 +670,17 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
 
+	memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
 	vmemmap_free(start, end, altmap);
 }
+
 static void free_map_bootmem(struct page *memmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
 
+	memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
+						  PAGE_SIZE)));
 	vmemmap_free(start, end, NULL);
 }
 
@@ -774,14 +783,10 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	 * The memmap of early sections is always fully populated. See
 	 * section_activate() and pfn_valid() .
 	 */
-	if (!section_is_early) {
-		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+	if (!section_is_early)
 		depopulate_section_memmap(pfn, nr_pages, altmap);
-	} else if (memmap) {
-		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
-							  PAGE_SIZE)));
+	else if (memmap)
 		free_map_bootmem(memmap);
-	}
 
 	if (empty)
 		ms->section_mem_map = (unsigned long)NULL;
@@ -826,7 +831,6 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
 		section_deactivate(pfn, nr_pages, altmap);
 		return ERR_PTR(-ENOMEM);
 	}
-	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
 
 	return memmap;
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 2/7] mm/memory_hotplug: Fix incorrect altmap passing in error path
  2026-04-24  2:55 [PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization Muchun Song
  2026-04-24  2:55 ` [PATCH v6 1/7] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
@ 2026-04-24  2:55 ` Muchun Song
  2026-04-24  2:55 ` [PATCH v6 3/7] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Muchun Song @ 2026-04-24  2:55 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable

In create_altmaps_and_memory_blocks(), when arch_add_memory() succeeds
with memmap_on_memory enabled, the vmemmap pages are allocated from
params.altmap. If create_memory_block_devices() subsequently fails, the
error path calls arch_remove_memory() with a NULL altmap instead of
params.altmap.

This is a bug that could lead to memory corruption. Since altmap is
NULL, vmemmap_free() falls back to freeing the vmemmap pages into the
system buddy allocator via free_pages() instead of the altmap.
arch_remove_memory() then immediately destroys the physical linear
mapping for this memory. This injects unowned pages into the buddy
allocator, causing machine checks or memory corruption if the system
later attempts to allocate and use those freed pages.

Fix this by passing params.altmap to arch_remove_memory() in the error
path.

Fixes: 6b8f0798b85a ("mm/memory_hotplug: split memmap_on_memory requests across memblocks")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/memory_hotplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2a943ec57c85..0bad2aed2bde 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1468,7 +1468,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
 		ret = create_memory_block_devices(cur_start, memblock_size, nid,
 						  params.altmap, group);
 		if (ret) {
-			arch_remove_memory(cur_start, memblock_size, NULL);
+			arch_remove_memory(cur_start, memblock_size, params.altmap);
 			kfree(params.altmap);
 			goto out;
 		}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 3/7] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
  2026-04-24  2:55 [PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization Muchun Song
  2026-04-24  2:55 ` [PATCH v6 1/7] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
  2026-04-24  2:55 ` [PATCH v6 2/7] mm/memory_hotplug: Fix incorrect altmap passing in error path Muchun Song
@ 2026-04-24  2:55 ` Muchun Song
  2026-04-24  2:55 ` [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Muchun Song @ 2026-04-24  2:55 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

Currently, the memory hot-remove call chain -- arch_remove_memory(),
__remove_pages(), sparse_remove_section() and section_deactivate() --
does not carry the struct dev_pagemap pointer. This prevents the lower
levels from knowing whether the section was originally populated with
vmemmap optimizations (e.g., DAX with vmemmap optimization enabled).

Without this information, we cannot call vmemmap_can_optimize() to
determine if the vmemmap pages were optimized. As a result, the vmemmap
page accounting during teardown will mistakenly assume a non-optimized
allocation, leading to incorrect memmap statistics.

To lay the groundwork for fixing the vmemmap page accounting, we need
to pass the @pgmap pointer down to the deactivation location. Plumb the
@pgmap argument through the APIs of arch_remove_memory(), __remove_pages()
and sparse_remove_section(), mirroring the corresponding *_activate()
paths.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 arch/arm64/mm/mmu.c            |  5 +++--
 arch/loongarch/mm/init.c       |  5 +++--
 arch/powerpc/mm/mem.c          |  5 +++--
 arch/riscv/mm/init.c           |  5 +++--
 arch/s390/mm/init.c            |  5 +++--
 arch/x86/mm/init_64.c          |  5 +++--
 include/linux/memory_hotplug.h |  8 +++++---
 mm/memory_hotplug.c            | 13 +++++++------
 mm/memremap.c                  |  4 ++--
 mm/sparse-vmemmap.c            | 12 ++++++------
 10 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index dd85e093ffdb..e5a42b7a0160 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -2024,12 +2024,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return ret;
 }
 
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
 }
 
diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
index 00f3822b6e47..c9c57f08fa2c 100644
--- a/arch/loongarch/mm/init.c
+++ b/arch/loongarch/mm/init.c
@@ -86,7 +86,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params)
 	return ret;
 }
 
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -95,7 +96,7 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 	/* With altmap the first mapped page is offset from @start */
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 }
 #endif
 
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 648d0c5602ec..4c1afab91996 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -158,12 +158,13 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 	return rc;
 }
 
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			      struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	arch_remove_linear_mapping(start, size);
 }
 #endif
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index decd7df40fa4..b0092fb842a3 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1717,9 +1717,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *param
 	return ret;
 }
 
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			      struct dev_pagemap *pgmap)
 {
-	__remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap);
+	__remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap, pgmap);
 	remove_linear_mapping(start, size);
 	flush_tlb_all();
 }
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 1f72efc2a579..11a689423440 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -276,12 +276,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return rc;
 }
 
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	vmem_remove_mapping(start, size);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index df2261fa4f98..77b889b71cf3 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1288,12 +1288,13 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 	remove_pagetable(start, end, true, NULL);
 }
 
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			      struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	kernel_physical_mapping_remove(start, start + size);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 815e908c4135..7c9d66729c60 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -135,9 +135,10 @@ static inline bool movable_node_is_enabled(void)
 	return movable_node_enabled;
 }
 
-extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap);
+extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			       struct dev_pagemap *pgmap);
 extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages,
-			   struct vmem_altmap *altmap);
+			   struct vmem_altmap *altmap, struct dev_pagemap *pgmap);
 
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
@@ -307,7 +308,8 @@ extern int sparse_add_section(int nid, unsigned long pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap);
 extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
-				  struct vmem_altmap *altmap);
+				  struct vmem_altmap *altmap,
+				  struct dev_pagemap *pgmap);
 extern struct zone *zone_for_pfn_range(enum mmop online_type,
 		int nid, struct memory_group *group, unsigned long start_pfn,
 		unsigned long nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0bad2aed2bde..7bfdc3a99688 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -576,6 +576,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
  * @pfn: starting pageframe (must be aligned to start of a section)
  * @nr_pages: number of pages to remove (must be multiple of section size)
  * @altmap: alternative device page map or %NULL if default memmap is used
+ * @pgmap: device page map or %NULL if not ZONE_DEVICE
  *
  * Generic helper function to remove section mappings and sysfs entries
  * for the section of the memory we are removing. Caller needs to make
@@ -583,7 +584,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
  * calling offline_pages().
  */
 void __remove_pages(unsigned long pfn, unsigned long nr_pages,
-		    struct vmem_altmap *altmap)
+		    struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	const unsigned long end_pfn = pfn + nr_pages;
 	unsigned long cur_nr_pages;
@@ -598,7 +599,7 @@ void __remove_pages(unsigned long pfn, unsigned long nr_pages,
 		/* Select all remaining pages up to the next section boundary */
 		cur_nr_pages = min(end_pfn - pfn,
 				   SECTION_ALIGN_UP(pfn + 1) - pfn);
-		sparse_remove_section(pfn, cur_nr_pages, altmap);
+		sparse_remove_section(pfn, cur_nr_pages, altmap, pgmap);
 	}
 }
 
@@ -1425,7 +1426,7 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
 
 		remove_memory_block_devices(cur_start, memblock_size);
 
-		arch_remove_memory(cur_start, memblock_size, altmap);
+		arch_remove_memory(cur_start, memblock_size, altmap, NULL);
 
 		/* Verify that all vmemmap pages have actually been freed. */
 		WARN(altmap->alloc, "Altmap not fully unmapped");
@@ -1468,7 +1469,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
 		ret = create_memory_block_devices(cur_start, memblock_size, nid,
 						  params.altmap, group);
 		if (ret) {
-			arch_remove_memory(cur_start, memblock_size, params.altmap);
+			arch_remove_memory(cur_start, memblock_size, params.altmap, NULL);
 			kfree(params.altmap);
 			goto out;
 		}
@@ -1554,7 +1555,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		/* create memory block devices after memory was added */
 		ret = create_memory_block_devices(start, size, nid, NULL, group);
 		if (ret) {
-			arch_remove_memory(start, size, params.altmap);
+			arch_remove_memory(start, size, params.altmap, NULL);
 			goto error;
 		}
 	}
@@ -2266,7 +2267,7 @@ static int try_remove_memory(u64 start, u64 size)
 		 * No altmaps present, do the removal directly
 		 */
 		remove_memory_block_devices(start, size);
-		arch_remove_memory(start, size, NULL);
+		arch_remove_memory(start, size, NULL, NULL);
 	} else {
 		/* all memblocks in the range have altmaps */
 		remove_memory_blocks_and_altmaps(start, size);
diff --git a/mm/memremap.c b/mm/memremap.c
index 053842d45cb1..81766d822400 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -97,10 +97,10 @@ static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
 				   PHYS_PFN(range_len(range)));
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		__remove_pages(PHYS_PFN(range->start),
-			       PHYS_PFN(range_len(range)), NULL);
+			       PHYS_PFN(range_len(range)), NULL, pgmap);
 	} else {
 		arch_remove_memory(range->start, range_len(range),
-				pgmap_altmap(pgmap));
+				pgmap_altmap(pgmap), pgmap);
 		kasan_remove_zero_shadow(__va(range->start), range_len(range));
 	}
 	mem_hotplug_done();
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a7b11248b989..3340f6d30b01 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -665,7 +665,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
 }
 
 static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
@@ -746,7 +746,7 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
  * usage map, but still need to free the vmemmap range.
  */
 static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	struct mem_section *ms = __pfn_to_section(pfn);
 	bool section_is_early = early_section(ms);
@@ -784,7 +784,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	 * section_activate() and pfn_valid() .
 	 */
 	if (!section_is_early)
-		depopulate_section_memmap(pfn, nr_pages, altmap);
+		depopulate_section_memmap(pfn, nr_pages, altmap, pgmap);
 	else if (memmap)
 		free_map_bootmem(memmap);
 
@@ -828,7 +828,7 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
 
 	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
 	if (!memmap) {
-		section_deactivate(pfn, nr_pages, altmap);
+		section_deactivate(pfn, nr_pages, altmap, pgmap);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -889,13 +889,13 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 }
 
 void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
-			   struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	struct mem_section *ms = __pfn_to_section(pfn);
 
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
-	section_deactivate(pfn, nr_pages, altmap);
+	section_deactivate(pfn, nr_pages, altmap, pgmap);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-24  2:55 [PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization Muchun Song
                   ` (2 preceding siblings ...)
  2026-04-24  2:55 ` [PATCH v6 3/7] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
@ 2026-04-24  2:55 ` Muchun Song
  2026-04-24  7:33   ` David Hildenbrand (Arm)
  2026-04-24  2:55 ` [PATCH v6 5/7] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Muchun Song @ 2026-04-24  2:55 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable

When vmemmap optimization is enabled for DAX, the nr_memmap_pages
counter in /proc/vmstat is incorrect. The current code always accounts
for the full, non-optimized vmemmap size, but vmemmap optimization
reduces the actual number of vmemmap pages by reusing tail pages. This
causes the system to overcount vmemmap usage, leading to inaccurate
page statistics in /proc/vmstat.

Fix this by introducing section_vmemmap_pages(), which returns the exact
vmemmap page count for a given pfn range based on whether optimization
is in effect.

Fixes: 15995a352474 ("mm: report per-page metadata information")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
---
 mm/sparse-vmemmap.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 3340f6d30b01..2e642c5ff3f2 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -652,6 +652,28 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 	}
 }
 
+static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
+{
+	const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
+	const unsigned long pages_per_compound = 1UL << order;
+
+	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages,
+				    min(pages_per_compound, PAGES_PER_SECTION)));
+	VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
+
+	if (!vmemmap_can_optimize(altmap, pgmap))
+		return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
+
+	if (order < PFN_SECTION_SHIFT)
+		return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
+
+	if (IS_ALIGNED(pfn, pages_per_compound))
+		return VMEMMAP_RESERVE_NR;
+
+	return 0;
+}
+
 static struct page * __meminit populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
@@ -659,7 +681,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
 	struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
 						      pgmap);
 
-	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+	memmap_pages_add(section_nr_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
 
 	return page;
 }
@@ -670,7 +692,7 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
 
-	memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+	memmap_pages_add(-section_nr_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
 	vmemmap_free(start, end, altmap);
 }
 
@@ -678,9 +700,10 @@ static void free_map_bootmem(struct page *memmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+	unsigned long pfn = page_to_pfn(memmap);
 
-	memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
-						  PAGE_SIZE)));
+	memmap_boot_pages_add(-section_nr_vmemmap_pages(pfn, PAGES_PER_SECTION,
+							NULL, NULL));
 	vmemmap_free(start, end, NULL);
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 5/7] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
  2026-04-24  2:55 [PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization Muchun Song
                   ` (3 preceding siblings ...)
  2026-04-24  2:55 ` [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
@ 2026-04-24  2:55 ` Muchun Song
  2026-04-24  2:55 ` [PATCH v6 6/7] mm/mm_init: Fix uninitialized struct pages for ZONE_DEVICE Muchun Song
  2026-04-24  2:55 ` [PATCH v6 7/7] mm/memory_hotplug: Factor out altmap freeing checks Muchun Song
  6 siblings, 0 replies; 19+ messages in thread
From: Muchun Song @ 2026-04-24  2:55 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable

The memmap_init_zone_device() function only initializes the migratetype
of the first pageblock of a compound page. If the compound page size
exceeds pageblock_nr_pages (e.g., 1GB hugepages with 2MB pageblocks),
subsequent pageblocks in the compound page remain uninitialized.

Move the migratetype initialization out of __init_zone_device_page()
and into a separate pageblock_migratetype_init_range() function. This
iterates over the entire PFN range of the memory, ensuring that all
pageblocks are correctly initialized.

Also remove the stale confusing comment about MEMINIT_HOTPLUG above
the migratetype setting since it is an obsolete relic from commit
966cf44f637e ("mm: defer ZONE_DEVICE page initialization to the point
where we init pgmap") and no longer makes sense here.

Fixes: c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/mm_init.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index f9f8e1af921c..cfc76953e249 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -674,6 +674,20 @@ static inline void fixup_hashdist(void)
 static inline void fixup_hashdist(void) {}
 #endif /* CONFIG_NUMA */
 
+#ifdef CONFIG_ZONE_DEVICE
+static __meminit void pageblock_migratetype_init_range(unsigned long pfn,
+		unsigned long nr_pages, int migratetype)
+{
+	const unsigned long end = pfn + nr_pages;
+
+	for (pfn = pageblock_align(pfn); pfn < end; pfn += pageblock_nr_pages) {
+		init_pageblock_migratetype(pfn_to_page(pfn), migratetype, false);
+		if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
+			cond_resched();
+	}
+}
+#endif
+
 /*
  * Initialize a reserved page unconditionally, finding its zone first.
  */
@@ -1011,21 +1025,6 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	page_folio(page)->pgmap = pgmap;
 	page->zone_device_data = NULL;
 
-	/*
-	 * Mark the block movable so that blocks are reserved for
-	 * movable at startup. This will force kernel allocations
-	 * to reserve their blocks rather than leaking throughout
-	 * the address space during boot when many long-lived
-	 * kernel allocations are made.
-	 *
-	 * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
-	 * because this is done early in section_activate()
-	 */
-	if (pageblock_aligned(pfn)) {
-		init_pageblock_migratetype(page, MIGRATE_MOVABLE, false);
-		cond_resched();
-	}
-
 	/*
 	 * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released
 	 * directly to the driver page allocator which will set the page count
@@ -1122,6 +1121,9 @@ void __ref memmap_init_zone_device(struct zone *zone,
 
 		__init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
 
+		if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
+			cond_resched();
+
 		if (pfns_per_compound == 1)
 			continue;
 
@@ -1129,6 +1131,8 @@ void __ref memmap_init_zone_device(struct zone *zone,
 				     compound_nr_pages(altmap, pgmap));
 	}
 
+	pageblock_migratetype_init_range(start_pfn, nr_pages, MIGRATE_MOVABLE);
+
 	pr_debug("%s initialised %lu pages in %ums\n", __func__,
 		nr_pages, jiffies_to_msecs(jiffies - start));
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 6/7] mm/mm_init: Fix uninitialized struct pages for ZONE_DEVICE
  2026-04-24  2:55 [PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization Muchun Song
                   ` (4 preceding siblings ...)
  2026-04-24  2:55 ` [PATCH v6 5/7] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
@ 2026-04-24  2:55 ` Muchun Song
  2026-04-24  8:20   ` Mike Rapoport
  2026-04-24  2:55 ` [PATCH v6 7/7] mm/memory_hotplug: Factor out altmap freeing checks Muchun Song
  6 siblings, 1 reply; 19+ messages in thread
From: Muchun Song @ 2026-04-24  2:55 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable

If DAX memory is hotplugged into an unoccupied subsection of an early
section, section_activate() reuses the unoptimized boot memmap.
However, compound_nr_pages() still assumes that vmemmap optimization is
in effect and initializes only the reduced number of struct pages. As a
result, the remaining tail struct pages are left uninitialized, which
can later lead to unexpected behavior or crashes.

Fix this by treating early sections as unoptimized when calculating how
many struct pages to initialize.

Fixes: 6fd3620b3428 ("mm/page_alloc: reuse tail struct pages for compound devmaps")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/mm_init.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index cfc76953e249..bd466a3c10c8 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1055,10 +1055,17 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
  * of how the sparse_vmemmap internals handle compound pages in the lack
  * of an altmap. See vmemmap_populate_compound_pages().
  */
-static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap,
+static inline unsigned long compound_nr_pages(unsigned long pfn,
+					      struct vmem_altmap *altmap,
 					      struct dev_pagemap *pgmap)
 {
-	if (!vmemmap_can_optimize(altmap, pgmap))
+	/*
+	 * If DAX memory is hot-plugged into an unoccupied subsection
+	 * of an early section, the unoptimized boot memmap is reused.
+	 * See section_activate().
+	 */
+	if (early_section(__pfn_to_section(pfn)) ||
+	    !vmemmap_can_optimize(altmap, pgmap))
 		return pgmap_vmemmap_nr(pgmap);
 
 	return VMEMMAP_RESERVE_NR * (PAGE_SIZE / sizeof(struct page));
@@ -1128,7 +1135,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
 			continue;
 
 		memmap_init_compound(page, pfn, zone_idx, nid, pgmap,
-				     compound_nr_pages(altmap, pgmap));
+				     compound_nr_pages(pfn, altmap, pgmap));
 	}
 
 	pageblock_migratetype_init_range(start_pfn, nr_pages, MIGRATE_MOVABLE);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 7/7] mm/memory_hotplug: Factor out altmap freeing checks
  2026-04-24  2:55 [PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization Muchun Song
                   ` (5 preceding siblings ...)
  2026-04-24  2:55 ` [PATCH v6 6/7] mm/mm_init: Fix uninitialized struct pages for ZONE_DEVICE Muchun Song
@ 2026-04-24  2:55 ` Muchun Song
  2026-04-24  7:34   ` David Hildenbrand (Arm)
  6 siblings, 1 reply; 19+ messages in thread
From: Muchun Song @ 2026-04-24  2:55 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

Use a small helper to centralize altmap freeing after verifying that all
vmemmap pages were released. This keeps the check consistent between the
normal teardown path and the memory hotplug error paths.

Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/memory_hotplug.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7bfdc3a99688..ee150d312bd9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1403,6 +1403,12 @@ bool mhp_supports_memmap_on_memory(void)
 }
 EXPORT_SYMBOL_GPL(mhp_supports_memmap_on_memory);
 
+static void altmap_free(struct vmem_altmap *altmap)
+{
+	WARN_ONCE(altmap->alloc, "Altmap not fully unmapped");
+	kfree(altmap);
+}
+
 static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
 {
 	unsigned long memblock_size = memory_block_size_bytes();
@@ -1425,12 +1431,8 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
 		mem->altmap = NULL;
 
 		remove_memory_block_devices(cur_start, memblock_size);
-
 		arch_remove_memory(cur_start, memblock_size, altmap, NULL);
-
-		/* Verify that all vmemmap pages have actually been freed. */
-		WARN(altmap->alloc, "Altmap not fully unmapped");
-		kfree(altmap);
+		altmap_free(altmap);
 	}
 }
 
@@ -1461,7 +1463,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
 		/* call arch's memory hotadd */
 		ret = arch_add_memory(nid, cur_start, memblock_size, &params);
 		if (ret < 0) {
-			kfree(params.altmap);
+			altmap_free(params.altmap);
 			goto out;
 		}
 
@@ -1470,7 +1472,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
 						  params.altmap, group);
 		if (ret) {
 			arch_remove_memory(cur_start, memblock_size, params.altmap, NULL);
-			kfree(params.altmap);
+			altmap_free(params.altmap);
 			goto out;
 		}
 	}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-24  2:55 ` [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
@ 2026-04-24  7:33   ` David Hildenbrand (Arm)
  2026-04-24  7:48     ` Muchun Song
  2026-04-25  3:05     ` Muchun Song
  0 siblings, 2 replies; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24  7:33 UTC (permalink / raw)
  To: Muchun Song, Andrew Morton, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, stable

On 4/24/26 04:55, Muchun Song wrote:
> When vmemmap optimization is enabled for DAX, the nr_memmap_pages
> counter in /proc/vmstat is incorrect. The current code always accounts
> for the full, non-optimized vmemmap size, but vmemmap optimization
> reduces the actual number of vmemmap pages by reusing tail pages. This
> causes the system to overcount vmemmap usage, leading to inaccurate
> page statistics in /proc/vmstat.
> 
> Fix this by introducing section_vmemmap_pages(), which returns the exact
> vmemmap page count for a given pfn range based on whether optimization
> is in effect.
> 
> Fixes: 15995a352474 ("mm: report per-page metadata information")
> Cc: stable@vger.kernel.org
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Acked-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/sparse-vmemmap.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 3340f6d30b01..2e642c5ff3f2 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -652,6 +652,28 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  	}
>  }
>  
> +static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
> +		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
> +{
> +	const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
> +	const unsigned long pages_per_compound = 1UL << order;
> +
> +	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages,
> +				    min(pages_per_compound, PAGES_PER_SECTION)));

FWIW, I though the right thing to do here would be:

	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound);
	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION);

I don't really see how PAGES_PER_SECTION make sense given that
PAGES_PER_SUBSECTION are the smallest granularity we allow adding/removing.

Also, the "min()" implies that there is a connection between both properties,
but there isn't to that degree.

If order == 0, then you'd only ever check alignment for ... 1, not
PAGES_PER_SUBSECTION, which already looks weird.

So you really want to check "max(pages_per_compound, PAGES_PER_SUBSECTION)", but
just having two statements is clearer.

Or am I getting something very wrong here? :)


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 7/7] mm/memory_hotplug: Factor out altmap freeing checks
  2026-04-24  2:55 ` [PATCH v6 7/7] mm/memory_hotplug: Factor out altmap freeing checks Muchun Song
@ 2026-04-24  7:34   ` David Hildenbrand (Arm)
  2026-04-24 10:20     ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24  7:34 UTC (permalink / raw)
  To: Muchun Song, Andrew Morton, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel

On 4/24/26 04:55, Muchun Song wrote:
> Use a small helper to centralize altmap freeing after verifying that all
> vmemmap pages were released. This keeps the check consistent between the
> normal teardown path and the memory hotplug error paths.
> 
> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---

Thanks!

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

Andrew usually prefers sending non-fixes separately, but he can tell us how he
prefers it in this case here.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-24  7:33   ` David Hildenbrand (Arm)
@ 2026-04-24  7:48     ` Muchun Song
  2026-04-25  3:05     ` Muchun Song
  1 sibling, 0 replies; 19+ messages in thread
From: Muchun Song @ 2026-04-24  7:48 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Muchun Song, Andrew Morton, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
	linux-mm, linuxppc-dev, linux-kernel, stable



> On Apr 24, 2026, at 15:33, David Hildenbrand (Arm) <david@kernel.org> wrote:
> 
> On 4/24/26 04:55, Muchun Song wrote:
>> When vmemmap optimization is enabled for DAX, the nr_memmap_pages
>> counter in /proc/vmstat is incorrect. The current code always accounts
>> for the full, non-optimized vmemmap size, but vmemmap optimization
>> reduces the actual number of vmemmap pages by reusing tail pages. This
>> causes the system to overcount vmemmap usage, leading to inaccurate
>> page statistics in /proc/vmstat.
>> 
>> Fix this by introducing section_vmemmap_pages(), which returns the exact
>> vmemmap page count for a given pfn range based on whether optimization
>> is in effect.
>> 
>> Fixes: 15995a352474 ("mm: report per-page metadata information")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Acked-by: Oscar Salvador <osalvador@suse.de>
>> ---
>> mm/sparse-vmemmap.c | 31 +++++++++++++++++++++++++++----
>> 1 file changed, 27 insertions(+), 4 deletions(-)
>> 
>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>> index 3340f6d30b01..2e642c5ff3f2 100644
>> --- a/mm/sparse-vmemmap.c
>> +++ b/mm/sparse-vmemmap.c
>> @@ -652,6 +652,28 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>> }
>> }
>> 
>> +static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
>> + 		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>> +{
>> + 	const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
>> + 	const unsigned long pages_per_compound = 1UL << order;
>> +
>> + 	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages,
>> +     				    min(pages_per_compound, PAGES_PER_SECTION)));
> 
> FWIW, I though the right thing to do here would be:
> 
> VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound);
> VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION);
> 
> I don't really see how PAGES_PER_SECTION make sense given that
> PAGES_PER_SUBSECTION are the smallest granularity we allow adding/removing.
> 
> Also, the "min()" implies that there is a connection between both properties,
> but there isn't to that degree.
> 
> If order == 0, then you'd only ever check alignment for ... 1, not
> PAGES_PER_SUBSECTION, which already looks weird.
> 
> So you really want to check "max(pages_per_compound, PAGES_PER_SUBSECTION)", but
> just having two statements is clearer.
> 
> Or am I getting something very wrong here? :)
> 

You are absolutely right. I misread it earlier. I mistakenly read
PAGES_PER_SUBSECTION as PAGES_PER_SECTION, which is why I still used
PAGES_PER_SECTION in v5. That was my mistake and obviously not what
you originally meant.

I completely agree with your suggestion to use two statements here,
as it makes the alignment requirements much clearer. I'll fix this in
the next version. Thanks for pointing this out!

Muchun,
Thanks.

> 
> -- 
> Cheers,
> 
> David




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 6/7] mm/mm_init: Fix uninitialized struct pages for ZONE_DEVICE
  2026-04-24  2:55 ` [PATCH v6 6/7] mm/mm_init: Fix uninitialized struct pages for ZONE_DEVICE Muchun Song
@ 2026-04-24  8:20   ` Mike Rapoport
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2026-04-24  8:20 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel, stable

On Fri, Apr 24, 2026 at 10:55:46AM +0800, Muchun Song wrote:
> If DAX memory is hotplugged into an unoccupied subsection of an early
> section, section_activate() reuses the unoptimized boot memmap.
> However, compound_nr_pages() still assumes that vmemmap optimization is
> in effect and initializes only the reduced number of struct pages. As a
> result, the remaining tail struct pages are left uninitialized, which
> can later lead to unexpected behavior or crashes.
> 
> Fix this by treating early sections as unoptimized when calculating how
> many struct pages to initialize.
> 
> Fixes: 6fd3620b3428 ("mm/page_alloc: reuse tail struct pages for compound devmaps")
> Cc: stable@vger.kernel.org
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/mm_init.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index cfc76953e249..bd466a3c10c8 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1055,10 +1055,17 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
>   * of how the sparse_vmemmap internals handle compound pages in the lack
>   * of an altmap. See vmemmap_populate_compound_pages().
>   */
> -static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap,
> +static inline unsigned long compound_nr_pages(unsigned long pfn,
> +					      struct vmem_altmap *altmap,
>  					      struct dev_pagemap *pgmap)
>  {
> -	if (!vmemmap_can_optimize(altmap, pgmap))
> +	/*
> +	 * If DAX memory is hot-plugged into an unoccupied subsection
> +	 * of an early section, the unoptimized boot memmap is reused.
> +	 * See section_activate().
> +	 */
> +	if (early_section(__pfn_to_section(pfn)) ||
> +	    !vmemmap_can_optimize(altmap, pgmap))
>  		return pgmap_vmemmap_nr(pgmap);
>  
>  	return VMEMMAP_RESERVE_NR * (PAGE_SIZE / sizeof(struct page));
> @@ -1128,7 +1135,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
>  			continue;
>  
>  		memmap_init_compound(page, pfn, zone_idx, nid, pgmap,
> -				     compound_nr_pages(altmap, pgmap));
> +				     compound_nr_pages(pfn, altmap, pgmap));
>  	}
>  
>  	pageblock_migratetype_init_range(start_pfn, nr_pages, MIGRATE_MOVABLE);
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 7/7] mm/memory_hotplug: Factor out altmap freeing checks
  2026-04-24  7:34   ` David Hildenbrand (Arm)
@ 2026-04-24 10:20     ` Andrew Morton
  2026-04-24 11:58       ` Muchun Song
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2026-04-24 10:20 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Muchun Song, Muchun Song, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
	linux-mm, linuxppc-dev, linux-kernel

On Fri, 24 Apr 2026 09:34:43 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote:

> On 4/24/26 04:55, Muchun Song wrote:
> > Use a small helper to centralize altmap freeing after verifying that all
> > vmemmap pages were released. This keeps the check consistent between the
> > normal teardown path and the memory hotplug error paths.
> > 
> > Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> 
> Thanks!
> 
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> Andrew usually prefers sending non-fixes separately,

Patches which are destined for the current -rc cycle (and possibly
-stable) (aka "hotfixes") take a different route into mainline from
regular next-merge-window material.  They go into different branches
and they have different timing.

If a patchset has a mixture of hotfixes (upstream next week) and
regular patches (upstream mid June) then I have to pull the series
apart, stage some things into one branch and other things in another
branch, rework the cover letter etc etc.  Problems with this are:

- what goes upstream doesn't map well onto what was presented on the
  mailing list.

- the hotfixes (upstream next week) may have dependencies on the
  regular patches (upstream mid June).  This is backwards.

Much depends on the urgency of the hotfixes.

In this case, iirc, the determination is "not very urgent at all".  So
the series is OK as-is - it's all "upstream mid June".

This is still a bit suboptimal because when the -stable maintainers get
onto backporting the cc:stable patches (after mid June), they may
encounter merge/build/runtime issues due to the absence of the
non-hotfix patches from this series.

So generally, it is best for authors to have a think about these
timing/priority issues and to present the patches in a suitable fashion
- hotfixes/-stable patches in one series then non-hotfixes in a second,
later series.  This way their presentation matches what goes upstream
and we reduce the possibility of problems when the -stable maintainers
get onto backporting.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 7/7] mm/memory_hotplug: Factor out altmap freeing checks
  2026-04-24 10:20     ` Andrew Morton
@ 2026-04-24 11:58       ` Muchun Song
  0 siblings, 0 replies; 19+ messages in thread
From: Muchun Song @ 2026-04-24 11:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand (Arm), Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel



> On Apr 24, 2026, at 18:20, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Fri, 24 Apr 2026 09:34:43 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote:
> 
>> On 4/24/26 04:55, Muchun Song wrote:
>>> Use a small helper to centralize altmap freeing after verifying that all
>>> vmemmap pages were released. This keeps the check consistent between the
>>> normal teardown path and the memory hotplug error paths.
>>> 
>>> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> ---
>> 
>> Thanks!
>> 
>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>> 
>> Andrew usually prefers sending non-fixes separately,
> 
> Patches which are destined for the current -rc cycle (and possibly
> -stable) (aka "hotfixes") take a different route into mainline from
> regular next-merge-window material.  They go into different branches
> and they have different timing.
> 
> If a patchset has a mixture of hotfixes (upstream next week) and
> regular patches (upstream mid June) then I have to pull the series
> apart, stage some things into one branch and other things in another
> branch, rework the cover letter etc etc.  Problems with this are:
> 
> - what goes upstream doesn't map well onto what was presented on the
>  mailing list.
> 
> - the hotfixes (upstream next week) may have dependencies on the
>  regular patches (upstream mid June).  This is backwards.
> 
> Much depends on the urgency of the hotfixes.
> 
> In this case, iirc, the determination is "not very urgent at all".  So
> the series is OK as-is - it's all "upstream mid June".
> 
> This is still a bit suboptimal because when the -stable maintainers get
> onto backporting the cc:stable patches (after mid June), they may
> encounter merge/build/runtime issues due to the absence of the
> non-hotfix patches from this series.
> 
> So generally, it is best for authors to have a think about these
> timing/priority issues and to present the patches in a suitable fashion
> - hotfixes/-stable patches in one series then non-hotfixes in a second,
> later series.  This way their presentation matches what goes upstream
> and we reduce the possibility of problems when the -stable maintainers
> get onto backporting.

Thanks for the clarification! Since I'm heading into the next revision
anyway, I’ll go ahead and split the series.

I'll drop the non-fix patches for now and focus this series on the
bugfixes to ensure a smooth merge. The regular patches will follow
in a separate submission later.

Thanks,
Muchun.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-24  7:33   ` David Hildenbrand (Arm)
  2026-04-24  7:48     ` Muchun Song
@ 2026-04-25  3:05     ` Muchun Song
  2026-04-25  5:48       ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 19+ messages in thread
From: Muchun Song @ 2026-04-25  3:05 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Muchun Song, Andrew Morton, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
	linux-mm, linuxppc-dev, linux-kernel, stable



> On Apr 24, 2026, at 15:33, David Hildenbrand (Arm) <david@kernel.org> wrote:
> 
> On 4/24/26 04:55, Muchun Song wrote:
>> When vmemmap optimization is enabled for DAX, the nr_memmap_pages
>> counter in /proc/vmstat is incorrect. The current code always accounts
>> for the full, non-optimized vmemmap size, but vmemmap optimization
>> reduces the actual number of vmemmap pages by reusing tail pages. This
>> causes the system to overcount vmemmap usage, leading to inaccurate
>> page statistics in /proc/vmstat.
>> 
>> Fix this by introducing section_vmemmap_pages(), which returns the exact
>> vmemmap page count for a given pfn range based on whether optimization
>> is in effect.
>> 
>> Fixes: 15995a352474 ("mm: report per-page metadata information")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Acked-by: Oscar Salvador <osalvador@suse.de>
>> ---
>> mm/sparse-vmemmap.c | 31 +++++++++++++++++++++++++++----
>> 1 file changed, 27 insertions(+), 4 deletions(-)
>> 
>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>> index 3340f6d30b01..2e642c5ff3f2 100644
>> --- a/mm/sparse-vmemmap.c
>> +++ b/mm/sparse-vmemmap.c
>> @@ -652,6 +652,28 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>> }
>> }
>> 
>> +static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
>> + 		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>> +{
>> + 	const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
>> + 	const unsigned long pages_per_compound = 1UL << order;
>> +
>> + 	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages,
>> +    			min(pages_per_compound, PAGES_PER_SECTION)));
> 
> FWIW, I though the right thing to do here would be:
> 
> VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound);
> VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION);
> 
> I don't really see how PAGES_PER_SECTION make sense given that
> PAGES_PER_SUBSECTION are the smallest granularity we allow adding/removing.
> 
> Also, the "min()" implies that there is a connection between both properties,
> but there isn't to that degree.
> 
> If order == 0, then you'd only ever check alignment for ... 1, not
> PAGES_PER_SUBSECTION, which already looks weird.
> 
> So you really want to check "max(pages_per_compound, PAGES_PER_SUBSECTION)", but
> just having two statements is clearer.
> 
> Or am I getting something very wrong here? :)

Hi David,

Sorry, I missed the 1GB hugepage scenario earlier. Given that sparse_add_section()
operates on a scale between PAGES_PER_SUBSECTION and PAGES_PER_SECTION, the pfn and
nr_pages parameters wouldn't be aligned with the hugepage size (pages_per_compound),
but rather with the PAGES_PER_SECTION boundary. Do you think this explanation makes
it clearer? In the interest of code clarity, do you think the modification below
makes it easier to follow?

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 2e642c5ff3f2..ce675c5fb94d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -658,15 +658,18 @@ static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long n
        const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
        const unsigned long pages_per_compound = 1UL << order;

-       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages,
-                                   min(pages_per_compound, PAGES_PER_SECTION)));
+       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION));
        VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));

        if (!vmemmap_can_optimize(altmap, pgmap))
                return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);

-       if (order < PFN_SECTION_SHIFT)
+       if (order < PFN_SECTION_SHIFT) {
+               VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound));
                return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
+       }
+
+       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION));

        if (IS_ALIGNED(pfn, pages_per_compound))
                return VMEMMAP_RESERVE_NR;

Thanks.

> 
> 
> -- 
> Cheers,
> 
> David



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-25  3:05     ` Muchun Song
@ 2026-04-25  5:48       ` David Hildenbrand (Arm)
  2026-04-25  6:20         ` Muchun Song
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-25  5:48 UTC (permalink / raw)
  To: Muchun Song
  Cc: Muchun Song, Andrew Morton, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
	linux-mm, linuxppc-dev, linux-kernel, stable

> 
> Hi David,
> 
> Sorry, I missed the 1GB hugepage scenario earlier. Given that sparse_add_section()
> operates on a scale between PAGES_PER_SUBSECTION and PAGES_PER_SECTION, the pfn and
> nr_pages parameters wouldn't be aligned with the hugepage size (pages_per_compound),
> but rather with the PAGES_PER_SECTION boundary. Do you think this explanation makes
> it clearer? In the interest of code clarity, do you think the modification below
> makes it easier to follow?
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 2e642c5ff3f2..ce675c5fb94d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -658,15 +658,18 @@ static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long n
>         const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
>         const unsigned long pages_per_compound = 1UL << order;
> 
> -       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages,
> -                                   min(pages_per_compound, PAGES_PER_SECTION)));
> +       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION));

That here makes sense. We can only add/remove in multiples of PAGES_PER_SECTION.
I think what we are saying is that we want that check in addition to the
existing min() check.

>         VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
> 
>         if (!vmemmap_can_optimize(altmap, pgmap))
>                 return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
> 
> -       if (order < PFN_SECTION_SHIFT)
> +       if (order < PFN_SECTION_SHIFT) {
> +               VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound));
>                 return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;

That makes sense as well, within a section, we expect that we always add/remove
entire "compound"-managed chunks.

> +       }
> +
> +       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION));

And this is then for the case where a 1G page spans multiple sections, where we
expect to add/remove an entire section.

So here, indeed the "min" makes sense. I guess we also assume:

	VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION);

Looks better to me!

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-25  5:48       ` David Hildenbrand (Arm)
@ 2026-04-25  6:20         ` Muchun Song
  2026-04-25  6:47           ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 19+ messages in thread
From: Muchun Song @ 2026-04-25  6:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Muchun Song, Andrew Morton, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan, Lorenzo Stoakes, Liam R Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
	linux-mm, linuxppc-dev, linux-kernel, stable



> On Apr 25, 2026, at 13:48, David Hildenbrand (Arm) <david@kernel.org> wrote:
> 
> 
>> 
>> 
>> Hi David,
>> 
>> Sorry, I missed the 1GB hugepage scenario earlier. Given that sparse_add_section()
>> operates on a scale between PAGES_PER_SUBSECTION and PAGES_PER_SECTION, the pfn and
>> nr_pages parameters wouldn't be aligned with the hugepage size (pages_per_compound),
>> but rather with the PAGES_PER_SECTION boundary. Do you think this explanation makes
>> it clearer? In the interest of code clarity, do you think the modification below
>> makes it easier to follow?
>> 
>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>> index 2e642c5ff3f2..ce675c5fb94d 100644
>> --- a/mm/sparse-vmemmap.c
>> +++ b/mm/sparse-vmemmap.c
>> @@ -658,15 +658,18 @@ static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long n
>>        const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
>>        const unsigned long pages_per_compound = 1UL << order;
>> 
>> -       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages,
>> -                                   min(pages_per_compound, PAGES_PER_SECTION)));
>> +       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION));
> 
> That here makes sense. We can only add/remove in multiples of PAGES_PER_SECTION.
> I think what we are saying is that we want that check in addition to the
> existing min() check.

Right.

> 
>>        VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
>> 
>>        if (!vmemmap_can_optimize(altmap, pgmap))
>>                return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
>> 
>> -       if (order < PFN_SECTION_SHIFT)
>> +       if (order < PFN_SECTION_SHIFT) {
>> +               VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound));
>>                return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
> 
> That makes sense as well, within a section, we expect that we always add/remove
> entire "compound"-managed chunks.
> 
>> +       }
>> +
>> +       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION));
> 
> And this is then for the case where a 1G page spans multiple sections, where we
> expect to add/remove an entire section.
> 
> So here, indeed the "min" makes sense. I guess we also assume:
> 
>    VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION);

Yes. But this one we do not need to explicit it to
assert it since at the front of this function we have

VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));

to make sure the passing range belongs to one section.

Thanks.

> 
> Looks better to me!
> 
> --
> Cheers,
> 
> David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-25  6:20         ` Muchun Song
@ 2026-04-25  6:47           ` David Hildenbrand (Arm)
  2026-04-25  6:56             ` Muchun Song
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-25  6:47 UTC (permalink / raw)
  To: Muchun Song
  Cc: Muchun Song, Andrew Morton, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan, Lorenzo Stoakes, Liam R Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
	linux-mm, linuxppc-dev, linux-kernel, stable

On 4/25/26 08:20, Muchun Song wrote:
> 
> 
>> On Apr 25, 2026, at 13:48, David Hildenbrand (Arm) <david@kernel.org> wrote:
>>
>> 
>>>
>>>
>>> Hi David,
>>>
>>> Sorry, I missed the 1GB hugepage scenario earlier. Given that sparse_add_section()
>>> operates on a scale between PAGES_PER_SUBSECTION and PAGES_PER_SECTION, the pfn and
>>> nr_pages parameters wouldn't be aligned with the hugepage size (pages_per_compound),
>>> but rather with the PAGES_PER_SECTION boundary. Do you think this explanation makes
>>> it clearer? In the interest of code clarity, do you think the modification below
>>> makes it easier to follow?
>>>
>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>> index 2e642c5ff3f2..ce675c5fb94d 100644
>>> --- a/mm/sparse-vmemmap.c
>>> +++ b/mm/sparse-vmemmap.c
>>> @@ -658,15 +658,18 @@ static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long n
>>>        const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
>>>        const unsigned long pages_per_compound = 1UL << order;
>>>
>>> -       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages,
>>> -                                   min(pages_per_compound, PAGES_PER_SECTION)));
>>> +       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION));
>>
>> That here makes sense. We can only add/remove in multiples of PAGES_PER_SECTION.
>> I think what we are saying is that we want that check in addition to the
>> existing min() check.
> 
> Right.
> 
>>
>>>        VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
>>>
>>>        if (!vmemmap_can_optimize(altmap, pgmap))
>>>                return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
>>>
>>> -       if (order < PFN_SECTION_SHIFT)
>>> +       if (order < PFN_SECTION_SHIFT) {
>>> +               VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound));
>>>                return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
>>
>> That makes sense as well, within a section, we expect that we always add/remove
>> entire "compound"-managed chunks.
>>
>>> +       }
>>> +
>>> +       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION));
>>
>> And this is then for the case where a 1G page spans multiple sections, where we
>> expect to add/remove an entire section.
>>
>> So here, indeed the "min" makes sense. I guess we also assume:
>>
>>    VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION);
> 
> Yes. But this one we do not need to explicit it to
> assert it since at the front of this function we have
> 
> VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));

Ah, yes. The alignment checks + VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION);
however imply that.

So you could simplify by using that check instead of the pfn_to_section_nr() check.

But it's still early here ... so whatever you prefer :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-25  6:47           ` David Hildenbrand (Arm)
@ 2026-04-25  6:56             ` Muchun Song
  0 siblings, 0 replies; 19+ messages in thread
From: Muchun Song @ 2026-04-25  6:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Muchun Song, Andrew Morton, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan, Lorenzo Stoakes, Liam R Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
	linux-mm, linuxppc-dev, linux-kernel, stable



> On Apr 25, 2026, at 14:47, David Hildenbrand (Arm) <david@kernel.org> wrote:
> 
> On 4/25/26 08:20, Muchun Song wrote:
>> 
>> 
>>>> On Apr 25, 2026, at 13:48, David Hildenbrand (Arm) <david@kernel.org> wrote:
>>> 
>>> 
>>>> 
>>>> 
>>>> Hi David,
>>>> 
>>>> Sorry, I missed the 1GB hugepage scenario earlier. Given that sparse_add_section()
>>>> operates on a scale between PAGES_PER_SUBSECTION and PAGES_PER_SECTION, the pfn and
>>>> nr_pages parameters wouldn't be aligned with the hugepage size (pages_per_compound),
>>>> but rather with the PAGES_PER_SECTION boundary. Do you think this explanation makes
>>>> it clearer? In the interest of code clarity, do you think the modification below
>>>> makes it easier to follow?
>>>> 
>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>>> index 2e642c5ff3f2..ce675c5fb94d 100644
>>>> --- a/mm/sparse-vmemmap.c
>>>> +++ b/mm/sparse-vmemmap.c
>>>> @@ -658,15 +658,18 @@ static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long n
>>>>       const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
>>>>       const unsigned long pages_per_compound = 1UL << order;
>>>> 
>>>> -       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages,
>>>> -                                   min(pages_per_compound, PAGES_PER_SECTION)));
>>>> +       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION));
>>> 
>>> That here makes sense. We can only add/remove in multiples of PAGES_PER_SECTION.
>>> I think what we are saying is that we want that check in addition to the
>>> existing min() check.
>> 
>> Right.
>> 
>>> 
>>>>       VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
>>>> 
>>>>       if (!vmemmap_can_optimize(altmap, pgmap))
>>>>               return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
>>>> 
>>>> -       if (order < PFN_SECTION_SHIFT)
>>>> +       if (order < PFN_SECTION_SHIFT) {
>>>> +               VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound));
>>>>               return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
>>> 
>>> That makes sense as well, within a section, we expect that we always add/remove
>>> entire "compound"-managed chunks.
>>> 
>>>> +       }
>>>> +
>>>> +       VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION));
>>> 
>>> And this is then for the case where a 1G page spans multiple sections, where we
>>> expect to add/remove an entire section.
>>> 
>>> So here, indeed the "min" makes sense. I guess we also assume:
>>> 
>>>   VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION);
>> 
>> Yes. But this one we do not need to explicit it to
>> assert it since at the front of this function we have
>> 
>> VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
> 
> Ah, yes. The alignment checks + VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION);
> however imply that.
> 
> So you could simplify by using that check instead of the pfn_to_section_nr() check.
> 
> But it's still early here ... so whatever you prefer :)

Thanks for the suggestion. I think your approach is also
good — at least it looks shorter and cleaner. I'll switch to
using VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION) instead.

Thanks.

> 
> --
> Cheers,
> 
> David


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-04-25  6:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24  2:55 [PATCH v6 0/7] mm: fix vmemmap optimization accounting and initialization Muchun Song
2026-04-24  2:55 ` [PATCH v6 1/7] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
2026-04-24  2:55 ` [PATCH v6 2/7] mm/memory_hotplug: Fix incorrect altmap passing in error path Muchun Song
2026-04-24  2:55 ` [PATCH v6 3/7] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
2026-04-24  2:55 ` [PATCH v6 4/7] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
2026-04-24  7:33   ` David Hildenbrand (Arm)
2026-04-24  7:48     ` Muchun Song
2026-04-25  3:05     ` Muchun Song
2026-04-25  5:48       ` David Hildenbrand (Arm)
2026-04-25  6:20         ` Muchun Song
2026-04-25  6:47           ` David Hildenbrand (Arm)
2026-04-25  6:56             ` Muchun Song
2026-04-24  2:55 ` [PATCH v6 5/7] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
2026-04-24  2:55 ` [PATCH v6 6/7] mm/mm_init: Fix uninitialized struct pages for ZONE_DEVICE Muchun Song
2026-04-24  8:20   ` Mike Rapoport
2026-04-24  2:55 ` [PATCH v6 7/7] mm/memory_hotplug: Factor out altmap freeing checks Muchun Song
2026-04-24  7:34   ` David Hildenbrand (Arm)
2026-04-24 10:20     ` Andrew Morton
2026-04-24 11:58       ` Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox