LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 09/49] mm: panic on memory allocation failure in sparse_init_nid()
From: Mike Rapoport @ 2026-04-28  6:56 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <20260405125240.2558577-10-songmuchun@bytedance.com>

On Sun, Apr 05, 2026 at 08:52:00PM +0800, Muchun Song wrote:
> When vmemmap pages allocation or usemap allocation fails, sparse_init_nid()
> currently only marks the corresponding section as non-present. However,
> subsequent code like memmap_init() iterating over PFNs does not check for
> non-present sections, leading to invalid memory access (additional,
> subsection_map_init() accessing the unallocated usemap as well).
> 
> It is complex to audit and fix all boot-time PFN iterators to handle these
> partially initialized sections correctly. Since vmemmap and usemap allocation
> failures are extremely rare during early boot, the more appropriate approach
> is to expose the problem as early as possible.
> 
> Therefore, use BUG_ON() to panic immediately if allocation fails, instead of
> attempting a partial recovery that leads to obscure crashes later.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/sparse.c | 37 ++++++++-----------------------------
>  1 file changed, 8 insertions(+), 29 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index effdac6b0ab1..5c12b979a618 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -354,19 +354,15 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
>  				   unsigned long map_count)
>  {
>  	unsigned long pnum;
> -	struct page *map;
> -	struct mem_section *ms;
> -
> -	if (sparse_usage_init(nid, map_count)) {
> -		pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
> -		goto failed;
> -	}
>  
> +	if (sparse_usage_init(nid, map_count))
> +		panic("The node[%d] usemap allocation failed\n", nid);

Please consider using memblock_alloc_or_panic() in sparse_usage_init(), it
would simplify the code even more.

>  	sparse_buffer_init(map_count * section_map_size(), nid);
>  
>  	sparse_vmemmap_init_nid_early(nid);
>  
>  	for_each_present_section_nr(pnum_begin, pnum) {
> +		struct mem_section *ms;
>  		unsigned long pfn = section_nr_to_pfn(pnum);
>  
>  		if (pnum >= pnum_end)
> @@ -374,16 +370,12 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
>  
>  		ms = __nr_to_section(pnum);
>  		if (!preinited_vmemmap_section(ms)) {
> +			struct page *map;
> +
>  			map = __populate_section_memmap(pfn, PAGES_PER_SECTION,
> -					nid, NULL, NULL);
> -			if (!map) {
> -				pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.",
> -				       __func__, nid);
> -				pnum_begin = pnum;
> -				sparse_usage_fini();
> -				sparse_buffer_fini();
> -				goto failed;
> -			}
> +							nid, NULL, NULL);
> +			if (!map)
> +				panic("Populate section (%ld) on node[%d] failed\n", pnum, nid);
>  			memmap_boot_pages_add(DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
>  							   PAGE_SIZE));
>  			sparse_init_early_section(nid, map, pnum, 0);
> @@ -391,19 +383,6 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
>  	}
>  	sparse_usage_fini();
>  	sparse_buffer_fini();
> -	return;
> -failed:
> -	/*
> -	 * We failed to allocate, mark all the following pnums as not present,
> -	 * except the ones already initialized earlier.
> -	 */
> -	for_each_present_section_nr(pnum_begin, pnum) {
> -		if (pnum >= pnum_end)
> -			break;
> -		ms = __nr_to_section(pnum);
> -		if (!preinited_vmemmap_section(ms))
> -			ms->section_mem_map = 0;
> -	}
>  }
>  
>  /*
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply

* Re: [PATCH v7 4/6] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
From: David Hildenbrand (Arm) @ 2026-04-28  7:00 UTC (permalink / raw)
  To: Muchun Song
  Cc: Muchun Song, Andrew Morton, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
	linux-mm, linuxppc-dev, linux-kernel, stable
In-Reply-To: <C2E99AB4-CC47-4D9F-BF56-F971FD5A3C26@linux.dev>

On 4/28/26 04:21, Muchun Song wrote:
> 
> 
>> On Apr 27, 2026, at 18:17, David Hildenbrand (Arm) <david@kernel.org> wrote:
>>
>> On 4/26/26 11:26, Muchun Song wrote:
>>> When vmemmap optimization is enabled for DAX, the nr_memmap_pages
>>> counter in /proc/vmstat is incorrect. The current code always accounts
>>> for the full, non-optimized vmemmap size, but vmemmap optimization
>>> reduces the actual number of vmemmap pages by reusing tail pages. This
>>> causes the system to overcount vmemmap usage, leading to inaccurate
>>> page statistics in /proc/vmstat.
>>>
>>> Fix this by introducing section_nr_vmemmap_pages(), which returns the exact
>>> vmemmap page count for a given pfn range based on whether optimization
>>> is in effect.
>>>
>>> Fixes: 15995a352474 ("mm: report per-page metadata information")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>>> Acked-by: Oscar Salvador <osalvador@suse.de>
>>> ---
>>> v6 -> v7:
>>> - Refine the alignment assertions in section_nr_vmemmap_pages().
>>> ---
>>> mm/sparse-vmemmap.c | 34 ++++++++++++++++++++++++++++++----
>>> 1 file changed, 30 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>> index 3340f6d30b01..01f448607bad 100644
>>> --- a/mm/sparse-vmemmap.c
>>> +++ b/mm/sparse-vmemmap.c
>>> @@ -652,6 +652,31 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>>> }
>>> }
>>>
>>> +static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
>>> + 		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>>> +{
>>> + 	const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
>>> + 	const unsigned long pages_per_compound = 1UL << order;
>>> +
>>> + 	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION));
>>> +
>>> + 	if (!vmemmap_can_optimize(altmap, pgmap))
>>> + 		return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
>>> +
>>> + 	if (order < PFN_SECTION_SHIFT) {
>>> + 		VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound));
>>> + 		return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
>>> + 	}
>>> +
>>> + 	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION));
>>> + 	VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION);
>>
>> I would just have done that at the very top, as this check applies to all cases.
> 
> My initial reasoning was that the current formula holds for compound pages smaller
> than the section size, and we only need to impose limits when the page size exceeds
> it. While the current callers of section_nr_vmemmap_pages() don't pass sizes larger
> than a section, this will change in the future (see [1]).

A function that is called *section_* will get a range that exceeds a section?

That sounds conceptually wrong, no?


-- 
Cheers,

David


^ permalink raw reply

* Re: [PATCH 09/49] mm: panic on memory allocation failure in sparse_init_nid()
From: Muchun Song @ 2026-04-28  7:02 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Muchun Song, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <afBaMEBoR_35Fblc@kernel.org>



> On Apr 28, 2026, at 14:56, Mike Rapoport <rppt@kernel.org> wrote:
> 
> On Sun, Apr 05, 2026 at 08:52:00PM +0800, Muchun Song wrote:
>> When vmemmap pages allocation or usemap allocation fails, sparse_init_nid()
>> currently only marks the corresponding section as non-present. However,
>> subsequent code like memmap_init() iterating over PFNs does not check for
>> non-present sections, leading to invalid memory access (additional,
>> subsection_map_init() accessing the unallocated usemap as well).
>> 
>> It is complex to audit and fix all boot-time PFN iterators to handle these
>> partially initialized sections correctly. Since vmemmap and usemap allocation
>> failures are extremely rare during early boot, the more appropriate approach
>> is to expose the problem as early as possible.
>> 
>> Therefore, use BUG_ON() to panic immediately if allocation fails, instead of
>> attempting a partial recovery that leads to obscure crashes later.
>> 
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> 
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Thanks.

> 
>> ---
>> mm/sparse.c | 37 ++++++++-----------------------------
>> 1 file changed, 8 insertions(+), 29 deletions(-)
>> 
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index effdac6b0ab1..5c12b979a618 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -354,19 +354,15 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
>>    unsigned long map_count)
>> {
>> 	unsigned long pnum;
>> - 	struct page *map;
>> - 	struct mem_section *ms;
>> -
>> - 	if (sparse_usage_init(nid, map_count)) {
>> - 		pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
>> - 		goto failed;
>> - 	}
>> 
>> + 	if (sparse_usage_init(nid, map_count))
>> + 		panic("The node[%d] usemap allocation failed\n", nid);
> 
> Please consider using memblock_alloc_or_panic() in sparse_usage_init(), it
> would simplify the code even more.

Hi Mike,

Yes. I have several more updates for v2. Please hold off on reviewing
the current version to avoid wasting your time; I’ll send the new one
over shortly.

Thanks.

^ permalink raw reply

* Re: [PATCH 10/49] mm: move subsection_map_init() into sparse_init()
From: Mike Rapoport @ 2026-04-28  7:06 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <20260405125240.2558577-11-songmuchun@bytedance.com>

On Sun, Apr 05, 2026 at 08:52:01PM +0800, Muchun Song wrote:
> Move the initialization of the subsection map from free_area_init()
> into sparse_init(). This encapsulates the logic within the sparse
> memory initialization code.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/internal.h       |  5 ++---
>  mm/mm_init.c        | 10 ++--------
>  mm/sparse-vmemmap.c | 11 ++++++++++-
>  mm/sparse.c         |  1 +
>  4 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index edb1c04d0617..d70075d0e788 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1004,10 +1004,9 @@ static inline void sparse_init(void) {}
>   * mm/sparse-vmemmap.c
>   */
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> -void sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages);
> +void sparse_init_subsection_map(void);
>  #else
> -static inline void sparse_init_subsection_map(unsigned long pfn,
> -		unsigned long nr_pages)
> +static inline void sparse_init_subsection_map(void)
>  {
>  }
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */

I side note: we might want to split out mm/sparse.h and also move some
declarations from include/linux/mmzone.h there.

> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index a92c5053f63d..5ca4503e7622 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1857,18 +1857,12 @@ static void __init free_area_init(void)
>  			       (u64)zone_movable_pfn[i] << PAGE_SHIFT);
>  	}
>  
> -	/*
> -	 * Print out the early node map, and initialize the
> -	 * subsection-map relative to active online memory ranges to
> -	 * enable future "sub-section" extensions of the memory map.
> -	 */
> +	/* Print out the early node map. */
>  	pr_info("Early memory node ranges\n");
> -	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
> +	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
>  		pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
>  			(u64)start_pfn << PAGE_SHIFT,
>  			((u64)end_pfn << PAGE_SHIFT) - 1);
> -		sparse_init_subsection_map(start_pfn, end_pfn - start_pfn);
> -	}
>  
>  	/* Initialise every node */
>  	mminit_verify_pageflags_layout();
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 0ee03db0b22f..b7201c235419 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -603,7 +603,7 @@ static void subsection_mask_set(unsigned long *map, unsigned long pfn,
>  	bitmap_set(map, idx, end - idx + 1);
>  }
>  
> -void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +static void __init sparse_init_subsection_map_range(unsigned long pfn, unsigned long nr_pages)
>  {
>  	int end_sec_nr = pfn_to_section_nr(pfn + nr_pages - 1);
>  	unsigned long nr, start_sec_nr = pfn_to_section_nr(pfn);
> @@ -626,6 +626,15 @@ void __init sparse_init_subsection_map(unsigned long pfn, unsigned long nr_pages
>  	}
>  }
>  
> +void __init sparse_init_subsection_map(void)
> +{
> +	int i, nid;
> +	unsigned long start, end;
> +
> +	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid)
> +		sparse_init_subsection_map_range(start, end - start);
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  
>  /* Mark all memory sections within the pfn range as online */
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 5c12b979a618..c7f91dc2e5b5 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -424,5 +424,6 @@ void __init sparse_init(void)
>  	}
>  	/* cover the last node */
>  	sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
> +	sparse_init_subsection_map();
>  	vmemmap_populate_print_last();
>  }
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply

* Re: [PATCH 11/49] mm: defer sparse_init() until after zone initialization
From: Mike Rapoport @ 2026-04-28  7:15 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <20260405125240.2558577-12-songmuchun@bytedance.com>

On Sun, Apr 05, 2026 at 08:52:02PM +0800, Muchun Song wrote:
> According to the comment of free_area_init(), its main goal is to
> initialise all pg_data_t and zone data. However, sparse_init() and
> memmap_init() are aimed at allocating vmemmap pages and initializing
> struct page respectively, which differs from the goal of free_area_init().
> Therefore, it is reasonable to move them out of free_area_init().
> 
> Call sparse_init() after free_area_init() to guarantee that zone data
> structures are available when sparse_init() executes. This change is a
> prerequisite for integrating vmemmap initialization steps and allows
> sparse_init() to safely access zone information if needed (e.g. HVO case).
> 
> Also, move hugetlb reservation functions (hugetlb_cma_reserve() and
> hugetlb_bootmem_alloc()) to be after free_area_init(). This allows
> hugetlb reservation to access zone information to ensure that contiguous
> pages are not allocated across zone boundaries, which simplifies the
> hugetlb code. So this is a preparation for subsequent changes.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/mm_init.c | 15 ++++++++-------
>  mm/sparse.c  |  3 ---
>  2 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index c7f91dc2e5b5..5fe0a7e66775 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -406,9 +406,6 @@ void __init sparse_init(void)
>  	pnum_begin = first_present_section_nr();
>  	nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
>  
> -	/* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
> -	set_pageblock_order();
> -

This does not seem related to this patch. Otherwise

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

>  	for_each_present_section_nr(pnum_begin + 1, pnum_end) {
>  		int nid = sparse_early_nid(__nr_to_section(pnum_end));
>  
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply

* Re: [PATCH 12/49] mm: make set_pageblock_order() static
From: Mike Rapoport @ 2026-04-28  7:17 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <20260405125240.2558577-13-songmuchun@bytedance.com>

On Sun, Apr 05, 2026 at 08:52:03PM +0800, Muchun Song wrote:
> Since set_pageblock_order() is only used in mm/mm_init.c now, make it
> static and remove its declaration from mm/internal.h.
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

The removal of set_pageblock_order() from sparse.c should be moved here as
well :)
With that

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/internal.h | 1 -
>  mm/mm_init.c  | 4 ++--
>  2 files changed, 2 insertions(+), 3 deletions(-)

-- 
Sincerely yours,
Mike.


^ permalink raw reply

* Re: [PATCH 13/49] mm: integrate sparse_vmemmap_init_nid_late() into sparse_init_nid()
From: Mike Rapoport @ 2026-04-28  7:18 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <20260405125240.2558577-14-songmuchun@bytedance.com>

On Sun, Apr 05, 2026 at 08:52:04PM +0800, Muchun Song wrote:
> Move the call to sparse_vmemmap_init_nid_late() from mm_core_init_early()
> into sparse_init_nid().
> 
> Since sparse_init() has been deferred until after zone initialization,
> the zone data structures are now available during sparse_init(). This
> satisfies the requirements of sparse_vmemmap_init_nid_late(), allowing
> it to be moved safely.
> 
> This change unifies the vmemmap initialization steps by placing both
> sparse_vmemmap_init_nid_early() and sparse_vmemmap_init_nid_late()
> within the sparse memory initialization logic, making the code structure
> clearer.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/mm_init.c | 4 ----
>  mm/sparse.c  | 2 ++
>  2 files changed, 2 insertions(+), 4 deletions(-)

-- 
Sincerely yours,
Mike.


^ permalink raw reply

* Re: [PATCH v7 4/6] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
From: Muchun Song @ 2026-04-28  7:24 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Muchun Song, Andrew Morton, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
	linux-mm, linuxppc-dev, linux-kernel, stable
In-Reply-To: <5dd84f9c-4ce2-4bc8-b644-e865f0623ba3@kernel.org>



> On Apr 28, 2026, at 15:00, David Hildenbrand (Arm) <david@kernel.org> wrote:
> 
> On 4/28/26 04:21, Muchun Song wrote:
>> 
>> 
>>> On Apr 27, 2026, at 18:17, David Hildenbrand (Arm) <david@kernel.org> wrote:
>>> 
>>> On 4/26/26 11:26, Muchun Song wrote:
>>>> When vmemmap optimization is enabled for DAX, the nr_memmap_pages
>>>> counter in /proc/vmstat is incorrect. The current code always accounts
>>>> for the full, non-optimized vmemmap size, but vmemmap optimization
>>>> reduces the actual number of vmemmap pages by reusing tail pages. This
>>>> causes the system to overcount vmemmap usage, leading to inaccurate
>>>> page statistics in /proc/vmstat.
>>>> 
>>>> Fix this by introducing section_nr_vmemmap_pages(), which returns the exact
>>>> vmemmap page count for a given pfn range based on whether optimization
>>>> is in effect.
>>>> 
>>>> Fixes: 15995a352474 ("mm: report per-page metadata information")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>>>> Acked-by: Oscar Salvador <osalvador@suse.de>
>>>> ---
>>>> v6 -> v7:
>>>> - Refine the alignment assertions in section_nr_vmemmap_pages().
>>>> ---
>>>> mm/sparse-vmemmap.c | 34 ++++++++++++++++++++++++++++++----
>>>> 1 file changed, 30 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>>> index 3340f6d30b01..01f448607bad 100644
>>>> --- a/mm/sparse-vmemmap.c
>>>> +++ b/mm/sparse-vmemmap.c
>>>> @@ -652,6 +652,31 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>>>> }
>>>> }
>>>> 
>>>> +static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
>>>> +  struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>>>> +{
>>>> +  const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
>>>> +  const unsigned long pages_per_compound = 1UL << order;
>>>> +
>>>> +  VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION));
>>>> +
>>>> +  if (!vmemmap_can_optimize(altmap, pgmap))
>>>> +  return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
>>>> +
>>>> +  if (order < PFN_SECTION_SHIFT) {
>>>> +  VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound));
>>>> +  return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
>>>> +  }
>>>> +
>>>> +  VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION));
>>>> +  VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION);
>>> 
>>> I would just have done that at the very top, as this check applies to all cases.
>> 
>> My initial reasoning was that the current formula holds for compound pages smaller
>> than the section size, and we only need to impose limits when the page size exceeds
>> it. While the current callers of section_nr_vmemmap_pages() don't pass sizes larger
>> than a section, this will change in the future (see [1]).
> 
> A function that is called *section_* will get a range that exceeds a section?
> 
> That sounds conceptually wrong, no?

It does seem a bit ambiguous. I will rename it to something more appropriate if I
expand its functionality in the future. For this series, I will update a v8 to move
VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION); to the top of this function.

Thanks.

> 
> 
> -- 
> Cheers,
> 
> David




^ permalink raw reply

* Re: [PATCH 14/49] mm/cma: validate hugetlb CMA range by zone at reserve time
From: Mike Rapoport @ 2026-04-28  7:30 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <20260405125240.2558577-15-songmuchun@bytedance.com>

On Sun, Apr 05, 2026 at 08:52:05PM +0800, Muchun Song wrote:
> During hugetlb_cma_reserve() we already have access to zone information, so we
> can validate that the reserved CMA range does not span multiple zones.
> 
> Doing this check up front allows future hugetlb allocations from CMA to assume
> zone-valid CMA areas, avoiding additional validity checks and potential
> fallback/rollback paths, greatly simplifying the code.
> 
> The pfn_valid() check is removed from cma_validate_zones() because mem_section is
> not initialized at that stage and it can trigger false warnings; keep the
> sanity check in cma_activate_area() instead. This is preparatory work for the
> follow-up simplification.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/cma.c         | 3 ++-
>  mm/hugetlb_cma.c | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/cma.c b/mm/cma.c
> index 15cc0ae76c8e..dd046a23f467 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -125,7 +125,6 @@ bool cma_validate_zones(struct cma *cma)
>  		 * to be in the same zone. Simplify by forcing the entire
>  		 * CMA resv range to be in the same zone.
>  		 */
> -		WARN_ON_ONCE(!pfn_valid(base_pfn));
>  		if (pfn_range_intersects_zones(cma->nid, base_pfn, cmr->count)) {
>  			set_bit(CMA_ZONES_INVALID, &cma->flags);
>  			return false;
> @@ -164,6 +163,8 @@ static void __init cma_activate_area(struct cma *cma)
>  			bitmap_set(cmr->bitmap, 0, bitmap_count);
>  		}
>  
> +		WARN_ON_ONCE(!pfn_valid(cmr->base_pfn));
> +
>  		for (pfn = early_pfn[r]; pfn < cmr->base_pfn + cmr->count;
>  		     pfn += pageblock_nr_pages)
>  			init_cma_reserved_pageblock(pfn_to_page(pfn));
> diff --git a/mm/hugetlb_cma.c b/mm/hugetlb_cma.c
> index f83ae4998990..b068e9bf6537 100644
> --- a/mm/hugetlb_cma.c
> +++ b/mm/hugetlb_cma.c
> @@ -233,9 +233,10 @@ void __init hugetlb_cma_reserve(void)
>  		res = cma_declare_contiguous_multi(size, PAGE_SIZE << order,
>  					HUGETLB_PAGE_ORDER, name,
>  					&hugetlb_cma[nid], nid);
> -		if (res) {
> +		if (res || !cma_validate_zones(hugetlb_cma[nid])) {
>  			pr_warn("hugetlb_cma: reservation failed: err %d, node %d",
>  				res, nid);

The warning here should be updated as well. Other than that

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> +			hugetlb_cma[nid] = NULL;
>  			continue;
>  		}
>  
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply

* Re: [PATCH 08/49] mm: Convert vmemmap_p?d_populate() to static functions
From: Mike Rapoport @ 2026-04-28  7:31 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel, Chengkaitao
In-Reply-To: <20260405125240.2558577-9-songmuchun@bytedance.com>

On Sun, Apr 05, 2026 at 08:51:59PM +0800, Muchun Song wrote:
> From: Chengkaitao <chengkaitao@kylinos.cn>
> 
> Since the vmemmap_p?d_populate functions are unused outside the mm
> subsystem, we can remove their external declarations and convert
> them to static functions.

I think this is already merged
 
> Signed-off-by: Chengkaitao <chengkaitao@kylinos.cn>
> ---
>  include/linux/mm.h  |  7 -------
>  mm/sparse-vmemmap.c | 10 +++++-----
>  2 files changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bebc5f892f81..aa8c05de7585 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4860,13 +4860,6 @@ unsigned long section_map_size(void);
>  struct page * __populate_section_memmap(unsigned long pfn,
>  		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>  		struct dev_pagemap *pgmap);
> -pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
> -p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
> -pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
> -pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
> -pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
> -			    struct vmem_altmap *altmap, unsigned long ptpfn,
> -			    unsigned long flags);
>  void *vmemmap_alloc_block(unsigned long size, int node);
>  struct vmem_altmap;
>  void *vmemmap_alloc_block_buf(unsigned long size, int node,
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index d3096de04cc6..0ee03db0b22f 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -151,7 +151,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>  			start, end - 1);
>  }
>  
> -pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
> +static pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
>  				       struct vmem_altmap *altmap,
>  				       unsigned long ptpfn, unsigned long flags)
>  {
> @@ -195,7 +195,7 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
>  	return p;
>  }
>  
> -pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> +static pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>  {
>  	pmd_t *pmd = pmd_offset(pud, addr);
>  	if (pmd_none(*pmd)) {
> @@ -208,7 +208,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>  	return pmd;
>  }
>  
> -pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node)
> +static pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node)
>  {
>  	pud_t *pud = pud_offset(p4d, addr);
>  	if (pud_none(*pud)) {
> @@ -221,7 +221,7 @@ pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node)
>  	return pud;
>  }
>  
> -p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node)
> +static p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node)
>  {
>  	p4d_t *p4d = p4d_offset(pgd, addr);
>  	if (p4d_none(*p4d)) {
> @@ -234,7 +234,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node)
>  	return p4d;
>  }
>  
> -pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
> +static pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
>  {
>  	pgd_t *pgd = pgd_offset_k(addr);
>  	if (pgd_none(*pgd)) {
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply

* Re: [PATCH 09/49] mm: panic on memory allocation failure in sparse_init_nid()
From: Mike Rapoport @ 2026-04-28  7:32 UTC (permalink / raw)
  To: Muchun Song
  Cc: Muchun Song, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <96DB49C7-A7C2-4F53-8321-FF4A4ECDFF95@linux.dev>

Hi Muchun,

On Tue, Apr 28, 2026 at 03:02:14PM +0800, Muchun Song wrote:
> 
> >> diff --git a/mm/sparse.c b/mm/sparse.c
> >> index effdac6b0ab1..5c12b979a618 100644
> >> --- a/mm/sparse.c
> >> +++ b/mm/sparse.c
> >> @@ -354,19 +354,15 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
> >>    unsigned long map_count)
> >> {
> >> 	unsigned long pnum;
> >> - 	struct page *map;
> >> - 	struct mem_section *ms;
> >> -
> >> - 	if (sparse_usage_init(nid, map_count)) {
> >> - 		pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
> >> - 		goto failed;
> >> - 	}
> >> 
> >> + 	if (sparse_usage_init(nid, map_count))
> >> + 		panic("The node[%d] usemap allocation failed\n", nid);
> > 
> > Please consider using memblock_alloc_or_panic() in sparse_usage_init(), it
> > would simplify the code even more.
> 
> Hi Mike,
> 
> Yes. I have several more updates for v2. Please hold off on reviewing
> the current version to avoid wasting your time; I’ll send the new one
> over shortly.

Thanks for the heads up!
I'll stop for now :)
 
> Thanks.

-- 
Sincerely yours,
Mike.


^ permalink raw reply

* Re: [PATCH 09/49] mm: panic on memory allocation failure in sparse_init_nid()
From: Muchun Song @ 2026-04-28  7:57 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Muchun Song, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <afBilACQByFLa-hi@kernel.org>



> On Apr 28, 2026, at 15:32, Mike Rapoport <rppt@kernel.org> wrote:
> 
> Hi Muchun,
> 
> On Tue, Apr 28, 2026 at 03:02:14PM +0800, Muchun Song wrote:
>> 
>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>> index effdac6b0ab1..5c12b979a618 100644
>>>> --- a/mm/sparse.c
>>>> +++ b/mm/sparse.c
>>>> @@ -354,19 +354,15 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
>>>>   unsigned long map_count)
>>>> {
>>>> unsigned long pnum;
>>>> -  struct page *map;
>>>> -  struct mem_section *ms;
>>>> -
>>>> -  if (sparse_usage_init(nid, map_count)) {
>>>> -  pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
>>>> -  goto failed;
>>>> -  }
>>>> 
>>>> +  if (sparse_usage_init(nid, map_count))
>>>> +  panic("The node[%d] usemap allocation failed\n", nid);
>>> 
>>> Please consider using memblock_alloc_or_panic() in sparse_usage_init(), it
>>> would simplify the code even more.
>> 
>> Hi Mike,
>> 
>> Yes. I have several more updates for v2. Please hold off on reviewing
>> the current version to avoid wasting your time; I’ll send the new one
>> over shortly.
> 
> Thanks for the heads up!
> I'll stop for now :)

Thanks for the quick response!

To clarify, the first few patches didn't change much, so your feedback
on those is still very relevant and much appreciated. The major updates
are in the later parts of the series, so I'm glad I caught you before
you spent time on those.

I’ll get the new version shortly.

Thanks again!

> 
>> Thanks.
> 
> -- 
> Sincerely yours,
> Mike.




^ permalink raw reply

* Re: [PATCH] KVM: PPC: Book3S HV: Add H_FAC_UNAVAIL mapping for tracing exits
From: Gautam Menghani @ 2026-04-28  8:02 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Gautam Menghani, maddy, npiggin, mpe, chleroy, linuxppc-dev, kvm,
	linux-kernel
In-Reply-To: <lde9s0qx.ritesh.list@gmail.com>

On Mon, Apr 27, 2026 at 06:00:46AM +0530, Ritesh Harjani wrote:
> Gautam Menghani <Gautam.Menghani@ibm.com> writes:
> 
> > From: Gautam Menghani <gautam@linux.ibm.com>
> >
> > The macro kvm_trace_symbol_exit is used for providing the mappings
> > for the trap vectors and their names. Add mapping for H_FAC_UNAVAIL so that
> > trap reason is displayed as string instead of a vector number when using
> > the kvm_guest_exit tracepoint.
> >
> > Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
> > ---
> >  arch/powerpc/kvm/trace_book3s.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kvm/trace_book3s.h b/arch/powerpc/kvm/trace_book3s.h
> > index 9260ddbd557f..eafbaeb5a9db 100644
> > --- a/arch/powerpc/kvm/trace_book3s.h
> > +++ b/arch/powerpc/kvm/trace_book3s.h
> > @@ -28,6 +28,7 @@
> >  	{0xea0, "H_VIRT"}, \
> >  	{0xf00, "PERFMON"}, \
> >  	{0xf20, "ALTIVEC"}, \
> > -	{0xf40, "VSX"}
> > +	{0xf40, "VSX"}, \
> > +	{0xf80, "H_FAC_UNAVAIL"},
> 
> So I ended up looking into other trace_symbols too and I don't think we
> should have a trailing comma here. It may cause some issues silently in
> trace-cmd or other places. I remember going through it when I was doing
> some other tracing work earlier but I don't recollect it now.
> 
> Can we just remove the trailing comma and send a v2 please?

Good catch, thanks. Will send a v2


> 
> btw, 0XF80 is correct as per "arch/powerpc/include/asm/kvm_asm.h"
> 
> #define BOOK3S_INTERRUPT_H_FAC_UNAVAIL	0xf80
> 
> -ritesh
> 


^ permalink raw reply

* [PATCH v8 0/6] mm: Fix vmemmap optimization accounting and initialization
From: Muchun Song @ 2026-04-28  8:18 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

The series fixes several bugs in vmemmap optimization, mainly around
incorrect page accounting and memmap initialization in DAX and memory
hotplug paths. It also fixes pageblock migratetype initialization and
struct page initialization for ZONE_DEVICE compound pages.

Patches 1-4 fix vmemmap accounting issues. Patch 1 fixes an accounting
underflow in the section activation failure path by moving vmemmap page
accounting into the lower-level allocation and freeing helpers. Patch 2
fixes incorrect altmap passing in the memory hotplug error path. Patch 3
passes pgmap through memory deactivation paths so the teardown side can
determine whether vmemmap optimization was in effect. Patch 4 uses that
information to account the optimized DAX vmemmap size correctly.

Patches 5-6 fix initialization issues in mm/mm_init. One makes sure all
pageblocks in ZONE_DEVICE compound pages get their migratetype
initialized. The other fixes a case where DAX memory hotplug reuses an
unoptimized early-section memmap while compound_nr_pages() still assumes
vmemmap optimization, leaving tail struct pages uninitialized.

Changes in v8:
- In patch 4, move VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION) to the
  top of section_nr_vmemmap_pages().
- In patch 4, add Acked-by from David Hildenbrand.

Muchun Song (6):
  mm/sparse-vmemmap: Fix vmemmap accounting underflow
  mm/memory_hotplug: Fix incorrect altmap passing in error path
  mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
  mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
  mm/mm_init: Fix uninitialized struct pages for ZONE_DEVICE

 arch/arm64/mm/mmu.c            |  5 +--
 arch/loongarch/mm/init.c       |  5 +--
 arch/powerpc/mm/mem.c          |  5 +--
 arch/riscv/mm/init.c           |  5 +--
 arch/s390/mm/init.c            |  5 +--
 arch/x86/mm/init_64.c          |  5 +--
 include/linux/memory_hotplug.h |  8 +++--
 mm/memory_hotplug.c            | 13 ++++----
 mm/memremap.c                  |  4 +--
 mm/mm_init.c                   | 47 ++++++++++++++++-----------
 mm/sparse-vmemmap.c            | 58 ++++++++++++++++++++++++++--------
 11 files changed, 105 insertions(+), 55 deletions(-)

base-commit: 39704f00f747aba3144289870b5fd8ac230a9aaf
-- 
2.20.1

^ permalink raw reply

* [PATCH v8 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow
From: Muchun Song @ 2026-04-28  8:18 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable
In-Reply-To: <20260428081855.1249045-1-songmuchun@bytedance.com>

In section_activate(), if populate_section_memmap() fails, the error
handling path calls section_deactivate() to roll back the state. This
causes a vmemmap accounting imbalance.

Since commit c3576889d87b ("mm: fix accounting of memmap pages"),
memmap pages are accounted for only after populate_section_memmap()
succeeds. However, the failure path unconditionally calls
section_deactivate(), which decreases the vmemmap count. Consequently,
a failure in populate_section_memmap() leads to an accounting underflow,
incorrectly reducing the system's tracked vmemmap usage.

Fix this more thoroughly by moving all accounting calls into the lower
level functions that actually perform the vmemmap allocation and freeing:

  - populate_section_memmap() accounts for newly allocated vmemmap pages
  - depopulate_section_memmap() unaccounts when vmemmap is freed

This ensures proper accounting in all code paths, including error
handling and early section cases.

Fixes: c3576889d87b ("mm: fix accounting of memmap pages")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/sparse-vmemmap.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 6eadb9d116e4..a7b11248b989 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -656,7 +656,12 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
 {
-	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
+	struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
+						      pgmap);
+
+	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+
+	return page;
 }
 
 static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
@@ -665,13 +670,17 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
 
+	memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
 	vmemmap_free(start, end, altmap);
 }
+
 static void free_map_bootmem(struct page *memmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
 
+	memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
+						  PAGE_SIZE)));
 	vmemmap_free(start, end, NULL);
 }
 
@@ -774,14 +783,10 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	 * The memmap of early sections is always fully populated. See
 	 * section_activate() and pfn_valid() .
 	 */
-	if (!section_is_early) {
-		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+	if (!section_is_early)
 		depopulate_section_memmap(pfn, nr_pages, altmap);
-	} else if (memmap) {
-		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
-							  PAGE_SIZE)));
+	else if (memmap)
 		free_map_bootmem(memmap);
-	}
 
 	if (empty)
 		ms->section_mem_map = (unsigned long)NULL;
@@ -826,7 +831,6 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
 		section_deactivate(pfn, nr_pages, altmap);
 		return ERR_PTR(-ENOMEM);
 	}
-	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
 
 	return memmap;
 }
-- 
2.20.1



^ permalink raw reply related

* [PATCH v8 2/6] mm/memory_hotplug: Fix incorrect altmap passing in error path
From: Muchun Song @ 2026-04-28  8:18 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable
In-Reply-To: <20260428081855.1249045-1-songmuchun@bytedance.com>

In create_altmaps_and_memory_blocks(), when arch_add_memory() succeeds
with memmap_on_memory enabled, the vmemmap pages are allocated from
params.altmap. If create_memory_block_devices() subsequently fails, the
error path calls arch_remove_memory() with a NULL altmap instead of
params.altmap.

This is a bug that could lead to memory corruption. Since altmap is
NULL, vmemmap_free() falls back to freeing the vmemmap pages into the
system buddy allocator via free_pages() instead of the altmap.
arch_remove_memory() then immediately destroys the physical linear
mapping for this memory. This injects unowned pages into the buddy
allocator, causing machine checks or memory corruption if the system
later attempts to allocate and use those freed pages.

Fix this by passing params.altmap to arch_remove_memory() in the error
path.

Fixes: 6b8f0798b85a ("mm/memory_hotplug: split memmap_on_memory requests across memblocks")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/memory_hotplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4426abb05655..e3352284f635 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1469,7 +1469,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
 		ret = create_memory_block_devices(cur_start, memblock_size, nid,
 						  params.altmap, group);
 		if (ret) {
-			arch_remove_memory(cur_start, memblock_size, NULL);
+			arch_remove_memory(cur_start, memblock_size, params.altmap);
 			kfree(params.altmap);
 			goto out;
 		}
-- 
2.20.1

^ permalink raw reply related

* [PATCH v8 3/6] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
From: Muchun Song @ 2026-04-28  8:18 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song
In-Reply-To: <20260428081855.1249045-1-songmuchun@bytedance.com>

Currently, the memory hot-remove call chain -- arch_remove_memory(),
__remove_pages(), sparse_remove_section() and section_deactivate() --
does not carry the struct dev_pagemap pointer. This prevents the lower
levels from knowing whether the section was originally populated with
vmemmap optimizations (e.g., DAX with vmemmap optimization enabled).

Without this information, we cannot call vmemmap_can_optimize() to
determine if the vmemmap pages were optimized. As a result, the vmemmap
page accounting during teardown will mistakenly assume a non-optimized
allocation, leading to incorrect memmap statistics.

To lay the groundwork for fixing the vmemmap page accounting, we need
to pass the @pgmap pointer down to the deactivation location. Plumb the
@pgmap argument through the APIs of arch_remove_memory(), __remove_pages()
and sparse_remove_section(), mirroring the corresponding *_activate()
paths.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 arch/arm64/mm/mmu.c            |  5 +++--
 arch/loongarch/mm/init.c       |  5 +++--
 arch/powerpc/mm/mem.c          |  5 +++--
 arch/riscv/mm/init.c           |  5 +++--
 arch/s390/mm/init.c            |  5 +++--
 arch/x86/mm/init_64.c          |  5 +++--
 include/linux/memory_hotplug.h |  8 +++++---
 mm/memory_hotplug.c            | 13 +++++++------
 mm/memremap.c                  |  4 ++--
 mm/sparse-vmemmap.c            | 12 ++++++------
 10 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index dd85e093ffdb..e5a42b7a0160 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -2024,12 +2024,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return ret;
 }
 
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
 }
 
diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
index 3f9ab54114c5..055ecd2c8fd9 100644
--- a/arch/loongarch/mm/init.c
+++ b/arch/loongarch/mm/init.c
@@ -119,7 +119,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params)
 	return ret;
 }
 
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -128,7 +129,7 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 	/* With altmap the first mapped page is offset from @start */
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 }
 #endif
 
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 648d0c5602ec..4c1afab91996 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -158,12 +158,13 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 	return rc;
 }
 
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			      struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	arch_remove_linear_mapping(start, size);
 }
 #endif
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index decd7df40fa4..b0092fb842a3 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1717,9 +1717,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *param
 	return ret;
 }
 
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			      struct dev_pagemap *pgmap)
 {
-	__remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap);
+	__remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap, pgmap);
 	remove_linear_mapping(start, size);
 	flush_tlb_all();
 }
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 1f72efc2a579..11a689423440 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -276,12 +276,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return rc;
 }
 
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	vmem_remove_mapping(start, size);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index df2261fa4f98..77b889b71cf3 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1288,12 +1288,13 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 	remove_pagetable(start, end, true, NULL);
 }
 
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			      struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	kernel_physical_mapping_remove(start, start + size);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 815e908c4135..7c9d66729c60 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -135,9 +135,10 @@ static inline bool movable_node_is_enabled(void)
 	return movable_node_enabled;
 }
 
-extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap);
+extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			       struct dev_pagemap *pgmap);
 extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages,
-			   struct vmem_altmap *altmap);
+			   struct vmem_altmap *altmap, struct dev_pagemap *pgmap);
 
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
@@ -307,7 +308,8 @@ extern int sparse_add_section(int nid, unsigned long pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap);
 extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
-				  struct vmem_altmap *altmap);
+				  struct vmem_altmap *altmap,
+				  struct dev_pagemap *pgmap);
 extern struct zone *zone_for_pfn_range(enum mmop online_type,
 		int nid, struct memory_group *group, unsigned long start_pfn,
 		unsigned long nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e3352284f635..fbb9cae10a8a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -576,6 +576,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
  * @pfn: starting pageframe (must be aligned to start of a section)
  * @nr_pages: number of pages to remove (must be multiple of section size)
  * @altmap: alternative device page map or %NULL if default memmap is used
+ * @pgmap: device page map or %NULL if not ZONE_DEVICE
  *
  * Generic helper function to remove section mappings and sysfs entries
  * for the section of the memory we are removing. Caller needs to make
@@ -583,7 +584,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
  * calling offline_pages().
  */
 void __remove_pages(unsigned long pfn, unsigned long nr_pages,
-		    struct vmem_altmap *altmap)
+		    struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	const unsigned long end_pfn = pfn + nr_pages;
 	unsigned long cur_nr_pages;
@@ -598,7 +599,7 @@ void __remove_pages(unsigned long pfn, unsigned long nr_pages,
 		/* Select all remaining pages up to the next section boundary */
 		cur_nr_pages = min(end_pfn - pfn,
 				   SECTION_ALIGN_UP(pfn + 1) - pfn);
-		sparse_remove_section(pfn, cur_nr_pages, altmap);
+		sparse_remove_section(pfn, cur_nr_pages, altmap, pgmap);
 	}
 }
 
@@ -1426,7 +1427,7 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
 
 		remove_memory_block_devices(cur_start, memblock_size);
 
-		arch_remove_memory(cur_start, memblock_size, altmap);
+		arch_remove_memory(cur_start, memblock_size, altmap, NULL);
 
 		/* Verify that all vmemmap pages have actually been freed. */
 		WARN(altmap->alloc, "Altmap not fully unmapped");
@@ -1469,7 +1470,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
 		ret = create_memory_block_devices(cur_start, memblock_size, nid,
 						  params.altmap, group);
 		if (ret) {
-			arch_remove_memory(cur_start, memblock_size, params.altmap);
+			arch_remove_memory(cur_start, memblock_size, params.altmap, NULL);
 			kfree(params.altmap);
 			goto out;
 		}
@@ -1555,7 +1556,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		/* create memory block devices after memory was added */
 		ret = create_memory_block_devices(start, size, nid, NULL, group);
 		if (ret) {
-			arch_remove_memory(start, size, params.altmap);
+			arch_remove_memory(start, size, params.altmap, NULL);
 			goto error;
 		}
 	}
@@ -2267,7 +2268,7 @@ static int try_remove_memory(u64 start, u64 size)
 		 * No altmaps present, do the removal directly
 		 */
 		remove_memory_block_devices(start, size);
-		arch_remove_memory(start, size, NULL);
+		arch_remove_memory(start, size, NULL, NULL);
 	} else {
 		/* all memblocks in the range have altmaps */
 		remove_memory_blocks_and_altmaps(start, size);
diff --git a/mm/memremap.c b/mm/memremap.c
index 053842d45cb1..81766d822400 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -97,10 +97,10 @@ static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
 				   PHYS_PFN(range_len(range)));
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		__remove_pages(PHYS_PFN(range->start),
-			       PHYS_PFN(range_len(range)), NULL);
+			       PHYS_PFN(range_len(range)), NULL, pgmap);
 	} else {
 		arch_remove_memory(range->start, range_len(range),
-				pgmap_altmap(pgmap));
+				pgmap_altmap(pgmap), pgmap);
 		kasan_remove_zero_shadow(__va(range->start), range_len(range));
 	}
 	mem_hotplug_done();
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a7b11248b989..3340f6d30b01 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -665,7 +665,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
 }
 
 static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
@@ -746,7 +746,7 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
  * usage map, but still need to free the vmemmap range.
  */
 static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	struct mem_section *ms = __pfn_to_section(pfn);
 	bool section_is_early = early_section(ms);
@@ -784,7 +784,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	 * section_activate() and pfn_valid() .
 	 */
 	if (!section_is_early)
-		depopulate_section_memmap(pfn, nr_pages, altmap);
+		depopulate_section_memmap(pfn, nr_pages, altmap, pgmap);
 	else if (memmap)
 		free_map_bootmem(memmap);
 
@@ -828,7 +828,7 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
 
 	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
 	if (!memmap) {
-		section_deactivate(pfn, nr_pages, altmap);
+		section_deactivate(pfn, nr_pages, altmap, pgmap);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -889,13 +889,13 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 }
 
 void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
-			   struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	struct mem_section *ms = __pfn_to_section(pfn);
 
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
-	section_deactivate(pfn, nr_pages, altmap);
+	section_deactivate(pfn, nr_pages, altmap, pgmap);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.20.1



^ permalink raw reply related

* [PATCH v8 4/6] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
From: Muchun Song @ 2026-04-28  8:18 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable
In-Reply-To: <20260428081855.1249045-1-songmuchun@bytedance.com>

When vmemmap optimization is enabled for DAX, the nr_memmap_pages
counter in /proc/vmstat is incorrect. The current code always accounts
for the full, non-optimized vmemmap size, but vmemmap optimization
reduces the actual number of vmemmap pages by reusing tail pages. This
causes the system to overcount vmemmap usage, leading to inaccurate
page statistics in /proc/vmstat.

Fix this by introducing section_nr_vmemmap_pages(), which returns the exact
vmemmap page count for a given pfn range based on whether optimization
is in effect.

Fixes: 15995a352474 ("mm: report per-page metadata information")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
v7-v8:
- Move VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION); to the top of
  section_nr_vmemmap_pages().
- Add Acked-by from David.
---
 mm/sparse-vmemmap.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 3340f6d30b01..932082296e8d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -652,6 +652,31 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 	}
 }
 
+static int __meminit section_nr_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
+{
+	const unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
+	const unsigned long pages_per_compound = 1UL << order;
+
+	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SUBSECTION));
+	VM_WARN_ON_ONCE(nr_pages > PAGES_PER_SECTION);
+
+	if (!vmemmap_can_optimize(altmap, pgmap))
+		return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
+
+	if (order < PFN_SECTION_SHIFT) {
+		VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, pages_per_compound));
+		return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
+	}
+
+	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION));
+
+	if (IS_ALIGNED(pfn, pages_per_compound))
+		return VMEMMAP_RESERVE_NR;
+
+	return 0;
+}
+
 static struct page * __meminit populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
@@ -659,7 +684,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
 	struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
 						      pgmap);
 
-	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+	memmap_pages_add(section_nr_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
 
 	return page;
 }
@@ -670,7 +695,7 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
 
-	memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+	memmap_pages_add(-section_nr_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
 	vmemmap_free(start, end, altmap);
 }
 
@@ -678,9 +703,10 @@ static void free_map_bootmem(struct page *memmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+	unsigned long pfn = page_to_pfn(memmap);
 
-	memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
-						  PAGE_SIZE)));
+	memmap_boot_pages_add(-section_nr_vmemmap_pages(pfn, PAGES_PER_SECTION,
+							NULL, NULL));
 	vmemmap_free(start, end, NULL);
 }
 
-- 
2.20.1



^ permalink raw reply related

* [PATCH v8 5/6] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
From: Muchun Song @ 2026-04-28  8:18 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable
In-Reply-To: <20260428081855.1249045-1-songmuchun@bytedance.com>

The memmap_init_zone_device() function only initializes the migratetype
of the first pageblock of a compound page. If the compound page size
exceeds pageblock_nr_pages (e.g., 1GB hugepages with 2MB pageblocks),
subsequent pageblocks in the compound page remain uninitialized.

Move the migratetype initialization out of __init_zone_device_page()
and into a separate pageblock_migratetype_init_range() function. This
iterates over the entire PFN range of the memory, ensuring that all
pageblocks are correctly initialized.

Also remove the stale confusing comment about MEMINIT_HOTPLUG above
the migratetype setting since it is an obsolete relic from commit
966cf44f637e ("mm: defer ZONE_DEVICE page initialization to the point
where we init pgmap") and no longer makes sense here.

Fixes: c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/mm_init.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index f9f8e1af921c..cfc76953e249 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -674,6 +674,20 @@ static inline void fixup_hashdist(void)
 static inline void fixup_hashdist(void) {}
 #endif /* CONFIG_NUMA */
 
+#ifdef CONFIG_ZONE_DEVICE
+static __meminit void pageblock_migratetype_init_range(unsigned long pfn,
+		unsigned long nr_pages, int migratetype)
+{
+	const unsigned long end = pfn + nr_pages;
+
+	for (pfn = pageblock_align(pfn); pfn < end; pfn += pageblock_nr_pages) {
+		init_pageblock_migratetype(pfn_to_page(pfn), migratetype, false);
+		if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
+			cond_resched();
+	}
+}
+#endif
+
 /*
  * Initialize a reserved page unconditionally, finding its zone first.
  */
@@ -1011,21 +1025,6 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	page_folio(page)->pgmap = pgmap;
 	page->zone_device_data = NULL;
 
-	/*
-	 * Mark the block movable so that blocks are reserved for
-	 * movable at startup. This will force kernel allocations
-	 * to reserve their blocks rather than leaking throughout
-	 * the address space during boot when many long-lived
-	 * kernel allocations are made.
-	 *
-	 * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
-	 * because this is done early in section_activate()
-	 */
-	if (pageblock_aligned(pfn)) {
-		init_pageblock_migratetype(page, MIGRATE_MOVABLE, false);
-		cond_resched();
-	}
-
 	/*
 	 * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released
 	 * directly to the driver page allocator which will set the page count
@@ -1122,6 +1121,9 @@ void __ref memmap_init_zone_device(struct zone *zone,
 
 		__init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
 
+		if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
+			cond_resched();
+
 		if (pfns_per_compound == 1)
 			continue;
 
@@ -1129,6 +1131,8 @@ void __ref memmap_init_zone_device(struct zone *zone,
 				     compound_nr_pages(altmap, pgmap));
 	}
 
+	pageblock_migratetype_init_range(start_pfn, nr_pages, MIGRATE_MOVABLE);
+
 	pr_debug("%s initialised %lu pages in %ums\n", __func__,
 		nr_pages, jiffies_to_msecs(jiffies - start));
 }
-- 
2.20.1



^ permalink raw reply related

* [PATCH v8 6/6] mm/mm_init: Fix uninitialized struct pages for ZONE_DEVICE
From: Muchun Song @ 2026-04-28  8:18 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song, stable
In-Reply-To: <20260428081855.1249045-1-songmuchun@bytedance.com>

If DAX memory is hotplugged into an unoccupied subsection of an early
section, section_activate() reuses the unoptimized boot memmap.
However, compound_nr_pages() still assumes that vmemmap optimization is
in effect and initializes only the reduced number of struct pages. As a
result, the remaining tail struct pages are left uninitialized, which
can later lead to unexpected behavior or crashes.

Fix this by treating early sections as unoptimized when calculating how
many struct pages to initialize.

Fixes: 6fd3620b3428 ("mm/page_alloc: reuse tail struct pages for compound devmaps")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 mm/mm_init.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index cfc76953e249..bd466a3c10c8 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1055,10 +1055,17 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
  * of how the sparse_vmemmap internals handle compound pages in the lack
  * of an altmap. See vmemmap_populate_compound_pages().
  */
-static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap,
+static inline unsigned long compound_nr_pages(unsigned long pfn,
+					      struct vmem_altmap *altmap,
 					      struct dev_pagemap *pgmap)
 {
-	if (!vmemmap_can_optimize(altmap, pgmap))
+	/*
+	 * If DAX memory is hot-plugged into an unoccupied subsection
+	 * of an early section, the unoptimized boot memmap is reused.
+	 * See section_activate().
+	 */
+	if (early_section(__pfn_to_section(pfn)) ||
+	    !vmemmap_can_optimize(altmap, pgmap))
 		return pgmap_vmemmap_nr(pgmap);
 
 	return VMEMMAP_RESERVE_NR * (PAGE_SIZE / sizeof(struct page));
@@ -1128,7 +1135,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
 			continue;
 
 		memmap_init_compound(page, pfn, zone_idx, nid, pgmap,
-				     compound_nr_pages(altmap, pgmap));
+				     compound_nr_pages(pfn, altmap, pgmap));
 	}
 
 	pageblock_migratetype_init_range(start_pfn, nr_pages, MIGRATE_MOVABLE);
-- 
2.20.1



^ permalink raw reply related

* [PATCH v2] KVM: PPC: Book3S HV: Add H_FAC_UNAVAIL mapping for tracing exits
From: Gautam Menghani @ 2026-04-28  8:45 UTC (permalink / raw)
  To: maddy, npiggin, mpe, chleroy
  Cc: Gautam Menghani, linuxppc-dev, kvm, linux-kernel

From: Gautam Menghani <gautam@linux.ibm.com>

The macro kvm_trace_symbol_exit is used for providing the mappings
for the trap vectors and their names. Add mapping for H_FAC_UNAVAIL so that
trap reason is displayed as string instead of a vector number when using
the kvm_guest_exit tracepoint.

Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
---
v2:
1. Remove the trailing comma after last element

 arch/powerpc/kvm/trace_book3s.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/trace_book3s.h b/arch/powerpc/kvm/trace_book3s.h
index 9260ddbd557f..5d272c115331 100644
--- a/arch/powerpc/kvm/trace_book3s.h
+++ b/arch/powerpc/kvm/trace_book3s.h
@@ -28,6 +28,7 @@
 	{0xea0, "H_VIRT"}, \
 	{0xf00, "PERFMON"}, \
 	{0xf20, "ALTIVEC"}, \
-	{0xf40, "VSX"}
+	{0xf40, "VSX"}, \
+	{0xf80, "H_FAC_UNAVAIL"}
 
 #endif
-- 
2.52.0



^ permalink raw reply related

* Re: [PATCH v2 3/3] ASoC: fsl: imx-rpmsg: Switch to core ignore-suspend-widgets support
From: Mark Brown @ 2026-04-28  8:48 UTC (permalink / raw)
  To: Chancel Liu
  Cc: lgirdwood, perex, tiwai, shengjiu.wang, Xiubo.Lee, festevam,
	nicoleotsuka, Frank.Li, s.hauer, kernel, shumingf, rander.wang,
	pierre-louis.bossart, linux-sound, linux-kernel, linuxppc-dev,
	imx, linux-arm-kernel
In-Reply-To: <20260415081942.4183108-4-chancel.liu@nxp.com>

[-- Attachment #1: Type: text/plain, Size: 596 bytes --]

On Wed, Apr 15, 2026 at 05:19:42PM +0900, Chancel Liu wrote:

> @@ -274,6 +257,15 @@ static int imx_rpmsg_probe(struct platform_device *pdev)
>  		}
>  	}
>  
> +	if (data->lpa && of_property_present(np, "ignore-suspend-widgets")) {
> +		ret = snd_soc_of_parse_ignore_suspend_widgets(&data->card,
> +							      "ignore-suspend-widgets");
> +		if (ret) {
> +			dev_err(&pdev->dev, "failed to parse ignore-suspend-widgets: %d\n", ret);
> +			return ret;
> +		}
> +	}
> +

The other error handling paths here have a goto fail to do cleanup of
the of_node in the platform device.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [RFC PATCH v1 4/9] uaccess: Introduce copy_{to/from}_user_partial()
From: Geert Uytterhoeven @ 2026-04-28  9:25 UTC (permalink / raw)
  To: Christophe Leroy (CS GROUP)
  Cc: Yury Norov, Andrew Morton, Linus Torvalds, David Laight,
	Thomas Gleixner, linux-alpha, linux-kernel, linux-snps-arc,
	linux-arm-kernel, linux-mips, linuxppc-dev, kvm, linux-riscv,
	linux-s390, sparclinux, linux-um, dmaengine, linux-efi, linux-fsi,
	amd-gfx, dri-devel, intel-gfx, linux-wpan, netdev, linux-wireless,
	linux-spi, linux-media, linux-staging, linux-serial, linux-usb,
	xen-devel, linux-fsdevel, ocfs2-devel, bpf, kasan-dev, linux-mm,
	linux-x25, rust-for-linux, linux-sound, sound-open-firmware,
	linux-csky, linux-hexagon, loongarch, linux-m68k, linux-openrisc,
	linux-parisc, linux-sh, linux-arch
In-Reply-To: <c73b90236f2810edd47c84edd2a8d8e8e0c816da.1777306795.git.chleroy@kernel.org>

Hi Christophe,

Thanks for your patch!

On Mon, 27 Apr 2026 at 19:18, Christophe Leroy (CS GROUP)
<chleroy@kernel.org> wrote:
> Today there are approximately 3000 calls for copy_to_user() and
> 3000 calls to copy_from_user().
>
> The majority of callers of copy_{to/from}_user() don't care about the
> return value, they only check whether it is 0 or not, and when it is
> not 0 they handle it as a -EACCES.

I think the "a" can be dropped.

> In order to allow better optimisation of copy_{to/from}_user() when
> the size of the copy is known at build time, create new fonctions

functions

> named copy_{to/from}_user_partial() to be used by the few callers
> that are interested in partial copies and need to now how many

know

> bytes remain at the end of the copy.
>
> For the time being it is just the same as copy_{to/from}_user().
>
> Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply

* Re: [RFC PATCH v1 2/9] uaccess: Convert INLINE_COPY_{TO/FROM}_USER to kconfig and reduce ifdefery
From: Andrew Cooper @ 2026-04-28  9:36 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Cooper, Christophe Leroy (CS GROUP), Andrew Morton,
	Linus Torvalds, David Laight, Thomas Gleixner, linux-alpha,
	Yury Norov, linux-kernel, linux-snps-arc, linux-arm-kernel,
	linux-mips, linuxppc-dev, kvm, linux-riscv, linux-s390,
	sparclinux, linux-um, dmaengine, linux-efi, linux-fsi, amd-gfx,
	dri-devel, intel-gfx, linux-wpan, netdev, linux-wireless,
	linux-spi, linux-media, linux-staging, linux-serial, linux-usb,
	xen-devel, linux-fsdevel, ocfs2-devel, bpf, kasan-dev, linux-mm,
	linux-x25, rust-for-linux, linux-sound, sound-open-firmware,
	linux-csky, linux-hexagon, loongarch, linux-m68k, linux-openrisc,
	linux-parisc, linux-sh, linux-arch
In-Reply-To: <ae_LeSk7XDEseaZb@yury>

On 27/04/2026 9:47 pm, Yury Norov wrote:
> On Mon, Apr 27, 2026 at 09:39:33PM +0100, Andrew Cooper wrote:
>> On 27/04/2026 7:39 pm, Yury Norov wrote:
>>> On Mon, Apr 27, 2026 at 07:13:43PM +0200, Christophe Leroy (CS GROUP) wrote:
>>>> Among the 21 architectures supported by the kernel, 16 define both
>>>> INLINE_COPY_TO_USER and INLINE_COPY_FROM_USER while the 5 other ones
>>>> don't define any of the two.
>>>>
>>>> To simplify and reduce risk of mistakes, convert them to a single
>>>> kconfig item named CONFIG_ARCH_WANTS_NOINLINE_COPY which will be
>>> We've got a special word for it: outline. Can you name it
>>> CONFIG_OUTLINE_USERCOPY, or similar?
>> You can't swap the "in" for "out" like this.  "out of line" is the
>> opposite of "inline" in this context, while "outline" means something
>> different and unrelated.
> Check KASAN_OUTLINE vs KASAN_INLINE for example

Then I suggest it gets corrected before more examples try to copy this
non-english.

~Andrew


^ permalink raw reply

* RE: Re: [PATCH v2 3/3] ASoC: fsl: imx-rpmsg: Switch to core ignore-suspend-widgets support
From: Chancel Liu @ 2026-04-28  9:53 UTC (permalink / raw)
  To: Mark Brown
  Cc: lgirdwood@gmail.com, perex@perex.cz, tiwai@suse.com,
	shengjiu.wang@gmail.com, Xiubo.Lee@gmail.com, festevam@gmail.com,
	nicoleotsuka@gmail.com, Frank Li, s.hauer@pengutronix.de,
	kernel@pengutronix.de, shumingf@realtek.com,
	rander.wang@linux.intel.com, pierre-louis.bossart@linux.dev,
	linux-sound@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <afB0WHueZ5AGF3Vi@sirena.co.uk>

> > @@ -274,6 +257,15 @@ static int imx_rpmsg_probe(struct platform_device
> *pdev)
> >  		}
> >  	}
> >
> > +	if (data->lpa && of_property_present(np, "ignore-suspend-widgets"))
> {
> > +		ret = snd_soc_of_parse_ignore_suspend_widgets(&data-
> >card,
> > +							      "ignore-suspend-
> widgets");
> > +		if (ret) {
> > +			dev_err(&pdev->dev, "failed to parse ignore-suspend-
> widgets: %d\n", ret);
> > +			return ret;
> > +		}
> > +	}
> > +
> 
> The other error handling paths here have a goto fail to do cleanup of the
> of_node in the platform device.

Will fix it in next version.

Regards, 
Chancel Liu


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox