* [PATCH v2] x86/mm: fix freeing of PMD-sized vmemmap pages
@ 2026-04-29 10:49 David Hildenbrand (Arm)
2026-04-29 15:29 ` Lance Yang
2026-05-08 9:19 ` David Hildenbrand (Arm)
0 siblings, 2 replies; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-29 10:49 UTC (permalink / raw)
To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
Mike Rapoport (Microsoft), Jason Gunthorpe, Lu Baolu,
Andrew Morton, Lu Baolu, Lance Yang
Cc: linux-kernel, linux-mm, stable, David Hildenbrand (Arm)
In commit bf9e4e30f353 ("x86/mm: use pagetable_free()"), we switched
from freeing non-boot page tables through __free_pages() to
pagetable_free().
However, the function is also called to free vmemmap pages.
Given that vmemmap pages are not page tables, already the page_ptdesc(page)
is wrong. But worse, pagetable_free() calls
__free_pages(page, compound_order(page));
As vmemmap pages are not compound pages (see vmemmap_alloc_block()) --
except for HVO, which doesn't apply here -- we will only free the first
page when freeing a PMD-sized vmemmap page, leaking the other ones.
Fix it by properly decoupling pagetable and vmemmap freeing.
free_pagetable() no longer has to mess with SECTION_INFO, as only the
vmemmap is marked like that in register_page_bootmem_memmap().
The indentation in remove_pmd_table() is messed up, let's fix that
while touching it.
Note that we'll try to get rid of that bootmem info handling soon. For
now, we'll handle it similar to free_pagetable(), just avoiding the
ifdef.
Tested-by: Lance Yang <lance.yang@linux.dev>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
Reproduced and tested with a simple VM with a virtio-mem device,
repeatedly adding and removing memory.
Found by code inspection while working on bootmem_info removal.
---
Changes in v2:
- Don't mess with the altmap with PTEs and add a comment why.
- Simplify "unsigned long nr_pages" handling.
- Link to v1: https://lore.kernel.org/r/20260428-vmemmap-v1-1-b2aa1e6db2c0@kernel.org
---
arch/x86/mm/init_64.c | 40 ++++++++++++++++++++++++++--------------
1 file changed, 26 insertions(+), 14 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index df2261fa4f98..7e20b22d658b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1014,7 +1014,7 @@ static void __meminit free_pagetable(struct page *page, int order)
#ifdef CONFIG_HAVE_BOOTMEM_INFO_NODE
enum bootmem_type type = bootmem_type(page);
- if (type == SECTION_INFO || type == MIX_SECTION_INFO) {
+ if (type == MIX_SECTION_INFO) {
while (nr_pages--)
put_page_bootmem(page++);
} else {
@@ -1028,13 +1028,24 @@ static void __meminit free_pagetable(struct page *page, int order)
}
}
-static void __meminit free_hugepage_table(struct page *page,
+static void __meminit free_vmemmap_pages(struct page *page, unsigned int order,
struct vmem_altmap *altmap)
{
- if (altmap)
- vmem_altmap_free(altmap, PMD_SIZE / PAGE_SIZE);
- else
- free_pagetable(page, get_order(PMD_SIZE));
+ unsigned long nr_pages = 1u << order;
+
+ if (altmap) {
+ vmem_altmap_free(altmap, nr_pages);
+ } else if (PageReserved(page)) {
+ if (IS_ENABLED(CONFIG_HAVE_BOOTMEM_INFO_NODE) &&
+ bootmem_type(page) == SECTION_INFO) {
+ while (nr_pages--)
+ put_page_bootmem(page++);
+ } else {
+ free_reserved_pages(page, nr_pages);
+ }
+ } else {
+ __free_pages(page, order);
+ }
}
static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
@@ -1118,7 +1129,8 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
return;
if (!direct)
- free_pagetable(pte_page(*pte), 0);
+ /* We never populate base pages from the altmap. */
+ free_vmemmap_pages(pte_page(*pte), 0, NULL);
spin_lock(&init_mm.page_table_lock);
pte_clear(&init_mm, addr, pte);
@@ -1153,19 +1165,19 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
if (IS_ALIGNED(addr, PMD_SIZE) &&
IS_ALIGNED(next, PMD_SIZE)) {
if (!direct)
- free_hugepage_table(pmd_page(*pmd),
- altmap);
+ free_vmemmap_pages(pmd_page(*pmd),
+ PMD_ORDER, altmap);
spin_lock(&init_mm.page_table_lock);
pmd_clear(pmd);
spin_unlock(&init_mm.page_table_lock);
pages++;
} else if (vmemmap_pmd_is_unused(addr, next)) {
- free_hugepage_table(pmd_page(*pmd),
- altmap);
- spin_lock(&init_mm.page_table_lock);
- pmd_clear(pmd);
- spin_unlock(&init_mm.page_table_lock);
+ free_vmemmap_pages(pmd_page(*pmd), PMD_ORDER,
+ altmap);
+ spin_lock(&init_mm.page_table_lock);
+ pmd_clear(pmd);
+ spin_unlock(&init_mm.page_table_lock);
}
continue;
}
---
base-commit: a2ddbfd1af0f54ea84bf17f0400088815d012e8d
change-id: 20260428-vmemmap-ab4b949aa727
--
Cheers,
David
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] x86/mm: fix freeing of PMD-sized vmemmap pages
2026-04-29 10:49 [PATCH v2] x86/mm: fix freeing of PMD-sized vmemmap pages David Hildenbrand (Arm)
@ 2026-04-29 15:29 ` Lance Yang
2026-05-08 9:19 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 5+ messages in thread
From: Lance Yang @ 2026-04-29 15:29 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Mike Rapoport (Microsoft), Dave Hansen, Borislav Petkov,
Jason Gunthorpe, Andy Lutomirski, linux-kernel, H. Peter Anvin,
Andrew Morton, Peter Zijlstra, Lu Baolu, linux-mm, stable, x86,
Thomas Gleixner, Ingo Molnar
On 2026/4/29 18:49, David Hildenbrand (Arm) wrote:
> In commit bf9e4e30f353 ("x86/mm: use pagetable_free()"), we switched
> from freeing non-boot page tables through __free_pages() to
> pagetable_free().
>
> However, the function is also called to free vmemmap pages.
>
> Given that vmemmap pages are not page tables, already the page_ptdesc(page)
> is wrong. But worse, pagetable_free() calls
>
> __free_pages(page, compound_order(page));
>
> As vmemmap pages are not compound pages (see vmemmap_alloc_block()) --
> except for HVO, which doesn't apply here -- we will only free the first
> page when freeing a PMD-sized vmemmap page, leaking the other ones.
>
> Fix it by properly decoupling pagetable and vmemmap freeing.
> free_pagetable() no longer has to mess with SECTION_INFO, as only the
> vmemmap is marked like that in register_page_bootmem_memmap().
>
> The indentation in remove_pmd_table() is messed up, let's fix that
> while touching it.
>
> Note that we'll try to get rid of that bootmem info handling soon. For
> now, we'll handle it similar to free_pagetable(), just avoiding the
> ifdef.
>
> Tested-by: Lance Yang <lance.yang@linux.dev>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
> Reproduced and tested with a simple VM with a virtio-mem device,
> repeatedly adding and removing memory.
>
> Found by code inspection while working on bootmem_info removal.
> ---
Retested. Works as expected :)
Cheers, Lance
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] x86/mm: fix freeing of PMD-sized vmemmap pages
2026-04-29 10:49 [PATCH v2] x86/mm: fix freeing of PMD-sized vmemmap pages David Hildenbrand (Arm)
2026-04-29 15:29 ` Lance Yang
@ 2026-05-08 9:19 ` David Hildenbrand (Arm)
2026-05-08 9:23 ` Peter Zijlstra
1 sibling, 1 reply; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-08 9:19 UTC (permalink / raw)
To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
Mike Rapoport (Microsoft), Jason Gunthorpe, Lu Baolu,
Andrew Morton, Lance Yang
Cc: linux-kernel, linux-mm, stable
On 4/29/26 12:49, David Hildenbrand (Arm) wrote:
> In commit bf9e4e30f353 ("x86/mm: use pagetable_free()"), we switched
> from freeing non-boot page tables through __free_pages() to
> pagetable_free().
>
> However, the function is also called to free vmemmap pages.
>
> Given that vmemmap pages are not page tables, already the page_ptdesc(page)
> is wrong. But worse, pagetable_free() calls
>
> __free_pages(page, compound_order(page));
>
> As vmemmap pages are not compound pages (see vmemmap_alloc_block()) --
> except for HVO, which doesn't apply here -- we will only free the first
> page when freeing a PMD-sized vmemmap page, leaking the other ones.
>
> Fix it by properly decoupling pagetable and vmemmap freeing.
> free_pagetable() no longer has to mess with SECTION_INFO, as only the
> vmemmap is marked like that in register_page_bootmem_memmap().
>
> The indentation in remove_pmd_table() is messed up, let's fix that
> while touching it.
>
> Note that we'll try to get rid of that bootmem info handling soon. For
> now, we'll handle it similar to free_pagetable(), just avoiding the
> ifdef.
>
> Tested-by: Lance Yang <lance.yang@linux.dev>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
> Reproduced and tested with a simple VM with a virtio-mem device,
> repeatedly adding and removing memory.
>
> Found by code inspection while working on bootmem_info removal.
> ---
@x86 maintainers, do you want to take this through your tree or should we merge
this through the MM tree?
I have another MM series coming up that will touch this code (no fixes, though).
--
Cheers,
David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] x86/mm: fix freeing of PMD-sized vmemmap pages
2026-05-08 9:19 ` David Hildenbrand (Arm)
@ 2026-05-08 9:23 ` Peter Zijlstra
2026-05-08 10:51 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2026-05-08 9:23 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Dave Hansen, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, x86, H. Peter Anvin, Mike Rapoport (Microsoft),
Jason Gunthorpe, Lu Baolu, Andrew Morton, Lance Yang,
linux-kernel, linux-mm, stable
On Fri, May 08, 2026 at 11:19:26AM +0200, David Hildenbrand (Arm) wrote:
> On 4/29/26 12:49, David Hildenbrand (Arm) wrote:
> > In commit bf9e4e30f353 ("x86/mm: use pagetable_free()"), we switched
> > from freeing non-boot page tables through __free_pages() to
> > pagetable_free().
> >
> > However, the function is also called to free vmemmap pages.
> >
> > Given that vmemmap pages are not page tables, already the page_ptdesc(page)
> > is wrong. But worse, pagetable_free() calls
> >
> > __free_pages(page, compound_order(page));
> >
> > As vmemmap pages are not compound pages (see vmemmap_alloc_block()) --
> > except for HVO, which doesn't apply here -- we will only free the first
> > page when freeing a PMD-sized vmemmap page, leaking the other ones.
> >
> > Fix it by properly decoupling pagetable and vmemmap freeing.
> > free_pagetable() no longer has to mess with SECTION_INFO, as only the
> > vmemmap is marked like that in register_page_bootmem_memmap().
> >
> > The indentation in remove_pmd_table() is messed up, let's fix that
> > while touching it.
> >
> > Note that we'll try to get rid of that bootmem info handling soon. For
> > now, we'll handle it similar to free_pagetable(), just avoiding the
> > ifdef.
> >
> > Tested-by: Lance Yang <lance.yang@linux.dev>
> > Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> > ---
> > Reproduced and tested with a simple VM with a virtio-mem device,
> > repeatedly adding and removing memory.
> >
> > Found by code inspection while working on bootmem_info removal.
> > ---
>
> @x86 maintainers, do you want to take this through your tree or should we merge
> this through the MM tree?
>
> I have another MM series coming up that will touch this code (no fixes, though).
I'm thinking this should go in rather more urgent, yes?
It looks good to me, Dave you want to stick this in x86/urgent?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] x86/mm: fix freeing of PMD-sized vmemmap pages
2026-05-08 9:23 ` Peter Zijlstra
@ 2026-05-08 10:51 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-08 10:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Dave Hansen, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, x86, H. Peter Anvin, Mike Rapoport (Microsoft),
Jason Gunthorpe, Lu Baolu, Andrew Morton, Lance Yang,
linux-kernel, linux-mm, stable
On 5/8/26 11:23, Peter Zijlstra wrote:
> On Fri, May 08, 2026 at 11:19:26AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/29/26 12:49, David Hildenbrand (Arm) wrote:
>>> In commit bf9e4e30f353 ("x86/mm: use pagetable_free()"), we switched
>>> from freeing non-boot page tables through __free_pages() to
>>> pagetable_free().
>>>
>>> However, the function is also called to free vmemmap pages.
>>>
>>> Given that vmemmap pages are not page tables, already the page_ptdesc(page)
>>> is wrong. But worse, pagetable_free() calls
>>>
>>> __free_pages(page, compound_order(page));
>>>
>>> As vmemmap pages are not compound pages (see vmemmap_alloc_block()) --
>>> except for HVO, which doesn't apply here -- we will only free the first
>>> page when freeing a PMD-sized vmemmap page, leaking the other ones.
>>>
>>> Fix it by properly decoupling pagetable and vmemmap freeing.
>>> free_pagetable() no longer has to mess with SECTION_INFO, as only the
>>> vmemmap is marked like that in register_page_bootmem_memmap().
>>>
>>> The indentation in remove_pmd_table() is messed up, let's fix that
>>> while touching it.
>>>
>>> Note that we'll try to get rid of that bootmem info handling soon. For
>>> now, we'll handle it similar to free_pagetable(), just avoiding the
>>> ifdef.
>>>
>>> Tested-by: Lance Yang <lance.yang@linux.dev>
>>> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>>> Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>>> ---
>>> Reproduced and tested with a simple VM with a virtio-mem device,
>>> repeatedly adding and removing memory.
>>>
>>> Found by code inspection while working on bootmem_info removal.
>>> ---
>>
>> @x86 maintainers, do you want to take this through your tree or should we merge
>> this through the MM tree?
>>
>> I have another MM series coming up that will touch this code (no fixes, though).
>
> I'm thinking this should go in rather more urgent, yes?
Yes, please :)
--
Cheers,
David
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-08 10:51 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 10:49 [PATCH v2] x86/mm: fix freeing of PMD-sized vmemmap pages David Hildenbrand (Arm)
2026-04-29 15:29 ` Lance Yang
2026-05-08 9:19 ` David Hildenbrand (Arm)
2026-05-08 9:23 ` Peter Zijlstra
2026-05-08 10:51 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox