All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: david@kernel.org
Cc: dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, tglx@kernel.org, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, hpa@zytor.com, rppt@kernel.org,
	jgg@ziepe.ca, baolu.lu@linux.intel.com,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, stable@vger.kernel.org,
	Lance Yang <lance.yang@linux.dev>
Subject: Re: [PATCH] x86/mm: fix freeing of PMD-sized vmemmap pages
Date: Wed, 29 Apr 2026 10:30:08 +0800	[thread overview]
Message-ID: <20260429023008.61378-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260429021224.39916-1-lance.yang@linux.dev>


On Wed, Apr 29, 2026 at 10:12:24AM +0800, Lance Yang wrote:
>
>On Tue, Apr 28, 2026 at 12:29:36PM +0200, David Hildenbrand (Arm) wrote:
>>In commit bf9e4e30f353 ("x86/mm: use pagetable_free()"), we switched
>>from freeing non-boot page tables through __free_pages() to
>>pagetable_free().
>>
>>However, the function is also called to free vmemmap pages.
>>
>>Given that vmemmap pages are not page tables, already the page_ptdesc(page)
>>is wrong. But worse, pagetable_free() calls
>>
>>	__free_pages(page, compound_order(page));
>>
>>As vmemmap pages are not compound pages (see vmemmap_alloc_block()) --
>>except for HVO, which doesn't apply here -- we will only free the first
>>page when freeing a PMD-sized vmemmap page, leaking the other ones.
>>
>>Fix it by properly decoupling pagetable and vmemmap freeing.
>>free_pagetable() no longer has to mess with SECTION_INFO, as only the
>>vmemmap is marked like that in register_page_bootmem_memmap().
>>
>>While at it, just wire up the altmap parameter for remove_pte_table().
>>Also, the indentation in remove_pmd_table() is messed up, let's fix that
>>while touching it.
>
>One thing I'm not sure about is passing altmap down into
>remove_pte_table().
>
>Do we actually know that a non-NULL altmap means that the vmemmap
>backing page came from that altmap?
>
>On x86 we still have in vmemmap_populate():
>
>	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
>		err = vmemmap_populate_basepages(start, end, node, NULL);
>
>So for smaller-than-section vmemmap ranges, even if the caller has an
>altmap, the backing pages are allocated from normal memory. But with
>this fix the PTE removal path would now call vmem_altmap_free() just
>because altmap is non-NULL, and would not free the actual backing page,
>IIUC :)
>
>Maybe free_vmemmap_pages() should first check that the backing page is
>really inside the altmap range before using vmem_altmap_free()?
>
>Hopefully I didn't miss anything :)

I played a bit with the following on top of this fix (untested):

---8<---
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 8d03e44a7fb9..9a52f9424a07 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1028,14 +1028,33 @@ static void __meminit free_pagetable(struct page *page, int order)
 	}
 }

+static bool __meminit vmemmap_page_is_altmap(struct page *page,
+		unsigned long nr_pages, struct vmem_altmap *altmap)
+{
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long start_pfn;
+	unsigned long end_pfn;
+
+	if (!altmap)
+		return false;
+
+	start_pfn = altmap->base_pfn + altmap->reserve;
+	end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
+
+	if (pfn < start_pfn || pfn >= end_pfn)
+		return false;
+
+	return nr_pages <= end_pfn - pfn;
+}
+
 static void __meminit free_vmemmap_pages(struct page *page, unsigned int order,
 		struct vmem_altmap *altmap)
 {
-	if (altmap) {
-		vmem_altmap_free(altmap, 1u << order);
-	} else if (PageReserved(page)) {
-		unsigned long nr_pages = 1 << order;
+	unsigned long nr_pages = 1 << order;

+	if (vmemmap_page_is_altmap(page, nr_pages, altmap)) {
+		vmem_altmap_free(altmap, nr_pages);
+	} else if (PageReserved(page)) {
 		if (IS_ENABLED(CONFIG_HAVE_BOOTMEM_INFO_NODE) &&
 		    bootmem_type(page) == SECTION_INFO) {
 			while (nr_pages--)
--

Thanks,
Lance

>Thanks,
>Lance
>
>>Note that we'll try to get rid of that bootmem info handling soon. For
>>now, we'll handle it similar to free_pagetable(), just avoiding the
>>ifdef.
>>
>>Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
>>Cc: stable@vger.kernel.org
>>Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>>---
>>Reproduced and tested with a simple VM with a virtio-mem device,
>>repeatedly adding and removing memory.
>>
>>Found by code inspection while working on bootmem_info removal.
>>---
>> arch/x86/mm/init_64.c | 43 +++++++++++++++++++++++++++----------------
>> 1 file changed, 27 insertions(+), 16 deletions(-)
>>
>>diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>index df2261fa4f98..8d03e44a7fb9 100644
>>--- a/arch/x86/mm/init_64.c
>>+++ b/arch/x86/mm/init_64.c
>>@@ -1014,7 +1014,7 @@ static void __meminit free_pagetable(struct page *page, int order)
>> #ifdef CONFIG_HAVE_BOOTMEM_INFO_NODE
>> 		enum bootmem_type type = bootmem_type(page);
>> 
>>-		if (type == SECTION_INFO || type == MIX_SECTION_INFO) {
>>+		if (type == MIX_SECTION_INFO) {
>> 			while (nr_pages--)
>> 				put_page_bootmem(page++);
>> 		} else {
>>@@ -1028,13 +1028,24 @@ static void __meminit free_pagetable(struct page *page, int order)
>> 	}
>> }
>> 
>>-static void __meminit free_hugepage_table(struct page *page,
>>+static void __meminit free_vmemmap_pages(struct page *page, unsigned int order,
>> 		struct vmem_altmap *altmap)
>> {
>>-	if (altmap)
>>-		vmem_altmap_free(altmap, PMD_SIZE / PAGE_SIZE);
>>-	else
>>-		free_pagetable(page, get_order(PMD_SIZE));
>>+	if (altmap) {
>>+		vmem_altmap_free(altmap, 1u << order);
>>+	} else if (PageReserved(page)) {
>>+		unsigned long nr_pages = 1 << order;
>>+
>>+		if (IS_ENABLED(CONFIG_HAVE_BOOTMEM_INFO_NODE) &&
>>+		    bootmem_type(page) == SECTION_INFO) {
>>+			while (nr_pages--)
>>+				put_page_bootmem(page++);
>>+		} else {
>>+			free_reserved_pages(page, nr_pages);
>>+		}
>>+	} else {
>>+		__free_pages(page, order);
>>+	}
>> }
>> 
>> static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
>>@@ -1093,7 +1104,7 @@ static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
>> 
>> static void __meminit
>> remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>>-		 bool direct)
>>+		 bool direct, struct vmem_altmap *altmap)
>> {
>> 	unsigned long next, pages = 0;
>> 	pte_t *pte;
>>@@ -1118,7 +1129,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>> 			return;
>> 
>> 		if (!direct)
>>-			free_pagetable(pte_page(*pte), 0);
>>+			free_vmemmap_pages(pte_page(*pte), 0, altmap);
>> 
>> 		spin_lock(&init_mm.page_table_lock);
>> 		pte_clear(&init_mm, addr, pte);
>>@@ -1153,25 +1164,25 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
>> 			if (IS_ALIGNED(addr, PMD_SIZE) &&
>> 			    IS_ALIGNED(next, PMD_SIZE)) {
>> 				if (!direct)
>>-					free_hugepage_table(pmd_page(*pmd),
>>-							    altmap);
>>+					free_vmemmap_pages(pmd_page(*pmd),
>>+							   PMD_ORDER, altmap);
>> 
>> 				spin_lock(&init_mm.page_table_lock);
>> 				pmd_clear(pmd);
>> 				spin_unlock(&init_mm.page_table_lock);
>> 				pages++;
>> 			} else if (vmemmap_pmd_is_unused(addr, next)) {
>>-					free_hugepage_table(pmd_page(*pmd),
>>-							    altmap);
>>-					spin_lock(&init_mm.page_table_lock);
>>-					pmd_clear(pmd);
>>-					spin_unlock(&init_mm.page_table_lock);
>>+				free_vmemmap_pages(pmd_page(*pmd), PMD_ORDER,
>>+						   altmap);
>>+				spin_lock(&init_mm.page_table_lock);
>>+				pmd_clear(pmd);
>>+				spin_unlock(&init_mm.page_table_lock);
>> 			}
>> 			continue;
>> 		}
>> 
>> 		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
>>-		remove_pte_table(pte_base, addr, next, direct);
>>+		remove_pte_table(pte_base, addr, next, direct, altmap);
>> 		free_pte_table(pte_base, pmd);
>> 	}
>> 
>>
>>---
>>
>>base-commit: a2ddbfd1af0f54ea84bf17f0400088815d012e8d
>>
>>change-id: 20260428-vmemmap-ab4b949aa727
>>
>>--
>>
>>Cheers,
>>
>>David
>>
>>
>


  reply	other threads:[~2026-04-29  2:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 10:29 [PATCH] x86/mm: fix freeing of PMD-sized vmemmap pages David Hildenbrand (Arm)
2026-04-28 10:34 ` David Hildenbrand (Arm)
2026-04-28 13:20 ` Lance Yang
2026-04-28 19:36   ` David Hildenbrand (Arm)
2026-04-28 20:47 ` Mike Rapoport
2026-04-29  2:12 ` Lance Yang
2026-04-29  2:30   ` Lance Yang [this message]
2026-04-29  5:50   ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260429023008.61378-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=hpa@zytor.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.