The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] mm/page_table_check: do not track special (PFN-mapped) PTEs
@ 2026-06-08 15:57 Andrey Smirnov
  2026-06-08 21:22 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Andrey Smirnov @ 2026-06-08 15:57 UTC (permalink / raw)
  To: pasha.tatashin, akpm
  Cc: linux-mm, linux-kernel, linux-riscv, pjw, palmer, aou, alex,
	syzbot+2b5fe617654be3d8848b, Andrey Smirnov, Thomas Gleixner,
	Thomas Weißschuh, Andrei Vagin, Andy Lutomirski,
	Vincenzo Frascino, stable

The vDSO data store ("[vvar]") special mapping is created as a VM_PFNMAP
mapping and its pages are installed into userspace with vmf_insert_pfn(),
which produces special PTEs (pte_special()). On x86 and arm64 (and riscv)
pte_user_accessible_page() only tests the PRESENT/USER bits and does not
exclude special PTEs, so page_table_check accounts these PFN mappings in
the per-page anon/file map counters even though they are not rmap-managed
pages (vm_normal_page() returns NULL for them).

Most of these data pages live in the kernel image and are never freed, so
the stray accounting is invisible. The time-namespace VVAR page is the
exception: it is a real alloc_page() page that is released with
__free_page() in free_time_ns() when the last task of a time namespace
exits. Across the map / unmap / vdso_join_timens() zap transitions the
special-PTE accounting is not balanced for this page, so a non-zero
file_map_count survives to the free path and trips:

  kernel BUG at mm/page_table_check.c:143!
  __page_table_check_zero+0xfb/0x130
  __free_frozen_pages+0x52f/0x650
  free_time_ns+0x85/0xc0
  free_nsproxy+0x7f/0x130
  do_exit+0x313/0xa60
  do_group_exit+0x77/0x90

This is reliably reproducible on x86_64 and arm64 under heavy container/CI
churn that rapidly creates and destroys time namespaces (CLONE_NEWTIME via
runc / docker-init / tini), and was independently reported by syzbot on
riscv. It only manifests when CONFIG_PAGE_TABLE_CHECK is active.

Special PTEs have no struct-page rmap semantics and must never have been
tracked by page table check. Skip them in both the set and clear paths so
the counters stay balanced (always zero) for PFN-mapped pages, regardless
of how the architecture defines pte_user_accessible_page(). pte_special()
is available generically (it is a no-op returning false on architectures
without ARCH_HAS_PTE_SPECIAL), so this is a single, arch-independent fix.

Note that the v7.0 generic vDSO datastore rework in commit 05988dba1179
("vdso/datastore: Allocate data pages dynamically") incidentally avoids
the problem by switching the mapping to VM_MIXEDMAP + vmf_insert_page()
with balanced struct-page accounting. This patch fixes the still-affected
VM_PFNMAP path used by 6.18.y and earlier, and additionally makes
page_table_check robust against any future PFN-mapped user pages.

Fixes: df4e817b7108 ("mm: page table check")
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reported-by: syzbot+2b5fe617654be3d8848b@syzkaller.appspotmail.com
Closes: https://github.com/siderolabs/talos/issues/13496
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
---
 mm/page_table_check.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 4eeca782b888..ee492d5389b9 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -150,9 +150,16 @@ void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
 	if (&init_mm == mm)
 		return;
 
-	if (pte_user_accessible_page(pte)) {
+	/*
+	 * PFN-mapped (special) PTEs - e.g. the vDSO/time-namespace "[vvar]"
+	 * mapping installed via vmf_insert_pfn() - are not rmap-managed and
+	 * must not be tracked here. Tracking them can leave a non-zero map
+	 * count on a struct page that is later freed (the time namespace VVAR
+	 * page in free_time_ns()), tripping the BUG_ON() in
+	 * __page_table_check_zero().
+	 */
+	if (pte_user_accessible_page(pte) && !pte_special(pte))
 		page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT);
-	}
 }
 EXPORT_SYMBOL(__page_table_check_pte_clear);
 
@@ -205,7 +212,7 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
 
 	for (i = 0; i < nr; i++)
 		__page_table_check_pte_clear(mm, ptep_get(ptep + i));
-	if (pte_user_accessible_page(pte))
+	if (pte_user_accessible_page(pte) && !pte_special(pte))
 		page_table_check_set(pte_pfn(pte), nr, pte_write(pte));
 }
 EXPORT_SYMBOL(__page_table_check_ptes_set);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-09  2:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 15:57 [PATCH] mm/page_table_check: do not track special (PFN-mapped) PTEs Andrey Smirnov
2026-06-08 21:22 ` Andrew Morton
2026-06-09  2:23   ` Pasha Tatashin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox