From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Andrey Smirnov" <andrey.smirnov@siderolabs.com>,
pasha.tatashin@soleen.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu,
alex@ghiti.fr,
syzbot+2b5fe617654be3d8848b@syzkaller.appspotmail.com,
"Thomas Gleixner" <tglx@linutronix.de>,
"Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
"Andrei Vagin" <avagin@gmail.com>,
"Andy Lutomirski" <luto@kernel.org>,
"Vincenzo Frascino" <vincenzo.frascino@arm.com>,
stable@vger.kernel.org
Subject: Re: [PATCH] mm/page_table_check: do not track special (PFN-mapped) PTEs
Date: Tue, 9 Jun 2026 02:23:28 +0000 [thread overview]
Message-ID: <aid4yw9WRvZEm2BV@plex> (raw)
In-Reply-To: <20260608142258.5028187b1d245b46554eb2dc@linux-foundation.org>
On 06-08 14:22, Andrew Morton wrote:
> On Mon, 8 Jun 2026 19:57:58 +0400 Andrey Smirnov <andrey.smirnov@siderolabs.com> wrote:
>
> > The vDSO data store ("[vvar]") special mapping is created as a VM_PFNMAP
> > mapping and its pages are installed into userspace with vmf_insert_pfn(),
> > which produces special PTEs (pte_special()). On x86 and arm64 (and riscv)
> > pte_user_accessible_page() only tests the PRESENT/USER bits and does not
> > exclude special PTEs, so page_table_check accounts these PFN mappings in
> > the per-page anon/file map counters even though they are not rmap-managed
> > pages (vm_normal_page() returns NULL for them).
> >
> > Most of these data pages live in the kernel image and are never freed, so
> > the stray accounting is invisible. The time-namespace VVAR page is the
> > exception: it is a real alloc_page() page that is released with
> > __free_page() in free_time_ns() when the last task of a time namespace
> > exits. Across the map / unmap / vdso_join_timens() zap transitions the
> > special-PTE accounting is not balanced for this page, so a non-zero
> > file_map_count survives to the free path and trips:
> >
> > kernel BUG at mm/page_table_check.c:143!
> > __page_table_check_zero+0xfb/0x130
> > __free_frozen_pages+0x52f/0x650
> > free_time_ns+0x85/0xc0
> > free_nsproxy+0x7f/0x130
> > do_exit+0x313/0xa60
> > do_group_exit+0x77/0x90
> >
> > This is reliably reproducible on x86_64 and arm64 under heavy container/CI
> > churn that rapidly creates and destroys time namespaces (CLONE_NEWTIME via
> > runc / docker-init / tini), and was independently reported by syzbot on
> > riscv. It only manifests when CONFIG_PAGE_TABLE_CHECK is active.
> >
> > Special PTEs have no struct-page rmap semantics and must never have been
> > tracked by page table check. Skip them in both the set and clear paths so
> > the counters stay balanced (always zero) for PFN-mapped pages, regardless
> > of how the architecture defines pte_user_accessible_page(). pte_special()
> > is available generically (it is a no-op returning false on architectures
> > without ARCH_HAS_PTE_SPECIAL), so this is a single, arch-independent fix.
> >
> > Note that the v7.0 generic vDSO datastore rework in commit 05988dba1179
> > ("vdso/datastore: Allocate data pages dynamically") incidentally avoids
> > the problem by switching the mapping to VM_MIXEDMAP + vmf_insert_page()
> > with balanced struct-page accounting. This patch fixes the still-affected
> > VM_PFNMAP path used by 6.18.y and earlier, and additionally makes
> > page_table_check robust against any future PFN-mapped user pages.
Thank you for detailed explanation of the bug, and it makes sense to me.
> Thanks.
>
> The patch isn't applicable to current -linus mainline. I reworked it
> as below, then deleted it. It would be better if this rework came from
> yourself (tested), please. And a patch which applies will get checked
> by Sashiko AI review.
+1.
Pasha
> --- a/mm/page_table_check.c~mm-page_table_check-do-not-track-special-pfn-mapped-ptes
> +++ a/mm/page_table_check.c
> @@ -151,7 +151,15 @@ void __page_table_check_pte_clear(struct
> if (&init_mm == mm)
> return;
>
> - if (pte_user_accessible_page(mm, addr, pte))
> + /*
> + * PFN-mapped (special) PTEs - e.g. the vDSO/time-namespace "[vvar]"
> + * mapping installed via vmf_insert_pfn() - are not rmap-managed and
> + * must not be tracked here. Tracking them can leave a non-zero map
> + * count on a struct page that is later freed (the time namespace VVAR
> + * page in free_time_ns()), tripping the BUG_ON() in
> + * __page_table_check_zero().
> + */
> + if (pte_user_accessible_page(mm, addr, pte) && !pte_special(pte))
> page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT);
> }
> EXPORT_SYMBOL(__page_table_check_pte_clear);
> @@ -208,7 +216,7 @@ void __page_table_check_ptes_set(struct
>
> for (i = 0; i < nr; i++)
> __page_table_check_pte_clear(mm, addr + PAGE_SIZE * i, ptep_get(ptep + i));
> - if (pte_user_accessible_page(mm, addr, pte))
> + if (pte_user_accessible_page(mm, addr, pte) && !pte_special(pte))
> page_table_check_set(pte_pfn(pte), nr, pte_write(pte));
> }
> EXPORT_SYMBOL(__page_table_check_ptes_set);
> _
>
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
prev parent reply other threads:[~2026-06-09 2:23 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 15:57 [PATCH] mm/page_table_check: do not track special (PFN-mapped) PTEs Andrey Smirnov
2026-06-08 21:22 ` Andrew Morton
2026-06-09 2:23 ` Pasha Tatashin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aid4yw9WRvZEm2BV@plex \
--to=pasha.tatashin@soleen.com \
--cc=akpm@linux-foundation.org \
--cc=alex@ghiti.fr \
--cc=andrey.smirnov@siderolabs.com \
--cc=aou@eecs.berkeley.edu \
--cc=avagin@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=luto@kernel.org \
--cc=palmer@dabbelt.com \
--cc=pjw@kernel.org \
--cc=stable@vger.kernel.org \
--cc=syzbot+2b5fe617654be3d8848b@syzkaller.appspotmail.com \
--cc=tglx@linutronix.de \
--cc=thomas.weissschuh@linutronix.de \
--cc=vincenzo.frascino@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox