Linux-RISC-V Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Andrey Smirnov" <andrey.smirnov@siderolabs.com>,
	pasha.tatashin@soleen.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu,
	alex@ghiti.fr,
	syzbot+2b5fe617654be3d8848b@syzkaller.appspotmail.com,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
	"Andrei Vagin" <avagin@gmail.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Vincenzo Frascino" <vincenzo.frascino@arm.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] mm/page_table_check: do not track special (PFN-mapped) PTEs
Date: Tue, 9 Jun 2026 02:23:28 +0000	[thread overview]
Message-ID: <aid4yw9WRvZEm2BV@plex> (raw)
In-Reply-To: <20260608142258.5028187b1d245b46554eb2dc@linux-foundation.org>

On 06-08 14:22, Andrew Morton wrote:
> On Mon,  8 Jun 2026 19:57:58 +0400 Andrey Smirnov <andrey.smirnov@siderolabs.com> wrote:
> 
> > The vDSO data store ("[vvar]") special mapping is created as a VM_PFNMAP
> > mapping and its pages are installed into userspace with vmf_insert_pfn(),
> > which produces special PTEs (pte_special()). On x86 and arm64 (and riscv)
> > pte_user_accessible_page() only tests the PRESENT/USER bits and does not
> > exclude special PTEs, so page_table_check accounts these PFN mappings in
> > the per-page anon/file map counters even though they are not rmap-managed
> > pages (vm_normal_page() returns NULL for them).
> > 
> > Most of these data pages live in the kernel image and are never freed, so
> > the stray accounting is invisible. The time-namespace VVAR page is the
> > exception: it is a real alloc_page() page that is released with
> > __free_page() in free_time_ns() when the last task of a time namespace
> > exits. Across the map / unmap / vdso_join_timens() zap transitions the
> > special-PTE accounting is not balanced for this page, so a non-zero
> > file_map_count survives to the free path and trips:
> > 
> >   kernel BUG at mm/page_table_check.c:143!
> >   __page_table_check_zero+0xfb/0x130
> >   __free_frozen_pages+0x52f/0x650
> >   free_time_ns+0x85/0xc0
> >   free_nsproxy+0x7f/0x130
> >   do_exit+0x313/0xa60
> >   do_group_exit+0x77/0x90
> > 
> > This is reliably reproducible on x86_64 and arm64 under heavy container/CI
> > churn that rapidly creates and destroys time namespaces (CLONE_NEWTIME via
> > runc / docker-init / tini), and was independently reported by syzbot on
> > riscv. It only manifests when CONFIG_PAGE_TABLE_CHECK is active.
> > 
> > Special PTEs have no struct-page rmap semantics and must never have been
> > tracked by page table check. Skip them in both the set and clear paths so
> > the counters stay balanced (always zero) for PFN-mapped pages, regardless
> > of how the architecture defines pte_user_accessible_page(). pte_special()
> > is available generically (it is a no-op returning false on architectures
> > without ARCH_HAS_PTE_SPECIAL), so this is a single, arch-independent fix.
> > 
> > Note that the v7.0 generic vDSO datastore rework in commit 05988dba1179
> > ("vdso/datastore: Allocate data pages dynamically") incidentally avoids
> > the problem by switching the mapping to VM_MIXEDMAP + vmf_insert_page()
> > with balanced struct-page accounting. This patch fixes the still-affected
> > VM_PFNMAP path used by 6.18.y and earlier, and additionally makes
> > page_table_check robust against any future PFN-mapped user pages.

Thank you for detailed explanation of the bug, and it makes sense to me.

> Thanks.
> 
> The patch isn't applicable to current -linus mainline.  I reworked it
> as below, then deleted it.  It would be better if this rework came from
> yourself (tested), please.  And a patch which applies will get checked
> by Sashiko AI review.

+1.

Pasha

> --- a/mm/page_table_check.c~mm-page_table_check-do-not-track-special-pfn-mapped-ptes
> +++ a/mm/page_table_check.c
> @@ -151,7 +151,15 @@ void __page_table_check_pte_clear(struct
>  	if (&init_mm == mm)
>  		return;
>  
> -	if (pte_user_accessible_page(mm, addr, pte))
> +	/*
> +	 * PFN-mapped (special) PTEs - e.g. the vDSO/time-namespace "[vvar]"
> +	 * mapping installed via vmf_insert_pfn() - are not rmap-managed and
> +	 * must not be tracked here. Tracking them can leave a non-zero map
> +	 * count on a struct page that is later freed (the time namespace VVAR
> +	 * page in free_time_ns()), tripping the BUG_ON() in
> +	 * __page_table_check_zero().
> +	 */
> +	if (pte_user_accessible_page(mm, addr, pte) && !pte_special(pte))
>  		page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT);
>  }
>  EXPORT_SYMBOL(__page_table_check_pte_clear);
> @@ -208,7 +216,7 @@ void __page_table_check_ptes_set(struct
>  
>  	for (i = 0; i < nr; i++)
>  		__page_table_check_pte_clear(mm, addr + PAGE_SIZE * i, ptep_get(ptep + i));
> -	if (pte_user_accessible_page(mm, addr, pte))
> +	if (pte_user_accessible_page(mm, addr, pte) && !pte_special(pte))
>  		page_table_check_set(pte_pfn(pte), nr, pte_write(pte));
>  }
>  EXPORT_SYMBOL(__page_table_check_ptes_set);
> _
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

      reply	other threads:[~2026-06-09  2:23 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08 15:57 [PATCH] mm/page_table_check: do not track special (PFN-mapped) PTEs Andrey Smirnov
2026-06-08 21:22 ` Andrew Morton
2026-06-09  2:23   ` Pasha Tatashin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aid4yw9WRvZEm2BV@plex \
    --to=pasha.tatashin@soleen.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=andrey.smirnov@siderolabs.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=avagin@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=pjw@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+2b5fe617654be3d8848b@syzkaller.appspotmail.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.weissschuh@linutronix.de \
    --cc=vincenzo.frascino@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox