public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	 Andrew Morton <akpm@linux-foundation.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
	 linux-mm@kvack.org, Alex Williamson <alex@shazbot.org>,
	 Max Boone <mboone@akamai.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
Date: Tue, 24 Mar 2026 13:06:04 +0000	[thread overview]
Message-ID: <132009fb-70d4-4e3e-98a9-fcc230dd282e@lucifer.local> (raw)
In-Reply-To: <43cc2290-10b6-4db3-bfc0-169adb8201b7@kernel.org>

On Tue, Mar 24, 2026 at 01:46:20PM +0100, David Hildenbrand (Arm) wrote:
> On 3/24/26 12:04, Lorenzo Stoakes (Oracle) wrote:
> > On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
> >> follow_pfnmap_start() suffers from two problems:
> >>
> >> (1) We are not re-fetching the pmd/pud after taking the PTL
> >>
> >> Therefore, we are not properly stabilizing what the lock lock actually
> >> protects. If there is concurrent zapping, we would indicate to the
> >> caller that we found an entry, however, that entry might already have
> >> been invalidated, or contain a different PFN after taking the lock.
> >>
> >> Properly use pmdp_get() / pudp_get() after taking the lock.
> >>
> >> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
> >>
> >> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
> >>
> >> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> >> reasonable on non-present entries. Most architectures indeed either
> >> perform a present check or make it work by smart use of flags.
> >
> > It seems huge page split is the main user via pmd_invalidate() ->
> > pmd_mkinvalid().
> >
> > And I guess this is the kind of thing you mean by smart use of flags, for
> > x86-64:
>
> Exactly.
>
> [...]
>
> >
> >>
> >> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> >> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> >> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> >> do that.
> >
> > But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
> > whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
> > and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
> > _VALID, _DIRTY, _PROTNONE) so it'd return true.
>
> pmd_present() will correctly indicate "not present" for, say, a softleaf
> migration entry.
>
> However, pmd_leaf() will indicate "leaf" for a softleaf migration entry.

Right yeah that's true. By definition softleaves are non-present. But as they
are leaves, you'd expect pXX_leaf() to return true.

>
> So not checking pmd_present() will actually treat non-present migration
> entries as present leafs in this function, which is wrong in the context
> of this function.
>
> We're walking present entries where things like pmd_pfn(pmd) etc make sense.

Ack, makes sense, thanks!

>
> >
> > pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
> > retained on split so should all still have worked?
> >
> > But anyway this is still worthwhile I think.
> >
> >>
> >> Let's check pmd_present()/pud_present() before assuming "the is a
> >> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> >> table handling code that traverses user page tables does.
> >>
> >> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> >> (1) is likely more relevant than (2). It is questionable how often (1)
> >> would actually trigger, but let's CC stable to be sure.
> >>
> >> This was found by code inspection.
> >>
> >> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> >
> > This looks correct to me, so:
> >
> > Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>
> Thanks!
>
> >
> >> ---
> >> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
> >> I actually trigger the follow_pfnmap machinery.
> >> ---
> >>  mm/memory.c | 18 +++++++++++++++---
> >>  1 file changed, 15 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 219b9bf6cae0..2921d35c50ae 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
> >>
> >>  	pudp = pud_offset(p4dp, address);
> >>  	pud = pudp_get(pudp);
> >> -	if (pud_none(pud))
> >> +	if (!pud_present(pud))
> >>  		goto out;
> >>  	if (pud_leaf(pud)) {
> >>  		lock = pud_lock(mm, pudp);
> >> -		if (!unlikely(pud_leaf(pud))) {
> >> +		pud = pudp_get(pudp);
> >> +
> >> +		if (unlikely(!pud_present(pud))) {
> >> +			spin_unlock(lock);
> >> +			goto out;
> >> +		} else if (unlikely(!pud_leaf(pud))) {
> >
> > Tiny nit, but no need for else here. Sometimes compilers complain about
> > this but not sure if it such pedantry is enabled in default kernel compiler
> > flags :)
>
> You mean
>
> if (unlikely(!pud_present(pud))) {
> 	spin_unlock(lock);
> 	goto out;
> }
> if (...) {
>
> ?
>
> That just creates an additional LOC without any benefit IMHO. And we use
> it all over the place :)

Yeah I think the argument is you don't want to imply that it could somehow _not_
be else. But I think it's the compiler being a wee bit pendatic... :)

>
> In fact, I will beat any C compiler with the C standard that complains
> about that ;)

Haha, I'd like to see that!

>
> --
> Cheers,
>
> David

Cheers, Lorenzo

      reply	other threads:[~2026-03-24 13:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
2026-03-24  7:33 ` Vlastimil Babka (SUSE)
2026-03-24  8:05   ` David Hildenbrand (Arm)
2026-03-24  8:39 ` Mike Rapoport
2026-03-24  9:26   ` David Hildenbrand (Arm)
2026-03-24 11:04 ` Lorenzo Stoakes (Oracle)
2026-03-24 12:46   ` David Hildenbrand (Arm)
2026-03-24 13:06     ` Lorenzo Stoakes (Oracle) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=132009fb-70d4-4e3e-98a9-fcc230dd282e@lucifer.local \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mboone@akamai.com \
    --cc=mhocko@suse.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox