From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
linux-mm@kvack.org, Alex Williamson <alex@shazbot.org>,
Max Boone <mboone@akamai.com>,
stable@vger.kernel.org
Subject: Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
Date: Tue, 24 Mar 2026 13:06:04 +0000 [thread overview]
Message-ID: <132009fb-70d4-4e3e-98a9-fcc230dd282e@lucifer.local> (raw)
In-Reply-To: <43cc2290-10b6-4db3-bfc0-169adb8201b7@kernel.org>
On Tue, Mar 24, 2026 at 01:46:20PM +0100, David Hildenbrand (Arm) wrote:
> On 3/24/26 12:04, Lorenzo Stoakes (Oracle) wrote:
> > On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
> >> follow_pfnmap_start() suffers from two problems:
> >>
> >> (1) We are not re-fetching the pmd/pud after taking the PTL
> >>
> >> Therefore, we are not properly stabilizing what the lock lock actually
> >> protects. If there is concurrent zapping, we would indicate to the
> >> caller that we found an entry, however, that entry might already have
> >> been invalidated, or contain a different PFN after taking the lock.
> >>
> >> Properly use pmdp_get() / pudp_get() after taking the lock.
> >>
> >> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
> >>
> >> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
> >>
> >> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> >> reasonable on non-present entries. Most architectures indeed either
> >> perform a present check or make it work by smart use of flags.
> >
> > It seems huge page split is the main user via pmd_invalidate() ->
> > pmd_mkinvalid().
> >
> > And I guess this is the kind of thing you mean by smart use of flags, for
> > x86-64:
>
> Exactly.
>
> [...]
>
> >
> >>
> >> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> >> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> >> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> >> do that.
> >
> > But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
> > whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
> > and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
> > _VALID, _DIRTY, _PROTNONE) so it'd return true.
>
> pmd_present() will correctly indicate "not present" for, say, a softleaf
> migration entry.
>
> However, pmd_leaf() will indicate "leaf" for a softleaf migration entry.
Right yeah that's true. By definition softleaves are non-present. But as they
are leaves, you'd expect pXX_leaf() to return true.
>
> So not checking pmd_present() will actually treat non-present migration
> entries as present leafs in this function, which is wrong in the context
> of this function.
>
> We're walking present entries where things like pmd_pfn(pmd) etc make sense.
Ack, makes sense, thanks!
>
> >
> > pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
> > retained on split so should all still have worked?
> >
> > But anyway this is still worthwhile I think.
> >
> >>
> >> Let's check pmd_present()/pud_present() before assuming "the is a
> >> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> >> table handling code that traverses user page tables does.
> >>
> >> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> >> (1) is likely more relevant than (2). It is questionable how often (1)
> >> would actually trigger, but let's CC stable to be sure.
> >>
> >> This was found by code inspection.
> >>
> >> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> >
> > This looks correct to me, so:
> >
> > Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>
> Thanks!
>
> >
> >> ---
> >> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
> >> I actually trigger the follow_pfnmap machinery.
> >> ---
> >> mm/memory.c | 18 +++++++++++++++---
> >> 1 file changed, 15 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 219b9bf6cae0..2921d35c50ae 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
> >>
> >> pudp = pud_offset(p4dp, address);
> >> pud = pudp_get(pudp);
> >> - if (pud_none(pud))
> >> + if (!pud_present(pud))
> >> goto out;
> >> if (pud_leaf(pud)) {
> >> lock = pud_lock(mm, pudp);
> >> - if (!unlikely(pud_leaf(pud))) {
> >> + pud = pudp_get(pudp);
> >> +
> >> + if (unlikely(!pud_present(pud))) {
> >> + spin_unlock(lock);
> >> + goto out;
> >> + } else if (unlikely(!pud_leaf(pud))) {
> >
> > Tiny nit, but no need for else here. Sometimes compilers complain about
> > this but not sure if it such pedantry is enabled in default kernel compiler
> > flags :)
>
> You mean
>
> if (unlikely(!pud_present(pud))) {
> spin_unlock(lock);
> goto out;
> }
> if (...) {
>
> ?
>
> That just creates an additional LOC without any benefit IMHO. And we use
> it all over the place :)
Yeah I think the argument is you don't want to imply that it could somehow _not_
be else. But I think it's the compiler being a wee bit pendatic... :)
>
> In fact, I will beat any C compiler with the C standard that complains
> about that ;)
Haha, I'd like to see that!
>
> --
> Cheers,
>
> David
Cheers, Lorenzo
prev parent reply other threads:[~2026-03-24 13:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
2026-03-24 7:33 ` Vlastimil Babka (SUSE)
2026-03-24 8:05 ` David Hildenbrand (Arm)
2026-03-24 8:39 ` Mike Rapoport
2026-03-24 9:26 ` David Hildenbrand (Arm)
2026-03-24 11:04 ` Lorenzo Stoakes (Oracle)
2026-03-24 12:46 ` David Hildenbrand (Arm)
2026-03-24 13:06 ` Lorenzo Stoakes (Oracle) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=132009fb-70d4-4e3e-98a9-fcc230dd282e@lucifer.local \
--to=ljs@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mboone@akamai.com \
--cc=mhocko@suse.com \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox