From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
linux-mm@kvack.org, Alex Williamson <alex@shazbot.org>,
Max Boone <mboone@akamai.com>,
stable@vger.kernel.org
Subject: Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
Date: Tue, 24 Mar 2026 13:46:20 +0100 [thread overview]
Message-ID: <43cc2290-10b6-4db3-bfc0-169adb8201b7@kernel.org> (raw)
In-Reply-To: <b3b78722-c265-484b-acde-3aa4bee0aac7@lucifer.local>
On 3/24/26 12:04, Lorenzo Stoakes (Oracle) wrote:
> On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
>> follow_pfnmap_start() suffers from two problems:
>>
>> (1) We are not re-fetching the pmd/pud after taking the PTL
>>
>> Therefore, we are not properly stabilizing what the lock lock actually
>> protects. If there is concurrent zapping, we would indicate to the
>> caller that we found an entry, however, that entry might already have
>> been invalidated, or contain a different PFN after taking the lock.
>>
>> Properly use pmdp_get() / pudp_get() after taking the lock.
>>
>> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
>>
>> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
>>
>> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
>> reasonable on non-present entries. Most architectures indeed either
>> perform a present check or make it work by smart use of flags.
>
> It seems huge page split is the main user via pmd_invalidate() ->
> pmd_mkinvalid().
>
> And I guess this is the kind of thing you mean by smart use of flags, for
> x86-64:
Exactly.
[...]
>
>>
>> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
>> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
>> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
>> do that.
>
> But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
> whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
> and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
> _VALID, _DIRTY, _PROTNONE) so it'd return true.
pmd_present() will correctly indicate "not present" for, say, a softleaf
migration entry.
However, pmd_leaf() will indicate "leaf" for a softleaf migration entry.
So not checking pmd_present() will actually treat non-present migration
entries as present leafs in this function, which is wrong in the context
of this function.
We're walking present entries where things like pmd_pfn(pmd) etc make sense.
>
> pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
> retained on split so should all still have worked?
>
> But anyway this is still worthwhile I think.
>
>>
>> Let's check pmd_present()/pud_present() before assuming "the is a
>> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
>> table handling code that traverses user page tables does.
>>
>> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
>> (1) is likely more relevant than (2). It is questionable how often (1)
>> would actually trigger, but let's CC stable to be sure.
>>
>> This was found by code inspection.
>>
>> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>
> This looks correct to me, so:
>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Thanks!
>
>> ---
>> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
>> I actually trigger the follow_pfnmap machinery.
>> ---
>> mm/memory.c | 18 +++++++++++++++---
>> 1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 219b9bf6cae0..2921d35c50ae 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
>>
>> pudp = pud_offset(p4dp, address);
>> pud = pudp_get(pudp);
>> - if (pud_none(pud))
>> + if (!pud_present(pud))
>> goto out;
>> if (pud_leaf(pud)) {
>> lock = pud_lock(mm, pudp);
>> - if (!unlikely(pud_leaf(pud))) {
>> + pud = pudp_get(pudp);
>> +
>> + if (unlikely(!pud_present(pud))) {
>> + spin_unlock(lock);
>> + goto out;
>> + } else if (unlikely(!pud_leaf(pud))) {
>
> Tiny nit, but no need for else here. Sometimes compilers complain about
> this but not sure if it such pedantry is enabled in default kernel compiler
> flags :)
You mean
if (unlikely(!pud_present(pud))) {
spin_unlock(lock);
goto out;
}
if (...) {
?
That just creates an additional LOC without any benefit IMHO. And we use
it all over the place :)
In fact, I will beat any C compiler with the C standard that complains
about that ;)
--
Cheers,
David
next prev parent reply other threads:[~2026-03-24 12:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
2026-03-24 7:33 ` Vlastimil Babka (SUSE)
2026-03-24 8:05 ` David Hildenbrand (Arm)
2026-03-24 8:39 ` Mike Rapoport
2026-03-24 9:26 ` David Hildenbrand (Arm)
2026-03-24 11:04 ` Lorenzo Stoakes (Oracle)
2026-03-24 12:46 ` David Hildenbrand (Arm) [this message]
2026-03-24 13:06 ` Lorenzo Stoakes (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43cc2290-10b6-4db3-bfc0-169adb8201b7@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mboone@akamai.com \
--cc=mhocko@suse.com \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.