From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
linux-mm@kvack.org, Alex Williamson <alex@shazbot.org>,
Max Boone <mboone@akamai.com>,
stable@vger.kernel.org
Subject: Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
Date: Tue, 24 Mar 2026 11:04:33 +0000 [thread overview]
Message-ID: <b3b78722-c265-484b-acde-3aa4bee0aac7@lucifer.local> (raw)
In-Reply-To: <20260323-follow_pfnmap_fix-v1-1-5b0ec10872b3@kernel.org>
On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
> follow_pfnmap_start() suffers from two problems:
>
> (1) We are not re-fetching the pmd/pud after taking the PTL
>
> Therefore, we are not properly stabilizing what the lock lock actually
> protects. If there is concurrent zapping, we would indicate to the
> caller that we found an entry, however, that entry might already have
> been invalidated, or contain a different PFN after taking the lock.
>
> Properly use pmdp_get() / pudp_get() after taking the lock.
>
> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
>
> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
>
> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> reasonable on non-present entries. Most architectures indeed either
> perform a present check or make it work by smart use of flags.
It seems huge page split is the main user via pmd_invalidate() ->
pmd_mkinvalid().
And I guess this is the kind of thing you mean by smart use of flags, for
x86-64:
static inline int pmd_present(pmd_t pmd)
{
/*
* Checking for _PAGE_PSE is needed too because
* split_huge_page will temporarily clear the present bit (but
* the _PAGE_PSE flag will remain set at all times while the
* _PAGE_PRESENT bit is clear).
*/
return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
}
So you might have missing _PAGE_PRESENT but still pmd_present() returns
true, as does pmd_leaf().
Seems the same for RISC-V.
And other arches play other games with the same result :)
So we probably shouldn't actually hit any problem with this from any other
sauce, but still good to do it.
>
> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> do that.
But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
_VALID, _DIRTY, _PROTNONE) so it'd return true.
pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
retained on split so should all still have worked?
But anyway this is still worthwhile I think.
>
> Let's check pmd_present()/pud_present() before assuming "the is a
> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> table handling code that traverses user page tables does.
>
> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> (1) is likely more relevant than (2). It is questionable how often (1)
> would actually trigger, but let's CC stable to be sure.
>
> This was found by code inspection.
>
> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
This looks correct to me, so:
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
> I actually trigger the follow_pfnmap machinery.
> ---
> mm/memory.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 219b9bf6cae0..2921d35c50ae 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
>
> pudp = pud_offset(p4dp, address);
> pud = pudp_get(pudp);
> - if (pud_none(pud))
> + if (!pud_present(pud))
> goto out;
> if (pud_leaf(pud)) {
> lock = pud_lock(mm, pudp);
> - if (!unlikely(pud_leaf(pud))) {
> + pud = pudp_get(pudp);
> +
> + if (unlikely(!pud_present(pud))) {
> + spin_unlock(lock);
> + goto out;
> + } else if (unlikely(!pud_leaf(pud))) {
Tiny nit, but no need for else here. Sometimes compilers complain about
this but not sure if it such pedantry is enabled in default kernel compiler
flags :)
Obv. same for below.
> spin_unlock(lock);
> goto retry;
> }
> @@ -6884,9 +6889,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
>
> pmdp = pmd_offset(pudp, address);
> pmd = pmdp_get_lockless(pmdp);
> + if (!pmd_present(pmd))
> + goto out;
> if (pmd_leaf(pmd)) {
> lock = pmd_lock(mm, pmdp);
> - if (!unlikely(pmd_leaf(pmd))) {
> + pmd = pmdp_get(pmdp);
> +
> + if (unlikely(!pmd_present(pmd))) {
> + spin_unlock(lock);
> + goto out;
> + } else if (unlikely(!pmd_leaf(pmd))) {
> spin_unlock(lock);
> goto retry;
> }
>
> ---
> base-commit: 3f4f1faa33544d0bd724e32980b6f211c3a9bc7b
> change-id: 20260323-follow_pfnmap_fix-bab73335468a
>
> Best regards,
> --
> David Hildenbrand (Arm) <david@kernel.org>
>
Cheers, Lorenzo
next prev parent reply other threads:[~2026-03-24 11:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
2026-03-24 7:33 ` Vlastimil Babka (SUSE)
2026-03-24 8:05 ` David Hildenbrand (Arm)
2026-03-24 8:39 ` Mike Rapoport
2026-03-24 9:26 ` David Hildenbrand (Arm)
2026-03-24 11:04 ` Lorenzo Stoakes (Oracle) [this message]
2026-03-24 12:46 ` David Hildenbrand (Arm)
2026-03-24 13:06 ` Lorenzo Stoakes (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b3b78722-c265-484b-acde-3aa4bee0aac7@lucifer.local \
--to=ljs@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mboone@akamai.com \
--cc=mhocko@suse.com \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox