public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	 Andrew Morton <akpm@linux-foundation.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
	 linux-mm@kvack.org, Alex Williamson <alex@shazbot.org>,
	 Max Boone <mboone@akamai.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
Date: Tue, 24 Mar 2026 11:04:33 +0000	[thread overview]
Message-ID: <b3b78722-c265-484b-acde-3aa4bee0aac7@lucifer.local> (raw)
In-Reply-To: <20260323-follow_pfnmap_fix-v1-1-5b0ec10872b3@kernel.org>

On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
> follow_pfnmap_start() suffers from two problems:
>
> (1) We are not re-fetching the pmd/pud after taking the PTL
>
> Therefore, we are not properly stabilizing what the lock lock actually
> protects. If there is concurrent zapping, we would indicate to the
> caller that we found an entry, however, that entry might already have
> been invalidated, or contain a different PFN after taking the lock.
>
> Properly use pmdp_get() / pudp_get() after taking the lock.
>
> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
>
> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
>
> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> reasonable on non-present entries. Most architectures indeed either
> perform a present check or make it work by smart use of flags.

It seems huge page split is the main user via pmd_invalidate() ->
pmd_mkinvalid().

And I guess this is the kind of thing you mean by smart use of flags, for
x86-64:

static inline int pmd_present(pmd_t pmd)
{
	/*
	 * Checking for _PAGE_PSE is needed too because
	 * split_huge_page will temporarily clear the present bit (but
	 * the _PAGE_PSE flag will remain set at all times while the
	 * _PAGE_PRESENT bit is clear).
	 */
	return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
}

So you might have missing _PAGE_PRESENT but still pmd_present() returns
true, as does pmd_leaf().

Seems the same for RISC-V.

And other arches play other games with the same result :)

So we probably shouldn't actually hit any problem with this from any other
sauce, but still good to do it.

>
> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> do that.

But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
_VALID, _DIRTY, _PROTNONE) so it'd return true.

pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
retained on split so should all still have worked?

But anyway this is still worthwhile I think.

>
> Let's check pmd_present()/pud_present() before assuming "the is a
> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> table handling code that traverses user page tables does.
>
> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> (1) is likely more relevant than (2). It is questionable how often (1)
> would actually trigger, but let's CC stable to be sure.
>
> This was found by code inspection.
>
> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

This looks correct to me, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
> I actually trigger the follow_pfnmap machinery.
> ---
>  mm/memory.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 219b9bf6cae0..2921d35c50ae 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
>
>  	pudp = pud_offset(p4dp, address);
>  	pud = pudp_get(pudp);
> -	if (pud_none(pud))
> +	if (!pud_present(pud))
>  		goto out;
>  	if (pud_leaf(pud)) {
>  		lock = pud_lock(mm, pudp);
> -		if (!unlikely(pud_leaf(pud))) {
> +		pud = pudp_get(pudp);
> +
> +		if (unlikely(!pud_present(pud))) {
> +			spin_unlock(lock);
> +			goto out;
> +		} else if (unlikely(!pud_leaf(pud))) {

Tiny nit, but no need for else here. Sometimes compilers complain about
this but not sure if it such pedantry is enabled in default kernel compiler
flags :)

Obv. same for below.

>  			spin_unlock(lock);
>  			goto retry;
>  		}
> @@ -6884,9 +6889,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
>
>  	pmdp = pmd_offset(pudp, address);
>  	pmd = pmdp_get_lockless(pmdp);
> +	if (!pmd_present(pmd))
> +		goto out;
>  	if (pmd_leaf(pmd)) {
>  		lock = pmd_lock(mm, pmdp);
> -		if (!unlikely(pmd_leaf(pmd))) {
> +		pmd = pmdp_get(pmdp);
> +
> +		if (unlikely(!pmd_present(pmd))) {
> +			spin_unlock(lock);
> +			goto out;
> +		} else if (unlikely(!pmd_leaf(pmd))) {
>  			spin_unlock(lock);
>  			goto retry;
>  		}
>
> ---
> base-commit: 3f4f1faa33544d0bd724e32980b6f211c3a9bc7b
> change-id: 20260323-follow_pfnmap_fix-bab73335468a
>
> Best regards,
> --
> David Hildenbrand (Arm) <david@kernel.org>
>

Cheers, Lorenzo

  parent reply	other threads:[~2026-03-24 11:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
2026-03-24  7:33 ` Vlastimil Babka (SUSE)
2026-03-24  8:05   ` David Hildenbrand (Arm)
2026-03-24  8:39 ` Mike Rapoport
2026-03-24  9:26   ` David Hildenbrand (Arm)
2026-03-24 11:04 ` Lorenzo Stoakes (Oracle) [this message]
2026-03-24 12:46   ` David Hildenbrand (Arm)
2026-03-24 13:06     ` Lorenzo Stoakes (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b3b78722-c265-484b-acde-3aa4bee0aac7@lucifer.local \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mboone@akamai.com \
    --cc=mhocko@suse.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox