public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
	linux-mm@kvack.org, Alex Williamson <alex@shazbot.org>,
	Max Boone <mboone@akamai.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
Date: Tue, 24 Mar 2026 13:46:20 +0100	[thread overview]
Message-ID: <43cc2290-10b6-4db3-bfc0-169adb8201b7@kernel.org> (raw)
In-Reply-To: <b3b78722-c265-484b-acde-3aa4bee0aac7@lucifer.local>

On 3/24/26 12:04, Lorenzo Stoakes (Oracle) wrote:
> On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
>> follow_pfnmap_start() suffers from two problems:
>>
>> (1) We are not re-fetching the pmd/pud after taking the PTL
>>
>> Therefore, we are not properly stabilizing what the lock lock actually
>> protects. If there is concurrent zapping, we would indicate to the
>> caller that we found an entry, however, that entry might already have
>> been invalidated, or contain a different PFN after taking the lock.
>>
>> Properly use pmdp_get() / pudp_get() after taking the lock.
>>
>> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
>>
>> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
>>
>> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
>> reasonable on non-present entries. Most architectures indeed either
>> perform a present check or make it work by smart use of flags.
> 
> It seems huge page split is the main user via pmd_invalidate() ->
> pmd_mkinvalid().
> 
> And I guess this is the kind of thing you mean by smart use of flags, for
> x86-64:

Exactly.

[...]

> 
>>
>> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
>> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
>> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
>> do that.
> 
> But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
> whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
> and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
> _VALID, _DIRTY, _PROTNONE) so it'd return true.

pmd_present() will correctly indicate "not present" for, say, a softleaf
migration entry.

However, pmd_leaf() will indicate "leaf" for a softleaf migration entry.

So not checking pmd_present() will actually treat non-present migration
entries as present leafs in this function, which is wrong in the context
of this function.

We're walking present entries where things like pmd_pfn(pmd) etc make sense.

> 
> pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
> retained on split so should all still have worked?
> 
> But anyway this is still worthwhile I think.
> 
>>
>> Let's check pmd_present()/pud_present() before assuming "the is a
>> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
>> table handling code that traverses user page tables does.
>>
>> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
>> (1) is likely more relevant than (2). It is questionable how often (1)
>> would actually trigger, but let's CC stable to be sure.
>>
>> This was found by code inspection.
>>
>> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> This looks correct to me, so:
> 
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Thanks!

> 
>> ---
>> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
>> I actually trigger the follow_pfnmap machinery.
>> ---
>>  mm/memory.c | 18 +++++++++++++++---
>>  1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 219b9bf6cae0..2921d35c50ae 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
>>
>>  	pudp = pud_offset(p4dp, address);
>>  	pud = pudp_get(pudp);
>> -	if (pud_none(pud))
>> +	if (!pud_present(pud))
>>  		goto out;
>>  	if (pud_leaf(pud)) {
>>  		lock = pud_lock(mm, pudp);
>> -		if (!unlikely(pud_leaf(pud))) {
>> +		pud = pudp_get(pudp);
>> +
>> +		if (unlikely(!pud_present(pud))) {
>> +			spin_unlock(lock);
>> +			goto out;
>> +		} else if (unlikely(!pud_leaf(pud))) {
> 
> Tiny nit, but no need for else here. Sometimes compilers complain about
> this but not sure if it such pedantry is enabled in default kernel compiler
> flags :)

You mean

if (unlikely(!pud_present(pud))) {
	spin_unlock(lock);
	goto out;
}
if (...) {

?

That just creates an additional LOC without any benefit IMHO. And we use
it all over the place :)

In fact, I will beat any C compiler with the C standard that complains
about that ;)

-- 
Cheers,

David


  reply	other threads:[~2026-03-24 12:46 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
2026-03-24  7:33 ` Vlastimil Babka (SUSE)
2026-03-24  8:05   ` David Hildenbrand (Arm)
2026-03-24  8:39 ` Mike Rapoport
2026-03-24  9:26   ` David Hildenbrand (Arm)
2026-03-24 11:04 ` Lorenzo Stoakes (Oracle)
2026-03-24 12:46   ` David Hildenbrand (Arm) [this message]
2026-03-24 13:06     ` Lorenzo Stoakes (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43cc2290-10b6-4db3-bfc0-169adb8201b7@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mboone@akamai.com \
    --cc=mhocko@suse.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox