public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
@ 2026-03-23 20:20 David Hildenbrand (Arm)
  2026-03-24  7:33 ` Vlastimil Babka (SUSE)
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-23 20:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Peter Xu,
	linux-mm, Alex Williamson, Max Boone, stable,
	David Hildenbrand (Arm)

follow_pfnmap_start() suffers from two problems:

(1) We are not re-fetching the pmd/pud after taking the PTL

Therefore, we are not properly stabilizing what the lock lock actually
protects. If there is concurrent zapping, we would indicate to the
caller that we found an entry, however, that entry might already have
been invalidated, or contain a different PFN after taking the lock.

Properly use pmdp_get() / pudp_get() after taking the lock.

(2) pmd_leaf() / pud_leaf() are not well defined on non-present entries

pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.

There is no real guarantee that pmd_leaf()/pud_leaf() returns something
reasonable on non-present entries. Most architectures indeed either
perform a present check or make it work by smart use of flags.

However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
do that.

Let's check pmd_present()/pud_present() before assuming "the is a
present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
table handling code that traverses user page tables does.

Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
(1) is likely more relevant than (2). It is questionable how often (1)
would actually trigger, but let's CC stable to be sure.

This was found by code inspection.

Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
Gave it a quick test in a VM with MM selftests etc, but I am not sure if
I actually trigger the follow_pfnmap machinery.
---
 mm/memory.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 219b9bf6cae0..2921d35c50ae 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
 
 	pudp = pud_offset(p4dp, address);
 	pud = pudp_get(pudp);
-	if (pud_none(pud))
+	if (!pud_present(pud))
 		goto out;
 	if (pud_leaf(pud)) {
 		lock = pud_lock(mm, pudp);
-		if (!unlikely(pud_leaf(pud))) {
+		pud = pudp_get(pudp);
+
+		if (unlikely(!pud_present(pud))) {
+			spin_unlock(lock);
+			goto out;
+		} else if (unlikely(!pud_leaf(pud))) {
 			spin_unlock(lock);
 			goto retry;
 		}
@@ -6884,9 +6889,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
 
 	pmdp = pmd_offset(pudp, address);
 	pmd = pmdp_get_lockless(pmdp);
+	if (!pmd_present(pmd))
+		goto out;
 	if (pmd_leaf(pmd)) {
 		lock = pmd_lock(mm, pmdp);
-		if (!unlikely(pmd_leaf(pmd))) {
+		pmd = pmdp_get(pmdp);
+
+		if (unlikely(!pmd_present(pmd))) {
+			spin_unlock(lock);
+			goto out;
+		} else if (unlikely(!pmd_leaf(pmd))) {
 			spin_unlock(lock);
 			goto retry;
 		}

---
base-commit: 3f4f1faa33544d0bd724e32980b6f211c3a9bc7b
change-id: 20260323-follow_pfnmap_fix-bab73335468a

Best regards,
-- 
David Hildenbrand (Arm) <david@kernel.org>



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
  2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
@ 2026-03-24  7:33 ` Vlastimil Babka (SUSE)
  2026-03-24  8:05   ` David Hildenbrand (Arm)
  2026-03-24  8:39 ` Mike Rapoport
  2026-03-24 11:04 ` Lorenzo Stoakes (Oracle)
  2 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-03-24  7:33 UTC (permalink / raw)
  To: David Hildenbrand (Arm), linux-kernel
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Peter Xu, linux-mm,
	Alex Williamson, Max Boone, stable

On 3/23/26 21:20, David Hildenbrand (Arm) wrote:
> follow_pfnmap_start() suffers from two problems:
> 
> (1) We are not re-fetching the pmd/pud after taking the PTL
> 
> Therefore, we are not properly stabilizing what the lock lock actually
> protects. If there is concurrent zapping, we would indicate to the
> caller that we found an entry, however, that entry might already have
> been invalidated, or contain a different PFN after taking the lock.
> 
> Properly use pmdp_get() / pudp_get() after taking the lock.
> 
> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
> 
> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
> 
> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> reasonable on non-present entries. Most architectures indeed either
> perform a present check or make it work by smart use of flags.
> 
> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> do that.
> 
> Let's check pmd_present()/pud_present() before assuming "the is a
> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> table handling code that traverses user page tables does.
> 
> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> (1) is likely more relevant than (2). It is questionable how often (1)
> would actually trigger, but let's CC stable to be sure.
> 
> This was found by code inspection.
> 
> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>

Should we also convert pgd_none() and p4d_none() checks to X_present() checks?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
  2026-03-24  7:33 ` Vlastimil Babka (SUSE)
@ 2026-03-24  8:05   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-24  8:05 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), linux-kernel
  Cc: Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Peter Xu, linux-mm,
	Alex Williamson, Max Boone, stable

On 3/24/26 08:33, Vlastimil Babka (SUSE) wrote:
> On 3/23/26 21:20, David Hildenbrand (Arm) wrote:
>> follow_pfnmap_start() suffers from two problems:
>>
>> (1) We are not re-fetching the pmd/pud after taking the PTL
>>
>> Therefore, we are not properly stabilizing what the lock lock actually
>> protects. If there is concurrent zapping, we would indicate to the
>> caller that we found an entry, however, that entry might already have
>> been invalidated, or contain a different PFN after taking the lock.
>>
>> Properly use pmdp_get() / pudp_get() after taking the lock.
>>
>> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
>>
>> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
>>
>> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
>> reasonable on non-present entries. Most architectures indeed either
>> perform a present check or make it work by smart use of flags.
>>
>> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
>> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
>> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
>> do that.
>>
>> Let's check pmd_present()/pud_present() before assuming "the is a
>> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
>> table handling code that traverses user page tables does.
>>
>> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
>> (1) is likely more relevant than (2). It is questionable how often (1)
>> would actually trigger, but let's CC stable to be sure.
>>
>> This was found by code inspection.
>>
>> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> 
> Should we also convert pgd_none() and p4d_none() checks to X_present() checks?

For these, we don't really support non-present entries yet (excluding
hugetlb, hmmm)..

I have some more cleanups laying around here, I'll throw that in as
well, thanks!

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
  2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
  2026-03-24  7:33 ` Vlastimil Babka (SUSE)
@ 2026-03-24  8:39 ` Mike Rapoport
  2026-03-24  9:26   ` David Hildenbrand (Arm)
  2026-03-24 11:04 ` Lorenzo Stoakes (Oracle)
  2 siblings, 1 reply; 8+ messages in thread
From: Mike Rapoport @ 2026-03-24  8:39 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Suren Baghdasaryan, Michal Hocko, Peter Xu,
	linux-mm, Alex Williamson, Max Boone, stable

On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
> follow_pfnmap_start() suffers from two problems:
> 
> (1) We are not re-fetching the pmd/pud after taking the PTL
> 
> Therefore, we are not properly stabilizing what the lock lock actually

                                                    ^ lock lock

> protects. If there is concurrent zapping, we would indicate to the
> caller that we found an entry, however, that entry might already have
> been invalidated, or contain a different PFN after taking the lock.
> 
> Properly use pmdp_get() / pudp_get() after taking the lock.
> 
> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
> 
> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
> 
> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> reasonable on non-present entries. Most architectures indeed either
> perform a present check or make it work by smart use of flags.
> 
> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> do that.
> 
> Let's check pmd_present()/pud_present() before assuming "the is a
> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> table handling code that traverses user page tables does.
> 
> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> (1) is likely more relevant than (2). It is questionable how often (1)
> would actually trigger, but let's CC stable to be sure.
> 
> This was found by code inspection.
> 
> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
> I actually trigger the follow_pfnmap machinery.

Most probably not :)
KVM selftests might, didn't really dig into that. But I doubt any selftest
would trigger potential races there.

> ---
>  mm/memory.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
 
-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
  2026-03-24  8:39 ` Mike Rapoport
@ 2026-03-24  9:26   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-24  9:26 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Suren Baghdasaryan, Michal Hocko, Peter Xu,
	linux-mm, Alex Williamson, Max Boone, stable

On 3/24/26 09:39, Mike Rapoport wrote:
> On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
>> follow_pfnmap_start() suffers from two problems:
>>
>> (1) We are not re-fetching the pmd/pud after taking the PTL
>>
>> Therefore, we are not properly stabilizing what the lock lock actually
> 
>                                                     ^ lock lock

Thanks, Andrew can you fix that up? Thanks!

> 
>> protects. If there is concurrent zapping, we would indicate to the
>> caller that we found an entry, however, that entry might already have
>> been invalidated, or contain a different PFN after taking the lock.
>>
>> Properly use pmdp_get() / pudp_get() after taking the lock.
>>
>> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
>>
>> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
>>
>> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
>> reasonable on non-present entries. Most architectures indeed either
>> perform a present check or make it work by smart use of flags.
>>
>> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
>> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
>> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
>> do that.
>>
>> Let's check pmd_present()/pud_present() before assuming "the is a
>> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
>> table handling code that traverses user page tables does.
>>
>> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
>> (1) is likely more relevant than (2). It is questionable how often (1)
>> would actually trigger, but let's CC stable to be sure.
>>
>> This was found by code inspection.
>>
>> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> 
>> ---
>> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
>> I actually trigger the follow_pfnmap machinery.
> 
> Most probably not :)
> KVM selftests might, didn't really dig into that. But I doubt any selftest
> would trigger potential races there.

I was wondering whether generic_access_phys() would somehow trigger on
an ordinary system when doing some more involved activities in user space :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
  2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
  2026-03-24  7:33 ` Vlastimil Babka (SUSE)
  2026-03-24  8:39 ` Mike Rapoport
@ 2026-03-24 11:04 ` Lorenzo Stoakes (Oracle)
  2026-03-24 12:46   ` David Hildenbrand (Arm)
  2 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-24 11:04 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Peter Xu,
	linux-mm, Alex Williamson, Max Boone, stable

On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
> follow_pfnmap_start() suffers from two problems:
>
> (1) We are not re-fetching the pmd/pud after taking the PTL
>
> Therefore, we are not properly stabilizing what the lock lock actually
> protects. If there is concurrent zapping, we would indicate to the
> caller that we found an entry, however, that entry might already have
> been invalidated, or contain a different PFN after taking the lock.
>
> Properly use pmdp_get() / pudp_get() after taking the lock.
>
> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
>
> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
>
> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> reasonable on non-present entries. Most architectures indeed either
> perform a present check or make it work by smart use of flags.

It seems huge page split is the main user via pmd_invalidate() ->
pmd_mkinvalid().

And I guess this is the kind of thing you mean by smart use of flags, for
x86-64:

static inline int pmd_present(pmd_t pmd)
{
	/*
	 * Checking for _PAGE_PSE is needed too because
	 * split_huge_page will temporarily clear the present bit (but
	 * the _PAGE_PSE flag will remain set at all times while the
	 * _PAGE_PRESENT bit is clear).
	 */
	return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
}

So you might have missing _PAGE_PRESENT but still pmd_present() returns
true, as does pmd_leaf().

Seems the same for RISC-V.

And other arches play other games with the same result :)

So we probably shouldn't actually hit any problem with this from any other
sauce, but still good to do it.

>
> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> do that.

But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
_VALID, _DIRTY, _PROTNONE) so it'd return true.

pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
retained on split so should all still have worked?

But anyway this is still worthwhile I think.

>
> Let's check pmd_present()/pud_present() before assuming "the is a
> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> table handling code that traverses user page tables does.
>
> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> (1) is likely more relevant than (2). It is questionable how often (1)
> would actually trigger, but let's CC stable to be sure.
>
> This was found by code inspection.
>
> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>

This looks correct to me, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
> I actually trigger the follow_pfnmap machinery.
> ---
>  mm/memory.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 219b9bf6cae0..2921d35c50ae 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
>
>  	pudp = pud_offset(p4dp, address);
>  	pud = pudp_get(pudp);
> -	if (pud_none(pud))
> +	if (!pud_present(pud))
>  		goto out;
>  	if (pud_leaf(pud)) {
>  		lock = pud_lock(mm, pudp);
> -		if (!unlikely(pud_leaf(pud))) {
> +		pud = pudp_get(pudp);
> +
> +		if (unlikely(!pud_present(pud))) {
> +			spin_unlock(lock);
> +			goto out;
> +		} else if (unlikely(!pud_leaf(pud))) {

Tiny nit, but no need for else here. Sometimes compilers complain about
this but not sure if it such pedantry is enabled in default kernel compiler
flags :)

Obv. same for below.

>  			spin_unlock(lock);
>  			goto retry;
>  		}
> @@ -6884,9 +6889,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
>
>  	pmdp = pmd_offset(pudp, address);
>  	pmd = pmdp_get_lockless(pmdp);
> +	if (!pmd_present(pmd))
> +		goto out;
>  	if (pmd_leaf(pmd)) {
>  		lock = pmd_lock(mm, pmdp);
> -		if (!unlikely(pmd_leaf(pmd))) {
> +		pmd = pmdp_get(pmdp);
> +
> +		if (unlikely(!pmd_present(pmd))) {
> +			spin_unlock(lock);
> +			goto out;
> +		} else if (unlikely(!pmd_leaf(pmd))) {
>  			spin_unlock(lock);
>  			goto retry;
>  		}
>
> ---
> base-commit: 3f4f1faa33544d0bd724e32980b6f211c3a9bc7b
> change-id: 20260323-follow_pfnmap_fix-bab73335468a
>
> Best regards,
> --
> David Hildenbrand (Arm) <david@kernel.org>
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
  2026-03-24 11:04 ` Lorenzo Stoakes (Oracle)
@ 2026-03-24 12:46   ` David Hildenbrand (Arm)
  2026-03-24 13:06     ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-24 12:46 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: linux-kernel, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Peter Xu,
	linux-mm, Alex Williamson, Max Boone, stable

On 3/24/26 12:04, Lorenzo Stoakes (Oracle) wrote:
> On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
>> follow_pfnmap_start() suffers from two problems:
>>
>> (1) We are not re-fetching the pmd/pud after taking the PTL
>>
>> Therefore, we are not properly stabilizing what the lock lock actually
>> protects. If there is concurrent zapping, we would indicate to the
>> caller that we found an entry, however, that entry might already have
>> been invalidated, or contain a different PFN after taking the lock.
>>
>> Properly use pmdp_get() / pudp_get() after taking the lock.
>>
>> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
>>
>> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
>>
>> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
>> reasonable on non-present entries. Most architectures indeed either
>> perform a present check or make it work by smart use of flags.
> 
> It seems huge page split is the main user via pmd_invalidate() ->
> pmd_mkinvalid().
> 
> And I guess this is the kind of thing you mean by smart use of flags, for
> x86-64:

Exactly.

[...]

> 
>>
>> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
>> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
>> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
>> do that.
> 
> But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
> whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
> and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
> _VALID, _DIRTY, _PROTNONE) so it'd return true.

pmd_present() will correctly indicate "not present" for, say, a softleaf
migration entry.

However, pmd_leaf() will indicate "leaf" for a softleaf migration entry.

So not checking pmd_present() will actually treat non-present migration
entries as present leafs in this function, which is wrong in the context
of this function.

We're walking present entries where things like pmd_pfn(pmd) etc make sense.

> 
> pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
> retained on split so should all still have worked?
> 
> But anyway this is still worthwhile I think.
> 
>>
>> Let's check pmd_present()/pud_present() before assuming "the is a
>> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
>> table handling code that traverses user page tables does.
>>
>> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
>> (1) is likely more relevant than (2). It is questionable how often (1)
>> would actually trigger, but let's CC stable to be sure.
>>
>> This was found by code inspection.
>>
>> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> This looks correct to me, so:
> 
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Thanks!

> 
>> ---
>> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
>> I actually trigger the follow_pfnmap machinery.
>> ---
>>  mm/memory.c | 18 +++++++++++++++---
>>  1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 219b9bf6cae0..2921d35c50ae 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
>>
>>  	pudp = pud_offset(p4dp, address);
>>  	pud = pudp_get(pudp);
>> -	if (pud_none(pud))
>> +	if (!pud_present(pud))
>>  		goto out;
>>  	if (pud_leaf(pud)) {
>>  		lock = pud_lock(mm, pudp);
>> -		if (!unlikely(pud_leaf(pud))) {
>> +		pud = pudp_get(pudp);
>> +
>> +		if (unlikely(!pud_present(pud))) {
>> +			spin_unlock(lock);
>> +			goto out;
>> +		} else if (unlikely(!pud_leaf(pud))) {
> 
> Tiny nit, but no need for else here. Sometimes compilers complain about
> this but not sure if it such pedantry is enabled in default kernel compiler
> flags :)

You mean

if (unlikely(!pud_present(pud))) {
	spin_unlock(lock);
	goto out;
}
if (...) {

?

That just creates an additional LOC without any benefit IMHO. And we use
it all over the place :)

In fact, I will beat any C compiler with the C standard that complains
about that ;)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()
  2026-03-24 12:46   ` David Hildenbrand (Arm)
@ 2026-03-24 13:06     ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-24 13:06 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Peter Xu,
	linux-mm, Alex Williamson, Max Boone, stable

On Tue, Mar 24, 2026 at 01:46:20PM +0100, David Hildenbrand (Arm) wrote:
> On 3/24/26 12:04, Lorenzo Stoakes (Oracle) wrote:
> > On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
> >> follow_pfnmap_start() suffers from two problems:
> >>
> >> (1) We are not re-fetching the pmd/pud after taking the PTL
> >>
> >> Therefore, we are not properly stabilizing what the lock lock actually
> >> protects. If there is concurrent zapping, we would indicate to the
> >> caller that we found an entry, however, that entry might already have
> >> been invalidated, or contain a different PFN after taking the lock.
> >>
> >> Properly use pmdp_get() / pudp_get() after taking the lock.
> >>
> >> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
> >>
> >> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
> >>
> >> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> >> reasonable on non-present entries. Most architectures indeed either
> >> perform a present check or make it work by smart use of flags.
> >
> > It seems huge page split is the main user via pmd_invalidate() ->
> > pmd_mkinvalid().
> >
> > And I guess this is the kind of thing you mean by smart use of flags, for
> > x86-64:
>
> Exactly.
>
> [...]
>
> >
> >>
> >> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> >> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> >> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> >> do that.
> >
> > But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
> > whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
> > and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
> > _VALID, _DIRTY, _PROTNONE) so it'd return true.
>
> pmd_present() will correctly indicate "not present" for, say, a softleaf
> migration entry.
>
> However, pmd_leaf() will indicate "leaf" for a softleaf migration entry.

Right yeah that's true. By definition softleaves are non-present. But as they
are leaves, you'd expect pXX_leaf() to return true.

>
> So not checking pmd_present() will actually treat non-present migration
> entries as present leafs in this function, which is wrong in the context
> of this function.
>
> We're walking present entries where things like pmd_pfn(pmd) etc make sense.

Ack, makes sense, thanks!

>
> >
> > pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
> > retained on split so should all still have worked?
> >
> > But anyway this is still worthwhile I think.
> >
> >>
> >> Let's check pmd_present()/pud_present() before assuming "the is a
> >> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> >> table handling code that traverses user page tables does.
> >>
> >> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> >> (1) is likely more relevant than (2). It is questionable how often (1)
> >> would actually trigger, but let's CC stable to be sure.
> >>
> >> This was found by code inspection.
> >>
> >> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> >
> > This looks correct to me, so:
> >
> > Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>
> Thanks!
>
> >
> >> ---
> >> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
> >> I actually trigger the follow_pfnmap machinery.
> >> ---
> >>  mm/memory.c | 18 +++++++++++++++---
> >>  1 file changed, 15 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 219b9bf6cae0..2921d35c50ae 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
> >>
> >>  	pudp = pud_offset(p4dp, address);
> >>  	pud = pudp_get(pudp);
> >> -	if (pud_none(pud))
> >> +	if (!pud_present(pud))
> >>  		goto out;
> >>  	if (pud_leaf(pud)) {
> >>  		lock = pud_lock(mm, pudp);
> >> -		if (!unlikely(pud_leaf(pud))) {
> >> +		pud = pudp_get(pudp);
> >> +
> >> +		if (unlikely(!pud_present(pud))) {
> >> +			spin_unlock(lock);
> >> +			goto out;
> >> +		} else if (unlikely(!pud_leaf(pud))) {
> >
> > Tiny nit, but no need for else here. Sometimes compilers complain about
> > this but not sure if it such pedantry is enabled in default kernel compiler
> > flags :)
>
> You mean
>
> if (unlikely(!pud_present(pud))) {
> 	spin_unlock(lock);
> 	goto out;
> }
> if (...) {
>
> ?
>
> That just creates an additional LOC without any benefit IMHO. And we use
> it all over the place :)

Yeah I think the argument is you don't want to imply that it could somehow _not_
be else. But I think it's the compiler being a wee bit pendatic... :)

>
> In fact, I will beat any C compiler with the C standard that complains
> about that ;)

Haha, I'd like to see that!

>
> --
> Cheers,
>
> David

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-24 13:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 20:20 [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start() David Hildenbrand (Arm)
2026-03-24  7:33 ` Vlastimil Babka (SUSE)
2026-03-24  8:05   ` David Hildenbrand (Arm)
2026-03-24  8:39 ` Mike Rapoport
2026-03-24  9:26   ` David Hildenbrand (Arm)
2026-03-24 11:04 ` Lorenzo Stoakes (Oracle)
2026-03-24 12:46   ` David Hildenbrand (Arm)
2026-03-24 13:06     ` Lorenzo Stoakes (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox