Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd
@ 2026-05-08  1:37 Wei Yang
  2026-05-08 21:51 ` Andrew Morton
  2026-05-08 22:48 ` Balbir Singh
  0 siblings, 2 replies; 5+ messages in thread
From: Wei Yang @ 2026-05-08  1:37 UTC (permalink / raw)
  To: akpm, david, ljs, riel, liam, vbabka, harry, jannh, sj, ziy,
	balbirs
  Cc: linux-mm, Wei Yang, Lorenzo Stoakes, stable

For pmd_trans_huge() and pmd_is_migration_entry(), we does following
before return the pmd entry:

  * re-validate pmd entry
  * check PVMW_MIGRATION
  * check_pmd()
  * handle on pte level if split under us

But for device-private pmd, we just return after pmd_lock(). This may
lead to inproper situation.

This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration
support device-private entries") by following the same pattern as
pmd_trans_huge() and pmd_is_migration_entry().

Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: <stable@vger.kernel.org>
---
 mm/page_vma_mapped.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index a4d52fdb3056..5d337ea43019 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -269,21 +269,33 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 			spin_unlock(pvmw->ptl);
 			pvmw->ptl = NULL;
 		} else if (!pmd_present(pmde)) {
-			const softleaf_t entry = softleaf_from_pmd(pmde);
+			softleaf_t entry = softleaf_from_pmd(pmde);
 
 			if (softleaf_is_device_private(entry)) {
 				pvmw->ptl = pmd_lock(mm, pvmw->pmd);
-				return true;
-			}
-
-			if ((pvmw->flags & PVMW_SYNC) &&
-			    thp_vma_suitable_order(vma, pvmw->address,
-						   PMD_ORDER) &&
-			    (pvmw->nr_pages >= HPAGE_PMD_NR))
-				sync_with_folio_pmd_zap(mm, pvmw->pmd);
+				entry = softleaf_from_pmd(*pvmw->pmd);
+
+				if (softleaf_is_device_private(entry)) {
+					if (pvmw->flags & PVMW_MIGRATION)
+						return not_found(pvmw);
+					if (!check_pmd(softleaf_to_pfn(entry), pvmw))
+						return not_found(pvmw);
+					return true;
+				}
 
-			step_forward(pvmw, PMD_SIZE);
-			continue;
+				/* THP pmd was split under us: handle on pte level */
+				spin_unlock(pvmw->ptl);
+				pvmw->ptl = NULL;
+			} else {
+				if ((pvmw->flags & PVMW_SYNC) &&
+				    thp_vma_suitable_order(vma, pvmw->address,
+							   PMD_ORDER) &&
+				    (pvmw->nr_pages >= HPAGE_PMD_NR))
+					sync_with_folio_pmd_zap(mm, pvmw->pmd);
+
+				step_forward(pvmw, PMD_SIZE);
+				continue;
+			}
 		}
 		if (!map_pte(pvmw, &pmde, &ptl)) {
 			if (!pvmw->pte)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd
  2026-05-08  1:37 [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Wei Yang
@ 2026-05-08 21:51 ` Andrew Morton
  2026-05-10  1:22   ` Wei Yang
  2026-05-08 22:48 ` Balbir Singh
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2026-05-08 21:51 UTC (permalink / raw)
  To: Wei Yang
  Cc: david, ljs, riel, liam, vbabka, harry, jannh, sj, ziy, balbirs,
	linux-mm, Lorenzo Stoakes, stable

On Fri,  8 May 2026 01:37:28 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:

> For pmd_trans_huge() and pmd_is_migration_entry(), we does following
> before return the pmd entry:
> 
>   * re-validate pmd entry
>   * check PVMW_MIGRATION
>   * check_pmd()
>   * handle on pte level if split under us
> 
> But for device-private pmd, we just return after pmd_lock(). This may
> lead to inproper situation.

What is "improper situation"?

> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration
> support device-private entries") by following the same pattern as
> pmd_trans_huge() and pmd_is_migration_entry().
> 
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Balbir Singh <balbirs@nvidia.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: <stable@vger.kernel.org>

If we're to propose a fix for -stable backporting I believe we should
fully explain to -stable maintainers *why* we're making that proposal.

IOW, and not for the first time(!), what are the worst-case
userspace-visible effects of this bug?




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd
  2026-05-08  1:37 [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Wei Yang
  2026-05-08 21:51 ` Andrew Morton
@ 2026-05-08 22:48 ` Balbir Singh
  2026-05-10  1:20   ` Wei Yang
  1 sibling, 1 reply; 5+ messages in thread
From: Balbir Singh @ 2026-05-08 22:48 UTC (permalink / raw)
  To: Wei Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, sj,
	ziy
  Cc: linux-mm, Lorenzo Stoakes, stable

On 5/8/26 11:37, Wei Yang wrote:
> For pmd_trans_huge() and pmd_is_migration_entry(), we does following
> before return the pmd entry:
> 
>   * re-validate pmd entry
>   * check PVMW_MIGRATION
>   * check_pmd()
>   * handle on pte level if split under us
> 
> But for device-private pmd, we just return after pmd_lock(). This may
> lead to inproper situation.
> 

Could you elaborate a more on the improper situation?

> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration
> support device-private entries") by following the same pattern as
> pmd_trans_huge() and pmd_is_migration_entry().
> 
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Balbir Singh <balbirs@nvidia.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: <stable@vger.kernel.org>
> ---
>  mm/page_vma_mapped.c | 34 +++++++++++++++++++++++-----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index a4d52fdb3056..5d337ea43019 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -269,21 +269,33 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>  			spin_unlock(pvmw->ptl);
>  			pvmw->ptl = NULL;
>  		} else if (!pmd_present(pmde)) {
> -			const softleaf_t entry = softleaf_from_pmd(pmde);
> +			softleaf_t entry = softleaf_from_pmd(pmde);
>  
>  			if (softleaf_is_device_private(entry)) {
>  				pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> -				return true;
> -			}
> -
> -			if ((pvmw->flags & PVMW_SYNC) &&
> -			    thp_vma_suitable_order(vma, pvmw->address,
> -						   PMD_ORDER) &&
> -			    (pvmw->nr_pages >= HPAGE_PMD_NR))
> -				sync_with_folio_pmd_zap(mm, pvmw->pmd);
> +				entry = softleaf_from_pmd(*pvmw->pmd);
> +
> +				if (softleaf_is_device_private(entry)) {

Do we need to check softleaf_is_device_private() twice, can't we hold the pmd
lock and check once?

> +					if (pvmw->flags & PVMW_MIGRATION)
> +						return not_found(pvmw);

Double check, do we want to skip migration pte's (from remove_migration_pte)

> +					if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> +						return not_found(pvmw);
> +					return true;
> +				}
>  
> -			step_forward(pvmw, PMD_SIZE);
> -			continue;
> +				/* THP pmd was split under us: handle on pte level */
> +				spin_unlock(pvmw->ptl);
> +				pvmw->ptl = NULL;
> +			} else {
> +				if ((pvmw->flags & PVMW_SYNC) &&
> +				    thp_vma_suitable_order(vma, pvmw->address,
> +							   PMD_ORDER) &&
> +				    (pvmw->nr_pages >= HPAGE_PMD_NR))
> +					sync_with_folio_pmd_zap(mm, pvmw->pmd);
> +
> +				step_forward(pvmw, PMD_SIZE);
> +				continue;
> +			}
>  		}
>  		if (!map_pte(pvmw, &pmde, &ptl)) {
>  			if (!pvmw->pte)


How was this tested? Did you run hmm-tests? Is there a broken user space
that caught the issue?

Balbir Singh



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd
  2026-05-08 22:48 ` Balbir Singh
@ 2026-05-10  1:20   ` Wei Yang
  0 siblings, 0 replies; 5+ messages in thread
From: Wei Yang @ 2026-05-10  1:20 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Wei Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, sj,
	ziy, linux-mm, Lorenzo Stoakes, stable

On Sat, May 09, 2026 at 08:48:37AM +1000, Balbir Singh wrote:
>On 5/8/26 11:37, Wei Yang wrote:
>> For pmd_trans_huge() and pmd_is_migration_entry(), we does following
>> before return the pmd entry:
>> 
>>   * re-validate pmd entry
>>   * check PVMW_MIGRATION
>>   * check_pmd()
>>   * handle on pte level if split under us
>> 
>> But for device-private pmd, we just return after pmd_lock(). This may
>> lead to inproper situation.
>> 
>
>Could you elaborate a more on the improper situation?
>

For example, in remove_migration_pte() page_vma_mapped_walk() may return true
on a device-private entry even it is not a migration entry.

>> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration
>> support device-private entries") by following the same pattern as
>> pmd_trans_huge() and pmd_is_migration_entry().
>> 
>> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Cc: David Hildenbrand <david@kernel.org>
>> Cc: Balbir Singh <balbirs@nvidia.com>
>> Cc: SeongJae Park <sj@kernel.org>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>>  mm/page_vma_mapped.c | 34 +++++++++++++++++++++++-----------
>>  1 file changed, 23 insertions(+), 11 deletions(-)
>> 
>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>> index a4d52fdb3056..5d337ea43019 100644
>> --- a/mm/page_vma_mapped.c
>> +++ b/mm/page_vma_mapped.c
>> @@ -269,21 +269,33 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>  			spin_unlock(pvmw->ptl);
>>  			pvmw->ptl = NULL;
>>  		} else if (!pmd_present(pmde)) {
>> -			const softleaf_t entry = softleaf_from_pmd(pmde);
>> +			softleaf_t entry = softleaf_from_pmd(pmde);
>>  
>>  			if (softleaf_is_device_private(entry)) {
>>  				pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> -				return true;
>> -			}
>> -
>> -			if ((pvmw->flags & PVMW_SYNC) &&
>> -			    thp_vma_suitable_order(vma, pvmw->address,
>> -						   PMD_ORDER) &&
>> -			    (pvmw->nr_pages >= HPAGE_PMD_NR))
>> -				sync_with_folio_pmd_zap(mm, pvmw->pmd);
>> +				entry = softleaf_from_pmd(*pvmw->pmd);
>> +
>> +				if (softleaf_is_device_private(entry)) {
>
>Do we need to check softleaf_is_device_private() twice, can't we hold the pmd
>lock and check once?
>

We discussed this code on [1], which spot the difference between
device-private pmd and the other two pmd case which re-validation after
pmd_lock().

Do check after pmd_lock() share the same pattern as the other two pmd entry
case. Also lock is heavy, check & lock & re-validate seems more friendly to
system. Otherwise we always need to grab lock.

And David suggest to use softleaf_is_device_private() again, [2]. 

>> +					if (pvmw->flags & PVMW_MIGRATION)
>> +						return not_found(pvmw);
>
>Double check, do we want to skip migration pte's (from remove_migration_pte)
>

Do you mean skip device-private entry?

remove_migration_pte() looks for migration entry, and tries to replace it.

The semantics above is: if it looks for migration entry, return not_found()
for device-private entry. Since device-private entry is not migration entry,
IIUC.

>> +					if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>> +						return not_found(pvmw);
>> +					return true;
>> +				}
>>  
>> -			step_forward(pvmw, PMD_SIZE);
>> -			continue;
>> +				/* THP pmd was split under us: handle on pte level */
>> +				spin_unlock(pvmw->ptl);
>> +				pvmw->ptl = NULL;
>> +			} else {
>> +				if ((pvmw->flags & PVMW_SYNC) &&
>> +				    thp_vma_suitable_order(vma, pvmw->address,
>> +							   PMD_ORDER) &&
>> +				    (pvmw->nr_pages >= HPAGE_PMD_NR))
>> +					sync_with_folio_pmd_zap(mm, pvmw->pmd);
>> +
>> +				step_forward(pvmw, PMD_SIZE);
>> +				continue;
>> +			}
>>  		}
>>  		if (!map_pte(pvmw, &pmde, &ptl)) {
>>  			if (!pvmw->pte)
>
>
>How was this tested? Did you run hmm-tests? Is there a broken user space
>that caught the issue?

I didn't do device-private memory related test.

IIUC, device-private memory is device related. I don't have such devices.
Or we have other workaround to test it? Glad to know if so.

BTW, this fix is from pure code analysis and discussion. Maybe it would be
better to include you in the discussion first, then I could get more
background on it.

>
>Balbir Singh

[1]: https://lore.kernel.org/all/c71930ae-19d9-4b3b-a74d-3de3261c4d43@kernel.org/
[2]: https://lore.kernel.org/all/413feed4-6aab-43d9-b7e5-a9386fa79f4b@kernel.org/


-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd
  2026-05-08 21:51 ` Andrew Morton
@ 2026-05-10  1:22   ` Wei Yang
  0 siblings, 0 replies; 5+ messages in thread
From: Wei Yang @ 2026-05-10  1:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wei Yang, david, ljs, riel, liam, vbabka, harry, jannh, sj, ziy,
	balbirs, linux-mm, Lorenzo Stoakes, stable

On Fri, May 08, 2026 at 02:51:21PM -0700, Andrew Morton wrote:
>On Fri,  8 May 2026 01:37:28 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> For pmd_trans_huge() and pmd_is_migration_entry(), we does following
>> before return the pmd entry:
>> 
>>   * re-validate pmd entry
>>   * check PVMW_MIGRATION
>>   * check_pmd()
>>   * handle on pte level if split under us
>> 
>> But for device-private pmd, we just return after pmd_lock(). This may
>> lead to improper situation.
>
>What is "improper situation"?
>

For example, in remove_migration_pte(), page_vma_mapped_walk() may return
device-private entry which is not a migration entry.

>> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration
>> support device-private entries") by following the same pattern as
>> pmd_trans_huge() and pmd_is_migration_entry().
>> 
>> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Cc: David Hildenbrand <david@kernel.org>
>> Cc: Balbir Singh <balbirs@nvidia.com>
>> Cc: SeongJae Park <sj@kernel.org>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Cc: <stable@vger.kernel.org>
>
>If we're to propose a fix for -stable backporting I believe we should
>fully explain to -stable maintainers *why* we're making that proposal.
>

IIUC, we may do migration on a wrong pmd entry, which may corrupt data.

>IOW, and not for the first time(!), what are the worst-case
>userspace-visible effects of this bug?
>

Got it, will pay attention. Sorry for the trouble.

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-10  1:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08  1:37 [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Wei Yang
2026-05-08 21:51 ` Andrew Morton
2026-05-10  1:22   ` Wei Yang
2026-05-08 22:48 ` Balbir Singh
2026-05-10  1:20   ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox