* [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
@ 2026-06-24 6:53 Wei Yang
2026-06-24 8:57 ` Lance Yang
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Wei Yang @ 2026-06-24 6:53 UTC (permalink / raw)
To: akpm, david, ljs, riel, liam, vbabka, harry, jannh, ziy, sj,
balbirs
Cc: linux-mm, linux-kernel, Wei Yang, stable, Lance Yang
Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
device-private entries") introduced the concept of device-private
PMD entries, but did not correctly update the rmap walk code to
account for them.
As a result, when page_vma_mapped_walk() encounters device-private
PMD entries, it takes no action other than to acquire the PMD lock
and exit.
However this is highly problematic for two reasons - firstly,
device private entries possess a PFN so check_pmd() needs to be
called to ensure an overlapping PFN range.
Secondly, and more importantly, if PVMW_MIGRATION is set the
caller assumes the returned entry is a migration entry, resulting
in memory corruption when the caller tries to interpret the device
private entry as such.
In addition, commit 146287290023 ("mm/huge_memory: implement
device-private THP splitting") allowed device private PMDs to be
split like THP mappings, but again did not update this code path.
As a result, we might race a PMD split prior to acquiring the PMD
lock.
This patch addresses all of these issues by invoking check_pmd(),
ensuring PMVW_MIGRATION is not set and checks whether a split raced
us we do for PMD THP and migration entries.
Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
Cc: <stable@vger.kernel.org>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Suggested-by: David Hildenbrand <david@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
---
v4:
* refine subject and commit log based on Lorenzo's suggestion
* put pmd device-private entry handling in its own if branch,
suggested by Lorenzo
v3:
* remove cleanup part, only fix the issue for device-private entry
* refine user effect description based on Lorenzo's suggestion
v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
* specify the possible error case of current code and user visible effect
* besides fix, cleanup the pmd entry handling based on David's suggestion
v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
---
mm/page_vma_mapped.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 2ccbabfb2cc1..17dff8aab9f9 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
/* THP pmd was split under us: handle on pte level */
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
- } else if (!pmd_present(pmde)) {
- const softleaf_t entry = softleaf_from_pmd(pmde);
+ } else if (pmd_is_device_private_entry(pmde)) {
+ softleaf_t entry;
+
+ pvmw->ptl = pmd_lock(mm, pvmw->pmd);
+ pmde = *pvmw->pmd;
+ entry = softleaf_from_pmd(pmde);
- if (softleaf_is_device_private(entry)) {
- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
+ if (likely(softleaf_is_device_private(entry))) {
+ if (pvmw->flags & PVMW_MIGRATION)
+ return not_found(pvmw);
+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
+ return not_found(pvmw);
return true;
}
-
+ /* device-private pmd was split under us: handle on pte level */
+ spin_unlock(pvmw->ptl);
+ pvmw->ptl = NULL;
+ } else if (!pmd_present(pmde)) {
if ((pvmw->flags & PVMW_SYNC) &&
thp_vma_suitable_order(vma, pvmw->address,
PMD_ORDER) &&
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang
@ 2026-06-24 8:57 ` Lance Yang
2026-06-25 9:57 ` Wei Yang
` (2 more replies)
2026-06-25 11:12 ` Balbir Singh
` (2 subsequent siblings)
3 siblings, 3 replies; 19+ messages in thread
From: Lance Yang @ 2026-06-24 8:57 UTC (permalink / raw)
To: richard.weiyang
Cc: akpm, david, ljs, riel, liam, vbabka, harry, jannh, ziy, sj,
balbirs, linux-mm, linux-kernel, stable, lance.yang
On Wed, Jun 24, 2026 at 06:53:53AM +0000, Wei Yang wrote:
>Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>device-private entries") introduced the concept of device-private
>PMD entries, but did not correctly update the rmap walk code to
>account for them.
>
>As a result, when page_vma_mapped_walk() encounters device-private
>PMD entries, it takes no action other than to acquire the PMD lock
>and exit.
>
>However this is highly problematic for two reasons - firstly,
>device private entries possess a PFN so check_pmd() needs to be
>called to ensure an overlapping PFN range.
>
>Secondly, and more importantly, if PVMW_MIGRATION is set the
>caller assumes the returned entry is a migration entry, resulting
>in memory corruption when the caller tries to interpret the device
>private entry as such.
>
>In addition, commit 146287290023 ("mm/huge_memory: implement
>device-private THP splitting") allowed device private PMDs to be
>split like THP mappings, but again did not update this code path.
>
>As a result, we might race a PMD split prior to acquiring the PMD
>lock.
>
>This patch addresses all of these issues by invoking check_pmd(),
>ensuring PMVW_MIGRATION is not set and checks whether a split raced
>us we do for PMD THP and migration entries.
>
>Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>Cc: <stable@vger.kernel.org>
>Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>Suggested-by: David Hildenbrand <david@kernel.org>
Shouldn't we add
Suggested-by: Lorenzo Stoakes <ljs@kernel.org>
as well?
v4 mostly follows Lorenzo's comments, code bits included. Feels only fair.
>Cc: David Hildenbrand <david@kernel.org>
>Cc: Balbir Singh <balbirs@nvidia.com>
>Cc: SeongJae Park <sj@kernel.org>
>Cc: Zi Yan <ziy@nvidia.com>
>Cc: Lorenzo Stoakes <ljs@kernel.org>
>Cc: Lance Yang <lance.yang@linux.dev>
>
>---
>v4:
> * refine subject and commit log based on Lorenzo's suggestion
> * put pmd device-private entry handling in its own if branch,
> suggested by Lorenzo
>
>v3:
> * remove cleanup part, only fix the issue for device-private entry
> * refine user effect description based on Lorenzo's suggestion
>
>v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
> * specify the possible error case of current code and user visible effect
> * besides fix, cleanup the pmd entry handling based on David's suggestion
>
>v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
>---
> mm/page_vma_mapped.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>index 2ccbabfb2cc1..17dff8aab9f9 100644
>--- a/mm/page_vma_mapped.c
>+++ b/mm/page_vma_mapped.c
>@@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
Hmm ... looks like there may still be a race here ...
Current code picks the branch from the lockless PMD value:
pmde = pmdp_get_lockless(pvmw->pmd);
if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
pmde = *pvmw->pmd;
if (!pmd_present(pmde)) {
softleaf_t entry;
if (!thp_migration_supported() ||
!(pvmw->flags & PVMW_MIGRATION))
return not_found(pvmw);
entry = softleaf_from_pmd(pmde);
if (!softleaf_is_migration(entry) ||
!check_pmd(softleaf_to_pfn(entry), pvmw))
return not_found(pvmw);
return true;
}
}
But after taking PTL, the PMD may already be a different non-present PMD
type:
CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry
CPU1: remove_migration_ptes(src, dst /* device-private */)
... via rmap_walk(dst) ...
page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */)
returns with PTL held for the PMD migration entry
remove_migration_pmd(new = dst page)
installs a device-private PMD
next page_vma_mapped_walk()
drops PTL via not_found()
CPU0: takes PTL
pmde = *pvmw->pmd; // now device-private PMD
So when PVMW_MIGRATION is not set, current code can return not_found()
before we even decode the locked PMD as a device-private entry.
Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
device-private entries") made the
device-private PMD <-> PMD migration
transition possible.
set_pmd_migration_entry() can replace a device-private PMD with a PMD
migration entry, and remove_migration_pmd() can restore a PMD migration
entry back to a device-private PMD when the new folio is device-private.
Maybe decode the locked softleaf entry first, before the migration-only
checks? Something like this on top:
---8<---
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 17dff8aab9f9..97babd408dba 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -249,10 +249,18 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (!pmd_present(pmde)) {
softleaf_t entry;
+ entry = softleaf_from_pmd(pmde);
+ if (softleaf_is_device_private(entry)) {
+ if (pvmw->flags & PVMW_MIGRATION)
+ return not_found(pvmw);
+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
+ return not_found(pvmw);
+ return true;
+ }
+
if (!thp_migration_supported() ||
!(pvmw->flags & PVMW_MIGRATION))
return not_found(pvmw);
- entry = softleaf_from_pmd(pmde);
if (!softleaf_is_migration(entry) ||
!check_pmd(softleaf_to_pfn(entry), pvmw))
@@ -266,7 +274,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return not_found(pvmw);
return true;
}
- /* THP pmd was split under us: handle on pte level */
+ /*
+ * THP pmd was split under us, or device-private PMD
+ * changed under us: handle on pte level.
+ */
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
} else if (pmd_is_device_private_entry(pmde)) {
--
Anyway, that stuff is getting kinda messy now. Feels like it really needs
a cleanup on top before it bites us again :)
Cheers, Lance
> /* THP pmd was split under us: handle on pte level */
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
>- } else if (!pmd_present(pmde)) {
>- const softleaf_t entry = softleaf_from_pmd(pmde);
>+ } else if (pmd_is_device_private_entry(pmde)) {
>+ softleaf_t entry;
>+
>+ pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>+ pmde = *pvmw->pmd;
>+ entry = softleaf_from_pmd(pmde);
>
>- if (softleaf_is_device_private(entry)) {
>- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>+ if (likely(softleaf_is_device_private(entry))) {
>+ if (pvmw->flags & PVMW_MIGRATION)
>+ return not_found(pvmw);
>+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>+ return not_found(pvmw);
> return true;
> }
>-
>+ /* device-private pmd was split under us: handle on pte level */
>+ spin_unlock(pvmw->ptl);
>+ pvmw->ptl = NULL;
>+ } else if (!pmd_present(pmde)) {
> if ((pvmw->flags & PVMW_SYNC) &&
> thp_vma_suitable_order(vma, pvmw->address,
> PMD_ORDER) &&
>--
>2.34.1
>
>
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-24 8:57 ` Lance Yang
@ 2026-06-25 9:57 ` Wei Yang
2026-06-25 10:37 ` David Hildenbrand (Arm)
2026-06-25 11:42 ` Lance Yang
2026-06-25 13:12 ` Lorenzo Stoakes
2 siblings, 1 reply; 19+ messages in thread
From: Wei Yang @ 2026-06-25 9:57 UTC (permalink / raw)
To: Lance Yang
Cc: richard.weiyang, akpm, david, ljs, riel, liam, vbabka, harry,
jannh, ziy, sj, balbirs, linux-mm, linux-kernel, stable
On Wed, Jun 24, 2026 at 04:57:56PM +0800, Lance Yang wrote:
>
>On Wed, Jun 24, 2026 at 06:53:53AM +0000, Wei Yang wrote:
>>Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>>device-private entries") introduced the concept of device-private
>>PMD entries, but did not correctly update the rmap walk code to
>>account for them.
>>
>>As a result, when page_vma_mapped_walk() encounters device-private
>>PMD entries, it takes no action other than to acquire the PMD lock
>>and exit.
>>
>>However this is highly problematic for two reasons - firstly,
>>device private entries possess a PFN so check_pmd() needs to be
>>called to ensure an overlapping PFN range.
>>
>>Secondly, and more importantly, if PVMW_MIGRATION is set the
>>caller assumes the returned entry is a migration entry, resulting
>>in memory corruption when the caller tries to interpret the device
>>private entry as such.
>>
>>In addition, commit 146287290023 ("mm/huge_memory: implement
>>device-private THP splitting") allowed device private PMDs to be
>>split like THP mappings, but again did not update this code path.
>>
>>As a result, we might race a PMD split prior to acquiring the PMD
>>lock.
>>
>>This patch addresses all of these issues by invoking check_pmd(),
>>ensuring PMVW_MIGRATION is not set and checks whether a split raced
>>us we do for PMD THP and migration entries.
>>
>>Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>>Cc: <stable@vger.kernel.org>
>>Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>Suggested-by: David Hildenbrand <david@kernel.org>
>
>Shouldn't we add
>
>Suggested-by: Lorenzo Stoakes <ljs@kernel.org>
>
>as well?
>
>v4 mostly follows Lorenzo's comments, code bits included. Feels only fair.
Fair enough, added.
>
>>Cc: David Hildenbrand <david@kernel.org>
>>Cc: Balbir Singh <balbirs@nvidia.com>
>>Cc: SeongJae Park <sj@kernel.org>
>>Cc: Zi Yan <ziy@nvidia.com>
>>Cc: Lorenzo Stoakes <ljs@kernel.org>
>>Cc: Lance Yang <lance.yang@linux.dev>
>>
>>---
>>v4:
>> * refine subject and commit log based on Lorenzo's suggestion
>> * put pmd device-private entry handling in its own if branch,
>> suggested by Lorenzo
>>
>>v3:
>> * remove cleanup part, only fix the issue for device-private entry
>> * refine user effect description based on Lorenzo's suggestion
>>
>>v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
>> * specify the possible error case of current code and user visible effect
>> * besides fix, cleanup the pmd entry handling based on David's suggestion
>>
>>v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
>>---
>> mm/page_vma_mapped.c | 20 +++++++++++++++-----
>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>
>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>index 2ccbabfb2cc1..17dff8aab9f9 100644
>>--- a/mm/page_vma_mapped.c
>>+++ b/mm/page_vma_mapped.c
>>@@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>
>
>Hmm ... looks like there may still be a race here ...
>
>Current code picks the branch from the lockless PMD value:
>
> pmde = pmdp_get_lockless(pvmw->pmd);
>
> if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> pmde = *pvmw->pmd;
> if (!pmd_present(pmde)) {
> softleaf_t entry;
>
> if (!thp_migration_supported() ||
> !(pvmw->flags & PVMW_MIGRATION))
> return not_found(pvmw);
> entry = softleaf_from_pmd(pmde);
>
> if (!softleaf_is_migration(entry) ||
> !check_pmd(softleaf_to_pfn(entry), pvmw))
> return not_found(pvmw);
> return true;
> }
> }
>
>But after taking PTL, the PMD may already be a different non-present PMD
>type:
>
>CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry
>
>CPU1: remove_migration_ptes(src, dst /* device-private */)
> ... via rmap_walk(dst) ...
> page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */)
> returns with PTL held for the PMD migration entry
> remove_migration_pmd(new = dst page)
> installs a device-private PMD
> next page_vma_mapped_walk()
> drops PTL via not_found()
>
>CPU0: takes PTL
> pmde = *pvmw->pmd; // now device-private PMD
>
>So when PVMW_MIGRATION is not set, current code can return not_found()
>before we even decode the locked PMD as a device-private entry.
>
>Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>device-private entries") made the
>
>device-private PMD <-> PMD migration
>
>transition possible.
>
>set_pmd_migration_entry() can replace a device-private PMD with a PMD
>migration entry, and remove_migration_pmd() can restore a PMD migration
>entry back to a device-private PMD when the new folio is device-private.
>
Nice catch.
But I think this matters if migration fail and restore the pmd to src folio.
When we successfully migrate to new folio, check_pmd() could catch it and
return not_found(). IIUC.
One more question: assume A unmap a folio, and B migrate the same one.
If B set_pmd_migration_entry() first, then A won't see this PMD from
page_vma_mapped_walk(), IIUC. Then B failed to migrate, and restore the folio
as this PMD migration entry is there. So A should check the status after
unmap, right? Would it see unstable status?
I am a little lost what is the correct way to do here.
>Maybe decode the locked softleaf entry first, before the migration-only
>checks? Something like this on top:
>
>---8<---
>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>index 17dff8aab9f9..97babd408dba 100644
>--- a/mm/page_vma_mapped.c
>+++ b/mm/page_vma_mapped.c
>@@ -249,10 +249,18 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> if (!pmd_present(pmde)) {
> softleaf_t entry;
>
>+ entry = softleaf_from_pmd(pmde);
>+ if (softleaf_is_device_private(entry)) {
>+ if (pvmw->flags & PVMW_MIGRATION)
>+ return not_found(pvmw);
>+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>+ return not_found(pvmw);
>+ return true;
>+ }
>+
If we have to do this, I am afraid we can put all three cases handling
here...
Not necessary to put pmd_is_device_private_entry() handling in two places.
> if (!thp_migration_supported() ||
> !(pvmw->flags & PVMW_MIGRATION))
> return not_found(pvmw);
>- entry = softleaf_from_pmd(pmde);
>
> if (!softleaf_is_migration(entry) ||
> !check_pmd(softleaf_to_pfn(entry), pvmw))
>@@ -266,7 +274,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> return not_found(pvmw);
> return true;
> }
>- /* THP pmd was split under us: handle on pte level */
>+ /*
>+ * THP pmd was split under us, or device-private PMD
>+ * changed under us: handle on pte level.
>+ */
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> } else if (pmd_is_device_private_entry(pmde)) {
>--
>
>Anyway, that stuff is getting kinda messy now. Feels like it really needs
>a cleanup on top before it bites us again :)
Agree.
I haven't imagined this would be more complicated than I thought :-)
>Cheers, Lance
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-25 9:57 ` Wei Yang
@ 2026-06-25 10:37 ` David Hildenbrand (Arm)
2026-06-25 11:25 ` Lance Yang
0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-25 10:37 UTC (permalink / raw)
To: Wei Yang, Lance Yang
Cc: akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, balbirs,
linux-mm, linux-kernel, stable
>> CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry
>>
>> CPU1: remove_migration_ptes(src, dst /* device-private */)
>> ... via rmap_walk(dst) ...
>> page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */)
>> returns with PTL held for the PMD migration entry
>> remove_migration_pmd(new = dst page)
>> installs a device-private PMD
>> next page_vma_mapped_walk()
>> drops PTL via not_found()
>>
>> CPU0: takes PTL
>> pmde = *pvmw->pmd; // now device-private PMD
>>
>> So when PVMW_MIGRATION is not set, current code can return not_found()
>> before we even decode the locked PMD as a device-private entry.
>>
>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>> device-private entries") made the
>>
>> device-private PMD <-> PMD migration
>>
>> transition possible.
Doesn't the folio lock help here already?
--
Cheers,
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang
2026-06-24 8:57 ` Lance Yang
@ 2026-06-25 11:12 ` Balbir Singh
2026-06-26 0:44 ` Wei Yang
2026-06-25 19:39 ` Zi Yan
2026-06-26 10:07 ` David Hildenbrand (Arm)
3 siblings, 1 reply; 19+ messages in thread
From: Balbir Singh @ 2026-06-25 11:12 UTC (permalink / raw)
To: Wei Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, ziy,
sj
Cc: linux-mm, linux-kernel, stable, Lance Yang
On 6/24/26 16:53, Wei Yang wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") introduced the concept of device-private
> PMD entries, but did not correctly update the rmap walk code to
> account for them.
>
> As a result, when page_vma_mapped_walk() encounters device-private
> PMD entries, it takes no action other than to acquire the PMD lock
> and exit.
>
> However this is highly problematic for two reasons - firstly,
> device private entries possess a PFN so check_pmd() needs to be
> called to ensure an overlapping PFN range.
>
> Secondly, and more importantly, if PVMW_MIGRATION is set the
> caller assumes the returned entry is a migration entry, resulting
> in memory corruption when the caller tries to interpret the device
> private entry as such.
>
> In addition, commit 146287290023 ("mm/huge_memory: implement
> device-private THP splitting") allowed device private PMDs to be
> split like THP mappings, but again did not update this code path.
>
> As a result, we might race a PMD split prior to acquiring the PMD
> lock.
>
> This patch addresses all of these issues by invoking check_pmd(),
> ensuring PMVW_MIGRATION is not set and checks whether a split raced
> us we do for PMD THP and migration entries.
Should be PVMW_MIGRATION and "us we do" -> "as we do"
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Balbir Singh <balbirs@nvidia.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Lance Yang <lance.yang@linux.dev>
>
> ---
> v4:
> * refine subject and commit log based on Lorenzo's suggestion
> * put pmd device-private entry handling in its own if branch,
> suggested by Lorenzo
>
> v3:
> * remove cleanup part, only fix the issue for device-private entry
> * refine user effect description based on Lorenzo's suggestion
>
> v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
> * specify the possible error case of current code and user visible effect
> * besides fix, cleanup the pmd entry handling based on David's suggestion
>
> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
> ---
> mm/page_vma_mapped.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 2ccbabfb2cc1..17dff8aab9f9 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> /* THP pmd was split under us: handle on pte level */
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> - } else if (!pmd_present(pmde)) {
> - const softleaf_t entry = softleaf_from_pmd(pmde);
> + } else if (pmd_is_device_private_entry(pmde)) {
> + softleaf_t entry;
> +
> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> + pmde = *pvmw->pmd;
> + entry = softleaf_from_pmd(pmde);
>
> - if (softleaf_is_device_private(entry)) {
> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> + if (likely(softleaf_is_device_private(entry))) {
> + if (pvmw->flags & PVMW_MIGRATION)
> + return not_found(pvmw);
> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> + return not_found(pvmw);
> return true;
> }
> -
> + /* device-private pmd was split under us: handle on pte level */
> + spin_unlock(pvmw->ptl);
> + pvmw->ptl = NULL;
> + } else if (!pmd_present(pmde)) {
> if ((pvmw->flags & PVMW_SYNC) &&
> thp_vma_suitable_order(vma, pvmw->address,
> PMD_ORDER) &&
I looked at comments from Lance on "device-private PMD <-> PMD migration" and had
the same comment as David
Balbir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-25 10:37 ` David Hildenbrand (Arm)
@ 2026-06-25 11:25 ` Lance Yang
0 siblings, 0 replies; 19+ messages in thread
From: Lance Yang @ 2026-06-25 11:25 UTC (permalink / raw)
To: David Hildenbrand (Arm), Wei Yang, balbirs
Cc: akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm,
linux-kernel, stable
On 2026/6/25 18:37, David Hildenbrand (Arm) wrote:
>
>>> CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry
>>>
>>> CPU1: remove_migration_ptes(src, dst /* device-private */)
>>> ... via rmap_walk(dst) ...
>>> page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */)
>>> returns with PTL held for the PMD migration entry
>>> remove_migration_pmd(new = dst page)
>>> installs a device-private PMD
>>> next page_vma_mapped_walk()
>>> drops PTL via not_found()
>>>
>>> CPU0: takes PTL
>>> pmde = *pvmw->pmd; // now device-private PMD
>>>
>>> So when PVMW_MIGRATION is not set, current code can return not_found()
>>> before we even decode the locked PMD as a device-private entry.
>>>
>>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>>> device-private entries") made the
>>>
>>> device-private PMD <-> PMD migration
>>>
>>> transition possible.
>
> Doesn't the folio lock help here already?
Ah, yeah, I was too focused on the PTL and missed the folio lock ...
Don't have a caller like that :) Went over the fix again, nothing
else jumped out.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-24 8:57 ` Lance Yang
2026-06-25 9:57 ` Wei Yang
@ 2026-06-25 11:42 ` Lance Yang
2026-06-25 21:07 ` Andrew Morton
2026-06-25 13:12 ` Lorenzo Stoakes
2 siblings, 1 reply; 19+ messages in thread
From: Lance Yang @ 2026-06-25 11:42 UTC (permalink / raw)
To: richard.weiyang, david, balbirs
Cc: akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm,
linux-kernel, stable, Lance Yang
On Wed, Jun 24, 2026 at 04:57:56PM +0800, Lance Yang wrote:
>
[...]
>>
>>Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>>Cc: <stable@vger.kernel.org>
>>Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>Suggested-by: David Hildenbrand <david@kernel.org>
>
>Shouldn't we add
>
>Suggested-by: Lorenzo Stoakes <ljs@kernel.org>
>
>as well?
No need to resend. I think Andrew can add this when applying :)
>v4 mostly follows Lorenzo's comments, code bits included. Feels only fair.
>
>>Cc: David Hildenbrand <david@kernel.org>
>>Cc: Balbir Singh <balbirs@nvidia.com>
>>Cc: SeongJae Park <sj@kernel.org>
>>Cc: Zi Yan <ziy@nvidia.com>
>>Cc: Lorenzo Stoakes <ljs@kernel.org>
>>Cc: Lance Yang <lance.yang@linux.dev>
>>
>>---
>>v4:
>> * refine subject and commit log based on Lorenzo's suggestion
>> * put pmd device-private entry handling in its own if branch,
>> suggested by Lorenzo
>>
>>v3:
>> * remove cleanup part, only fix the issue for device-private entry
>> * refine user effect description based on Lorenzo's suggestion
>>
>>v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
>> * specify the possible error case of current code and user visible effect
>> * besides fix, cleanup the pmd entry handling based on David's suggestion
>>
>>v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
>>---
>> mm/page_vma_mapped.c | 20 +++++++++++++++-----
>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>
>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>index 2ccbabfb2cc1..17dff8aab9f9 100644
>>--- a/mm/page_vma_mapped.c
>>+++ b/mm/page_vma_mapped.c
>>@@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>
Never mind my race comment below. Obviously missed folio lock there. My
bad. Don't have a caller like that. Nothing else jumped out, so:
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cheers, Lance
>
>Hmm ... looks like there may still be a race here ...
>
>Current code picks the branch from the lockless PMD value:
>
> pmde = pmdp_get_lockless(pvmw->pmd);
>
> if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> pmde = *pvmw->pmd;
> if (!pmd_present(pmde)) {
> softleaf_t entry;
>
> if (!thp_migration_supported() ||
> !(pvmw->flags & PVMW_MIGRATION))
> return not_found(pvmw);
> entry = softleaf_from_pmd(pmde);
>
> if (!softleaf_is_migration(entry) ||
> !check_pmd(softleaf_to_pfn(entry), pvmw))
> return not_found(pvmw);
> return true;
> }
> }
>
>But after taking PTL, the PMD may already be a different non-present PMD
>type:
>
>CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry
>
>CPU1: remove_migration_ptes(src, dst /* device-private */)
> ... via rmap_walk(dst) ...
> page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */)
> returns with PTL held for the PMD migration entry
> remove_migration_pmd(new = dst page)
> installs a device-private PMD
> next page_vma_mapped_walk()
> drops PTL via not_found()
>
>CPU0: takes PTL
> pmde = *pvmw->pmd; // now device-private PMD
>
>So when PVMW_MIGRATION is not set, current code can return not_found()
>before we even decode the locked PMD as a device-private entry.
>
>Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>device-private entries") made the
>
>device-private PMD <-> PMD migration
>
>transition possible.
>
>set_pmd_migration_entry() can replace a device-private PMD with a PMD
>migration entry, and remove_migration_pmd() can restore a PMD migration
>entry back to a device-private PMD when the new folio is device-private.
>
>Maybe decode the locked softleaf entry first, before the migration-only
>checks? Something like this on top:
>
>---8<---
>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>index 17dff8aab9f9..97babd408dba 100644
>--- a/mm/page_vma_mapped.c
>+++ b/mm/page_vma_mapped.c
>@@ -249,10 +249,18 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> if (!pmd_present(pmde)) {
> softleaf_t entry;
>
>+ entry = softleaf_from_pmd(pmde);
>+ if (softleaf_is_device_private(entry)) {
>+ if (pvmw->flags & PVMW_MIGRATION)
>+ return not_found(pvmw);
>+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>+ return not_found(pvmw);
>+ return true;
>+ }
>+
> if (!thp_migration_supported() ||
> !(pvmw->flags & PVMW_MIGRATION))
> return not_found(pvmw);
>- entry = softleaf_from_pmd(pmde);
>
> if (!softleaf_is_migration(entry) ||
> !check_pmd(softleaf_to_pfn(entry), pvmw))
>@@ -266,7 +274,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> return not_found(pvmw);
> return true;
> }
>- /* THP pmd was split under us: handle on pte level */
>+ /*
>+ * THP pmd was split under us, or device-private PMD
>+ * changed under us: handle on pte level.
>+ */
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> } else if (pmd_is_device_private_entry(pmde)) {
>--
>
>Anyway, that stuff is getting kinda messy now. Feels like it really needs
>a cleanup on top before it bites us again :)
>
>Cheers, Lance
>
>> /* THP pmd was split under us: handle on pte level */
>> spin_unlock(pvmw->ptl);
>> pvmw->ptl = NULL;
>>- } else if (!pmd_present(pmde)) {
>>- const softleaf_t entry = softleaf_from_pmd(pmde);
>>+ } else if (pmd_is_device_private_entry(pmde)) {
>>+ softleaf_t entry;
>>+
>>+ pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>+ pmde = *pvmw->pmd;
>>+ entry = softleaf_from_pmd(pmde);
>>
>>- if (softleaf_is_device_private(entry)) {
>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>+ if (likely(softleaf_is_device_private(entry))) {
>>+ if (pvmw->flags & PVMW_MIGRATION)
>>+ return not_found(pvmw);
>>+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>>+ return not_found(pvmw);
>> return true;
>> }
>>-
>>+ /* device-private pmd was split under us: handle on pte level */
>>+ spin_unlock(pvmw->ptl);
>>+ pvmw->ptl = NULL;
>>+ } else if (!pmd_present(pmde)) {
>> if ((pvmw->flags & PVMW_SYNC) &&
>> thp_vma_suitable_order(vma, pvmw->address,
>> PMD_ORDER) &&
>>--
>>2.34.1
>>
>>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-24 8:57 ` Lance Yang
2026-06-25 9:57 ` Wei Yang
2026-06-25 11:42 ` Lance Yang
@ 2026-06-25 13:12 ` Lorenzo Stoakes
2 siblings, 0 replies; 19+ messages in thread
From: Lorenzo Stoakes @ 2026-06-25 13:12 UTC (permalink / raw)
To: Lance Yang
Cc: richard.weiyang, akpm, david, riel, liam, vbabka, harry, jannh,
ziy, sj, balbirs, linux-mm, linux-kernel, stable
On Wed, Jun 24, 2026 at 04:57:56PM +0800, Lance Yang wrote:
>
> On Wed, Jun 24, 2026 at 06:53:53AM +0000, Wei Yang wrote:
> >Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> >device-private entries") introduced the concept of device-private
> >PMD entries, but did not correctly update the rmap walk code to
> >account for them.
> >
> >As a result, when page_vma_mapped_walk() encounters device-private
> >PMD entries, it takes no action other than to acquire the PMD lock
> >and exit.
> >
> >However this is highly problematic for two reasons - firstly,
> >device private entries possess a PFN so check_pmd() needs to be
> >called to ensure an overlapping PFN range.
> >
> >Secondly, and more importantly, if PVMW_MIGRATION is set the
> >caller assumes the returned entry is a migration entry, resulting
> >in memory corruption when the caller tries to interpret the device
> >private entry as such.
> >
> >In addition, commit 146287290023 ("mm/huge_memory: implement
> >device-private THP splitting") allowed device private PMDs to be
> >split like THP mappings, but again did not update this code path.
> >
> >As a result, we might race a PMD split prior to acquiring the PMD
> >lock.
> >
> >This patch addresses all of these issues by invoking check_pmd(),
> >ensuring PMVW_MIGRATION is not set and checks whether a split raced
> >us we do for PMD THP and migration entries.
> >
> >Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> >Cc: <stable@vger.kernel.org>
> >Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >Suggested-by: David Hildenbrand <david@kernel.org>
>
> Shouldn't we add
>
> Suggested-by: Lorenzo Stoakes <ljs@kernel.org>
>
> as well?
>
> v4 mostly follows Lorenzo's comments, code bits included. Feels only fair.
Thanks Lance :)
I'm kinda indifferent about it really, I'm really keen to ensure people sending
patches get the credit for their work, so if I send a patch in reply as a
shorthand for 'I think this might work better', I don't expect/require any
credit at all, it's just sometimes a quicker way of responding!
But if Wei wants to add a S-b that's fine by me also! :)
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang
2026-06-24 8:57 ` Lance Yang
2026-06-25 11:12 ` Balbir Singh
@ 2026-06-25 19:39 ` Zi Yan
2026-06-26 10:07 ` David Hildenbrand (Arm)
3 siblings, 0 replies; 19+ messages in thread
From: Zi Yan @ 2026-06-25 19:39 UTC (permalink / raw)
To: Wei Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, sj,
balbirs
Cc: linux-mm, linux-kernel, stable, Lance Yang
On Wed Jun 24, 2026 at 2:53 AM EDT, Wei Yang wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") introduced the concept of device-private
> PMD entries, but did not correctly update the rmap walk code to
> account for them.
>
> As a result, when page_vma_mapped_walk() encounters device-private
> PMD entries, it takes no action other than to acquire the PMD lock
> and exit.
>
> However this is highly problematic for two reasons - firstly,
> device private entries possess a PFN so check_pmd() needs to be
> called to ensure an overlapping PFN range.
>
> Secondly, and more importantly, if PVMW_MIGRATION is set the
> caller assumes the returned entry is a migration entry, resulting
> in memory corruption when the caller tries to interpret the device
> private entry as such.
>
> In addition, commit 146287290023 ("mm/huge_memory: implement
> device-private THP splitting") allowed device private PMDs to be
> split like THP mappings, but again did not update this code path.
>
> As a result, we might race a PMD split prior to acquiring the PMD
> lock.
>
> This patch addresses all of these issues by invoking check_pmd(),
> ensuring PMVW_MIGRATION is not set and checks whether a split raced
> us we do for PMD THP and migration entries.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Balbir Singh <balbirs@nvidia.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Lance Yang <lance.yang@linux.dev>
>
> ---
> v4:
> * refine subject and commit log based on Lorenzo's suggestion
> * put pmd device-private entry handling in its own if branch,
> suggested by Lorenzo
>
> v3:
> * remove cleanup part, only fix the issue for device-private entry
> * refine user effect description based on Lorenzo's suggestion
>
> v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
> * specify the possible error case of current code and user visible effect
> * besides fix, cleanup the pmd entry handling based on David's suggestion
>
> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
> ---
> mm/page_vma_mapped.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
LGTM. The discussion from the patch history is very valuable. Thanks.
Acked-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-25 11:42 ` Lance Yang
@ 2026-06-25 21:07 ` Andrew Morton
0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2026-06-25 21:07 UTC (permalink / raw)
To: Lance Yang
Cc: richard.weiyang, david, balbirs, ljs, riel, liam, vbabka, harry,
jannh, ziy, sj, linux-mm, linux-kernel, stable
On Thu, 25 Jun 2026 19:42:35 +0800 Lance Yang <lance.yang@linux.dev> wrote:
> >Suggested-by: Lorenzo Stoakes <ljs@kernel.org>
> >
> >as well?
>
> No need to resend. I think Andrew can add this when applying :)
I managed.
Thanks, I'll queue this as a hotfix. Somewhat tentatively - it's
unclear whether we'll be seeing a v2?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-25 11:12 ` Balbir Singh
@ 2026-06-26 0:44 ` Wei Yang
2026-06-26 0:58 ` Andrew Morton
0 siblings, 1 reply; 19+ messages in thread
From: Wei Yang @ 2026-06-26 0:44 UTC (permalink / raw)
To: Balbir Singh
Cc: Wei Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, ziy,
sj, linux-mm, linux-kernel, stable, Lance Yang
On Thu, Jun 25, 2026 at 09:12:23PM +1000, Balbir Singh wrote:
>On 6/24/26 16:53, Wei Yang wrote:
>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>> device-private entries") introduced the concept of device-private
>> PMD entries, but did not correctly update the rmap walk code to
>> account for them.
>>
>> As a result, when page_vma_mapped_walk() encounters device-private
>> PMD entries, it takes no action other than to acquire the PMD lock
>> and exit.
>>
>> However this is highly problematic for two reasons - firstly,
>> device private entries possess a PFN so check_pmd() needs to be
>> called to ensure an overlapping PFN range.
>>
>> Secondly, and more importantly, if PVMW_MIGRATION is set the
>> caller assumes the returned entry is a migration entry, resulting
>> in memory corruption when the caller tries to interpret the device
>> private entry as such.
>>
>> In addition, commit 146287290023 ("mm/huge_memory: implement
>> device-private THP splitting") allowed device private PMDs to be
>> split like THP mappings, but again did not update this code path.
>>
>> As a result, we might race a PMD split prior to acquiring the PMD
>> lock.
>>
>> This patch addresses all of these issues by invoking check_pmd(),
>> ensuring PMVW_MIGRATION is not set and checks whether a split raced
>> us we do for PMD THP and migration entries.
>
>Should be PVMW_MIGRATION and "us we do" -> "as we do"
>
Hi, Balbir
Sorry for missing your comment.
Hmm... looks you are right.
Andrew,
Would you mind handling it or prefer a v2?
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-26 0:44 ` Wei Yang
@ 2026-06-26 0:58 ` Andrew Morton
0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2026-06-26 0:58 UTC (permalink / raw)
To: Wei Yang
Cc: Balbir Singh, david, ljs, riel, liam, vbabka, harry, jannh, ziy,
sj, linux-mm, linux-kernel, stable, Lance Yang
On Fri, 26 Jun 2026 00:44:16 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
> >Should be PVMW_MIGRATION and "us we do" -> "as we do"
> >
>
> Hi, Balbir
>
> Sorry for missing your comment.
>
> Hmm... looks you are right.
>
> Andrew,
>
> Would you mind handling it or prefer a v2?
I have made those edits.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang
` (2 preceding siblings ...)
2026-06-25 19:39 ` Zi Yan
@ 2026-06-26 10:07 ` David Hildenbrand (Arm)
2026-06-26 10:42 ` Lorenzo Stoakes
2026-06-26 13:27 ` Lance Yang
3 siblings, 2 replies; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-26 10:07 UTC (permalink / raw)
To: Wei Yang, akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj,
balbirs
Cc: linux-mm, linux-kernel, stable, Lance Yang
On 6/24/26 08:53, Wei Yang wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") introduced the concept of device-private
> PMD entries, but did not correctly update the rmap walk code to
> account for them.
>
> As a result, when page_vma_mapped_walk() encounters device-private
> PMD entries, it takes no action other than to acquire the PMD lock
> and exit.
>
> However this is highly problematic for two reasons - firstly,
> device private entries possess a PFN so check_pmd() needs to be
> called to ensure an overlapping PFN range.
>
> Secondly, and more importantly, if PVMW_MIGRATION is set the
> caller assumes the returned entry is a migration entry, resulting
> in memory corruption when the caller tries to interpret the device
> private entry as such.
>
> In addition, commit 146287290023 ("mm/huge_memory: implement
> device-private THP splitting") allowed device private PMDs to be
> split like THP mappings, but again did not update this code path.
>
> As a result, we might race a PMD split prior to acquiring the PMD
> lock.
>
> This patch addresses all of these issues by invoking check_pmd(),
> ensuring PMVW_MIGRATION is not set and checks whether a split raced
> us we do for PMD THP and migration entries.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Balbir Singh <balbirs@nvidia.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Lance Yang <lance.yang@linux.dev>
>
> ---
> v4:
> * refine subject and commit log based on Lorenzo's suggestion
> * put pmd device-private entry handling in its own if branch,
> suggested by Lorenzo
>
> v3:
> * remove cleanup part, only fix the issue for device-private entry
> * refine user effect description based on Lorenzo's suggestion
>
> v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
> * specify the possible error case of current code and user visible effect
> * besides fix, cleanup the pmd entry handling based on David's suggestion
>
> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
> ---
> mm/page_vma_mapped.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 2ccbabfb2cc1..17dff8aab9f9 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> /* THP pmd was split under us: handle on pte level */
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> - } else if (!pmd_present(pmde)) {
> - const softleaf_t entry = softleaf_from_pmd(pmde);
> + } else if (pmd_is_device_private_entry(pmde)) {
> + softleaf_t entry;
> +
> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> + pmde = *pvmw->pmd;
> + entry = softleaf_from_pmd(pmde);
>
> - if (softleaf_is_device_private(entry)) {
> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> + if (likely(softleaf_is_device_private(entry))) {
> + if (pvmw->flags & PVMW_MIGRATION)
> + return not_found(pvmw);
> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> + return not_found(pvmw);
> return true;
> }
> -
> + /* device-private pmd was split under us: handle on pte level */
> + spin_unlock(pvmw->ptl);
> + pvmw->ptl = NULL;
> + } else if (!pmd_present(pmde)) {
> if ((pvmw->flags & PVMW_SYNC) &&
> thp_vma_suitable_order(vma, pvmw->address,
> PMD_ORDER) &&
This is extremely hard to review given the existing crap handling here. I'm
really sorry, but it makes my head hurt (I'm not kidding :) ).
It's completely unclear why we only have to check for a subset of the cases
after taking the lock.
Could we simply extend the existing migration pmd handling and leave the
!pmd_present() case for pmd_none()?
That leaves no question to "which transitions are actually allowed", including
"could we accidentally assume something is a page table when really it isn't".
So what about something like the following?
The "thp_migration_supported()" is not required when checking for
pmd_is_migration_entry(), as that defaults to "false" when not compiled in.
Untested:
From 048ecd33673ec649e168fbbb97749a7c0e344fcd Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@kernel.org>
Date: Fri, 26 Jun 2026 12:03:40 +0200
Subject: [PATCH] tmp
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
mm/page_vma_mapped.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 2ccbabfb2cc17..ed2a23a90e8dd 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -243,21 +243,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
*/
pmde = pmdp_get_lockless(pvmw->pmd);
- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
+ if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde) ||
+ pmd_is_device_private_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
pmde = *pvmw->pmd;
- if (!pmd_present(pmde)) {
+ if (pmd_is_migration_entry(pmde)) {
softleaf_t entry;
- if (!thp_migration_supported() ||
- !(pvmw->flags & PVMW_MIGRATION))
+ if (!(pvmw->flags & PVMW_MIGRATION))
return not_found(pvmw);
entry = softleaf_from_pmd(pmde);
+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
+ return not_found(pvmw);
+ return true;
+ } else if (pmd_is_device_private_entry(pmde)) {
+ softleaf_t entry;
- if (!softleaf_is_migration(entry) ||
- !check_pmd(softleaf_to_pfn(entry), pvmw))
+ if (pvmw->flags & PVMW_MIGRATION)
+ return not_found(pvmw);
+ entry = softleaf_from_pmd(pmde);
+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
return not_found(pvmw);
return true;
+ } else if (!pmd_present(pmde) ){
+ return not_found(pvmw);
}
if (likely(pmd_trans_huge(pmde))) {
if (pvmw->flags & PVMW_MIGRATION)
@@ -270,12 +280,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
} else if (!pmd_present(pmde)) {
- const softleaf_t entry = softleaf_from_pmd(pmde);
-
- if (softleaf_is_device_private(entry)) {
- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
- return true;
- }
if ((pvmw->flags & PVMW_SYNC) &&
thp_vma_suitable_order(vma, pvmw->address,
--
2.43.0
--
Cheers,
David
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-26 10:07 ` David Hildenbrand (Arm)
@ 2026-06-26 10:42 ` Lorenzo Stoakes
2026-06-26 11:31 ` David Hildenbrand (Arm)
2026-06-26 13:27 ` Lance Yang
1 sibling, 1 reply; 19+ messages in thread
From: Lorenzo Stoakes @ 2026-06-26 10:42 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Wei Yang, akpm, riel, liam, vbabka, harry, jannh, ziy, sj,
balbirs, linux-mm, linux-kernel, stable, Lance Yang
On Fri, Jun 26, 2026 at 12:07:56PM +0200, David Hildenbrand (Arm) wrote:
> On 6/24/26 08:53, Wei Yang wrote:
> > Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> > device-private entries") introduced the concept of device-private
> > PMD entries, but did not correctly update the rmap walk code to
> > account for them.
> >
> > As a result, when page_vma_mapped_walk() encounters device-private
> > PMD entries, it takes no action other than to acquire the PMD lock
> > and exit.
> >
> > However this is highly problematic for two reasons - firstly,
> > device private entries possess a PFN so check_pmd() needs to be
> > called to ensure an overlapping PFN range.
> >
> > Secondly, and more importantly, if PVMW_MIGRATION is set the
> > caller assumes the returned entry is a migration entry, resulting
> > in memory corruption when the caller tries to interpret the device
> > private entry as such.
> >
> > In addition, commit 146287290023 ("mm/huge_memory: implement
> > device-private THP splitting") allowed device private PMDs to be
> > split like THP mappings, but again did not update this code path.
> >
> > As a result, we might race a PMD split prior to acquiring the PMD
> > lock.
> >
> > This patch addresses all of these issues by invoking check_pmd(),
> > ensuring PMVW_MIGRATION is not set and checks whether a split raced
> > us we do for PMD THP and migration entries.
> >
> > Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> > Suggested-by: David Hildenbrand <david@kernel.org>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Balbir Singh <balbirs@nvidia.com>
> > Cc: SeongJae Park <sj@kernel.org>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > Cc: Lance Yang <lance.yang@linux.dev>
> >
> > ---
> > v4:
> > * refine subject and commit log based on Lorenzo's suggestion
> > * put pmd device-private entry handling in its own if branch,
> > suggested by Lorenzo
> >
> > v3:
> > * remove cleanup part, only fix the issue for device-private entry
> > * refine user effect description based on Lorenzo's suggestion
> >
> > v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
> > * specify the possible error case of current code and user visible effect
> > * besides fix, cleanup the pmd entry handling based on David's suggestion
> >
> > v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
> > ---
> > mm/page_vma_mapped.c | 20 +++++++++++++++-----
> > 1 file changed, 15 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > index 2ccbabfb2cc1..17dff8aab9f9 100644
> > --- a/mm/page_vma_mapped.c
> > +++ b/mm/page_vma_mapped.c
> > @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> > /* THP pmd was split under us: handle on pte level */
> > spin_unlock(pvmw->ptl);
> > pvmw->ptl = NULL;
> > - } else if (!pmd_present(pmde)) {
> > - const softleaf_t entry = softleaf_from_pmd(pmde);
> > + } else if (pmd_is_device_private_entry(pmde)) {
> > + softleaf_t entry;
> > +
> > + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> > + pmde = *pvmw->pmd;
> > + entry = softleaf_from_pmd(pmde);
> >
> > - if (softleaf_is_device_private(entry)) {
> > - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> > + if (likely(softleaf_is_device_private(entry))) {
> > + if (pvmw->flags & PVMW_MIGRATION)
> > + return not_found(pvmw);
> > + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> > + return not_found(pvmw);
> > return true;
> > }
> > -
> > + /* device-private pmd was split under us: handle on pte level */
> > + spin_unlock(pvmw->ptl);
> > + pvmw->ptl = NULL;
> > + } else if (!pmd_present(pmde)) {
> > if ((pvmw->flags & PVMW_SYNC) &&
> > thp_vma_suitable_order(vma, pvmw->address,
> > PMD_ORDER) &&
>
> This is extremely hard to review given the existing crap handling here. I'm
> really sorry, but it makes my head hurt (I'm not kidding :) ).
>
> It's completely unclear why we only have to check for a subset of the cases
> after taking the lock.
>
> Could we simply extend the existing migration pmd handling and leave the
> !pmd_present() case for pmd_none()?
>
> That leaves no question to "which transitions are actually allowed", including
> "could we accidentally assume something is a page table when really it isn't".
>
>
> So what about something like the following?
>
> The "thp_migration_supported()" is not required when checking for
> pmd_is_migration_entry(), as that defaults to "false" when not compiled in.
>
> Untested:
>
>
> From 048ecd33673ec649e168fbbb97749a7c0e344fcd Mon Sep 17 00:00:00 2001
> From: "David Hildenbrand (Arm)" <david@kernel.org>
> Date: Fri, 26 Jun 2026 12:03:40 +0200
> Subject: [PATCH] tmp
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
> mm/page_vma_mapped.c | 29 +++++++++++++++++------------
> 1 file changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 2ccbabfb2cc17..ed2a23a90e8dd 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -243,21 +243,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> */
> pmde = pmdp_get_lockless(pvmw->pmd);
>
> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
> + if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde) ||
> + pmd_is_device_private_entry(pmde)) {
> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> pmde = *pvmw->pmd;
> - if (!pmd_present(pmde)) {
> + if (pmd_is_migration_entry(pmde)) {
> softleaf_t entry;
>
> - if (!thp_migration_supported() ||
Do we care about this? Or is !tmp_migration_supported() -> implies you
wouldn't see a migration entry here anyway?
Maybe worth a VM_WARN_ON_ONCE()?
> - !(pvmw->flags & PVMW_MIGRATION))
> + if (!(pvmw->flags & PVMW_MIGRATION))
> return not_found(pvmw);
> entry = softleaf_from_pmd(pmde);
> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> + return not_found(pvmw);
> + return true;
> + } else if (pmd_is_device_private_entry(pmde)) {
> + softleaf_t entry;
>
> - if (!softleaf_is_migration(entry) ||
> - !check_pmd(softleaf_to_pfn(entry), pvmw))
> + if (pvmw->flags & PVMW_MIGRATION)
> + return not_found(pvmw);
> + entry = softleaf_from_pmd(pmde);
> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> return not_found(pvmw);
> return true;
> + } else if (!pmd_present(pmde) ){
> + return not_found(pvmw);
> }
> if (likely(pmd_trans_huge(pmde))) {
> if (pvmw->flags & PVMW_MIGRATION)
> @@ -270,12 +280,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> } else if (!pmd_present(pmde)) {
> - const softleaf_t entry = softleaf_from_pmd(pmde);
> -
> - if (softleaf_is_device_private(entry)) {
> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> - return true;
> - }
>
> if ((pvmw->flags & PVMW_SYNC) &&
> thp_vma_suitable_order(vma, pvmw->address,
Overall though this seems fine to me.
> --
> 2.43.0
>
>
> --
> Cheers,
>
> David
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-26 10:42 ` Lorenzo Stoakes
@ 2026-06-26 11:31 ` David Hildenbrand (Arm)
2026-06-26 13:24 ` Zi Yan
0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-26 11:31 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Wei Yang, akpm, riel, liam, vbabka, harry, jannh, ziy, sj,
balbirs, linux-mm, linux-kernel, stable, Lance Yang
On 6/26/26 12:42, Lorenzo Stoakes wrote:
> On Fri, Jun 26, 2026 at 12:07:56PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/24/26 08:53, Wei Yang wrote:
>>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>>> device-private entries") introduced the concept of device-private
>>> PMD entries, but did not correctly update the rmap walk code to
>>> account for them.
>>>
>>> As a result, when page_vma_mapped_walk() encounters device-private
>>> PMD entries, it takes no action other than to acquire the PMD lock
>>> and exit.
>>>
>>> However this is highly problematic for two reasons - firstly,
>>> device private entries possess a PFN so check_pmd() needs to be
>>> called to ensure an overlapping PFN range.
>>>
>>> Secondly, and more importantly, if PVMW_MIGRATION is set the
>>> caller assumes the returned entry is a migration entry, resulting
>>> in memory corruption when the caller tries to interpret the device
>>> private entry as such.
>>>
>>> In addition, commit 146287290023 ("mm/huge_memory: implement
>>> device-private THP splitting") allowed device private PMDs to be
>>> split like THP mappings, but again did not update this code path.
>>>
>>> As a result, we might race a PMD split prior to acquiring the PMD
>>> lock.
>>>
>>> This patch addresses all of these issues by invoking check_pmd(),
>>> ensuring PMVW_MIGRATION is not set and checks whether a split raced
>>> us we do for PMD THP and migration entries.
>>>
>>> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Suggested-by: David Hildenbrand <david@kernel.org>
>>> Cc: David Hildenbrand <david@kernel.org>
>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>> Cc: SeongJae Park <sj@kernel.org>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Lorenzo Stoakes <ljs@kernel.org>
>>> Cc: Lance Yang <lance.yang@linux.dev>
>>>
>>> ---
>>> v4:
>>> * refine subject and commit log based on Lorenzo's suggestion
>>> * put pmd device-private entry handling in its own if branch,
>>> suggested by Lorenzo
>>>
>>> v3:
>>> * remove cleanup part, only fix the issue for device-private entry
>>> * refine user effect description based on Lorenzo's suggestion
>>>
>>> v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
>>> * specify the possible error case of current code and user visible effect
>>> * besides fix, cleanup the pmd entry handling based on David's suggestion
>>>
>>> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
>>> ---
>>> mm/page_vma_mapped.c | 20 +++++++++++++++-----
>>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>> index 2ccbabfb2cc1..17dff8aab9f9 100644
>>> --- a/mm/page_vma_mapped.c
>>> +++ b/mm/page_vma_mapped.c
>>> @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>> /* THP pmd was split under us: handle on pte level */
>>> spin_unlock(pvmw->ptl);
>>> pvmw->ptl = NULL;
>>> - } else if (!pmd_present(pmde)) {
>>> - const softleaf_t entry = softleaf_from_pmd(pmde);
>>> + } else if (pmd_is_device_private_entry(pmde)) {
>>> + softleaf_t entry;
>>> +
>>> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>> + pmde = *pvmw->pmd;
>>> + entry = softleaf_from_pmd(pmde);
>>>
>>> - if (softleaf_is_device_private(entry)) {
>>> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>> + if (likely(softleaf_is_device_private(entry))) {
>>> + if (pvmw->flags & PVMW_MIGRATION)
>>> + return not_found(pvmw);
>>> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>>> + return not_found(pvmw);
>>> return true;
>>> }
>>> -
>>> + /* device-private pmd was split under us: handle on pte level */
>>> + spin_unlock(pvmw->ptl);
>>> + pvmw->ptl = NULL;
>>> + } else if (!pmd_present(pmde)) {
>>> if ((pvmw->flags & PVMW_SYNC) &&
>>> thp_vma_suitable_order(vma, pvmw->address,
>>> PMD_ORDER) &&
>>
>> This is extremely hard to review given the existing crap handling here. I'm
>> really sorry, but it makes my head hurt (I'm not kidding :) ).
>>
>> It's completely unclear why we only have to check for a subset of the cases
>> after taking the lock.
>>
>> Could we simply extend the existing migration pmd handling and leave the
>> !pmd_present() case for pmd_none()?
>>
>> That leaves no question to "which transitions are actually allowed", including
>> "could we accidentally assume something is a page table when really it isn't".
>>
>>
>> So what about something like the following?
>>
>> The "thp_migration_supported()" is not required when checking for
>> pmd_is_migration_entry(), as that defaults to "false" when not compiled in.
>>
>> Untested:
>>
>>
>> From 048ecd33673ec649e168fbbb97749a7c0e344fcd Mon Sep 17 00:00:00 2001
>> From: "David Hildenbrand (Arm)" <david@kernel.org>
>> Date: Fri, 26 Jun 2026 12:03:40 +0200
>> Subject: [PATCH] tmp
>>
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>> ---
>> mm/page_vma_mapped.c | 29 +++++++++++++++++------------
>> 1 file changed, 17 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>> index 2ccbabfb2cc17..ed2a23a90e8dd 100644
>> --- a/mm/page_vma_mapped.c
>> +++ b/mm/page_vma_mapped.c
>> @@ -243,21 +243,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>> */
>> pmde = pmdp_get_lockless(pvmw->pmd);
>>
>> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
>> + if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde) ||
>> + pmd_is_device_private_entry(pmde)) {
>> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> pmde = *pvmw->pmd;
>> - if (!pmd_present(pmde)) {
>> + if (pmd_is_migration_entry(pmde)) {
>> softleaf_t entry;
>>
>> - if (!thp_migration_supported() ||
>
> Do we care about this? Or is !tmp_migration_supported() -> implies you
> wouldn't see a migration entry here anyway?
Yeah, I noted above
"The "thp_migration_supported()" is not required when checking for
pmd_is_migration_entry(), as that defaults to "false" when not compiled in."
Given that
tmp_migration_supported() -> IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);$
And
pmd_is_migration_entry() -> softleaf_is_migration(softleaf_from_pmd(pmd));
whereby softleaf_from_pmd() only returns something non-none for
CONFIG_ARCH_ENABLE_THP_MIGRATION.
>
> Maybe worth a VM_WARN_ON_ONCE()?
I think it was primarily a a hack to slightly optimize code generated for
!CONFIG_ARCH_ENABLE_THP_MIGRATION, not really something for correctness as it seems.
So I think we can safely drop it. :)
--
Cheers,
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-26 11:31 ` David Hildenbrand (Arm)
@ 2026-06-26 13:24 ` Zi Yan
2026-06-26 13:32 ` Lorenzo Stoakes
0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2026-06-26 13:24 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Lorenzo Stoakes, Wei Yang, akpm, riel, liam, vbabka, harry, jannh,
sj, balbirs, linux-mm, linux-kernel, stable, Lance Yang
On 26 Jun 2026, at 7:31, David Hildenbrand (Arm) wrote:
> On 6/26/26 12:42, Lorenzo Stoakes wrote:
>> On Fri, Jun 26, 2026 at 12:07:56PM +0200, David Hildenbrand (Arm) wrote:
>>> On 6/24/26 08:53, Wei Yang wrote:
>>>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>>>> device-private entries") introduced the concept of device-private
>>>> PMD entries, but did not correctly update the rmap walk code to
>>>> account for them.
>>>>
>>>> As a result, when page_vma_mapped_walk() encounters device-private
>>>> PMD entries, it takes no action other than to acquire the PMD lock
>>>> and exit.
>>>>
>>>> However this is highly problematic for two reasons - firstly,
>>>> device private entries possess a PFN so check_pmd() needs to be
>>>> called to ensure an overlapping PFN range.
>>>>
>>>> Secondly, and more importantly, if PVMW_MIGRATION is set the
>>>> caller assumes the returned entry is a migration entry, resulting
>>>> in memory corruption when the caller tries to interpret the device
>>>> private entry as such.
>>>>
>>>> In addition, commit 146287290023 ("mm/huge_memory: implement
>>>> device-private THP splitting") allowed device private PMDs to be
>>>> split like THP mappings, but again did not update this code path.
>>>>
>>>> As a result, we might race a PMD split prior to acquiring the PMD
>>>> lock.
>>>>
>>>> This patch addresses all of these issues by invoking check_pmd(),
>>>> ensuring PMVW_MIGRATION is not set and checks whether a split raced
>>>> us we do for PMD THP and migration entries.
>>>>
>>>> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>>>> Cc: <stable@vger.kernel.org>
>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>> Suggested-by: David Hildenbrand <david@kernel.org>
>>>> Cc: David Hildenbrand <david@kernel.org>
>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>> Cc: SeongJae Park <sj@kernel.org>
>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>> Cc: Lorenzo Stoakes <ljs@kernel.org>
>>>> Cc: Lance Yang <lance.yang@linux.dev>
>>>>
>>>> ---
>>>> v4:
>>>> * refine subject and commit log based on Lorenzo's suggestion
>>>> * put pmd device-private entry handling in its own if branch,
>>>> suggested by Lorenzo
>>>>
>>>> v3:
>>>> * remove cleanup part, only fix the issue for device-private entry
>>>> * refine user effect description based on Lorenzo's suggestion
>>>>
>>>> v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
>>>> * specify the possible error case of current code and user visible effect
>>>> * besides fix, cleanup the pmd entry handling based on David's suggestion
>>>>
>>>> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
>>>> ---
>>>> mm/page_vma_mapped.c | 20 +++++++++++++++-----
>>>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>>> index 2ccbabfb2cc1..17dff8aab9f9 100644
>>>> --- a/mm/page_vma_mapped.c
>>>> +++ b/mm/page_vma_mapped.c
>>>> @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>>> /* THP pmd was split under us: handle on pte level */
>>>> spin_unlock(pvmw->ptl);
>>>> pvmw->ptl = NULL;
>>>> - } else if (!pmd_present(pmde)) {
>>>> - const softleaf_t entry = softleaf_from_pmd(pmde);
>>>> + } else if (pmd_is_device_private_entry(pmde)) {
>>>> + softleaf_t entry;
>>>> +
>>>> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>>> + pmde = *pvmw->pmd;
>>>> + entry = softleaf_from_pmd(pmde);
>>>>
>>>> - if (softleaf_is_device_private(entry)) {
>>>> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>>> + if (likely(softleaf_is_device_private(entry))) {
>>>> + if (pvmw->flags & PVMW_MIGRATION)
>>>> + return not_found(pvmw);
>>>> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>>>> + return not_found(pvmw);
>>>> return true;
>>>> }
>>>> -
>>>> + /* device-private pmd was split under us: handle on pte level */
>>>> + spin_unlock(pvmw->ptl);
>>>> + pvmw->ptl = NULL;
>>>> + } else if (!pmd_present(pmde)) {
>>>> if ((pvmw->flags & PVMW_SYNC) &&
>>>> thp_vma_suitable_order(vma, pvmw->address,
>>>> PMD_ORDER) &&
>>>
>>> This is extremely hard to review given the existing crap handling here. I'm
>>> really sorry, but it makes my head hurt (I'm not kidding :) ).
>>>
>>> It's completely unclear why we only have to check for a subset of the cases
>>> after taking the lock.
>>>
>>> Could we simply extend the existing migration pmd handling and leave the
>>> !pmd_present() case for pmd_none()?
>>>
>>> That leaves no question to "which transitions are actually allowed", including
>>> "could we accidentally assume something is a page table when really it isn't".
>>>
>>>
>>> So what about something like the following?
>>>
>>> The "thp_migration_supported()" is not required when checking for
>>> pmd_is_migration_entry(), as that defaults to "false" when not compiled in.
>>>
>>> Untested:
>>>
>>>
>>> From 048ecd33673ec649e168fbbb97749a7c0e344fcd Mon Sep 17 00:00:00 2001
>>> From: "David Hildenbrand (Arm)" <david@kernel.org>
>>> Date: Fri, 26 Jun 2026 12:03:40 +0200
>>> Subject: [PATCH] tmp
>>>
>>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>>> ---
>>> mm/page_vma_mapped.c | 29 +++++++++++++++++------------
>>> 1 file changed, 17 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>> index 2ccbabfb2cc17..ed2a23a90e8dd 100644
>>> --- a/mm/page_vma_mapped.c
>>> +++ b/mm/page_vma_mapped.c
>>> @@ -243,21 +243,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>> */
>>> pmde = pmdp_get_lockless(pvmw->pmd);
>>>
>>> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
>>> + if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde) ||
>>> + pmd_is_device_private_entry(pmde)) {
>>> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>> pmde = *pvmw->pmd;
>>> - if (!pmd_present(pmde)) {
>>> + if (pmd_is_migration_entry(pmde)) {
>>> softleaf_t entry;
>>>
>>> - if (!thp_migration_supported() ||
>>
>> Do we care about this? Or is !tmp_migration_supported() -> implies you
>> wouldn't see a migration entry here anyway?
>
> Yeah, I noted above
>
> "The "thp_migration_supported()" is not required when checking for
> pmd_is_migration_entry(), as that defaults to "false" when not compiled in."
>
> Given that
>
> tmp_migration_supported() -> IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);$
>
> And
>
> pmd_is_migration_entry() -> softleaf_is_migration(softleaf_from_pmd(pmd));
>
> whereby softleaf_from_pmd() only returns something non-none for
> CONFIG_ARCH_ENABLE_THP_MIGRATION.
>
>>
>> Maybe worth a VM_WARN_ON_ONCE()?
>
> I think it was primarily a a hack to slightly optimize code generated for
> !CONFIG_ARCH_ENABLE_THP_MIGRATION, not really something for correctness as it seems.
>
> So I think we can safely drop it. :)
thp_migration_supported() here is legacy code[1] from v4.14 when I added
the THP migration support. IIRC, the purpose was to avoid checking
PMD migration entry if the support is not enabled, but looking at it again
today, that thp_migration_supported() is unnecessary since
is_migration_entry(pmd_to_swp_entry(*pvmw->pmd)) returns false if
!CONFIG_ARCH_ENABLE_THP_MIGRATION.
[1] https://elixir.bootlin.com/linux/v4.14/source/mm/page_vma_mapped.c#L157
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-26 10:07 ` David Hildenbrand (Arm)
2026-06-26 10:42 ` Lorenzo Stoakes
@ 2026-06-26 13:27 ` Lance Yang
2026-06-26 13:51 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 19+ messages in thread
From: Lance Yang @ 2026-06-26 13:27 UTC (permalink / raw)
To: david
Cc: richard.weiyang, akpm, ljs, riel, liam, vbabka, harry, jannh, ziy,
sj, balbirs, linux-mm, linux-kernel, stable, lance.yang
On Fri, Jun 26, 2026 at 12:07:56PM +0200, David Hildenbrand (Arm) wrote:
>On 6/24/26 08:53, Wei Yang wrote:
>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>> device-private entries") introduced the concept of device-private
>> PMD entries, but did not correctly update the rmap walk code to
>> account for them.
>>
>> As a result, when page_vma_mapped_walk() encounters device-private
>> PMD entries, it takes no action other than to acquire the PMD lock
>> and exit.
>>
>> However this is highly problematic for two reasons - firstly,
>> device private entries possess a PFN so check_pmd() needs to be
>> called to ensure an overlapping PFN range.
>>
>> Secondly, and more importantly, if PVMW_MIGRATION is set the
>> caller assumes the returned entry is a migration entry, resulting
>> in memory corruption when the caller tries to interpret the device
>> private entry as such.
>>
>> In addition, commit 146287290023 ("mm/huge_memory: implement
>> device-private THP splitting") allowed device private PMDs to be
>> split like THP mappings, but again did not update this code path.
>>
>> As a result, we might race a PMD split prior to acquiring the PMD
>> lock.
>>
>> This patch addresses all of these issues by invoking check_pmd(),
>> ensuring PMVW_MIGRATION is not set and checks whether a split raced
>> us we do for PMD THP and migration entries.
>>
>> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Suggested-by: David Hildenbrand <david@kernel.org>
>> Cc: David Hildenbrand <david@kernel.org>
>> Cc: Balbir Singh <balbirs@nvidia.com>
>> Cc: SeongJae Park <sj@kernel.org>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Lorenzo Stoakes <ljs@kernel.org>
>> Cc: Lance Yang <lance.yang@linux.dev>
>>
>> ---
>> v4:
>> * refine subject and commit log based on Lorenzo's suggestion
>> * put pmd device-private entry handling in its own if branch,
>> suggested by Lorenzo
>>
>> v3:
>> * remove cleanup part, only fix the issue for device-private entry
>> * refine user effect description based on Lorenzo's suggestion
>>
>> v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
>> * specify the possible error case of current code and user visible effect
>> * besides fix, cleanup the pmd entry handling based on David's suggestion
>>
>> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
>> ---
>> mm/page_vma_mapped.c | 20 +++++++++++++++-----
>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>> index 2ccbabfb2cc1..17dff8aab9f9 100644
>> --- a/mm/page_vma_mapped.c
>> +++ b/mm/page_vma_mapped.c
>> @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>> /* THP pmd was split under us: handle on pte level */
>> spin_unlock(pvmw->ptl);
>> pvmw->ptl = NULL;
>> - } else if (!pmd_present(pmde)) {
>> - const softleaf_t entry = softleaf_from_pmd(pmde);
>> + } else if (pmd_is_device_private_entry(pmde)) {
>> + softleaf_t entry;
>> +
>> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> + pmde = *pvmw->pmd;
>> + entry = softleaf_from_pmd(pmde);
>>
>> - if (softleaf_is_device_private(entry)) {
>> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> + if (likely(softleaf_is_device_private(entry))) {
>> + if (pvmw->flags & PVMW_MIGRATION)
>> + return not_found(pvmw);
>> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>> + return not_found(pvmw);
>> return true;
>> }
>> -
>> + /* device-private pmd was split under us: handle on pte level */
>> + spin_unlock(pvmw->ptl);
>> + pvmw->ptl = NULL;
>> + } else if (!pmd_present(pmde)) {
>> if ((pvmw->flags & PVMW_SYNC) &&
>> thp_vma_suitable_order(vma, pvmw->address,
>> PMD_ORDER) &&
>
>This is extremely hard to review given the existing crap handling here. I'm
>really sorry, but it makes my head hurt (I'm not kidding :) ).
>
>It's completely unclear why we only have to check for a subset of the cases
>after taking the lock.
>
>Could we simply extend the existing migration pmd handling and leave the
>!pmd_present() case for pmd_none()?
>
>That leaves no question to "which transitions are actually allowed", including
>"could we accidentally assume something is a page table when really it isn't".
>
>
>So what about something like the following?
>
>The "thp_migration_supported()" is not required when checking for
>pmd_is_migration_entry(), as that defaults to "false" when not compiled in.
>
>Untested:
>
>
>>From 048ecd33673ec649e168fbbb97749a7c0e344fcd Mon Sep 17 00:00:00 2001
>From: "David Hildenbrand (Arm)" <david@kernel.org>
>Date: Fri, 26 Jun 2026 12:03:40 +0200
>Subject: [PATCH] tmp
>
>Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>---
> mm/page_vma_mapped.c | 29 +++++++++++++++++------------
> 1 file changed, 17 insertions(+), 12 deletions(-)
>
>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>index 2ccbabfb2cc17..ed2a23a90e8dd 100644
>--- a/mm/page_vma_mapped.c
>+++ b/mm/page_vma_mapped.c
>@@ -243,21 +243,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> */
> pmde = pmdp_get_lockless(pvmw->pmd);
>
>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
>+ if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde) ||
>+ pmd_is_device_private_entry(pmde)) {
> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> pmde = *pvmw->pmd;
>- if (!pmd_present(pmde)) {
>+ if (pmd_is_migration_entry(pmde)) {
> softleaf_t entry;
>
>- if (!thp_migration_supported() ||
>- !(pvmw->flags & PVMW_MIGRATION))
>+ if (!(pvmw->flags & PVMW_MIGRATION))
> return not_found(pvmw);
> entry = softleaf_from_pmd(pmde);
>+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>+ return not_found(pvmw);
>+ return true;
>+ } else if (pmd_is_device_private_entry(pmde)) {
>+ softleaf_t entry;
>
>- if (!softleaf_is_migration(entry) ||
>- !check_pmd(softleaf_to_pfn(entry), pvmw))
>+ if (pvmw->flags & PVMW_MIGRATION)
>+ return not_found(pvmw);
>+ entry = softleaf_from_pmd(pmde);
>+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> return not_found(pvmw);
> return true;
>+ } else if (!pmd_present(pmde) ){
>+ return not_found(pvmw);
> }
> if (likely(pmd_trans_huge(pmde))) {
> if (pvmw->flags & PVMW_MIGRATION)
>@@ -270,12 +280,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> } else if (!pmd_present(pmde)) {
>- const softleaf_t entry = softleaf_from_pmd(pmde);
>-
>- if (softleaf_is_device_private(entry)) {
>- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>- return true;
>- }
>
> if ((pvmw->flags & PVMW_SYNC) &&
> thp_vma_suitable_order(vma, pvmw->address,
>--
Might be good with this on top:
---8<---
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index cfa1230c87bb..8b7c062bd81d 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -281,7 +281,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return not_found(pvmw);
return true;
}
- /* THP pmd was split under us: handle on pte level */
+ /* THP/device-private pmd was split under us: handle on pte level */
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
} else if (!pmd_present(pmde)) {
--
Looks good to me as well, thanks!
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-26 13:24 ` Zi Yan
@ 2026-06-26 13:32 ` Lorenzo Stoakes
0 siblings, 0 replies; 19+ messages in thread
From: Lorenzo Stoakes @ 2026-06-26 13:32 UTC (permalink / raw)
To: Zi Yan
Cc: David Hildenbrand (Arm), Wei Yang, akpm, riel, liam, vbabka,
harry, jannh, sj, balbirs, linux-mm, linux-kernel, stable,
Lance Yang
On Fri, Jun 26, 2026 at 09:24:06AM -0400, Zi Yan wrote:
> On 26 Jun 2026, at 7:31, David Hildenbrand (Arm) wrote:
>
> > On 6/26/26 12:42, Lorenzo Stoakes wrote:
> >> On Fri, Jun 26, 2026 at 12:07:56PM +0200, David Hildenbrand (Arm) wrote:
> >>> On 6/24/26 08:53, Wei Yang wrote:
> >>>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> >>>> device-private entries") introduced the concept of device-private
> >>>> PMD entries, but did not correctly update the rmap walk code to
> >>>> account for them.
> >>>>
> >>>> As a result, when page_vma_mapped_walk() encounters device-private
> >>>> PMD entries, it takes no action other than to acquire the PMD lock
> >>>> and exit.
> >>>>
> >>>> However this is highly problematic for two reasons - firstly,
> >>>> device private entries possess a PFN so check_pmd() needs to be
> >>>> called to ensure an overlapping PFN range.
> >>>>
> >>>> Secondly, and more importantly, if PVMW_MIGRATION is set the
> >>>> caller assumes the returned entry is a migration entry, resulting
> >>>> in memory corruption when the caller tries to interpret the device
> >>>> private entry as such.
> >>>>
> >>>> In addition, commit 146287290023 ("mm/huge_memory: implement
> >>>> device-private THP splitting") allowed device private PMDs to be
> >>>> split like THP mappings, but again did not update this code path.
> >>>>
> >>>> As a result, we might race a PMD split prior to acquiring the PMD
> >>>> lock.
> >>>>
> >>>> This patch addresses all of these issues by invoking check_pmd(),
> >>>> ensuring PMVW_MIGRATION is not set and checks whether a split raced
> >>>> us we do for PMD THP and migration entries.
> >>>>
> >>>> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> >>>> Cc: <stable@vger.kernel.org>
> >>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >>>> Suggested-by: David Hildenbrand <david@kernel.org>
> >>>> Cc: David Hildenbrand <david@kernel.org>
> >>>> Cc: Balbir Singh <balbirs@nvidia.com>
> >>>> Cc: SeongJae Park <sj@kernel.org>
> >>>> Cc: Zi Yan <ziy@nvidia.com>
> >>>> Cc: Lorenzo Stoakes <ljs@kernel.org>
> >>>> Cc: Lance Yang <lance.yang@linux.dev>
> >>>>
> >>>> ---
> >>>> v4:
> >>>> * refine subject and commit log based on Lorenzo's suggestion
> >>>> * put pmd device-private entry handling in its own if branch,
> >>>> suggested by Lorenzo
> >>>>
> >>>> v3:
> >>>> * remove cleanup part, only fix the issue for device-private entry
> >>>> * refine user effect description based on Lorenzo's suggestion
> >>>>
> >>>> v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
> >>>> * specify the possible error case of current code and user visible effect
> >>>> * besides fix, cleanup the pmd entry handling based on David's suggestion
> >>>>
> >>>> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
> >>>> ---
> >>>> mm/page_vma_mapped.c | 20 +++++++++++++++-----
> >>>> 1 file changed, 15 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> >>>> index 2ccbabfb2cc1..17dff8aab9f9 100644
> >>>> --- a/mm/page_vma_mapped.c
> >>>> +++ b/mm/page_vma_mapped.c
> >>>> @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> >>>> /* THP pmd was split under us: handle on pte level */
> >>>> spin_unlock(pvmw->ptl);
> >>>> pvmw->ptl = NULL;
> >>>> - } else if (!pmd_present(pmde)) {
> >>>> - const softleaf_t entry = softleaf_from_pmd(pmde);
> >>>> + } else if (pmd_is_device_private_entry(pmde)) {
> >>>> + softleaf_t entry;
> >>>> +
> >>>> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> >>>> + pmde = *pvmw->pmd;
> >>>> + entry = softleaf_from_pmd(pmde);
> >>>>
> >>>> - if (softleaf_is_device_private(entry)) {
> >>>> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> >>>> + if (likely(softleaf_is_device_private(entry))) {
> >>>> + if (pvmw->flags & PVMW_MIGRATION)
> >>>> + return not_found(pvmw);
> >>>> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> >>>> + return not_found(pvmw);
> >>>> return true;
> >>>> }
> >>>> -
> >>>> + /* device-private pmd was split under us: handle on pte level */
> >>>> + spin_unlock(pvmw->ptl);
> >>>> + pvmw->ptl = NULL;
> >>>> + } else if (!pmd_present(pmde)) {
> >>>> if ((pvmw->flags & PVMW_SYNC) &&
> >>>> thp_vma_suitable_order(vma, pvmw->address,
> >>>> PMD_ORDER) &&
> >>>
> >>> This is extremely hard to review given the existing crap handling here. I'm
> >>> really sorry, but it makes my head hurt (I'm not kidding :) ).
> >>>
> >>> It's completely unclear why we only have to check for a subset of the cases
> >>> after taking the lock.
> >>>
> >>> Could we simply extend the existing migration pmd handling and leave the
> >>> !pmd_present() case for pmd_none()?
> >>>
> >>> That leaves no question to "which transitions are actually allowed", including
> >>> "could we accidentally assume something is a page table when really it isn't".
> >>>
> >>>
> >>> So what about something like the following?
> >>>
> >>> The "thp_migration_supported()" is not required when checking for
> >>> pmd_is_migration_entry(), as that defaults to "false" when not compiled in.
> >>>
> >>> Untested:
> >>>
> >>>
> >>> From 048ecd33673ec649e168fbbb97749a7c0e344fcd Mon Sep 17 00:00:00 2001
> >>> From: "David Hildenbrand (Arm)" <david@kernel.org>
> >>> Date: Fri, 26 Jun 2026 12:03:40 +0200
> >>> Subject: [PATCH] tmp
> >>>
> >>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> >>> ---
> >>> mm/page_vma_mapped.c | 29 +++++++++++++++++------------
> >>> 1 file changed, 17 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> >>> index 2ccbabfb2cc17..ed2a23a90e8dd 100644
> >>> --- a/mm/page_vma_mapped.c
> >>> +++ b/mm/page_vma_mapped.c
> >>> @@ -243,21 +243,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> >>> */
> >>> pmde = pmdp_get_lockless(pvmw->pmd);
> >>>
> >>> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
> >>> + if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde) ||
> >>> + pmd_is_device_private_entry(pmde)) {
> >>> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> >>> pmde = *pvmw->pmd;
> >>> - if (!pmd_present(pmde)) {
> >>> + if (pmd_is_migration_entry(pmde)) {
> >>> softleaf_t entry;
> >>>
> >>> - if (!thp_migration_supported() ||
> >>
> >> Do we care about this? Or is !tmp_migration_supported() -> implies you
> >> wouldn't see a migration entry here anyway?
> >
> > Yeah, I noted above
> >
> > "The "thp_migration_supported()" is not required when checking for
> > pmd_is_migration_entry(), as that defaults to "false" when not compiled in."
> >
> > Given that
> >
> > tmp_migration_supported() -> IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);$
> >
> > And
> >
> > pmd_is_migration_entry() -> softleaf_is_migration(softleaf_from_pmd(pmd));
> >
> > whereby softleaf_from_pmd() only returns something non-none for
> > CONFIG_ARCH_ENABLE_THP_MIGRATION.
> >
> >>
> >> Maybe worth a VM_WARN_ON_ONCE()?
> >
> > I think it was primarily a a hack to slightly optimize code generated for
> > !CONFIG_ARCH_ENABLE_THP_MIGRATION, not really something for correctness as it seems.
> >
> > So I think we can safely drop it. :)
>
> thp_migration_supported() here is legacy code[1] from v4.14 when I added
> the THP migration support. IIRC, the purpose was to avoid checking
> PMD migration entry if the support is not enabled, but looking at it again
> today, that thp_migration_supported() is unnecessary since
> is_migration_entry(pmd_to_swp_entry(*pvmw->pmd)) returns false if
> !CONFIG_ARCH_ENABLE_THP_MIGRATION.
>
> [1] https://elixir.bootlin.com/linux/v4.14/source/mm/page_vma_mapped.c#L157
Thanks guys, let's drop it then!
>
> Best Regards,
> Yan, Zi
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
2026-06-26 13:27 ` Lance Yang
@ 2026-06-26 13:51 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-26 13:51 UTC (permalink / raw)
To: Lance Yang
Cc: richard.weiyang, akpm, ljs, riel, liam, vbabka, harry, jannh, ziy,
sj, balbirs, linux-mm, linux-kernel, stable
On 6/26/26 15:27, Lance Yang wrote:
>
> On Fri, Jun 26, 2026 at 12:07:56PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/24/26 08:53, Wei Yang wrote:
>>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>>> device-private entries") introduced the concept of device-private
>>> PMD entries, but did not correctly update the rmap walk code to
>>> account for them.
>>>
>>> As a result, when page_vma_mapped_walk() encounters device-private
>>> PMD entries, it takes no action other than to acquire the PMD lock
>>> and exit.
>>>
>>> However this is highly problematic for two reasons - firstly,
>>> device private entries possess a PFN so check_pmd() needs to be
>>> called to ensure an overlapping PFN range.
>>>
>>> Secondly, and more importantly, if PVMW_MIGRATION is set the
>>> caller assumes the returned entry is a migration entry, resulting
>>> in memory corruption when the caller tries to interpret the device
>>> private entry as such.
>>>
>>> In addition, commit 146287290023 ("mm/huge_memory: implement
>>> device-private THP splitting") allowed device private PMDs to be
>>> split like THP mappings, but again did not update this code path.
>>>
>>> As a result, we might race a PMD split prior to acquiring the PMD
>>> lock.
>>>
>>> This patch addresses all of these issues by invoking check_pmd(),
>>> ensuring PMVW_MIGRATION is not set and checks whether a split raced
>>> us we do for PMD THP and migration entries.
>>>
>>> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Suggested-by: David Hildenbrand <david@kernel.org>
>>> Cc: David Hildenbrand <david@kernel.org>
>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>> Cc: SeongJae Park <sj@kernel.org>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Lorenzo Stoakes <ljs@kernel.org>
>>> Cc: Lance Yang <lance.yang@linux.dev>
>>>
>>> ---
>>> v4:
>>> * refine subject and commit log based on Lorenzo's suggestion
>>> * put pmd device-private entry handling in its own if branch,
>>> suggested by Lorenzo
>>>
>>> v3:
>>> * remove cleanup part, only fix the issue for device-private entry
>>> * refine user effect description based on Lorenzo's suggestion
>>>
>>> v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
>>> * specify the possible error case of current code and user visible effect
>>> * besides fix, cleanup the pmd entry handling based on David's suggestion
>>>
>>> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
>>> ---
>>> mm/page_vma_mapped.c | 20 +++++++++++++++-----
>>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>> index 2ccbabfb2cc1..17dff8aab9f9 100644
>>> --- a/mm/page_vma_mapped.c
>>> +++ b/mm/page_vma_mapped.c
>>> @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>> /* THP pmd was split under us: handle on pte level */
>>> spin_unlock(pvmw->ptl);
>>> pvmw->ptl = NULL;
>>> - } else if (!pmd_present(pmde)) {
>>> - const softleaf_t entry = softleaf_from_pmd(pmde);
>>> + } else if (pmd_is_device_private_entry(pmde)) {
>>> + softleaf_t entry;
>>> +
>>> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>> + pmde = *pvmw->pmd;
>>> + entry = softleaf_from_pmd(pmde);
>>>
>>> - if (softleaf_is_device_private(entry)) {
>>> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>> + if (likely(softleaf_is_device_private(entry))) {
>>> + if (pvmw->flags & PVMW_MIGRATION)
>>> + return not_found(pvmw);
>>> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>>> + return not_found(pvmw);
>>> return true;
>>> }
>>> -
>>> + /* device-private pmd was split under us: handle on pte level */
>>> + spin_unlock(pvmw->ptl);
>>> + pvmw->ptl = NULL;
>>> + } else if (!pmd_present(pmde)) {
>>> if ((pvmw->flags & PVMW_SYNC) &&
>>> thp_vma_suitable_order(vma, pvmw->address,
>>> PMD_ORDER) &&
>>
>> This is extremely hard to review given the existing crap handling here. I'm
>> really sorry, but it makes my head hurt (I'm not kidding :) ).
>>
>> It's completely unclear why we only have to check for a subset of the cases
>> after taking the lock.
>>
>> Could we simply extend the existing migration pmd handling and leave the
>> !pmd_present() case for pmd_none()?
>>
>> That leaves no question to "which transitions are actually allowed", including
>> "could we accidentally assume something is a page table when really it isn't".
>>
>>
>> So what about something like the following?
>>
>> The "thp_migration_supported()" is not required when checking for
>> pmd_is_migration_entry(), as that defaults to "false" when not compiled in.
>>
>> Untested:
>>
>>
>> >From 048ecd33673ec649e168fbbb97749a7c0e344fcd Mon Sep 17 00:00:00 2001
>> From: "David Hildenbrand (Arm)" <david@kernel.org>
>> Date: Fri, 26 Jun 2026 12:03:40 +0200
>> Subject: [PATCH] tmp
>>
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>> ---
>> mm/page_vma_mapped.c | 29 +++++++++++++++++------------
>> 1 file changed, 17 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>> index 2ccbabfb2cc17..ed2a23a90e8dd 100644
>> --- a/mm/page_vma_mapped.c
>> +++ b/mm/page_vma_mapped.c
>> @@ -243,21 +243,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>> */
>> pmde = pmdp_get_lockless(pvmw->pmd);
>>
>> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
>> + if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde) ||
>> + pmd_is_device_private_entry(pmde)) {
>> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> pmde = *pvmw->pmd;
>> - if (!pmd_present(pmde)) {
>> + if (pmd_is_migration_entry(pmde)) {
>> softleaf_t entry;
>>
>> - if (!thp_migration_supported() ||
>> - !(pvmw->flags & PVMW_MIGRATION))
>> + if (!(pvmw->flags & PVMW_MIGRATION))
>> return not_found(pvmw);
>> entry = softleaf_from_pmd(pmde);
>> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>> + return not_found(pvmw);
>> + return true;
>> + } else if (pmd_is_device_private_entry(pmde)) {
>> + softleaf_t entry;
>>
>> - if (!softleaf_is_migration(entry) ||
>> - !check_pmd(softleaf_to_pfn(entry), pvmw))
>> + if (pvmw->flags & PVMW_MIGRATION)
>> + return not_found(pvmw);
>> + entry = softleaf_from_pmd(pmde);
>> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>> return not_found(pvmw);
>> return true;
>> + } else if (!pmd_present(pmde) ){
>> + return not_found(pvmw);
>> }
>> if (likely(pmd_trans_huge(pmde))) {
>> if (pvmw->flags & PVMW_MIGRATION)
>> @@ -270,12 +280,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>> spin_unlock(pvmw->ptl);
>> pvmw->ptl = NULL;
>> } else if (!pmd_present(pmde)) {
>> - const softleaf_t entry = softleaf_from_pmd(pmde);
>> -
>> - if (softleaf_is_device_private(entry)) {
>> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> - return true;
>> - }
>>
>> if ((pvmw->flags & PVMW_SYNC) &&
>> thp_vma_suitable_order(vma, pvmw->address,
>> --
>
> Might be good with this on top:
>
> ---8<---
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index cfa1230c87bb..8b7c062bd81d 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -281,7 +281,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> return not_found(pvmw);
> return true;
> }
> - /* THP pmd was split under us: handle on pte level */
> + /* THP/device-private pmd was split under us: handle on pte level */
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> } else if (!pmd_present(pmde)) {
> --
>
> Looks good to me as well, thanks!
Makes sense, thanks!
--
Cheers,
David
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-06-26 13:51 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang
2026-06-24 8:57 ` Lance Yang
2026-06-25 9:57 ` Wei Yang
2026-06-25 10:37 ` David Hildenbrand (Arm)
2026-06-25 11:25 ` Lance Yang
2026-06-25 11:42 ` Lance Yang
2026-06-25 21:07 ` Andrew Morton
2026-06-25 13:12 ` Lorenzo Stoakes
2026-06-25 11:12 ` Balbir Singh
2026-06-26 0:44 ` Wei Yang
2026-06-26 0:58 ` Andrew Morton
2026-06-25 19:39 ` Zi Yan
2026-06-26 10:07 ` David Hildenbrand (Arm)
2026-06-26 10:42 ` Lorenzo Stoakes
2026-06-26 11:31 ` David Hildenbrand (Arm)
2026-06-26 13:24 ` Zi Yan
2026-06-26 13:32 ` Lorenzo Stoakes
2026-06-26 13:27 ` Lance Yang
2026-06-26 13:51 ` David Hildenbrand (Arm)
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.