* [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
@ 2026-06-24 6:53 Wei Yang
2026-06-24 8:57 ` Lance Yang
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Wei Yang @ 2026-06-24 6:53 UTC (permalink / raw)
To: akpm, david, ljs, riel, liam, vbabka, harry, jannh, ziy, sj,
balbirs
Cc: linux-mm, linux-kernel, Wei Yang, stable, Lance Yang
Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
device-private entries") introduced the concept of device-private
PMD entries, but did not correctly update the rmap walk code to
account for them.
As a result, when page_vma_mapped_walk() encounters device-private
PMD entries, it takes no action other than to acquire the PMD lock
and exit.
However this is highly problematic for two reasons - firstly,
device private entries possess a PFN so check_pmd() needs to be
called to ensure an overlapping PFN range.
Secondly, and more importantly, if PVMW_MIGRATION is set the
caller assumes the returned entry is a migration entry, resulting
in memory corruption when the caller tries to interpret the device
private entry as such.
In addition, commit 146287290023 ("mm/huge_memory: implement
device-private THP splitting") allowed device private PMDs to be
split like THP mappings, but again did not update this code path.
As a result, we might race a PMD split prior to acquiring the PMD
lock.
This patch addresses all of these issues by invoking check_pmd(),
ensuring PMVW_MIGRATION is not set and checks whether a split raced
us we do for PMD THP and migration entries.
Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
Cc: <stable@vger.kernel.org>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Suggested-by: David Hildenbrand <david@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
---
v4:
* refine subject and commit log based on Lorenzo's suggestion
* put pmd device-private entry handling in its own if branch,
suggested by Lorenzo
v3:
* remove cleanup part, only fix the issue for device-private entry
* refine user effect description based on Lorenzo's suggestion
v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
* specify the possible error case of current code and user visible effect
* besides fix, cleanup the pmd entry handling based on David's suggestion
v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
---
mm/page_vma_mapped.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 2ccbabfb2cc1..17dff8aab9f9 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
/* THP pmd was split under us: handle on pte level */
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
- } else if (!pmd_present(pmde)) {
- const softleaf_t entry = softleaf_from_pmd(pmde);
+ } else if (pmd_is_device_private_entry(pmde)) {
+ softleaf_t entry;
+
+ pvmw->ptl = pmd_lock(mm, pvmw->pmd);
+ pmde = *pvmw->pmd;
+ entry = softleaf_from_pmd(pmde);
- if (softleaf_is_device_private(entry)) {
- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
+ if (likely(softleaf_is_device_private(entry))) {
+ if (pvmw->flags & PVMW_MIGRATION)
+ return not_found(pvmw);
+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
+ return not_found(pvmw);
return true;
}
-
+ /* device-private pmd was split under us: handle on pte level */
+ spin_unlock(pvmw->ptl);
+ pvmw->ptl = NULL;
+ } else if (!pmd_present(pmde)) {
if ((pvmw->flags & PVMW_SYNC) &&
thp_vma_suitable_order(vma, pvmw->address,
PMD_ORDER) &&
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang @ 2026-06-24 8:57 ` Lance Yang 2026-06-25 9:57 ` Wei Yang ` (2 more replies) 2026-06-25 11:12 ` Balbir Singh 2026-06-25 19:39 ` Zi Yan 2 siblings, 3 replies; 12+ messages in thread From: Lance Yang @ 2026-06-24 8:57 UTC (permalink / raw) To: richard.weiyang Cc: akpm, david, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, balbirs, linux-mm, linux-kernel, stable, lance.yang On Wed, Jun 24, 2026 at 06:53:53AM +0000, Wei Yang wrote: >Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support >device-private entries") introduced the concept of device-private >PMD entries, but did not correctly update the rmap walk code to >account for them. > >As a result, when page_vma_mapped_walk() encounters device-private >PMD entries, it takes no action other than to acquire the PMD lock >and exit. > >However this is highly problematic for two reasons - firstly, >device private entries possess a PFN so check_pmd() needs to be >called to ensure an overlapping PFN range. > >Secondly, and more importantly, if PVMW_MIGRATION is set the >caller assumes the returned entry is a migration entry, resulting >in memory corruption when the caller tries to interpret the device >private entry as such. > >In addition, commit 146287290023 ("mm/huge_memory: implement >device-private THP splitting") allowed device private PMDs to be >split like THP mappings, but again did not update this code path. > >As a result, we might race a PMD split prior to acquiring the PMD >lock. > >This patch addresses all of these issues by invoking check_pmd(), >ensuring PMVW_MIGRATION is not set and checks whether a split raced >us we do for PMD THP and migration entries. > >Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") >Cc: <stable@vger.kernel.org> >Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >Suggested-by: David Hildenbrand <david@kernel.org> Shouldn't we add Suggested-by: Lorenzo Stoakes <ljs@kernel.org> as well? v4 mostly follows Lorenzo's comments, code bits included. Feels only fair. >Cc: David Hildenbrand <david@kernel.org> >Cc: Balbir Singh <balbirs@nvidia.com> >Cc: SeongJae Park <sj@kernel.org> >Cc: Zi Yan <ziy@nvidia.com> >Cc: Lorenzo Stoakes <ljs@kernel.org> >Cc: Lance Yang <lance.yang@linux.dev> > >--- >v4: > * refine subject and commit log based on Lorenzo's suggestion > * put pmd device-private entry handling in its own if branch, > suggested by Lorenzo > >v3: > * remove cleanup part, only fix the issue for device-private entry > * refine user effect description based on Lorenzo's suggestion > >v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u > * specify the possible error case of current code and user visible effect > * besides fix, cleanup the pmd entry handling based on David's suggestion > >v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/ >--- > mm/page_vma_mapped.c | 20 +++++++++++++++----- > 1 file changed, 15 insertions(+), 5 deletions(-) > >diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >index 2ccbabfb2cc1..17dff8aab9f9 100644 >--- a/mm/page_vma_mapped.c >+++ b/mm/page_vma_mapped.c >@@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) Hmm ... looks like there may still be a race here ... Current code picks the branch from the lockless PMD value: pmde = pmdp_get_lockless(pvmw->pmd); if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); pmde = *pvmw->pmd; if (!pmd_present(pmde)) { softleaf_t entry; if (!thp_migration_supported() || !(pvmw->flags & PVMW_MIGRATION)) return not_found(pvmw); entry = softleaf_from_pmd(pmde); if (!softleaf_is_migration(entry) || !check_pmd(softleaf_to_pfn(entry), pvmw)) return not_found(pvmw); return true; } } But after taking PTL, the PMD may already be a different non-present PMD type: CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry CPU1: remove_migration_ptes(src, dst /* device-private */) ... via rmap_walk(dst) ... page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */) returns with PTL held for the PMD migration entry remove_migration_pmd(new = dst page) installs a device-private PMD next page_vma_mapped_walk() drops PTL via not_found() CPU0: takes PTL pmde = *pvmw->pmd; // now device-private PMD So when PVMW_MIGRATION is not set, current code can return not_found() before we even decode the locked PMD as a device-private entry. Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") made the device-private PMD <-> PMD migration transition possible. set_pmd_migration_entry() can replace a device-private PMD with a PMD migration entry, and remove_migration_pmd() can restore a PMD migration entry back to a device-private PMD when the new folio is device-private. Maybe decode the locked softleaf entry first, before the migration-only checks? Something like this on top: ---8<--- diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 17dff8aab9f9..97babd408dba 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -249,10 +249,18 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (!pmd_present(pmde)) { softleaf_t entry; + entry = softleaf_from_pmd(pmde); + if (softleaf_is_device_private(entry)) { + if (pvmw->flags & PVMW_MIGRATION) + return not_found(pvmw); + if (!check_pmd(softleaf_to_pfn(entry), pvmw)) + return not_found(pvmw); + return true; + } + if (!thp_migration_supported() || !(pvmw->flags & PVMW_MIGRATION)) return not_found(pvmw); - entry = softleaf_from_pmd(pmde); if (!softleaf_is_migration(entry) || !check_pmd(softleaf_to_pfn(entry), pvmw)) @@ -266,7 +274,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return not_found(pvmw); return true; } - /* THP pmd was split under us: handle on pte level */ + /* + * THP pmd was split under us, or device-private PMD + * changed under us: handle on pte level. + */ spin_unlock(pvmw->ptl); pvmw->ptl = NULL; } else if (pmd_is_device_private_entry(pmde)) { -- Anyway, that stuff is getting kinda messy now. Feels like it really needs a cleanup on top before it bites us again :) Cheers, Lance > /* THP pmd was split under us: handle on pte level */ > spin_unlock(pvmw->ptl); > pvmw->ptl = NULL; >- } else if (!pmd_present(pmde)) { >- const softleaf_t entry = softleaf_from_pmd(pmde); >+ } else if (pmd_is_device_private_entry(pmde)) { >+ softleaf_t entry; >+ >+ pvmw->ptl = pmd_lock(mm, pvmw->pmd); >+ pmde = *pvmw->pmd; >+ entry = softleaf_from_pmd(pmde); > >- if (softleaf_is_device_private(entry)) { >- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >+ if (likely(softleaf_is_device_private(entry))) { >+ if (pvmw->flags & PVMW_MIGRATION) >+ return not_found(pvmw); >+ if (!check_pmd(softleaf_to_pfn(entry), pvmw)) >+ return not_found(pvmw); > return true; > } >- >+ /* device-private pmd was split under us: handle on pte level */ >+ spin_unlock(pvmw->ptl); >+ pvmw->ptl = NULL; >+ } else if (!pmd_present(pmde)) { > if ((pvmw->flags & PVMW_SYNC) && > thp_vma_suitable_order(vma, pvmw->address, > PMD_ORDER) && >-- >2.34.1 > > ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-24 8:57 ` Lance Yang @ 2026-06-25 9:57 ` Wei Yang 2026-06-25 10:37 ` David Hildenbrand (Arm) 2026-06-25 11:42 ` Lance Yang 2026-06-25 13:12 ` Lorenzo Stoakes 2 siblings, 1 reply; 12+ messages in thread From: Wei Yang @ 2026-06-25 9:57 UTC (permalink / raw) To: Lance Yang Cc: richard.weiyang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, balbirs, linux-mm, linux-kernel, stable On Wed, Jun 24, 2026 at 04:57:56PM +0800, Lance Yang wrote: > >On Wed, Jun 24, 2026 at 06:53:53AM +0000, Wei Yang wrote: >>Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support >>device-private entries") introduced the concept of device-private >>PMD entries, but did not correctly update the rmap walk code to >>account for them. >> >>As a result, when page_vma_mapped_walk() encounters device-private >>PMD entries, it takes no action other than to acquire the PMD lock >>and exit. >> >>However this is highly problematic for two reasons - firstly, >>device private entries possess a PFN so check_pmd() needs to be >>called to ensure an overlapping PFN range. >> >>Secondly, and more importantly, if PVMW_MIGRATION is set the >>caller assumes the returned entry is a migration entry, resulting >>in memory corruption when the caller tries to interpret the device >>private entry as such. >> >>In addition, commit 146287290023 ("mm/huge_memory: implement >>device-private THP splitting") allowed device private PMDs to be >>split like THP mappings, but again did not update this code path. >> >>As a result, we might race a PMD split prior to acquiring the PMD >>lock. >> >>This patch addresses all of these issues by invoking check_pmd(), >>ensuring PMVW_MIGRATION is not set and checks whether a split raced >>us we do for PMD THP and migration entries. >> >>Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") >>Cc: <stable@vger.kernel.org> >>Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >>Suggested-by: David Hildenbrand <david@kernel.org> > >Shouldn't we add > >Suggested-by: Lorenzo Stoakes <ljs@kernel.org> > >as well? > >v4 mostly follows Lorenzo's comments, code bits included. Feels only fair. Fair enough, added. > >>Cc: David Hildenbrand <david@kernel.org> >>Cc: Balbir Singh <balbirs@nvidia.com> >>Cc: SeongJae Park <sj@kernel.org> >>Cc: Zi Yan <ziy@nvidia.com> >>Cc: Lorenzo Stoakes <ljs@kernel.org> >>Cc: Lance Yang <lance.yang@linux.dev> >> >>--- >>v4: >> * refine subject and commit log based on Lorenzo's suggestion >> * put pmd device-private entry handling in its own if branch, >> suggested by Lorenzo >> >>v3: >> * remove cleanup part, only fix the issue for device-private entry >> * refine user effect description based on Lorenzo's suggestion >> >>v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u >> * specify the possible error case of current code and user visible effect >> * besides fix, cleanup the pmd entry handling based on David's suggestion >> >>v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/ >>--- >> mm/page_vma_mapped.c | 20 +++++++++++++++----- >> 1 file changed, 15 insertions(+), 5 deletions(-) >> >>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>index 2ccbabfb2cc1..17dff8aab9f9 100644 >>--- a/mm/page_vma_mapped.c >>+++ b/mm/page_vma_mapped.c >>@@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > > >Hmm ... looks like there may still be a race here ... > >Current code picks the branch from the lockless PMD value: > > pmde = pmdp_get_lockless(pvmw->pmd); > > if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { > pvmw->ptl = pmd_lock(mm, pvmw->pmd); > pmde = *pvmw->pmd; > if (!pmd_present(pmde)) { > softleaf_t entry; > > if (!thp_migration_supported() || > !(pvmw->flags & PVMW_MIGRATION)) > return not_found(pvmw); > entry = softleaf_from_pmd(pmde); > > if (!softleaf_is_migration(entry) || > !check_pmd(softleaf_to_pfn(entry), pvmw)) > return not_found(pvmw); > return true; > } > } > >But after taking PTL, the PMD may already be a different non-present PMD >type: > >CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry > >CPU1: remove_migration_ptes(src, dst /* device-private */) > ... via rmap_walk(dst) ... > page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */) > returns with PTL held for the PMD migration entry > remove_migration_pmd(new = dst page) > installs a device-private PMD > next page_vma_mapped_walk() > drops PTL via not_found() > >CPU0: takes PTL > pmde = *pvmw->pmd; // now device-private PMD > >So when PVMW_MIGRATION is not set, current code can return not_found() >before we even decode the locked PMD as a device-private entry. > >Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support >device-private entries") made the > >device-private PMD <-> PMD migration > >transition possible. > >set_pmd_migration_entry() can replace a device-private PMD with a PMD >migration entry, and remove_migration_pmd() can restore a PMD migration >entry back to a device-private PMD when the new folio is device-private. > Nice catch. But I think this matters if migration fail and restore the pmd to src folio. When we successfully migrate to new folio, check_pmd() could catch it and return not_found(). IIUC. One more question: assume A unmap a folio, and B migrate the same one. If B set_pmd_migration_entry() first, then A won't see this PMD from page_vma_mapped_walk(), IIUC. Then B failed to migrate, and restore the folio as this PMD migration entry is there. So A should check the status after unmap, right? Would it see unstable status? I am a little lost what is the correct way to do here. >Maybe decode the locked softleaf entry first, before the migration-only >checks? Something like this on top: > >---8<--- >diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >index 17dff8aab9f9..97babd408dba 100644 >--- a/mm/page_vma_mapped.c >+++ b/mm/page_vma_mapped.c >@@ -249,10 +249,18 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > if (!pmd_present(pmde)) { > softleaf_t entry; > >+ entry = softleaf_from_pmd(pmde); >+ if (softleaf_is_device_private(entry)) { >+ if (pvmw->flags & PVMW_MIGRATION) >+ return not_found(pvmw); >+ if (!check_pmd(softleaf_to_pfn(entry), pvmw)) >+ return not_found(pvmw); >+ return true; >+ } >+ If we have to do this, I am afraid we can put all three cases handling here... Not necessary to put pmd_is_device_private_entry() handling in two places. > if (!thp_migration_supported() || > !(pvmw->flags & PVMW_MIGRATION)) > return not_found(pvmw); >- entry = softleaf_from_pmd(pmde); > > if (!softleaf_is_migration(entry) || > !check_pmd(softleaf_to_pfn(entry), pvmw)) >@@ -266,7 +274,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > return not_found(pvmw); > return true; > } >- /* THP pmd was split under us: handle on pte level */ >+ /* >+ * THP pmd was split under us, or device-private PMD >+ * changed under us: handle on pte level. >+ */ > spin_unlock(pvmw->ptl); > pvmw->ptl = NULL; > } else if (pmd_is_device_private_entry(pmde)) { >-- > >Anyway, that stuff is getting kinda messy now. Feels like it really needs >a cleanup on top before it bites us again :) Agree. I haven't imagined this would be more complicated than I thought :-) >Cheers, Lance -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-25 9:57 ` Wei Yang @ 2026-06-25 10:37 ` David Hildenbrand (Arm) 2026-06-25 11:25 ` Lance Yang 0 siblings, 1 reply; 12+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-25 10:37 UTC (permalink / raw) To: Wei Yang, Lance Yang Cc: akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, balbirs, linux-mm, linux-kernel, stable >> CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry >> >> CPU1: remove_migration_ptes(src, dst /* device-private */) >> ... via rmap_walk(dst) ... >> page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */) >> returns with PTL held for the PMD migration entry >> remove_migration_pmd(new = dst page) >> installs a device-private PMD >> next page_vma_mapped_walk() >> drops PTL via not_found() >> >> CPU0: takes PTL >> pmde = *pvmw->pmd; // now device-private PMD >> >> So when PVMW_MIGRATION is not set, current code can return not_found() >> before we even decode the locked PMD as a device-private entry. >> >> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support >> device-private entries") made the >> >> device-private PMD <-> PMD migration >> >> transition possible. Doesn't the folio lock help here already? -- Cheers, David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-25 10:37 ` David Hildenbrand (Arm) @ 2026-06-25 11:25 ` Lance Yang 0 siblings, 0 replies; 12+ messages in thread From: Lance Yang @ 2026-06-25 11:25 UTC (permalink / raw) To: David Hildenbrand (Arm), Wei Yang, balbirs Cc: akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm, linux-kernel, stable On 2026/6/25 18:37, David Hildenbrand (Arm) wrote: > >>> CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry >>> >>> CPU1: remove_migration_ptes(src, dst /* device-private */) >>> ... via rmap_walk(dst) ... >>> page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */) >>> returns with PTL held for the PMD migration entry >>> remove_migration_pmd(new = dst page) >>> installs a device-private PMD >>> next page_vma_mapped_walk() >>> drops PTL via not_found() >>> >>> CPU0: takes PTL >>> pmde = *pvmw->pmd; // now device-private PMD >>> >>> So when PVMW_MIGRATION is not set, current code can return not_found() >>> before we even decode the locked PMD as a device-private entry. >>> >>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support >>> device-private entries") made the >>> >>> device-private PMD <-> PMD migration >>> >>> transition possible. > > Doesn't the folio lock help here already? Ah, yeah, I was too focused on the PTL and missed the folio lock ... Don't have a caller like that :) Went over the fix again, nothing else jumped out. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-24 8:57 ` Lance Yang 2026-06-25 9:57 ` Wei Yang @ 2026-06-25 11:42 ` Lance Yang 2026-06-25 21:07 ` Andrew Morton 2026-06-25 13:12 ` Lorenzo Stoakes 2 siblings, 1 reply; 12+ messages in thread From: Lance Yang @ 2026-06-25 11:42 UTC (permalink / raw) To: richard.weiyang, david, balbirs Cc: akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm, linux-kernel, stable, Lance Yang On Wed, Jun 24, 2026 at 04:57:56PM +0800, Lance Yang wrote: > [...] >> >>Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") >>Cc: <stable@vger.kernel.org> >>Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >>Suggested-by: David Hildenbrand <david@kernel.org> > >Shouldn't we add > >Suggested-by: Lorenzo Stoakes <ljs@kernel.org> > >as well? No need to resend. I think Andrew can add this when applying :) >v4 mostly follows Lorenzo's comments, code bits included. Feels only fair. > >>Cc: David Hildenbrand <david@kernel.org> >>Cc: Balbir Singh <balbirs@nvidia.com> >>Cc: SeongJae Park <sj@kernel.org> >>Cc: Zi Yan <ziy@nvidia.com> >>Cc: Lorenzo Stoakes <ljs@kernel.org> >>Cc: Lance Yang <lance.yang@linux.dev> >> >>--- >>v4: >> * refine subject and commit log based on Lorenzo's suggestion >> * put pmd device-private entry handling in its own if branch, >> suggested by Lorenzo >> >>v3: >> * remove cleanup part, only fix the issue for device-private entry >> * refine user effect description based on Lorenzo's suggestion >> >>v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u >> * specify the possible error case of current code and user visible effect >> * besides fix, cleanup the pmd entry handling based on David's suggestion >> >>v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/ >>--- >> mm/page_vma_mapped.c | 20 +++++++++++++++----- >> 1 file changed, 15 insertions(+), 5 deletions(-) >> >>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>index 2ccbabfb2cc1..17dff8aab9f9 100644 >>--- a/mm/page_vma_mapped.c >>+++ b/mm/page_vma_mapped.c >>@@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > Never mind my race comment below. Obviously missed folio lock there. My bad. Don't have a caller like that. Nothing else jumped out, so: Reviewed-by: Lance Yang <lance.yang@linux.dev> Cheers, Lance > >Hmm ... looks like there may still be a race here ... > >Current code picks the branch from the lockless PMD value: > > pmde = pmdp_get_lockless(pvmw->pmd); > > if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { > pvmw->ptl = pmd_lock(mm, pvmw->pmd); > pmde = *pvmw->pmd; > if (!pmd_present(pmde)) { > softleaf_t entry; > > if (!thp_migration_supported() || > !(pvmw->flags & PVMW_MIGRATION)) > return not_found(pvmw); > entry = softleaf_from_pmd(pmde); > > if (!softleaf_is_migration(entry) || > !check_pmd(softleaf_to_pfn(entry), pvmw)) > return not_found(pvmw); > return true; > } > } > >But after taking PTL, the PMD may already be a different non-present PMD >type: > >CPU0: pmde = pmdp_get_lockless(); // sees PMD migration entry > >CPU1: remove_migration_ptes(src, dst /* device-private */) > ... via rmap_walk(dst) ... > page_vma_mapped_walk(&pvmw /* src, PVMW_MIGRATION */) > returns with PTL held for the PMD migration entry > remove_migration_pmd(new = dst page) > installs a device-private PMD > next page_vma_mapped_walk() > drops PTL via not_found() > >CPU0: takes PTL > pmde = *pvmw->pmd; // now device-private PMD > >So when PVMW_MIGRATION is not set, current code can return not_found() >before we even decode the locked PMD as a device-private entry. > >Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support >device-private entries") made the > >device-private PMD <-> PMD migration > >transition possible. > >set_pmd_migration_entry() can replace a device-private PMD with a PMD >migration entry, and remove_migration_pmd() can restore a PMD migration >entry back to a device-private PMD when the new folio is device-private. > >Maybe decode the locked softleaf entry first, before the migration-only >checks? Something like this on top: > >---8<--- >diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >index 17dff8aab9f9..97babd408dba 100644 >--- a/mm/page_vma_mapped.c >+++ b/mm/page_vma_mapped.c >@@ -249,10 +249,18 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > if (!pmd_present(pmde)) { > softleaf_t entry; > >+ entry = softleaf_from_pmd(pmde); >+ if (softleaf_is_device_private(entry)) { >+ if (pvmw->flags & PVMW_MIGRATION) >+ return not_found(pvmw); >+ if (!check_pmd(softleaf_to_pfn(entry), pvmw)) >+ return not_found(pvmw); >+ return true; >+ } >+ > if (!thp_migration_supported() || > !(pvmw->flags & PVMW_MIGRATION)) > return not_found(pvmw); >- entry = softleaf_from_pmd(pmde); > > if (!softleaf_is_migration(entry) || > !check_pmd(softleaf_to_pfn(entry), pvmw)) >@@ -266,7 +274,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > return not_found(pvmw); > return true; > } >- /* THP pmd was split under us: handle on pte level */ >+ /* >+ * THP pmd was split under us, or device-private PMD >+ * changed under us: handle on pte level. >+ */ > spin_unlock(pvmw->ptl); > pvmw->ptl = NULL; > } else if (pmd_is_device_private_entry(pmde)) { >-- > >Anyway, that stuff is getting kinda messy now. Feels like it really needs >a cleanup on top before it bites us again :) > >Cheers, Lance > >> /* THP pmd was split under us: handle on pte level */ >> spin_unlock(pvmw->ptl); >> pvmw->ptl = NULL; >>- } else if (!pmd_present(pmde)) { >>- const softleaf_t entry = softleaf_from_pmd(pmde); >>+ } else if (pmd_is_device_private_entry(pmde)) { >>+ softleaf_t entry; >>+ >>+ pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>+ pmde = *pvmw->pmd; >>+ entry = softleaf_from_pmd(pmde); >> >>- if (softleaf_is_device_private(entry)) { >>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>+ if (likely(softleaf_is_device_private(entry))) { >>+ if (pvmw->flags & PVMW_MIGRATION) >>+ return not_found(pvmw); >>+ if (!check_pmd(softleaf_to_pfn(entry), pvmw)) >>+ return not_found(pvmw); >> return true; >> } >>- >>+ /* device-private pmd was split under us: handle on pte level */ >>+ spin_unlock(pvmw->ptl); >>+ pvmw->ptl = NULL; >>+ } else if (!pmd_present(pmde)) { >> if ((pvmw->flags & PVMW_SYNC) && >> thp_vma_suitable_order(vma, pvmw->address, >> PMD_ORDER) && >>-- >>2.34.1 >> >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-25 11:42 ` Lance Yang @ 2026-06-25 21:07 ` Andrew Morton 0 siblings, 0 replies; 12+ messages in thread From: Andrew Morton @ 2026-06-25 21:07 UTC (permalink / raw) To: Lance Yang Cc: richard.weiyang, david, balbirs, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm, linux-kernel, stable On Thu, 25 Jun 2026 19:42:35 +0800 Lance Yang <lance.yang@linux.dev> wrote: > >Suggested-by: Lorenzo Stoakes <ljs@kernel.org> > > > >as well? > > No need to resend. I think Andrew can add this when applying :) I managed. Thanks, I'll queue this as a hotfix. Somewhat tentatively - it's unclear whether we'll be seeing a v2? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-24 8:57 ` Lance Yang 2026-06-25 9:57 ` Wei Yang 2026-06-25 11:42 ` Lance Yang @ 2026-06-25 13:12 ` Lorenzo Stoakes 2 siblings, 0 replies; 12+ messages in thread From: Lorenzo Stoakes @ 2026-06-25 13:12 UTC (permalink / raw) To: Lance Yang Cc: richard.weiyang, akpm, david, riel, liam, vbabka, harry, jannh, ziy, sj, balbirs, linux-mm, linux-kernel, stable On Wed, Jun 24, 2026 at 04:57:56PM +0800, Lance Yang wrote: > > On Wed, Jun 24, 2026 at 06:53:53AM +0000, Wei Yang wrote: > >Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support > >device-private entries") introduced the concept of device-private > >PMD entries, but did not correctly update the rmap walk code to > >account for them. > > > >As a result, when page_vma_mapped_walk() encounters device-private > >PMD entries, it takes no action other than to acquire the PMD lock > >and exit. > > > >However this is highly problematic for two reasons - firstly, > >device private entries possess a PFN so check_pmd() needs to be > >called to ensure an overlapping PFN range. > > > >Secondly, and more importantly, if PVMW_MIGRATION is set the > >caller assumes the returned entry is a migration entry, resulting > >in memory corruption when the caller tries to interpret the device > >private entry as such. > > > >In addition, commit 146287290023 ("mm/huge_memory: implement > >device-private THP splitting") allowed device private PMDs to be > >split like THP mappings, but again did not update this code path. > > > >As a result, we might race a PMD split prior to acquiring the PMD > >lock. > > > >This patch addresses all of these issues by invoking check_pmd(), > >ensuring PMVW_MIGRATION is not set and checks whether a split raced > >us we do for PMD THP and migration entries. > > > >Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") > >Cc: <stable@vger.kernel.org> > >Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > >Suggested-by: David Hildenbrand <david@kernel.org> > > Shouldn't we add > > Suggested-by: Lorenzo Stoakes <ljs@kernel.org> > > as well? > > v4 mostly follows Lorenzo's comments, code bits included. Feels only fair. Thanks Lance :) I'm kinda indifferent about it really, I'm really keen to ensure people sending patches get the credit for their work, so if I send a patch in reply as a shorthand for 'I think this might work better', I don't expect/require any credit at all, it's just sometimes a quicker way of responding! But if Wei wants to add a S-b that's fine by me also! :) Cheers, Lorenzo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang 2026-06-24 8:57 ` Lance Yang @ 2026-06-25 11:12 ` Balbir Singh 2026-06-26 0:44 ` Wei Yang 2026-06-25 19:39 ` Zi Yan 2 siblings, 1 reply; 12+ messages in thread From: Balbir Singh @ 2026-06-25 11:12 UTC (permalink / raw) To: Wei Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, ziy, sj Cc: linux-mm, linux-kernel, stable, Lance Yang On 6/24/26 16:53, Wei Yang wrote: > Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support > device-private entries") introduced the concept of device-private > PMD entries, but did not correctly update the rmap walk code to > account for them. > > As a result, when page_vma_mapped_walk() encounters device-private > PMD entries, it takes no action other than to acquire the PMD lock > and exit. > > However this is highly problematic for two reasons - firstly, > device private entries possess a PFN so check_pmd() needs to be > called to ensure an overlapping PFN range. > > Secondly, and more importantly, if PVMW_MIGRATION is set the > caller assumes the returned entry is a migration entry, resulting > in memory corruption when the caller tries to interpret the device > private entry as such. > > In addition, commit 146287290023 ("mm/huge_memory: implement > device-private THP splitting") allowed device private PMDs to be > split like THP mappings, but again did not update this code path. > > As a result, we might race a PMD split prior to acquiring the PMD > lock. > > This patch addresses all of these issues by invoking check_pmd(), > ensuring PMVW_MIGRATION is not set and checks whether a split raced > us we do for PMD THP and migration entries. Should be PVMW_MIGRATION and "us we do" -> "as we do" > > Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") > Cc: <stable@vger.kernel.org> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > Suggested-by: David Hildenbrand <david@kernel.org> > Cc: David Hildenbrand <david@kernel.org> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: SeongJae Park <sj@kernel.org> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Lorenzo Stoakes <ljs@kernel.org> > Cc: Lance Yang <lance.yang@linux.dev> > > --- > v4: > * refine subject and commit log based on Lorenzo's suggestion > * put pmd device-private entry handling in its own if branch, > suggested by Lorenzo > > v3: > * remove cleanup part, only fix the issue for device-private entry > * refine user effect description based on Lorenzo's suggestion > > v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u > * specify the possible error case of current code and user visible effect > * besides fix, cleanup the pmd entry handling based on David's suggestion > > v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/ > --- > mm/page_vma_mapped.c | 20 +++++++++++++++----- > 1 file changed, 15 insertions(+), 5 deletions(-) > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > index 2ccbabfb2cc1..17dff8aab9f9 100644 > --- a/mm/page_vma_mapped.c > +++ b/mm/page_vma_mapped.c > @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > /* THP pmd was split under us: handle on pte level */ > spin_unlock(pvmw->ptl); > pvmw->ptl = NULL; > - } else if (!pmd_present(pmde)) { > - const softleaf_t entry = softleaf_from_pmd(pmde); > + } else if (pmd_is_device_private_entry(pmde)) { > + softleaf_t entry; > + > + pvmw->ptl = pmd_lock(mm, pvmw->pmd); > + pmde = *pvmw->pmd; > + entry = softleaf_from_pmd(pmde); > > - if (softleaf_is_device_private(entry)) { > - pvmw->ptl = pmd_lock(mm, pvmw->pmd); > + if (likely(softleaf_is_device_private(entry))) { > + if (pvmw->flags & PVMW_MIGRATION) > + return not_found(pvmw); > + if (!check_pmd(softleaf_to_pfn(entry), pvmw)) > + return not_found(pvmw); > return true; > } > - > + /* device-private pmd was split under us: handle on pte level */ > + spin_unlock(pvmw->ptl); > + pvmw->ptl = NULL; > + } else if (!pmd_present(pmde)) { > if ((pvmw->flags & PVMW_SYNC) && > thp_vma_suitable_order(vma, pvmw->address, > PMD_ORDER) && I looked at comments from Lance on "device-private PMD <-> PMD migration" and had the same comment as David Balbir ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-25 11:12 ` Balbir Singh @ 2026-06-26 0:44 ` Wei Yang 2026-06-26 0:58 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Wei Yang @ 2026-06-26 0:44 UTC (permalink / raw) To: Balbir Singh Cc: Wei Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm, linux-kernel, stable, Lance Yang On Thu, Jun 25, 2026 at 09:12:23PM +1000, Balbir Singh wrote: >On 6/24/26 16:53, Wei Yang wrote: >> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support >> device-private entries") introduced the concept of device-private >> PMD entries, but did not correctly update the rmap walk code to >> account for them. >> >> As a result, when page_vma_mapped_walk() encounters device-private >> PMD entries, it takes no action other than to acquire the PMD lock >> and exit. >> >> However this is highly problematic for two reasons - firstly, >> device private entries possess a PFN so check_pmd() needs to be >> called to ensure an overlapping PFN range. >> >> Secondly, and more importantly, if PVMW_MIGRATION is set the >> caller assumes the returned entry is a migration entry, resulting >> in memory corruption when the caller tries to interpret the device >> private entry as such. >> >> In addition, commit 146287290023 ("mm/huge_memory: implement >> device-private THP splitting") allowed device private PMDs to be >> split like THP mappings, but again did not update this code path. >> >> As a result, we might race a PMD split prior to acquiring the PMD >> lock. >> >> This patch addresses all of these issues by invoking check_pmd(), >> ensuring PMVW_MIGRATION is not set and checks whether a split raced >> us we do for PMD THP and migration entries. > >Should be PVMW_MIGRATION and "us we do" -> "as we do" > Hi, Balbir Sorry for missing your comment. Hmm... looks you are right. Andrew, Would you mind handling it or prefer a v2? -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-26 0:44 ` Wei Yang @ 2026-06-26 0:58 ` Andrew Morton 0 siblings, 0 replies; 12+ messages in thread From: Andrew Morton @ 2026-06-26 0:58 UTC (permalink / raw) To: Wei Yang Cc: Balbir Singh, david, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm, linux-kernel, stable, Lance Yang On Fri, 26 Jun 2026 00:44:16 +0000 Wei Yang <richard.weiyang@gmail.com> wrote: > >Should be PVMW_MIGRATION and "us we do" -> "as we do" > > > > Hi, Balbir > > Sorry for missing your comment. > > Hmm... looks you are right. > > Andrew, > > Would you mind handling it or prefer a v2? I have made those edits. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling 2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang 2026-06-24 8:57 ` Lance Yang 2026-06-25 11:12 ` Balbir Singh @ 2026-06-25 19:39 ` Zi Yan 2 siblings, 0 replies; 12+ messages in thread From: Zi Yan @ 2026-06-25 19:39 UTC (permalink / raw) To: Wei Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, sj, balbirs Cc: linux-mm, linux-kernel, stable, Lance Yang On Wed Jun 24, 2026 at 2:53 AM EDT, Wei Yang wrote: > Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support > device-private entries") introduced the concept of device-private > PMD entries, but did not correctly update the rmap walk code to > account for them. > > As a result, when page_vma_mapped_walk() encounters device-private > PMD entries, it takes no action other than to acquire the PMD lock > and exit. > > However this is highly problematic for two reasons - firstly, > device private entries possess a PFN so check_pmd() needs to be > called to ensure an overlapping PFN range. > > Secondly, and more importantly, if PVMW_MIGRATION is set the > caller assumes the returned entry is a migration entry, resulting > in memory corruption when the caller tries to interpret the device > private entry as such. > > In addition, commit 146287290023 ("mm/huge_memory: implement > device-private THP splitting") allowed device private PMDs to be > split like THP mappings, but again did not update this code path. > > As a result, we might race a PMD split prior to acquiring the PMD > lock. > > This patch addresses all of these issues by invoking check_pmd(), > ensuring PMVW_MIGRATION is not set and checks whether a split raced > us we do for PMD THP and migration entries. > > Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") > Cc: <stable@vger.kernel.org> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > Suggested-by: David Hildenbrand <david@kernel.org> > Cc: David Hildenbrand <david@kernel.org> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: SeongJae Park <sj@kernel.org> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Lorenzo Stoakes <ljs@kernel.org> > Cc: Lance Yang <lance.yang@linux.dev> > > --- > v4: > * refine subject and commit log based on Lorenzo's suggestion > * put pmd device-private entry handling in its own if branch, > suggested by Lorenzo > > v3: > * remove cleanup part, only fix the issue for device-private entry > * refine user effect description based on Lorenzo's suggestion > > v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u > * specify the possible error case of current code and user visible effect > * besides fix, cleanup the pmd entry handling based on David's suggestion > > v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/ > --- > mm/page_vma_mapped.c | 20 +++++++++++++++----- > 1 file changed, 15 insertions(+), 5 deletions(-) > LGTM. The discussion from the patch history is very valuable. Thanks. Acked-by: Zi Yan <ziy@nvidia.com> -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-06-26 0:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang 2026-06-24 8:57 ` Lance Yang 2026-06-25 9:57 ` Wei Yang 2026-06-25 10:37 ` David Hildenbrand (Arm) 2026-06-25 11:25 ` Lance Yang 2026-06-25 11:42 ` Lance Yang 2026-06-25 21:07 ` Andrew Morton 2026-06-25 13:12 ` Lorenzo Stoakes 2026-06-25 11:12 ` Balbir Singh 2026-06-26 0:44 ` Wei Yang 2026-06-26 0:58 ` Andrew Morton 2026-06-25 19:39 ` Zi Yan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox