* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd [not found] <20260616063436.20455-1-richard.weiyang@gmail.com> @ 2026-06-16 12:30 ` Lance Yang 2026-06-16 23:50 ` Wei Yang [not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org> 2026-06-19 10:44 ` Lorenzo Stoakes 1 sibling, 2 replies; 16+ messages in thread From: Lance Yang @ 2026-06-16 12:30 UTC (permalink / raw) To: richard.weiyang Cc: akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable, Lance Yang On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: [...] >diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >index 2ccbabfb2cc1..21635fab209c 100644 >--- a/mm/page_vma_mapped.c >+++ b/mm/page_vma_mapped.c >@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > */ > pmde = pmdp_get_lockless(pvmw->pmd); > >- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >- pmde = *pvmw->pmd; >- if (!pmd_present(pmde)) { >- softleaf_t entry; >- >- if (!thp_migration_supported() || >- !(pvmw->flags & PVMW_MIGRATION)) >- return not_found(pvmw); >- entry = softleaf_from_pmd(pmde); >- >- if (!softleaf_is_migration(entry) || >- !check_pmd(softleaf_to_pfn(entry), pvmw)) >- return not_found(pvmw); >- return true; >- } >- if (likely(pmd_trans_huge(pmde))) { >- if (pvmw->flags & PVMW_MIGRATION) >- return not_found(pvmw); >- if (!check_pmd(pmd_pfn(pmde), pvmw)) >- return not_found(pvmw); >- return true; >- } >- /* THP pmd was split under us: handle on pte level */ >- spin_unlock(pvmw->ptl); >- pvmw->ptl = NULL; >- } else if (!pmd_present(pmde)) { >- const softleaf_t entry = softleaf_from_pmd(pmde); >- >- if (softleaf_is_device_private(entry)) { >- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >- return true; >- } >+ if (pmd_present(pmde)) { >+ if (!pmd_leaf(pmde)) >+ goto pte_table; >+ if (pvmw->flags & PVMW_MIGRATION) >+ return not_found(pvmw); >+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >+ return not_found(pvmw); >+ } else if (pmd_is_migration_entry(pmde)) { >+ softleaf_t entry = softleaf_from_pmd(pmde); >+ >+ if (!(pvmw->flags & PVMW_MIGRATION)) >+ return not_found(pvmw); Looked at history a bit, and I wonder if this changed something old here ... Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD migration handling took PTL before doing PVMW_MIGRATION/PFN checks, including not_found() cases. So lockless PMD read was just a filter ... With this fix, true case gets final pmd_same() check, but this not_found() case happens before taking PTL. So a !PVMW_MIGRATION walker could race with someone, e.g. remove_migration_pmd(): we make the not_found() decision from old PMD value that still says "migration", while real *pvmw->pmd may already be present again. We return without ever taking PTL :) Not sure about practical fallout, but should these PMD-level not_found() cases also take PTL and restart if PMD changed? >+ if (!check_pmd(softleaf_to_pfn(entry), pvmw)) >+ return not_found(pvmw); >+ } else if (pmd_is_device_private_entry(pmde)) { >+ softleaf_t entry = softleaf_from_pmd(pmde); > >+ if (pvmw->flags & PVMW_MIGRATION) >+ return not_found(pvmw); >+ if (!check_pmd(softleaf_to_pfn(entry), pvmw)) >+ return not_found(pvmw); >+ } else { > if ((pvmw->flags & PVMW_SYNC) && > thp_vma_suitable_order(vma, pvmw->address, > PMD_ORDER) && >@@ -286,6 +274,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > step_forward(pvmw, PMD_SIZE); > continue; > } >+ >+ /* Double-check under PTL that the PMD didn't change. */ >+ pvmw->ptl = pmd_lock(mm, pvmw->pmd); >+ if (pmd_same(pmde, pmdp_get(pvmw->pmd))) >+ return true; >+ spin_unlock(pvmw->ptl); >+ pvmw->ptl = NULL; >+ goto restart; >+pte_table: > if (!map_pte(pvmw, &pmde, &ptl)) { > if (!pvmw->pte) > goto restart; >-- >2.34.1 > Cheers, Lance ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-16 12:30 ` [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Lance Yang @ 2026-06-16 23:50 ` Wei Yang 2026-06-17 2:32 ` Lance Yang [not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org> 1 sibling, 1 reply; 16+ messages in thread From: Wei Yang @ 2026-06-16 23:50 UTC (permalink / raw) To: Lance Yang Cc: richard.weiyang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: > >On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >[...] >>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>index 2ccbabfb2cc1..21635fab209c 100644 >>--- a/mm/page_vma_mapped.c >>+++ b/mm/page_vma_mapped.c >>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >> */ >> pmde = pmdp_get_lockless(pvmw->pmd); >> >>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>- pmde = *pvmw->pmd; >>- if (!pmd_present(pmde)) { >>- softleaf_t entry; >>- >>- if (!thp_migration_supported() || >>- !(pvmw->flags & PVMW_MIGRATION)) >>- return not_found(pvmw); >>- entry = softleaf_from_pmd(pmde); >>- >>- if (!softleaf_is_migration(entry) || >>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>- return not_found(pvmw); >>- return true; >>- } >>- if (likely(pmd_trans_huge(pmde))) { >>- if (pvmw->flags & PVMW_MIGRATION) >>- return not_found(pvmw); >>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>- return not_found(pvmw); >>- return true; >>- } >>- /* THP pmd was split under us: handle on pte level */ >>- spin_unlock(pvmw->ptl); >>- pvmw->ptl = NULL; >>- } else if (!pmd_present(pmde)) { >>- const softleaf_t entry = softleaf_from_pmd(pmde); >>- >>- if (softleaf_is_device_private(entry)) { >>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>- return true; >>- } >>+ if (pmd_present(pmde)) { >>+ if (!pmd_leaf(pmde)) >>+ goto pte_table; >>+ if (pvmw->flags & PVMW_MIGRATION) >>+ return not_found(pvmw); >>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>+ return not_found(pvmw); >>+ } else if (pmd_is_migration_entry(pmde)) { >>+ softleaf_t entry = softleaf_from_pmd(pmde); >>+ >>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>+ return not_found(pvmw); > >Looked at history a bit, and I wonder if this changed something old >here ... > >Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >including not_found() cases. So lockless PMD read was just a filter ... > >With this fix, true case gets final pmd_same() check, but this >not_found() case happens before taking PTL. > >So a !PVMW_MIGRATION walker could race with someone, e.g. >remove_migration_pmd(): we make the not_found() decision from old PMD >value that still says "migration", while real *pvmw->pmd may already be >present again. We return without ever taking PTL :) > Hi, Lance Thanks for take a look. I am trying to understand the scenario you mentioned. Let's say A migrate a pmd and B want to unmap the pmd. A B try to migrate a pmd pmd is set to migration entry unmap the pmd ... managed to finish migration ...still see migration entry, so skipped and unmap fail Would this be a timing case? Even B grab the PTL, it still could see migration entry if B visit pmd before A finish migration. Maybe I miss something, look forward your insight. >Not sure about practical fallout, but should these PMD-level not_found() >cases also take PTL and restart if PMD changed? > -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-16 23:50 ` Wei Yang @ 2026-06-17 2:32 ` Lance Yang 2026-06-17 8:18 ` Wei Yang 0 siblings, 1 reply; 16+ messages in thread From: Lance Yang @ 2026-06-17 2:32 UTC (permalink / raw) To: richard.weiyang Cc: lance.yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >> >>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>[...] >>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>index 2ccbabfb2cc1..21635fab209c 100644 >>>--- a/mm/page_vma_mapped.c >>>+++ b/mm/page_vma_mapped.c >>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>> */ >>> pmde = pmdp_get_lockless(pvmw->pmd); >>> >>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>- pmde = *pvmw->pmd; >>>- if (!pmd_present(pmde)) { >>>- softleaf_t entry; >>>- >>>- if (!thp_migration_supported() || >>>- !(pvmw->flags & PVMW_MIGRATION)) >>>- return not_found(pvmw); >>>- entry = softleaf_from_pmd(pmde); >>>- >>>- if (!softleaf_is_migration(entry) || >>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>- return not_found(pvmw); >>>- return true; >>>- } >>>- if (likely(pmd_trans_huge(pmde))) { >>>- if (pvmw->flags & PVMW_MIGRATION) >>>- return not_found(pvmw); >>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>- return not_found(pvmw); >>>- return true; >>>- } >>>- /* THP pmd was split under us: handle on pte level */ >>>- spin_unlock(pvmw->ptl); >>>- pvmw->ptl = NULL; >>>- } else if (!pmd_present(pmde)) { >>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>- >>>- if (softleaf_is_device_private(entry)) { >>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>- return true; >>>- } >>>+ if (pmd_present(pmde)) { >>>+ if (!pmd_leaf(pmde)) >>>+ goto pte_table; >>>+ if (pvmw->flags & PVMW_MIGRATION) >>>+ return not_found(pvmw); >>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>+ return not_found(pvmw); >>>+ } else if (pmd_is_migration_entry(pmde)) { >>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>+ >>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>+ return not_found(pvmw); >> >>Looked at history a bit, and I wonder if this changed something old >>here ... >> >>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>including not_found() cases. So lockless PMD read was just a filter ... >> >>With this fix, true case gets final pmd_same() check, but this >>not_found() case happens before taking PTL. >> >>So a !PVMW_MIGRATION walker could race with someone, e.g. >>remove_migration_pmd(): we make the not_found() decision from old PMD >>value that still says "migration", while real *pvmw->pmd may already be >>present again. We return without ever taking PTL :) >> > >Hi, Lance > >Thanks for take a look. > >I am trying to understand the scenario you mentioned. Let's say A migrate a >pmd and B want to unmap the pmd. > > A B > > try to migrate a pmd > pmd is set to migration entry > unmap the pmd ... > managed to finish migration > ...still see migration entry, > so skipped and unmap fail > >Would this be a timing case? Even B grab the PTL, it still could see migration >entry if B visit pmd before A finish migration. > >Maybe I miss something, look forward your insight. Right, seeing migration entry while migration is still ongoing is fine. What I meant was this ordering: CPU 0: pmde = pmdp_get_lockless(...); /* migration */ CPU 1: remove_migration_pmd() restores PMD to present CPU 0: returns not_found() from old pmde, without ever taking PTL and rechecking *pvmw->pmd So issue is not seeing migration entry itself, but making final not_found() decision from stale lockless PMD value ... Before this patch, PMD migration case took PTL before making that decision ... >>Not sure about practical fallout, but should these PMD-level not_found() >>cases also take PTL and restart if PMD changed? >> > >-- >Wei Yang >Help you, Help me > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-17 2:32 ` Lance Yang @ 2026-06-17 8:18 ` Wei Yang 2026-06-19 2:30 ` Wei Yang 0 siblings, 1 reply; 16+ messages in thread From: Wei Yang @ 2026-06-17 8:18 UTC (permalink / raw) To: Lance Yang Cc: richard.weiyang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable On Wed, Jun 17, 2026 at 10:32:11AM +0800, Lance Yang wrote: > >On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >>On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >>> >>>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>>[...] >>>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>>index 2ccbabfb2cc1..21635fab209c 100644 >>>>--- a/mm/page_vma_mapped.c >>>>+++ b/mm/page_vma_mapped.c >>>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>>> */ >>>> pmde = pmdp_get_lockless(pvmw->pmd); >>>> >>>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>- pmde = *pvmw->pmd; >>>>- if (!pmd_present(pmde)) { >>>>- softleaf_t entry; >>>>- >>>>- if (!thp_migration_supported() || >>>>- !(pvmw->flags & PVMW_MIGRATION)) >>>>- return not_found(pvmw); >>>>- entry = softleaf_from_pmd(pmde); >>>>- >>>>- if (!softleaf_is_migration(entry) || >>>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>>- return not_found(pvmw); >>>>- return true; >>>>- } >>>>- if (likely(pmd_trans_huge(pmde))) { >>>>- if (pvmw->flags & PVMW_MIGRATION) >>>>- return not_found(pvmw); >>>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>- return not_found(pvmw); >>>>- return true; >>>>- } >>>>- /* THP pmd was split under us: handle on pte level */ >>>>- spin_unlock(pvmw->ptl); >>>>- pvmw->ptl = NULL; >>>>- } else if (!pmd_present(pmde)) { >>>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>>- >>>>- if (softleaf_is_device_private(entry)) { >>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>- return true; >>>>- } >>>>+ if (pmd_present(pmde)) { >>>>+ if (!pmd_leaf(pmde)) >>>>+ goto pte_table; >>>>+ if (pvmw->flags & PVMW_MIGRATION) >>>>+ return not_found(pvmw); >>>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>+ return not_found(pvmw); >>>>+ } else if (pmd_is_migration_entry(pmde)) { >>>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>>+ >>>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>>+ return not_found(pvmw); >>> >>>Looked at history a bit, and I wonder if this changed something old >>>here ... >>> >>>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>>including not_found() cases. So lockless PMD read was just a filter ... >>> >>>With this fix, true case gets final pmd_same() check, but this >>>not_found() case happens before taking PTL. >>> >>>So a !PVMW_MIGRATION walker could race with someone, e.g. >>>remove_migration_pmd(): we make the not_found() decision from old PMD >>>value that still says "migration", while real *pvmw->pmd may already be >>>present again. We return without ever taking PTL :) >>> >> >>Hi, Lance >> >>Thanks for take a look. >> >>I am trying to understand the scenario you mentioned. Let's say A migrate a >>pmd and B want to unmap the pmd. >> >> A B >> >> try to migrate a pmd >> pmd is set to migration entry >> unmap the pmd ... >> managed to finish migration >> ...still see migration entry, >> so skipped and unmap fail >> >>Would this be a timing case? Even B grab the PTL, it still could see migration >>entry if B visit pmd before A finish migration. >> >>Maybe I miss something, look forward your insight. > >Right, seeing migration entry while migration is still ongoing is fine. > >What I meant was this ordering: > > CPU 0: pmde = pmdp_get_lockless(...); /* migration */ > CPU 1: remove_migration_pmd() restores PMD to present > CPU 0: returns not_found() from old pmde, without ever taking PTL and > rechecking *pvmw->pmd > >So issue is not seeing migration entry itself, but making final >not_found() decision from stale lockless PMD value ... > >Before this patch, PMD migration case took PTL before making that >decision ... > Yes, this patch changes the decision making condition for pmd entry. Thanks for pointing out. Hmm... I took another look into current pte handling and find for pte entry, we did two phase check: * map_pte() without ptl * check_pte() with ptl While check_pte() do extra pfn range check, map_pte() doesn't. This means for pte entry, we may face the same situation as you describe: make the decision before grab PTL. Till now, it looks reasonable. But one thing jumped at me, PVMW_SYNC. When this flag is specified, all check is done under PTL. But now for pmd entry, we don't have a chance to do so. And as the comment says in try_to_migrate_one() /* * When racing against e.g. zap_pte_range() on another cpu, * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), * try_to_migrate() may return before folio_mapped() has become false, * if page table locking is skipped: use TTU_SYNC to wait for that. */ I tracked down to commit a98a2f0c8ce1 ('mm/rmap: split migration into its own function'), but not getting more detail on reasoning. Not fully understand it yet, but it seems there is some race between migration and unmap which is protected by PTL? Will look into this to get more detail. >>>Not sure about practical fallout, but should these PMD-level not_found() >>>cases also take PTL and restart if PMD changed? >>> >> >>-- >>Wei Yang >>Help you, Help me >> -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-17 8:18 ` Wei Yang @ 2026-06-19 2:30 ` Wei Yang 2026-06-19 12:19 ` Lance Yang 0 siblings, 1 reply; 16+ messages in thread From: Wei Yang @ 2026-06-19 2:30 UTC (permalink / raw) To: Wei Yang Cc: Lance Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable On Wed, Jun 17, 2026 at 08:18:15AM +0000, Wei Yang wrote: >On Wed, Jun 17, 2026 at 10:32:11AM +0800, Lance Yang wrote: >> >>On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >>>On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >>>> >>>>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>>>[...] >>>>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>>>index 2ccbabfb2cc1..21635fab209c 100644 >>>>>--- a/mm/page_vma_mapped.c >>>>>+++ b/mm/page_vma_mapped.c >>>>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>>>> */ >>>>> pmde = pmdp_get_lockless(pvmw->pmd); >>>>> >>>>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>- pmde = *pvmw->pmd; >>>>>- if (!pmd_present(pmde)) { >>>>>- softleaf_t entry; >>>>>- >>>>>- if (!thp_migration_supported() || >>>>>- !(pvmw->flags & PVMW_MIGRATION)) >>>>>- return not_found(pvmw); >>>>>- entry = softleaf_from_pmd(pmde); >>>>>- >>>>>- if (!softleaf_is_migration(entry) || >>>>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>>>- return not_found(pvmw); >>>>>- return true; >>>>>- } >>>>>- if (likely(pmd_trans_huge(pmde))) { >>>>>- if (pvmw->flags & PVMW_MIGRATION) >>>>>- return not_found(pvmw); >>>>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>- return not_found(pvmw); >>>>>- return true; >>>>>- } >>>>>- /* THP pmd was split under us: handle on pte level */ >>>>>- spin_unlock(pvmw->ptl); >>>>>- pvmw->ptl = NULL; >>>>>- } else if (!pmd_present(pmde)) { >>>>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>>>- >>>>>- if (softleaf_is_device_private(entry)) { >>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>- return true; >>>>>- } >>>>>+ if (pmd_present(pmde)) { >>>>>+ if (!pmd_leaf(pmde)) >>>>>+ goto pte_table; >>>>>+ if (pvmw->flags & PVMW_MIGRATION) >>>>>+ return not_found(pvmw); >>>>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>+ return not_found(pvmw); >>>>>+ } else if (pmd_is_migration_entry(pmde)) { >>>>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>>>+ >>>>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>>>+ return not_found(pvmw); >>>> >>>>Looked at history a bit, and I wonder if this changed something old >>>>here ... >>>> >>>>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>>>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>>>including not_found() cases. So lockless PMD read was just a filter ... >>>> >>>>With this fix, true case gets final pmd_same() check, but this >>>>not_found() case happens before taking PTL. >>>> >>>>So a !PVMW_MIGRATION walker could race with someone, e.g. >>>>remove_migration_pmd(): we make the not_found() decision from old PMD >>>>value that still says "migration", while real *pvmw->pmd may already be >>>>present again. We return without ever taking PTL :) >>>> >>> >>>Hi, Lance >>> >>>Thanks for take a look. >>> >>>I am trying to understand the scenario you mentioned. Let's say A migrate a >>>pmd and B want to unmap the pmd. >>> >>> A B >>> >>> try to migrate a pmd >>> pmd is set to migration entry >>> unmap the pmd ... >>> managed to finish migration >>> ...still see migration entry, >>> so skipped and unmap fail >>> >>>Would this be a timing case? Even B grab the PTL, it still could see migration >>>entry if B visit pmd before A finish migration. >>> >>>Maybe I miss something, look forward your insight. >> >>Right, seeing migration entry while migration is still ongoing is fine. >> >>What I meant was this ordering: >> >> CPU 0: pmde = pmdp_get_lockless(...); /* migration */ >> CPU 1: remove_migration_pmd() restores PMD to present >> CPU 0: returns not_found() from old pmde, without ever taking PTL and >> rechecking *pvmw->pmd >> >>So issue is not seeing migration entry itself, but making final >>not_found() decision from stale lockless PMD value ... >> >>Before this patch, PMD migration case took PTL before making that >>decision ... >> > >Yes, this patch changes the decision making condition for pmd entry. Thanks >for pointing out. > >Hmm... I took another look into current pte handling and find for pte entry, >we did two phase check: > > * map_pte() without ptl > * check_pte() with ptl > >While check_pte() do extra pfn range check, map_pte() doesn't. > >This means for pte entry, we may face the same situation as you describe: >make the decision before grab PTL. Till now, it looks reasonable. > >But one thing jumped at me, PVMW_SYNC. When this flag is specified, all check >is done under PTL. But now for pmd entry, we don't have a chance to do so. > >And as the comment says in try_to_migrate_one() > > /* > * When racing against e.g. zap_pte_range() on another cpu, > * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), > * try_to_migrate() may return before folio_mapped() has become false, > * if page table locking is skipped: use TTU_SYNC to wait for that. > */ > >I tracked down to commit a98a2f0c8ce1 ('mm/rmap: split migration into its own >function'), but not getting more detail on reasoning. Not fully understand it >yet, but it seems there is some race between migration and unmap which is >protected by PTL? > >Will look into this to get more detail. > After going through the history, I found this: commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 Author: Hugh Dickins <hughd@google.com> Date: Tue Jun 15 18:23:53 2021 -0700 mm/thp: try_to_unmap() use TTU_SYNC for safe splitting This one fix the race mentioned above: we expect mapcount is 0, but is not. IIUC, if we apply the change in this patch, the affected case is pmd_is_migration_entry(). In case someone else has cleared it but not update mapcount yet, try_to_migrate() would return before folio_mapped() is false. Thanks Lance for raise the question. If above analysis is true, I haven't got a neat way to take this into consideration. BTW, for a fix, I am thinking to keep it simple and direct. So how about leave the refactor as a followup cleanup? -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 2:30 ` Wei Yang @ 2026-06-19 12:19 ` Lance Yang 2026-06-20 2:02 ` Wei Yang 0 siblings, 1 reply; 16+ messages in thread From: Lance Yang @ 2026-06-19 12:19 UTC (permalink / raw) To: richard.weiyang Cc: lance.yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 02:30:25AM +0000, Wei Yang wrote: >On Wed, Jun 17, 2026 at 08:18:15AM +0000, Wei Yang wrote: >>On Wed, Jun 17, 2026 at 10:32:11AM +0800, Lance Yang wrote: >>> >>>On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >>>>On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >>>>> >>>>>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>>>>[...] >>>>>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>>>>index 2ccbabfb2cc1..21635fab209c 100644 >>>>>>--- a/mm/page_vma_mapped.c >>>>>>+++ b/mm/page_vma_mapped.c >>>>>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>>>>> */ >>>>>> pmde = pmdp_get_lockless(pvmw->pmd); >>>>>> >>>>>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>>- pmde = *pvmw->pmd; >>>>>>- if (!pmd_present(pmde)) { >>>>>>- softleaf_t entry; >>>>>>- >>>>>>- if (!thp_migration_supported() || >>>>>>- !(pvmw->flags & PVMW_MIGRATION)) >>>>>>- return not_found(pvmw); >>>>>>- entry = softleaf_from_pmd(pmde); >>>>>>- >>>>>>- if (!softleaf_is_migration(entry) || >>>>>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>>>>- return not_found(pvmw); >>>>>>- return true; >>>>>>- } >>>>>>- if (likely(pmd_trans_huge(pmde))) { >>>>>>- if (pvmw->flags & PVMW_MIGRATION) >>>>>>- return not_found(pvmw); >>>>>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>>- return not_found(pvmw); >>>>>>- return true; >>>>>>- } >>>>>>- /* THP pmd was split under us: handle on pte level */ >>>>>>- spin_unlock(pvmw->ptl); >>>>>>- pvmw->ptl = NULL; >>>>>>- } else if (!pmd_present(pmde)) { >>>>>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>>>>- >>>>>>- if (softleaf_is_device_private(entry)) { >>>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>>- return true; >>>>>>- } >>>>>>+ if (pmd_present(pmde)) { >>>>>>+ if (!pmd_leaf(pmde)) >>>>>>+ goto pte_table; >>>>>>+ if (pvmw->flags & PVMW_MIGRATION) >>>>>>+ return not_found(pvmw); >>>>>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>>+ return not_found(pvmw); >>>>>>+ } else if (pmd_is_migration_entry(pmde)) { >>>>>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>>>>+ >>>>>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>>>>+ return not_found(pvmw); >>>>> >>>>>Looked at history a bit, and I wonder if this changed something old >>>>>here ... >>>>> >>>>>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>>>>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>>>>including not_found() cases. So lockless PMD read was just a filter ... >>>>> >>>>>With this fix, true case gets final pmd_same() check, but this >>>>>not_found() case happens before taking PTL. >>>>> >>>>>So a !PVMW_MIGRATION walker could race with someone, e.g. >>>>>remove_migration_pmd(): we make the not_found() decision from old PMD >>>>>value that still says "migration", while real *pvmw->pmd may already be >>>>>present again. We return without ever taking PTL :) >>>>> >>>> >>>>Hi, Lance >>>> >>>>Thanks for take a look. >>>> >>>>I am trying to understand the scenario you mentioned. Let's say A migrate a >>>>pmd and B want to unmap the pmd. >>>> >>>> A B >>>> >>>> try to migrate a pmd >>>> pmd is set to migration entry >>>> unmap the pmd ... >>>> managed to finish migration >>>> ...still see migration entry, >>>> so skipped and unmap fail >>>> >>>>Would this be a timing case? Even B grab the PTL, it still could see migration >>>>entry if B visit pmd before A finish migration. >>>> >>>>Maybe I miss something, look forward your insight. >>> >>>Right, seeing migration entry while migration is still ongoing is fine. >>> >>>What I meant was this ordering: >>> >>> CPU 0: pmde = pmdp_get_lockless(...); /* migration */ >>> CPU 1: remove_migration_pmd() restores PMD to present >>> CPU 0: returns not_found() from old pmde, without ever taking PTL and >>> rechecking *pvmw->pmd >>> >>>So issue is not seeing migration entry itself, but making final >>>not_found() decision from stale lockless PMD value ... >>> >>>Before this patch, PMD migration case took PTL before making that >>>decision ... >>> >> >>Yes, this patch changes the decision making condition for pmd entry. Thanks >>for pointing out. >> >>Hmm... I took another look into current pte handling and find for pte entry, >>we did two phase check: >> >> * map_pte() without ptl >> * check_pte() with ptl >> >>While check_pte() do extra pfn range check, map_pte() doesn't. >> >>This means for pte entry, we may face the same situation as you describe: >>make the decision before grab PTL. Till now, it looks reasonable. >> >>But one thing jumped at me, PVMW_SYNC. When this flag is specified, all check >>is done under PTL. But now for pmd entry, we don't have a chance to do so. >> >>And as the comment says in try_to_migrate_one() >> >> /* >> * When racing against e.g. zap_pte_range() on another cpu, >> * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), >> * try_to_migrate() may return before folio_mapped() has become false, >> * if page table locking is skipped: use TTU_SYNC to wait for that. >> */ >> >>I tracked down to commit a98a2f0c8ce1 ('mm/rmap: split migration into its own >>function'), but not getting more detail on reasoning. Not fully understand it >>yet, but it seems there is some race between migration and unmap which is >>protected by PTL? >> >>Will look into this to get more detail. >> > >After going through the history, I found this: > > commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 > Author: Hugh Dickins <hughd@google.com> > Date: Tue Jun 15 18:23:53 2021 -0700 > > mm/thp: try_to_unmap() use TTU_SYNC for safe splitting > >This one fix the race mentioned above: we expect mapcount is 0, but is not. Cool, thanks! I do want to spend more time on this refactor. It is touching some subtle page_vma_mapped_walk() rules, so I don't want to skim and guess ... One case I can pin down now is device-private: the PTE side gives us a clear rule to compare against :) On the PTE side: 1) PVMW_SYNC set, PVMW_MIGRATION set map_pte() uses pte_offset_map_lock(), so it takes PTL first. check_pte() then runs under PTL. Since PVMW_MIGRATION is set, check_pte() requires a migration entry, so device-private is rejected. 2) PVMW_SYNC set, PVMW_MIGRATION clear map_pte() takes PTL first. check_pte() then runs under PTL. Since PVMW_MIGRATION is clear, device-private can be a normal mapping, but check_pte() still checks entry type and PFN range. 3) PVMW_SYNC clear, PVMW_MIGRATION set map_pte() first does a lockless read. A non-present, non-none PTE can still be a candidate, so map_pte() takes PTL. check_pte() then rejects device-private, because PVMW_MIGRATION requires a migration entry. 4) PVMW_SYNC clear, PVMW_MIGRATION clear map_pte() first does a lockless read. A device-private PTE can be a normal mapping candidate, so map_pte() takes PTL. check_pte() then checks entry type and PFN range under PTL. On the PMD device-private side, before this patch, all four cases go through the same code once the lockless PMD read sees a device-private entry: - lockless read PMD into pmde - pmde is non-present - decode pmde as a softleaf entry - entry is device-private - take pmd_lock() - return true So compared with the PTE side: A) PVMW_SYNC set, PVMW_MIGRATION set PTE rejects device-private under PTL. PMD returns true. This does not match. The PMD code misses the PVMW_MIGRATION direction check, and does not reread/revalidate PMD under pmd_lock(). B) PVMW_SYNC set, PVMW_MIGRATION clear PTE can accept device-private, but only after locked check_pte() validation. PMD also returns true. The direction is OK, but the final check is missing. PMD returns true from the lockless PMD classification, without PMD revalidation and without check_pmd() PFN-range check. C) PVMW_SYNC clear, PVMW_MIGRATION set PTE can reach locked check_pte() from the lockless candidate, but check_pte() rejects device-private. PMD returns true. Same mismatch as case A: missing PVMW_MIGRATION direction check, and no locked PMD revalidation. D) PVMW_SYNC clear, PVMW_MIGRATION clear PTE can accept device-private after locked validation. PMD also returns true. Direction is OK here as well, but the PMD code still has no final locked check matching check_pte(): no PMD reread/revalidation, and no check_pmd() PFN-range check. > >IIUC, if we apply the change in this patch, the affected case is >pmd_is_migration_entry(). In case someone else has cleared it but not update >mapcount yet, try_to_migrate() would return before folio_mapped() is false. > >Thanks Lance for raise the question. > >If above analysis is true, I haven't got a neat way to take this into >consideration. > >BTW, for a fix, I am thinking to keep it simple and direct. So how about leave >the refactor as a followup cleanup? So for a fix, let's line up the PTE and PMD rules first :D Cheers, Lance >-- >Wei Yang >Help you, Help me > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 12:19 ` Lance Yang @ 2026-06-20 2:02 ` Wei Yang 0 siblings, 0 replies; 16+ messages in thread From: Wei Yang @ 2026-06-20 2:02 UTC (permalink / raw) To: Lance Yang Cc: richard.weiyang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 08:19:09PM +0800, Lance Yang wrote: > >On Fri, Jun 19, 2026 at 02:30:25AM +0000, Wei Yang wrote: >>On Wed, Jun 17, 2026 at 08:18:15AM +0000, Wei Yang wrote: >>>On Wed, Jun 17, 2026 at 10:32:11AM +0800, Lance Yang wrote: >>>> >>>>On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >>>>>On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >>>>>> >>>>>>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>>>>>[...] >>>>>>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>>>>>index 2ccbabfb2cc1..21635fab209c 100644 >>>>>>>--- a/mm/page_vma_mapped.c >>>>>>>+++ b/mm/page_vma_mapped.c >>>>>>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>>>>>> */ >>>>>>> pmde = pmdp_get_lockless(pvmw->pmd); >>>>>>> >>>>>>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>>>- pmde = *pvmw->pmd; >>>>>>>- if (!pmd_present(pmde)) { >>>>>>>- softleaf_t entry; >>>>>>>- >>>>>>>- if (!thp_migration_supported() || >>>>>>>- !(pvmw->flags & PVMW_MIGRATION)) >>>>>>>- return not_found(pvmw); >>>>>>>- entry = softleaf_from_pmd(pmde); >>>>>>>- >>>>>>>- if (!softleaf_is_migration(entry) || >>>>>>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>>>>>- return not_found(pvmw); >>>>>>>- return true; >>>>>>>- } >>>>>>>- if (likely(pmd_trans_huge(pmde))) { >>>>>>>- if (pvmw->flags & PVMW_MIGRATION) >>>>>>>- return not_found(pvmw); >>>>>>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>>>- return not_found(pvmw); >>>>>>>- return true; >>>>>>>- } >>>>>>>- /* THP pmd was split under us: handle on pte level */ >>>>>>>- spin_unlock(pvmw->ptl); >>>>>>>- pvmw->ptl = NULL; >>>>>>>- } else if (!pmd_present(pmde)) { >>>>>>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>>>>>- >>>>>>>- if (softleaf_is_device_private(entry)) { >>>>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>>>- return true; >>>>>>>- } >>>>>>>+ if (pmd_present(pmde)) { >>>>>>>+ if (!pmd_leaf(pmde)) >>>>>>>+ goto pte_table; >>>>>>>+ if (pvmw->flags & PVMW_MIGRATION) >>>>>>>+ return not_found(pvmw); >>>>>>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>>>+ return not_found(pvmw); >>>>>>>+ } else if (pmd_is_migration_entry(pmde)) { >>>>>>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>>>>>+ >>>>>>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>>>>>+ return not_found(pvmw); >>>>>> >>>>>>Looked at history a bit, and I wonder if this changed something old >>>>>>here ... >>>>>> >>>>>>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>>>>>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>>>>>including not_found() cases. So lockless PMD read was just a filter ... >>>>>> >>>>>>With this fix, true case gets final pmd_same() check, but this >>>>>>not_found() case happens before taking PTL. >>>>>> >>>>>>So a !PVMW_MIGRATION walker could race with someone, e.g. >>>>>>remove_migration_pmd(): we make the not_found() decision from old PMD >>>>>>value that still says "migration", while real *pvmw->pmd may already be >>>>>>present again. We return without ever taking PTL :) >>>>>> >>>>> >>>>>Hi, Lance >>>>> >>>>>Thanks for take a look. >>>>> >>>>>I am trying to understand the scenario you mentioned. Let's say A migrate a >>>>>pmd and B want to unmap the pmd. >>>>> >>>>> A B >>>>> >>>>> try to migrate a pmd >>>>> pmd is set to migration entry >>>>> unmap the pmd ... >>>>> managed to finish migration >>>>> ...still see migration entry, >>>>> so skipped and unmap fail >>>>> >>>>>Would this be a timing case? Even B grab the PTL, it still could see migration >>>>>entry if B visit pmd before A finish migration. >>>>> >>>>>Maybe I miss something, look forward your insight. >>>> >>>>Right, seeing migration entry while migration is still ongoing is fine. >>>> >>>>What I meant was this ordering: >>>> >>>> CPU 0: pmde = pmdp_get_lockless(...); /* migration */ >>>> CPU 1: remove_migration_pmd() restores PMD to present >>>> CPU 0: returns not_found() from old pmde, without ever taking PTL and >>>> rechecking *pvmw->pmd >>>> >>>>So issue is not seeing migration entry itself, but making final >>>>not_found() decision from stale lockless PMD value ... >>>> >>>>Before this patch, PMD migration case took PTL before making that >>>>decision ... >>>> >>> >>>Yes, this patch changes the decision making condition for pmd entry. Thanks >>>for pointing out. >>> >>>Hmm... I took another look into current pte handling and find for pte entry, >>>we did two phase check: >>> >>> * map_pte() without ptl >>> * check_pte() with ptl >>> >>>While check_pte() do extra pfn range check, map_pte() doesn't. >>> >>>This means for pte entry, we may face the same situation as you describe: >>>make the decision before grab PTL. Till now, it looks reasonable. >>> >>>But one thing jumped at me, PVMW_SYNC. When this flag is specified, all check >>>is done under PTL. But now for pmd entry, we don't have a chance to do so. >>> >>>And as the comment says in try_to_migrate_one() >>> >>> /* >>> * When racing against e.g. zap_pte_range() on another cpu, >>> * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), >>> * try_to_migrate() may return before folio_mapped() has become false, >>> * if page table locking is skipped: use TTU_SYNC to wait for that. >>> */ >>> >>>I tracked down to commit a98a2f0c8ce1 ('mm/rmap: split migration into its own >>>function'), but not getting more detail on reasoning. Not fully understand it >>>yet, but it seems there is some race between migration and unmap which is >>>protected by PTL? >>> >>>Will look into this to get more detail. >>> >> >>After going through the history, I found this: >> >> commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 >> Author: Hugh Dickins <hughd@google.com> >> Date: Tue Jun 15 18:23:53 2021 -0700 >> >> mm/thp: try_to_unmap() use TTU_SYNC for safe splitting >> >>This one fix the race mentioned above: we expect mapcount is 0, but is not. > >Cool, thanks! > >I do want to spend more time on this refactor. It is touching some subtle >page_vma_mapped_walk() rules, so I don't want to skim and guess ... > >One case I can pin down now is device-private: the PTE side gives us a >clear rule to compare against :) > >On the PTE side: > >1) PVMW_SYNC set, PVMW_MIGRATION set > > map_pte() uses pte_offset_map_lock(), so it takes PTL first. > check_pte() then runs under PTL. Since PVMW_MIGRATION is set, > check_pte() requires a migration entry, so device-private is rejected. > >2) PVMW_SYNC set, PVMW_MIGRATION clear > > map_pte() takes PTL first. check_pte() then runs under PTL. > Since PVMW_MIGRATION is clear, device-private can be a normal mapping, > but check_pte() still checks entry type and PFN range. > >3) PVMW_SYNC clear, PVMW_MIGRATION set > > map_pte() first does a lockless read. A non-present, non-none PTE can > still be a candidate, so map_pte() takes PTL. check_pte() then rejects > device-private, because PVMW_MIGRATION requires a migration entry. > >4) PVMW_SYNC clear, PVMW_MIGRATION clear > > map_pte() first does a lockless read. A device-private PTE can be a > normal mapping candidate, so map_pte() takes PTL. check_pte() then > checks entry type and PFN range under PTL. > >On the PMD device-private side, before this patch, all four cases go >through the same code once the lockless PMD read sees a device-private >entry: > >- lockless read PMD into pmde >- pmde is non-present >- decode pmde as a softleaf entry >- entry is device-private >- take pmd_lock() >- return true > >So compared with the PTE side: > >A) PVMW_SYNC set, PVMW_MIGRATION set > > PTE rejects device-private under PTL. > > PMD returns true. > > This does not match. The PMD code misses the PVMW_MIGRATION direction > check, and does not reread/revalidate PMD under pmd_lock(). > >B) PVMW_SYNC set, PVMW_MIGRATION clear > > PTE can accept device-private, but only after locked check_pte() > validation. > > PMD also returns true. > > The direction is OK, but the final check is missing. PMD returns true > from the lockless PMD classification, without PMD revalidation and > without check_pmd() PFN-range check. > >C) PVMW_SYNC clear, PVMW_MIGRATION set > > PTE can reach locked check_pte() from the lockless candidate, but > check_pte() rejects device-private. > > PMD returns true. > > Same mismatch as case A: missing PVMW_MIGRATION direction check, and no > locked PMD revalidation. > >D) PVMW_SYNC clear, PVMW_MIGRATION clear > > PTE can accept device-private after locked validation. > > PMD also returns true. > > Direction is OK here as well, but the PMD code still has no final > locked check matching check_pte(): no PMD reread/revalidation, and no > check_pmd() PFN-range check. > Thanks for the detailed analysis. >> >>IIUC, if we apply the change in this patch, the affected case is >>pmd_is_migration_entry(). In case someone else has cleared it but not update >>mapcount yet, try_to_migrate() would return before folio_mapped() is false. >> >>Thanks Lance for raise the question. >> >>If above analysis is true, I haven't got a neat way to take this into >>consideration. >> >>BTW, for a fix, I am thinking to keep it simple and direct. So how about leave >>the refactor as a followup cleanup? > >So for a fix, let's line up the PTE and PMD rules first :D > Sure. Based on your above analysis, looks the change in v1 [1] is the right direction, IIUC. So I will send v3 based on this with comment adjust according to Lorenzo's comment. [1]: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/ >Cheers, Lance > >>-- >>Wei Yang >>Help you, Help me >> -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org>]
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd [not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org> @ 2026-06-17 2:11 ` Balbir Singh 2026-06-17 3:14 ` Lance Yang 0 siblings, 1 reply; 16+ messages in thread From: Balbir Singh @ 2026-06-17 2:11 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Lance Yang, richard.weiyang, akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm, lorenzo.stoakes, stable On Tue, Jun 16, 2026 at 03:07:53PM +0200, David Hildenbrand (Arm) wrote: > On 6/16/26 14:30, Lance Yang wrote: > > > > On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: > > [...] > >> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > >> index 2ccbabfb2cc1..21635fab209c 100644 > >> --- a/mm/page_vma_mapped.c > >> +++ b/mm/page_vma_mapped.c > >> @@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > >> */ > >> pmde = pmdp_get_lockless(pvmw->pmd); > >> > >> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { > >> - pvmw->ptl = pmd_lock(mm, pvmw->pmd); > >> - pmde = *pvmw->pmd; > >> - if (!pmd_present(pmde)) { > >> - softleaf_t entry; > >> - > >> - if (!thp_migration_supported() || > >> - !(pvmw->flags & PVMW_MIGRATION)) > >> - return not_found(pvmw); > >> - entry = softleaf_from_pmd(pmde); > >> - > >> - if (!softleaf_is_migration(entry) || > >> - !check_pmd(softleaf_to_pfn(entry), pvmw)) > >> - return not_found(pvmw); > >> - return true; > >> - } > >> - if (likely(pmd_trans_huge(pmde))) { > >> - if (pvmw->flags & PVMW_MIGRATION) > >> - return not_found(pvmw); > >> - if (!check_pmd(pmd_pfn(pmde), pvmw)) > >> - return not_found(pvmw); > >> - return true; > >> - } > >> - /* THP pmd was split under us: handle on pte level */ > >> - spin_unlock(pvmw->ptl); > >> - pvmw->ptl = NULL; > >> - } else if (!pmd_present(pmde)) { > >> - const softleaf_t entry = softleaf_from_pmd(pmde); > >> - > >> - if (softleaf_is_device_private(entry)) { > >> - pvmw->ptl = pmd_lock(mm, pvmw->pmd); > >> - return true; > >> - } > >> + if (pmd_present(pmde)) { > >> + if (!pmd_leaf(pmde)) > >> + goto pte_table; > >> + if (pvmw->flags & PVMW_MIGRATION) > >> + return not_found(pvmw); > >> + if (!check_pmd(pmd_pfn(pmde), pvmw)) > >> + return not_found(pvmw); > >> + } else if (pmd_is_migration_entry(pmde)) { > >> + softleaf_t entry = softleaf_from_pmd(pmde); > >> + > >> + if (!(pvmw->flags & PVMW_MIGRATION)) > >> + return not_found(pvmw); > > > > Looked at history a bit, and I wonder if this changed something old > > here ... > > > > Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD > > migration handling took PTL before doing PVMW_MIGRATION/PFN checks, > > including not_found() cases. So lockless PMD read was just a filter ... > > > > With this fix, true case gets final pmd_same() check, but this > > not_found() case happens before taking PTL. > > > > So a !PVMW_MIGRATION walker could race with someone, e.g. > > remove_migration_pmd(): we make the not_found() decision from old PMD > > value that still says "migration", while real *pvmw->pmd may already be > > present again. We return without ever taking PTL :) > > > > Not sure about practical fallout, but should these PMD-level not_found() > > cases also take PTL and restart if PMD changed? > I was hoping that we could so something similar to the PTE case. > > In map_pte(), we check whether the PMD changed, which is slightly different. > > The actual check happens in check_pte() after grabbing the PTL. > > For the case you describe, map_pte() would find !pte_none(ptent) ... > !pte_present(ptent) and !is_migration, and effectively grab the lock and proceed > to check_pte(). > > In check_pte() we re-check under lock indeed. > Thinking of the practical fallout, not finding the PMD for a non migration worker should be OK. Is there a case where it's not OK to report the old state. Balbir ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-17 2:11 ` Balbir Singh @ 2026-06-17 3:14 ` Lance Yang 0 siblings, 0 replies; 16+ messages in thread From: Lance Yang @ 2026-06-17 3:14 UTC (permalink / raw) To: balbirs Cc: david, lance.yang, richard.weiyang, akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm, lorenzo.stoakes, stable On Wed, Jun 17, 2026 at 12:11:14PM +1000, Balbir Singh wrote: >On Tue, Jun 16, 2026 at 03:07:53PM +0200, David Hildenbrand (Arm) wrote: >> On 6/16/26 14:30, Lance Yang wrote: >> > >> > On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >> > [...] >> >> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >> >> index 2ccbabfb2cc1..21635fab209c 100644 >> >> --- a/mm/page_vma_mapped.c >> >> +++ b/mm/page_vma_mapped.c >> >> @@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >> >> */ >> >> pmde = pmdp_get_lockless(pvmw->pmd); >> >> >> >> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >> >> - pvmw->ptl = pmd_lock(mm, pvmw->pmd); >> >> - pmde = *pvmw->pmd; >> >> - if (!pmd_present(pmde)) { >> >> - softleaf_t entry; >> >> - >> >> - if (!thp_migration_supported() || >> >> - !(pvmw->flags & PVMW_MIGRATION)) >> >> - return not_found(pvmw); >> >> - entry = softleaf_from_pmd(pmde); >> >> - >> >> - if (!softleaf_is_migration(entry) || >> >> - !check_pmd(softleaf_to_pfn(entry), pvmw)) >> >> - return not_found(pvmw); >> >> - return true; >> >> - } >> >> - if (likely(pmd_trans_huge(pmde))) { >> >> - if (pvmw->flags & PVMW_MIGRATION) >> >> - return not_found(pvmw); >> >> - if (!check_pmd(pmd_pfn(pmde), pvmw)) >> >> - return not_found(pvmw); >> >> - return true; >> >> - } >> >> - /* THP pmd was split under us: handle on pte level */ >> >> - spin_unlock(pvmw->ptl); >> >> - pvmw->ptl = NULL; >> >> - } else if (!pmd_present(pmde)) { >> >> - const softleaf_t entry = softleaf_from_pmd(pmde); >> >> - >> >> - if (softleaf_is_device_private(entry)) { >> >> - pvmw->ptl = pmd_lock(mm, pvmw->pmd); >> >> - return true; >> >> - } >> >> + if (pmd_present(pmde)) { >> >> + if (!pmd_leaf(pmde)) >> >> + goto pte_table; >> >> + if (pvmw->flags & PVMW_MIGRATION) >> >> + return not_found(pvmw); >> >> + if (!check_pmd(pmd_pfn(pmde), pvmw)) >> >> + return not_found(pvmw); >> >> + } else if (pmd_is_migration_entry(pmde)) { >> >> + softleaf_t entry = softleaf_from_pmd(pmde); >> >> + >> >> + if (!(pvmw->flags & PVMW_MIGRATION)) >> >> + return not_found(pvmw); >> > >> > Looked at history a bit, and I wonder if this changed something old >> > here ... >> > >> > Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >> > migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >> > including not_found() cases. So lockless PMD read was just a filter ... >> > >> > With this fix, true case gets final pmd_same() check, but this >> > not_found() case happens before taking PTL. >> > >> > So a !PVMW_MIGRATION walker could race with someone, e.g. >> > remove_migration_pmd(): we make the not_found() decision from old PMD >> > value that still says "migration", while real *pvmw->pmd may already be >> > present again. We return without ever taking PTL :) >> > >> > Not sure about practical fallout, but should these PMD-level not_found() >> > cases also take PTL and restart if PMD changed? >> I was hoping that we could so something similar to the PTE case. >> >> In map_pte(), we check whether the PMD changed, which is slightly different. >> >> The actual check happens in check_pte() after grabbing the PTL. >> >> For the case you describe, map_pte() would find !pte_none(ptent) ... >> !pte_present(ptent) and !is_migration, and effectively grab the lock and proceed >> to check_pte(). >> >> In check_pte() we re-check under lock indeed. >> > >Thinking of the practical fallout, not finding the PMD for a non >migration worker should be OK. Is there a case where it's not OK to >report the old state. I was thinking the lockless value should only be used as a first filter, to see whether entry looks worth checking. PTE side is roughly lockless filter + PTL/check_pte(). PMD true case now gets PTL/pmd_same(), but some PMD not_found() cases still come straight from lockless "pmde". That mismatch felt odd to me ... Cheers, Lance ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd [not found] <20260616063436.20455-1-richard.weiyang@gmail.com> 2026-06-16 12:30 ` [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Lance Yang @ 2026-06-19 10:44 ` Lorenzo Stoakes 2026-06-19 10:48 ` David Hildenbrand (Arm) 2026-06-20 2:11 ` Wei Yang 1 sibling, 2 replies; 16+ messages in thread From: Lorenzo Stoakes @ 2026-06-19 10:44 UTC (permalink / raw) To: Wei Yang Cc: akpm, david, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable -cc wrong email On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: > For pmd_trans_huge() and pmd_is_migration_entry(), we does following > before return the pmd entry: > > * re-validate pmd entry after PTL > * check PVMW_MIGRATION > * check_pmd() > * handle on pte level if split under us > > But for device-private pmd, we just return after pmd_lock(). > > This may return improper entry, e.g. if we are looking for a migration > entry, device-private entry could still be returned, which leads to data > corruption. I don't thik this is quite clear? How about: If a softleaf entry is present, the existing code simply acquires the PMD lock and returns success even if PVMW_MIGRATION is set (indicating a migration entry is sought), meaning that the caller can incorrectly interpret the entry as something it is not, causing data corruption. > > This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration > support device-private entries") by following the same pattern as > pmd_trans_huge() and pmd_is_migration_entry() for device private entry. > > While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). > > * Instead of handling trans huge/migration entry/device private entry > in a mixed manner, we put each case into its own if condition and > handle with the same pattern. > * Also we grab PTL and make sure pmd is not changed under us after > above check instead of do the check with PTL hold. > * restart the process if pmd is changed under us You're doing quite a bit for a fix and you're putting it all in one place. How about do the fix as 1 patch, and then cleanups as other ones? It helps with review too :) It's a general rule of thumb that if you do more than one of moving, refactoring or changing code, to do them as separate patches so a reviewer/somebody bisecting can clearly separate each. Also PLEASE do not add new functionality (this lock recheck) in a fixes patch. We'll end up backporting new logic that way. Make the fixes bit _minimal_. I think in general Andrew prefers separate fixes patches so I'd just make the _minimal_ change that fixes this for the backport, and the cleanup stuff as a separate series. > > Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") Hmm seems the device private stuff has had a rocky road of late! I wonder if we need some more test coverage on this? > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > Suggested-by: David Hildenbrand <david@kernel.org> > Cc: David Hildenbrand <david@kernel.org> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: SeongJae Park <sj@kernel.org> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Annoying nag: You sent to my correct email ljs@kernel.org (thanks!) but also cc'd the incorrect one, please only send to ljs@kernel.org thanks :) > Cc: <stable@vger.kernel.org> Be better to just have this with the Fixes tag, Andrew adds the Cc's from the actual cc- list anyway. > > --- > v2: > * specify the possible error case of current code and user visible effect > * besides fix, cleanup the pmd entry handling based on David's suggestion > > v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/ > --- > mm/page_vma_mapped.c | 63 +++++++++++++++++++++----------------------- > 1 file changed, 30 insertions(+), 33 deletions(-) > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > index 2ccbabfb2cc1..21635fab209c 100644 > --- a/mm/page_vma_mapped.c > +++ b/mm/page_vma_mapped.c > @@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > */ > pmde = pmdp_get_lockless(pvmw->pmd); > > - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { > - pvmw->ptl = pmd_lock(mm, pvmw->pmd); > - pmde = *pvmw->pmd; > - if (!pmd_present(pmde)) { > - softleaf_t entry; > - > - if (!thp_migration_supported() || > - !(pvmw->flags & PVMW_MIGRATION)) > - return not_found(pvmw); > - entry = softleaf_from_pmd(pmde); > - > - if (!softleaf_is_migration(entry) || > - !check_pmd(softleaf_to_pfn(entry), pvmw)) > - return not_found(pvmw); > - return true; > - } > - if (likely(pmd_trans_huge(pmde))) { > - if (pvmw->flags & PVMW_MIGRATION) > - return not_found(pvmw); > - if (!check_pmd(pmd_pfn(pmde), pvmw)) > - return not_found(pvmw); > - return true; > - } > - /* THP pmd was split under us: handle on pte level */ Don't drop critical comments like this, that's very bad. > - spin_unlock(pvmw->ptl); > - pvmw->ptl = NULL; > - } else if (!pmd_present(pmde)) { > - const softleaf_t entry = softleaf_from_pmd(pmde); > - > - if (softleaf_is_device_private(entry)) { > - pvmw->ptl = pmd_lock(mm, pvmw->pmd); > - return true; > - } > + if (pmd_present(pmde)) { You're not checking pmd_trans_huge() at all now? Just assuming pmd_present() == pmd_trans_huge()? > + if (!pmd_leaf(pmde)) > + goto pte_table; OK now assuming pmd_leaf() == pmd_trans_huge()? You didn't mention this in the commit msg? Justificaiton please? > + if (pvmw->flags & PVMW_MIGRATION) > + return not_found(pvmw); > + if (!check_pmd(pmd_pfn(pmde), pvmw)) > + return not_found(pvmw); > + } else if (pmd_is_migration_entry(pmde)) { > + softleaf_t entry = softleaf_from_pmd(pmde); > + Err you dropped the thp_migration_supported() check? Why? > + if (!(pvmw->flags & PVMW_MIGRATION)) > + return not_found(pvmw); > + if (!check_pmd(softleaf_to_pfn(entry), pvmw)) > + return not_found(pvmw); > + } else if (pmd_is_device_private_entry(pmde)) { > + softleaf_t entry = softleaf_from_pmd(pmde); > > + if (pvmw->flags & PVMW_MIGRATION) > + return not_found(pvmw); > + if (!check_pmd(softleaf_to_pfn(entry), pvmw)) > + return not_found(pvmw); I mean it's less awful than what was there before, but as refactoring goes, putting a massive set of branches in the middle of a long function isn't really the best. Can we avoid this horrible goto by separating this out into a function? > + } else { Else means what exactly? A comment would be good. I feel like in an effort to keep all this at one level of indentation you've created a confusing else case. Sane thing would be to: a. have this as a separate function b. to do: if (pmd_none(pmde)) { ... return ...; } if (pmd_present(pmde)) { ... return ...; } softleaf = softleaf_from_pmd(pmde); /* softleaf stuff */ return ...; In this function. That way it's _very clear_ you're doing softleaf stuff. > if ((pvmw->flags & PVMW_SYNC) && > thp_vma_suitable_order(vma, pvmw->address, > PMD_ORDER) && Umm... previously this was done for all softleaf entries other than device-private, now not, and you haven't, why? > @@ -286,6 +274,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > step_forward(pvmw, PMD_SIZE); > continue; > } > + > + /* Double-check under PTL that the PMD didn't change. */ > + pvmw->ptl = pmd_lock(mm, pvmw->pmd); > + if (pmd_same(pmde, pmdp_get(pvmw->pmd))) > + return true; > + spin_unlock(pvmw->ptl); > + pvmw->ptl = NULL; > + goto restart; I'm not sure I really get the justification for this? seems you just added this arbitrarily? Again, you shouldn't be making changes like this in a fix patch. > +pte_table: > if (!map_pte(pvmw, &pmde, &ptl)) { > if (!pvmw->pte) > goto restart; > -- > 2.34.1 > This is really hard to review like this, as above please split this up properly. Thanks, Lorenzo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 10:44 ` Lorenzo Stoakes @ 2026-06-19 10:48 ` David Hildenbrand (Arm) 2026-06-19 11:04 ` Lorenzo Stoakes 2026-06-20 2:13 ` Wei Yang 2026-06-20 2:11 ` Wei Yang 1 sibling, 2 replies; 16+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-19 10:48 UTC (permalink / raw) To: Lorenzo Stoakes, Wei Yang Cc: akpm, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On 6/19/26 12:44, Lorenzo Stoakes wrote: > -cc wrong email > > On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >> For pmd_trans_huge() and pmd_is_migration_entry(), we does following >> before return the pmd entry: >> >> * re-validate pmd entry after PTL >> * check PVMW_MIGRATION >> * check_pmd() >> * handle on pte level if split under us >> >> But for device-private pmd, we just return after pmd_lock(). >> >> This may return improper entry, e.g. if we are looking for a migration >> entry, device-private entry could still be returned, which leads to data >> corruption. > > I don't thik this is quite clear? > > How about: > > If a softleaf entry is present, the existing code simply acquires the > PMD lock and returns success even if PVMW_MIGRATION is set (indicating a > migration entry is sought), meaning that the caller can incorrectly > interpret the entry as something it is not, causing data corruption. > >> >> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration >> support device-private entries") by following the same pattern as >> pmd_trans_huge() and pmd_is_migration_entry() for device private entry. >> >> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). >> >> * Instead of handling trans huge/migration entry/device private entry >> in a mixed manner, we put each case into its own if condition and >> handle with the same pattern. >> * Also we grab PTL and make sure pmd is not changed under us after >> above check instead of do the check with PTL hold. >> * restart the process if pmd is changed under us > > You're doing quite a bit for a fix and you're putting it all in one place. > > How about do the fix as 1 patch, and then cleanups as other ones? It helps with > review too :) > > It's a general rule of thumb that if you do more than one of moving, refactoring > or changing code, to do them as separate patches so a reviewer/somebody > bisecting can clearly separate each. > > Also PLEASE do not add new functionality (this lock recheck) in a fixes > patch. We'll end up backporting new logic that way. > > Make the fixes bit _minimal_. To be fair, I asked for this https://lore.kernel.org/all/2d48ef0d-1110-4a9d-adcb-f701a1ce2cfa@kernel.org/ But given that Wei mostly used my quick draft without properly checking the implications, yeah, let's fix it first separately. I can then follow up with a proper cleanup. > > I think in general Andrew prefers separate fixes patches so I'd just make the > _minimal_ change that fixes this for the backport, and the cleanup stuff as a > separate series. > The issue is that the existing handling is just crap, and to fix it, we're adding more crap. But yeah, let's add more crap first before we clean it up properly. -- Cheers, David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 10:48 ` David Hildenbrand (Arm) @ 2026-06-19 11:04 ` Lorenzo Stoakes 2026-06-20 2:13 ` Wei Yang 1 sibling, 0 replies; 16+ messages in thread From: Lorenzo Stoakes @ 2026-06-19 11:04 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Wei Yang, akpm, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 12:48:26PM +0200, David Hildenbrand (Arm) wrote: > On 6/19/26 12:44, Lorenzo Stoakes wrote: > > -cc wrong email > > > > On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: > >> For pmd_trans_huge() and pmd_is_migration_entry(), we does following > >> before return the pmd entry: > >> > >> * re-validate pmd entry after PTL > >> * check PVMW_MIGRATION > >> * check_pmd() > >> * handle on pte level if split under us > >> > >> But for device-private pmd, we just return after pmd_lock(). > >> > >> This may return improper entry, e.g. if we are looking for a migration > >> entry, device-private entry could still be returned, which leads to data > >> corruption. > > > > I don't thik this is quite clear? > > > > How about: > > > > If a softleaf entry is present, the existing code simply acquires the > > PMD lock and returns success even if PVMW_MIGRATION is set (indicating a > > migration entry is sought), meaning that the caller can incorrectly > > interpret the entry as something it is not, causing data corruption. > > > >> > >> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration > >> support device-private entries") by following the same pattern as > >> pmd_trans_huge() and pmd_is_migration_entry() for device private entry. > >> > >> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). > >> > >> * Instead of handling trans huge/migration entry/device private entry > >> in a mixed manner, we put each case into its own if condition and > >> handle with the same pattern. > >> * Also we grab PTL and make sure pmd is not changed under us after > >> above check instead of do the check with PTL hold. > >> * restart the process if pmd is changed under us > > > > You're doing quite a bit for a fix and you're putting it all in one place. > > > > How about do the fix as 1 patch, and then cleanups as other ones? It helps with > > review too :) > > > > It's a general rule of thumb that if you do more than one of moving, refactoring > > or changing code, to do them as separate patches so a reviewer/somebody > > bisecting can clearly separate each. > > > > Also PLEASE do not add new functionality (this lock recheck) in a fixes > > patch. We'll end up backporting new logic that way. > > > > Make the fixes bit _minimal_. > > To be fair, I asked for this > > https://lore.kernel.org/all/2d48ef0d-1110-4a9d-adcb-f701a1ce2cfa@kernel.org/ > > But given that Wei mostly used my quick draft without properly checking the > implications, yeah, let's fix it first separately. Ack yeah sorry I mean I agree that it needs cleanup just has to be done in the right way which clearly I think we agree on :) > > I can then follow up with a proper cleanup. Thanks! > > > > > I think in general Andrew prefers separate fixes patches so I'd just make the > > _minimal_ change that fixes this for the backport, and the cleanup stuff as a > > separate series. > > > > The issue is that the existing handling is just crap, and to fix it, we're > adding more crap. But yeah, let's add more crap first before we clean it up > properly. I couldn't agree more and to be clear - I hate how this is right now. But I think for the fix we have to wade in the crap first then clean it up afterwards... :) > > > -- > Cheers, > > David Cheers, Lorenzo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 10:48 ` David Hildenbrand (Arm) 2026-06-19 11:04 ` Lorenzo Stoakes @ 2026-06-20 2:13 ` Wei Yang 2026-06-22 11:21 ` Lance Yang 2026-06-22 11:33 ` David Hildenbrand (Arm) 1 sibling, 2 replies; 16+ messages in thread From: Wei Yang @ 2026-06-20 2:13 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Lorenzo Stoakes, Wei Yang, akpm, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 12:48:26PM +0200, David Hildenbrand (Arm) wrote: >On 6/19/26 12:44, Lorenzo Stoakes wrote: >> -cc wrong email >> >> On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>> For pmd_trans_huge() and pmd_is_migration_entry(), we does following >>> before return the pmd entry: >>> >>> * re-validate pmd entry after PTL >>> * check PVMW_MIGRATION >>> * check_pmd() >>> * handle on pte level if split under us >>> >>> But for device-private pmd, we just return after pmd_lock(). >>> >>> This may return improper entry, e.g. if we are looking for a migration >>> entry, device-private entry could still be returned, which leads to data >>> corruption. >> >> I don't thik this is quite clear? >> >> How about: >> >> If a softleaf entry is present, the existing code simply acquires the >> PMD lock and returns success even if PVMW_MIGRATION is set (indicating a >> migration entry is sought), meaning that the caller can incorrectly >> interpret the entry as something it is not, causing data corruption. >> >>> >>> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration >>> support device-private entries") by following the same pattern as >>> pmd_trans_huge() and pmd_is_migration_entry() for device private entry. >>> >>> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). >>> >>> * Instead of handling trans huge/migration entry/device private entry >>> in a mixed manner, we put each case into its own if condition and >>> handle with the same pattern. >>> * Also we grab PTL and make sure pmd is not changed under us after >>> above check instead of do the check with PTL hold. >>> * restart the process if pmd is changed under us >> >> You're doing quite a bit for a fix and you're putting it all in one place. >> >> How about do the fix as 1 patch, and then cleanups as other ones? It helps with >> review too :) >> >> It's a general rule of thumb that if you do more than one of moving, refactoring >> or changing code, to do them as separate patches so a reviewer/somebody >> bisecting can clearly separate each. >> >> Also PLEASE do not add new functionality (this lock recheck) in a fixes >> patch. We'll end up backporting new logic that way. >> >> Make the fixes bit _minimal_. > >To be fair, I asked for this > >https://lore.kernel.org/all/2d48ef0d-1110-4a9d-adcb-f701a1ce2cfa@kernel.org/ > >But given that Wei mostly used my quick draft without properly checking the >implications, yeah, let's fix it first separately. Sorry, if I misunderstand your point. > >I can then follow up with a proper cleanup. > I would like to do a followup cleanup for this, may I have this chance? -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-20 2:13 ` Wei Yang @ 2026-06-22 11:21 ` Lance Yang 2026-06-22 11:33 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 16+ messages in thread From: Lance Yang @ 2026-06-22 11:21 UTC (permalink / raw) To: richard.weiyang Cc: david, akpm, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable, ljs, Lance Yang On Sat, Jun 20, 2026 at 02:13:53AM +0000, Wei Yang wrote: >On Fri, Jun 19, 2026 at 12:48:26PM +0200, David Hildenbrand (Arm) wrote: >>On 6/19/26 12:44, Lorenzo Stoakes wrote: >>> -cc wrong email >>> >>> On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>>> For pmd_trans_huge() and pmd_is_migration_entry(), we does following >>>> before return the pmd entry: >>>> >>>> * re-validate pmd entry after PTL >>>> * check PVMW_MIGRATION >>>> * check_pmd() >>>> * handle on pte level if split under us >>>> >>>> But for device-private pmd, we just return after pmd_lock(). >>>> >>>> This may return improper entry, e.g. if we are looking for a migration >>>> entry, device-private entry could still be returned, which leads to data >>>> corruption. >>> >>> I don't thik this is quite clear? >>> >>> How about: >>> >>> If a softleaf entry is present, the existing code simply acquires the >>> PMD lock and returns success even if PVMW_MIGRATION is set (indicating a >>> migration entry is sought), meaning that the caller can incorrectly >>> interpret the entry as something it is not, causing data corruption. >>> >>>> >>>> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration >>>> support device-private entries") by following the same pattern as >>>> pmd_trans_huge() and pmd_is_migration_entry() for device private entry. >>>> >>>> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). >>>> >>>> * Instead of handling trans huge/migration entry/device private entry >>>> in a mixed manner, we put each case into its own if condition and >>>> handle with the same pattern. >>>> * Also we grab PTL and make sure pmd is not changed under us after >>>> above check instead of do the check with PTL hold. >>>> * restart the process if pmd is changed under us >>> >>> You're doing quite a bit for a fix and you're putting it all in one place. >>> >>> How about do the fix as 1 patch, and then cleanups as other ones? It helps with >>> review too :) >>> >>> It's a general rule of thumb that if you do more than one of moving, refactoring >>> or changing code, to do them as separate patches so a reviewer/somebody >>> bisecting can clearly separate each. >>> >>> Also PLEASE do not add new functionality (this lock recheck) in a fixes >>> patch. We'll end up backporting new logic that way. >>> >>> Make the fixes bit _minimal_. >> >>To be fair, I asked for this >> >>https://lore.kernel.org/all/2d48ef0d-1110-4a9d-adcb-f701a1ce2cfa@kernel.org/ >> >>But given that Wei mostly used my quick draft without properly checking the >>implications, yeah, let's fix it first separately. > >Sorry, if I misunderstand your point. > >> >>I can then follow up with a proper cleanup. FYI, spent a few days chasing the history here. Dropping my notes in case they save someone else some time reading or refactoring this code :P TL;DR below. For THP PMDs, the lockless pmd_trans_huge() check was only a candidate filter from day one. ace71a19cec5 [1] ("mm: introduce page_vma_mapped_walk()") already had the lock-and-recheck rule: the lockless check only got us into the branch, then the code took pmd_lock() before making any PMD-level decision: if (pmd_trans_huge(*pvmw->pmd)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... } else { ... } 616b8371539a [2] ("mm: thp: enable thp migration in generic path") later added PMD migration entries to that same locked branch: if (pmd_trans_huge(*pvmw->pmd) || is_pmd_migration_entry(*pvmw->pmd)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... } else { ... } Current code spells that helper as pmd_is_migration_entry(); that came from the later 0ac881efe164 softleaf cleanup, with no functional change intended. 3306d3119cea [3] ("mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd") later made the post-lock PMD value explicit as pmde. Same rule, just less repeated *pvmw->pmd dereferencing under PTL: pmde = READ_ONCE(*pvmw->pmd); if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); pmde = *pvmw->pmd; ... } One extra detail matters here. In ace71a19cec5 [1], the locked THP PMD branch rejected a non-present PMD before the present-THP check: if (pmd_trans_huge(*pvmw->pmd)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); if (!pmd_present(*pvmw->pmd)) return not_found(pvmw); if (likely(pmd_trans_huge(*pvmw->pmd))) { ... } else { ... } } 616b8371539a [2] could not keep that order after adding PMD migration entries, because a PMD migration entry is non-present. So the locked branch was reshaped to check the still-present THP PMD case first, and only then handle the non-present PMD as the PMD migration-entry case: if (pmd_trans_huge(*pvmw->pmd) || is_pmd_migration_entry(*pvmw->pmd)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); if (likely(pmd_trans_huge(*pvmw->pmd))) { ... } else if (!pmd_present(*pvmw->pmd)) { ... } else { ... } } Inside that non-present branch, 616b8371539a [2] made it a PMD-migration-only case: first require THP migration support, then require PVMW_MIGRATION. Otherwise it is not a match: if (pmd_trans_huge(*pvmw->pmd) || is_pmd_migration_entry(*pvmw->pmd)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); if (likely(pmd_trans_huge(*pvmw->pmd))) { ... } else if (!pmd_present(*pvmw->pmd)) { if (thp_migration_supported()) { if (!(pvmw->flags & PVMW_MIGRATION)) return not_found(pvmw); ... } else WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); return not_found(pvmw); } else { ... } } Note that PMD-level swap entries for anonymous THPs were not supported then, and still aren't supported today. After those gates, the entry still had to be a migration entry for the target THP. Functionally, that check goes back to 616b8371539a [2]: if (pmd_trans_huge(*pvmw->pmd) || is_pmd_migration_entry(*pvmw->pmd)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); if (likely(pmd_trans_huge(*pvmw->pmd))) { ... } else if (!pmd_present(*pvmw->pmd)) { if (thp_migration_supported()) { if (!(pvmw->flags & PVMW_MIGRATION)) return not_found(pvmw); if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) { swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd); if (migration_entry_to_page(entry) != page) return not_found(pvmw); return true; } } else WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); return not_found(pvmw); } else { ... } } The later commits mostly changed how the same rule is written. e2e1d4076c77 rewrote it into the negative-check style; 2aff7a4755be changed the page comparison to check_pmd() when pvmw moved to pfn/nr_pages; 0d206b5d2e0d switched the PFN extraction to swp_offset_pfn(); and 0ac881efe164 moved the PMD swap-entry helpers over to softleaf helpers. e2e1d4076c77: nested positive style -> negative-check style 2aff7a4755be: page comparison -> check_pmd(PFN range) 0d206b5d2e0d: swp_offset() -> swp_offset_pfn() 0ac881efe164: pmd_to_swp_entry/is_migration_entry -> softleaf helpers For the PMD-mapped THP case, the positive decision was also made under pmd_lock() from day one. In ace71a19cec5 [1], after taking pmd_lock(), the PMD had to still be a THP PMD, the walk had to be a non-PVMW_MIGRATION walk, and the PMD had to map the target THP: if (pmd_trans_huge(*pvmw->pmd)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... if (likely(pmd_trans_huge(*pvmw->pmd))) { if (pvmw->flags & PVMW_MIGRATION) return not_found(pvmw); if (pmd_page(*pvmw->pmd) != page) return not_found(pvmw); return true; } else { ... } } else { ... } 2aff7a4755be later changed the target check from pmd_page(...) == page to check_pmd(pmd_pfn(pmde), pvmw), when pvmw moved to pfn/nr_pages. Same rule, but range-based now. The split fallback was there from ace71a19cec5 [1] as well. After taking pmd_lock(), if the locked PMD was no longer a THP PMD, the walker did not make a PMD-level true/not_found decision. It dropped pmd_lock(), cleared pvmw->ptl, and continued at PTE level: if (pmd_trans_huge(*pvmw->pmd)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... if (likely(pmd_trans_huge(*pvmw->pmd))) { ... } else { /* THP pmd was split under us: handle on pte level */ spin_unlock(pvmw->ptl); pvmw->ptl = NULL; } } else { ... } if (!map_pte(pvmw)) goto next_pte; 616b8371539a [2] kept that fallback after adding PMD migration entries. e2e1d4076c77 only rearranged the !pmd_present() block; the split fallback still drops pmd_lock() and falls through to map_pte(). Now the outer non-present PMD case. In ace71a19cec5 [1], this was not an explicit else-if branch yet. It was hidden in check_pmd(): static inline bool check_pmd(struct page_vma_mapped_walk *pvmw) { pmd_t pmde; /* * Make sure we don't re-load pmd between present and !trans_huge check. * We need a consistent view. */ pmde = READ_ONCE(*pvmw->pmd); return pmd_present(pmde) && !pmd_trans_huge(pmde); } and the caller was: if (pmd_trans_huge(*pvmw->pmd)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... } else { if (!check_pmd(pvmw)) return false; } So outside the THP PMD branch, a non-present PMD failed check_pmd() and ended the walk. a7b100953aa3 [4] ("mm: page_vma_mapped: ensure pmd is loaded with READ_ONCE outside of lock") then made that lockless PMD value explicit in page_vma_mapped_walk(): So this branch means: the lockless PMD read did not look like a THP PMD, did not look like a PMD migration entry, and was non-present. pmde = READ_ONCE(*pvmw->pmd); if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... } else if (!pmd_present(pmde)) { return false; } The point of that commit was to use the same READ_ONCE() PMD snapshot for the initial lockless checks, rather than letting the compiler reload or reuse a stale PMD value around check_pmd(). 732ed55823fc [5] ("mm/thp: try_to_unmap() use TTU_SYNC for safe splitting") then added the PVMW_SYNC wait to this same outer non-present PMD branch: pmde = READ_ONCE(*pvmw->pmd); if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... } else if (!pmd_present(pmde)) { /* * If PVMW_SYNC, take and drop THP pmd lock so that we * cannot return prematurely, while zap_huge_pmd() has * cleared *pmd but not decremented compound_mapcount(). */ if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(pvmw->page)) { spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); spin_unlock(ptl); } return false; } A lockless non-present PMD was still not a mapping. PVMW_SYNC only changed whether the walker could return immediately: under PVMW_SYNC, it had to take and drop pmd_lock() first, so it would not return while zap_huge_pmd() had cleared *pmd but not yet decremented compound_mapcount(). a9a7504d9bea [6] ("mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes") then changed the end of this branch from return false to step_forward(PMD_SIZE) + continue: pmde = READ_ONCE(*pvmw->pmd); if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... } else if (!pmd_present(pmde)) { /* * If PVMW_SYNC, take and drop THP pmd lock so that we * cannot return prematurely, while zap_huge_pmd() has * cleared *pmd but not decremented compound_mapcount(). */ if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) { spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); spin_unlock(ptl); } step_forward(pvmw, PMD_SIZE); continue; } if (!map_pte(pvmw)) goto next_pte; That was about the walk range, not about making a non-present PMD a match. step_forward(pvmw, PMD_SIZE) advances pvmw->address to the next PMD boundary, and continue restarts the walk from there. So this branch only rules out the current PMD-sized slot. pmde = pmdp_get_lockless(pvmw->pmd); if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) || (pmd_present(pmde) && pmd_devmap(pmde))) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... } else if (!pmd_present(pmde)) { /* * If PVMW_SYNC, take and drop THP pmd lock so that we * cannot return prematurely, while zap_huge_pmd() has * cleared *pmd but not decremented compound_mapcount(). */ if ((pvmw->flags & PVMW_SYNC) && thp_vma_suitable_order(vma, pvmw->address, PMD_ORDER) && (pvmw->nr_pages >= HPAGE_PMD_NR)) { spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); spin_unlock(ptl); } step_forward(pvmw, PMD_SIZE); continue; } 2aff7a4755be later converted page_vma_mapped_walk() to pfn/nr_pages, so the PVMW_SYNC condition became VMA/range based. c453d8c7d138 tightened the VMA side to transhuge_vma_suitable(vma, pvmw->address), and 3485b88390b0 later made that helper order-aware, giving the current thp_vma_suitable_order(vma, pvmw->address, PMD_ORDER) spelling. if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) || (pmd_present(pmde) && pmd_devmap(pmde))) { There was also a temporary devmap wrinkle. 6472f6d2f7d9 added pmd_devmap() here because huge devmap PMDs were not covered by the old pmd_trans_huge() test. 8a6a984c2e0e later removed the explicit pmd_devmap() checks because DAX no longer created pmd_devmap entries, and pXd_trans_huge() covered those mappings. This did not change the outer non-present PMD rule above. 65edfda6f3f2 [7] ("mm/rmap: extend rmap and migration support device-private entries") later added the device-private PMD case into this same outer non-present branch: pmde = pmdp_get_lockless(pvmw->pmd); if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); pmde = *pvmw->pmd; ... } else if (!pmd_present(pmde)) { /* * If PVMW_SYNC, take and drop THP pmd lock so that we * cannot return prematurely, while zap_huge_pmd() has * cleared *pmd but not decremented compound_mapcount(). */ swp_entry_t entry = pmd_to_swp_entry(pmde); if (is_device_private_entry(entry)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); return true; } ... } That changed the old rule for this branch: a lockless non-present PMD was no longer always skipped. If the lockless PMD decoded as device-private, the walker took pmd_lock() and returned true. 0ac881efe164 later converted this code to softleaf helpers, with no functional change intended. So the names changed, but the control flow stayed the same: pmde = pmdp_get_lockless(pvmw->pmd); if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); ... } else if (!pmd_present(pmde)) { const softleaf_t entry = softleaf_from_pmd(pmde); if (softleaf_is_device_private(entry)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); return true; } if ((pvmw->flags & PVMW_SYNC) && thp_vma_suitable_order(vma, pvmw->address, PMD_ORDER) && (pvmw->nr_pages >= HPAGE_PMD_NR)) sync_with_folio_pmd_zap(mm, pvmw->pmd); step_forward(pvmw, PMD_SIZE); continue; } 514c2fe9927e later moved the PVMW_SYNC take/drop pmd_lock() helper into sync_with_folio_pmd_zap(). Note that PMD device-private handling still did not line up with the PTE side, as I pointed out earlier[8]. [1] https://lore.kernel.org/all/20170129173858.45174-3-kirill.shutemov@linux.intel.com/T/#u [2] https://lore.kernel.org/all/20170717193955.20207-6-zi.yan@sent.com/ [3] https://lore.kernel.org/all/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com/ [4] https://lore.kernel.org/all/1507222630-5839-1-git-send-email-will.deacon@arm.com/ [5] https://lore.kernel.org/linux-mm/20210616012353._aWHpXxeZ%25akpm@linux-foundation.org/ [6] https://lore.kernel.org/all/fedb8632-1798-de42-f39e-873551d5bc81@google.com/ [7] https://lore.kernel.org/all/20251001065707.920170-5-balbirs@nvidia.com/ [8] https://lore.kernel.org/all/20260619121909.90510-1-lance.yang@linux.dev/ [...] Cheers, Lance ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-20 2:13 ` Wei Yang 2026-06-22 11:21 ` Lance Yang @ 2026-06-22 11:33 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 16+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-22 11:33 UTC (permalink / raw) To: Wei Yang Cc: Lorenzo Stoakes, akpm, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable > >> >> I can then follow up with a proper cleanup. >> > > I would like to do a followup cleanup for this, may I have this chance? Let's fix it first and then decide how to proceed with this code here. I get the feeling that some more of this code might deserve a cleanup. -- Cheers, David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 10:44 ` Lorenzo Stoakes 2026-06-19 10:48 ` David Hildenbrand (Arm) @ 2026-06-20 2:11 ` Wei Yang 1 sibling, 0 replies; 16+ messages in thread From: Wei Yang @ 2026-06-20 2:11 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Wei Yang, akpm, david, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 11:44:13AM +0100, Lorenzo Stoakes wrote: >-cc wrong email > >On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >> For pmd_trans_huge() and pmd_is_migration_entry(), we does following >> before return the pmd entry: >> >> * re-validate pmd entry after PTL >> * check PVMW_MIGRATION >> * check_pmd() >> * handle on pte level if split under us >> >> But for device-private pmd, we just return after pmd_lock(). >> >> This may return improper entry, e.g. if we are looking for a migration >> entry, device-private entry could still be returned, which leads to data >> corruption. > >I don't thik this is quite clear? > >How about: > > If a softleaf entry is present, the existing code simply acquires the How about: If a softleaf entry is present, e.g. device-private pmd, > PMD lock and returns success even if PVMW_MIGRATION is set (indicating a > migration entry is sought), meaning that the caller can incorrectly > interpret the entry as something it is not, causing data corruption. > Looks better, thanks. >> >> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration >> support device-private entries") by following the same pattern as >> pmd_trans_huge() and pmd_is_migration_entry() for device private entry. >> >> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). >> >> * Instead of handling trans huge/migration entry/device private entry >> in a mixed manner, we put each case into its own if condition and >> handle with the same pattern. >> * Also we grab PTL and make sure pmd is not changed under us after >> above check instead of do the check with PTL hold. >> * restart the process if pmd is changed under us > >You're doing quite a bit for a fix and you're putting it all in one place. > >How about do the fix as 1 patch, and then cleanups as other ones? It helps with >review too :) > Got it. >It's a general rule of thumb that if you do more than one of moving, refactoring >or changing code, to do them as separate patches so a reviewer/somebody >bisecting can clearly separate each. > >Also PLEASE do not add new functionality (this lock recheck) in a fixes >patch. We'll end up backporting new logic that way. > >Make the fixes bit _minimal_. > >I think in general Andrew prefers separate fixes patches so I'd just make the >_minimal_ change that fixes this for the backport, and the cleanup stuff as a >separate series. > >> >> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") > >Hmm seems the device private stuff has had a rocky road of late! > >I wonder if we need some more test coverage on this? > >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> Suggested-by: David Hildenbrand <david@kernel.org> >> Cc: David Hildenbrand <david@kernel.org> >> Cc: Balbir Singh <balbirs@nvidia.com> >> Cc: SeongJae Park <sj@kernel.org> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > >Annoying nag: You sent to my correct email ljs@kernel.org (thanks!) but also >cc'd the incorrect one, please only send to ljs@kernel.org thanks :) > Yeah, I pasted it from commit log of the fix commit. But get your kernel.org email from get_maintainer.pl... Will pay attention >> Cc: <stable@vger.kernel.org> > >Be better to just have this with the Fixes tag, Andrew adds the Cc's from the >actual cc- list anyway. > OK, will move it close to Fixes tag. -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-06-22 11:33 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260616063436.20455-1-richard.weiyang@gmail.com>
2026-06-16 12:30 ` [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Lance Yang
2026-06-16 23:50 ` Wei Yang
2026-06-17 2:32 ` Lance Yang
2026-06-17 8:18 ` Wei Yang
2026-06-19 2:30 ` Wei Yang
2026-06-19 12:19 ` Lance Yang
2026-06-20 2:02 ` Wei Yang
[not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org>
2026-06-17 2:11 ` Balbir Singh
2026-06-17 3:14 ` Lance Yang
2026-06-19 10:44 ` Lorenzo Stoakes
2026-06-19 10:48 ` David Hildenbrand (Arm)
2026-06-19 11:04 ` Lorenzo Stoakes
2026-06-20 2:13 ` Wei Yang
2026-06-22 11:21 ` Lance Yang
2026-06-22 11:33 ` David Hildenbrand (Arm)
2026-06-20 2:11 ` Wei Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox