* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd [not found] <20260616063436.20455-1-richard.weiyang@gmail.com> @ 2026-06-16 12:30 ` Lance Yang 2026-06-16 23:50 ` Wei Yang [not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org> 2026-06-19 10:44 ` Lorenzo Stoakes 1 sibling, 2 replies; 14+ messages in thread From: Lance Yang @ 2026-06-16 12:30 UTC (permalink / raw) To: richard.weiyang Cc: akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable, Lance Yang On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: [...] >diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >index 2ccbabfb2cc1..21635fab209c 100644 >--- a/mm/page_vma_mapped.c >+++ b/mm/page_vma_mapped.c >@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > */ > pmde = pmdp_get_lockless(pvmw->pmd); > >- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >- pmde = *pvmw->pmd; >- if (!pmd_present(pmde)) { >- softleaf_t entry; >- >- if (!thp_migration_supported() || >- !(pvmw->flags & PVMW_MIGRATION)) >- return not_found(pvmw); >- entry = softleaf_from_pmd(pmde); >- >- if (!softleaf_is_migration(entry) || >- !check_pmd(softleaf_to_pfn(entry), pvmw)) >- return not_found(pvmw); >- return true; >- } >- if (likely(pmd_trans_huge(pmde))) { >- if (pvmw->flags & PVMW_MIGRATION) >- return not_found(pvmw); >- if (!check_pmd(pmd_pfn(pmde), pvmw)) >- return not_found(pvmw); >- return true; >- } >- /* THP pmd was split under us: handle on pte level */ >- spin_unlock(pvmw->ptl); >- pvmw->ptl = NULL; >- } else if (!pmd_present(pmde)) { >- const softleaf_t entry = softleaf_from_pmd(pmde); >- >- if (softleaf_is_device_private(entry)) { >- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >- return true; >- } >+ if (pmd_present(pmde)) { >+ if (!pmd_leaf(pmde)) >+ goto pte_table; >+ if (pvmw->flags & PVMW_MIGRATION) >+ return not_found(pvmw); >+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >+ return not_found(pvmw); >+ } else if (pmd_is_migration_entry(pmde)) { >+ softleaf_t entry = softleaf_from_pmd(pmde); >+ >+ if (!(pvmw->flags & PVMW_MIGRATION)) >+ return not_found(pvmw); Looked at history a bit, and I wonder if this changed something old here ... Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD migration handling took PTL before doing PVMW_MIGRATION/PFN checks, including not_found() cases. So lockless PMD read was just a filter ... With this fix, true case gets final pmd_same() check, but this not_found() case happens before taking PTL. So a !PVMW_MIGRATION walker could race with someone, e.g. remove_migration_pmd(): we make the not_found() decision from old PMD value that still says "migration", while real *pvmw->pmd may already be present again. We return without ever taking PTL :) Not sure about practical fallout, but should these PMD-level not_found() cases also take PTL and restart if PMD changed? >+ if (!check_pmd(softleaf_to_pfn(entry), pvmw)) >+ return not_found(pvmw); >+ } else if (pmd_is_device_private_entry(pmde)) { >+ softleaf_t entry = softleaf_from_pmd(pmde); > >+ if (pvmw->flags & PVMW_MIGRATION) >+ return not_found(pvmw); >+ if (!check_pmd(softleaf_to_pfn(entry), pvmw)) >+ return not_found(pvmw); >+ } else { > if ((pvmw->flags & PVMW_SYNC) && > thp_vma_suitable_order(vma, pvmw->address, > PMD_ORDER) && >@@ -286,6 +274,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > step_forward(pvmw, PMD_SIZE); > continue; > } >+ >+ /* Double-check under PTL that the PMD didn't change. */ >+ pvmw->ptl = pmd_lock(mm, pvmw->pmd); >+ if (pmd_same(pmde, pmdp_get(pvmw->pmd))) >+ return true; >+ spin_unlock(pvmw->ptl); >+ pvmw->ptl = NULL; >+ goto restart; >+pte_table: > if (!map_pte(pvmw, &pmde, &ptl)) { > if (!pvmw->pte) > goto restart; >-- >2.34.1 > Cheers, Lance ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-16 12:30 ` [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Lance Yang @ 2026-06-16 23:50 ` Wei Yang 2026-06-17 2:32 ` Lance Yang [not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org> 1 sibling, 1 reply; 14+ messages in thread From: Wei Yang @ 2026-06-16 23:50 UTC (permalink / raw) To: Lance Yang Cc: richard.weiyang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: > >On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >[...] >>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>index 2ccbabfb2cc1..21635fab209c 100644 >>--- a/mm/page_vma_mapped.c >>+++ b/mm/page_vma_mapped.c >>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >> */ >> pmde = pmdp_get_lockless(pvmw->pmd); >> >>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>- pmde = *pvmw->pmd; >>- if (!pmd_present(pmde)) { >>- softleaf_t entry; >>- >>- if (!thp_migration_supported() || >>- !(pvmw->flags & PVMW_MIGRATION)) >>- return not_found(pvmw); >>- entry = softleaf_from_pmd(pmde); >>- >>- if (!softleaf_is_migration(entry) || >>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>- return not_found(pvmw); >>- return true; >>- } >>- if (likely(pmd_trans_huge(pmde))) { >>- if (pvmw->flags & PVMW_MIGRATION) >>- return not_found(pvmw); >>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>- return not_found(pvmw); >>- return true; >>- } >>- /* THP pmd was split under us: handle on pte level */ >>- spin_unlock(pvmw->ptl); >>- pvmw->ptl = NULL; >>- } else if (!pmd_present(pmde)) { >>- const softleaf_t entry = softleaf_from_pmd(pmde); >>- >>- if (softleaf_is_device_private(entry)) { >>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>- return true; >>- } >>+ if (pmd_present(pmde)) { >>+ if (!pmd_leaf(pmde)) >>+ goto pte_table; >>+ if (pvmw->flags & PVMW_MIGRATION) >>+ return not_found(pvmw); >>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>+ return not_found(pvmw); >>+ } else if (pmd_is_migration_entry(pmde)) { >>+ softleaf_t entry = softleaf_from_pmd(pmde); >>+ >>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>+ return not_found(pvmw); > >Looked at history a bit, and I wonder if this changed something old >here ... > >Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >including not_found() cases. So lockless PMD read was just a filter ... > >With this fix, true case gets final pmd_same() check, but this >not_found() case happens before taking PTL. > >So a !PVMW_MIGRATION walker could race with someone, e.g. >remove_migration_pmd(): we make the not_found() decision from old PMD >value that still says "migration", while real *pvmw->pmd may already be >present again. We return without ever taking PTL :) > Hi, Lance Thanks for take a look. I am trying to understand the scenario you mentioned. Let's say A migrate a pmd and B want to unmap the pmd. A B try to migrate a pmd pmd is set to migration entry unmap the pmd ... managed to finish migration ...still see migration entry, so skipped and unmap fail Would this be a timing case? Even B grab the PTL, it still could see migration entry if B visit pmd before A finish migration. Maybe I miss something, look forward your insight. >Not sure about practical fallout, but should these PMD-level not_found() >cases also take PTL and restart if PMD changed? > -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-16 23:50 ` Wei Yang @ 2026-06-17 2:32 ` Lance Yang 2026-06-17 8:18 ` Wei Yang 0 siblings, 1 reply; 14+ messages in thread From: Lance Yang @ 2026-06-17 2:32 UTC (permalink / raw) To: richard.weiyang Cc: lance.yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >> >>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>[...] >>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>index 2ccbabfb2cc1..21635fab209c 100644 >>>--- a/mm/page_vma_mapped.c >>>+++ b/mm/page_vma_mapped.c >>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>> */ >>> pmde = pmdp_get_lockless(pvmw->pmd); >>> >>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>- pmde = *pvmw->pmd; >>>- if (!pmd_present(pmde)) { >>>- softleaf_t entry; >>>- >>>- if (!thp_migration_supported() || >>>- !(pvmw->flags & PVMW_MIGRATION)) >>>- return not_found(pvmw); >>>- entry = softleaf_from_pmd(pmde); >>>- >>>- if (!softleaf_is_migration(entry) || >>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>- return not_found(pvmw); >>>- return true; >>>- } >>>- if (likely(pmd_trans_huge(pmde))) { >>>- if (pvmw->flags & PVMW_MIGRATION) >>>- return not_found(pvmw); >>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>- return not_found(pvmw); >>>- return true; >>>- } >>>- /* THP pmd was split under us: handle on pte level */ >>>- spin_unlock(pvmw->ptl); >>>- pvmw->ptl = NULL; >>>- } else if (!pmd_present(pmde)) { >>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>- >>>- if (softleaf_is_device_private(entry)) { >>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>- return true; >>>- } >>>+ if (pmd_present(pmde)) { >>>+ if (!pmd_leaf(pmde)) >>>+ goto pte_table; >>>+ if (pvmw->flags & PVMW_MIGRATION) >>>+ return not_found(pvmw); >>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>+ return not_found(pvmw); >>>+ } else if (pmd_is_migration_entry(pmde)) { >>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>+ >>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>+ return not_found(pvmw); >> >>Looked at history a bit, and I wonder if this changed something old >>here ... >> >>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>including not_found() cases. So lockless PMD read was just a filter ... >> >>With this fix, true case gets final pmd_same() check, but this >>not_found() case happens before taking PTL. >> >>So a !PVMW_MIGRATION walker could race with someone, e.g. >>remove_migration_pmd(): we make the not_found() decision from old PMD >>value that still says "migration", while real *pvmw->pmd may already be >>present again. We return without ever taking PTL :) >> > >Hi, Lance > >Thanks for take a look. > >I am trying to understand the scenario you mentioned. Let's say A migrate a >pmd and B want to unmap the pmd. > > A B > > try to migrate a pmd > pmd is set to migration entry > unmap the pmd ... > managed to finish migration > ...still see migration entry, > so skipped and unmap fail > >Would this be a timing case? Even B grab the PTL, it still could see migration >entry if B visit pmd before A finish migration. > >Maybe I miss something, look forward your insight. Right, seeing migration entry while migration is still ongoing is fine. What I meant was this ordering: CPU 0: pmde = pmdp_get_lockless(...); /* migration */ CPU 1: remove_migration_pmd() restores PMD to present CPU 0: returns not_found() from old pmde, without ever taking PTL and rechecking *pvmw->pmd So issue is not seeing migration entry itself, but making final not_found() decision from stale lockless PMD value ... Before this patch, PMD migration case took PTL before making that decision ... >>Not sure about practical fallout, but should these PMD-level not_found() >>cases also take PTL and restart if PMD changed? >> > >-- >Wei Yang >Help you, Help me > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-17 2:32 ` Lance Yang @ 2026-06-17 8:18 ` Wei Yang 2026-06-19 2:30 ` Wei Yang 0 siblings, 1 reply; 14+ messages in thread From: Wei Yang @ 2026-06-17 8:18 UTC (permalink / raw) To: Lance Yang Cc: richard.weiyang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable On Wed, Jun 17, 2026 at 10:32:11AM +0800, Lance Yang wrote: > >On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >>On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >>> >>>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>>[...] >>>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>>index 2ccbabfb2cc1..21635fab209c 100644 >>>>--- a/mm/page_vma_mapped.c >>>>+++ b/mm/page_vma_mapped.c >>>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>>> */ >>>> pmde = pmdp_get_lockless(pvmw->pmd); >>>> >>>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>- pmde = *pvmw->pmd; >>>>- if (!pmd_present(pmde)) { >>>>- softleaf_t entry; >>>>- >>>>- if (!thp_migration_supported() || >>>>- !(pvmw->flags & PVMW_MIGRATION)) >>>>- return not_found(pvmw); >>>>- entry = softleaf_from_pmd(pmde); >>>>- >>>>- if (!softleaf_is_migration(entry) || >>>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>>- return not_found(pvmw); >>>>- return true; >>>>- } >>>>- if (likely(pmd_trans_huge(pmde))) { >>>>- if (pvmw->flags & PVMW_MIGRATION) >>>>- return not_found(pvmw); >>>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>- return not_found(pvmw); >>>>- return true; >>>>- } >>>>- /* THP pmd was split under us: handle on pte level */ >>>>- spin_unlock(pvmw->ptl); >>>>- pvmw->ptl = NULL; >>>>- } else if (!pmd_present(pmde)) { >>>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>>- >>>>- if (softleaf_is_device_private(entry)) { >>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>- return true; >>>>- } >>>>+ if (pmd_present(pmde)) { >>>>+ if (!pmd_leaf(pmde)) >>>>+ goto pte_table; >>>>+ if (pvmw->flags & PVMW_MIGRATION) >>>>+ return not_found(pvmw); >>>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>+ return not_found(pvmw); >>>>+ } else if (pmd_is_migration_entry(pmde)) { >>>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>>+ >>>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>>+ return not_found(pvmw); >>> >>>Looked at history a bit, and I wonder if this changed something old >>>here ... >>> >>>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>>including not_found() cases. So lockless PMD read was just a filter ... >>> >>>With this fix, true case gets final pmd_same() check, but this >>>not_found() case happens before taking PTL. >>> >>>So a !PVMW_MIGRATION walker could race with someone, e.g. >>>remove_migration_pmd(): we make the not_found() decision from old PMD >>>value that still says "migration", while real *pvmw->pmd may already be >>>present again. We return without ever taking PTL :) >>> >> >>Hi, Lance >> >>Thanks for take a look. >> >>I am trying to understand the scenario you mentioned. Let's say A migrate a >>pmd and B want to unmap the pmd. >> >> A B >> >> try to migrate a pmd >> pmd is set to migration entry >> unmap the pmd ... >> managed to finish migration >> ...still see migration entry, >> so skipped and unmap fail >> >>Would this be a timing case? Even B grab the PTL, it still could see migration >>entry if B visit pmd before A finish migration. >> >>Maybe I miss something, look forward your insight. > >Right, seeing migration entry while migration is still ongoing is fine. > >What I meant was this ordering: > > CPU 0: pmde = pmdp_get_lockless(...); /* migration */ > CPU 1: remove_migration_pmd() restores PMD to present > CPU 0: returns not_found() from old pmde, without ever taking PTL and > rechecking *pvmw->pmd > >So issue is not seeing migration entry itself, but making final >not_found() decision from stale lockless PMD value ... > >Before this patch, PMD migration case took PTL before making that >decision ... > Yes, this patch changes the decision making condition for pmd entry. Thanks for pointing out. Hmm... I took another look into current pte handling and find for pte entry, we did two phase check: * map_pte() without ptl * check_pte() with ptl While check_pte() do extra pfn range check, map_pte() doesn't. This means for pte entry, we may face the same situation as you describe: make the decision before grab PTL. Till now, it looks reasonable. But one thing jumped at me, PVMW_SYNC. When this flag is specified, all check is done under PTL. But now for pmd entry, we don't have a chance to do so. And as the comment says in try_to_migrate_one() /* * When racing against e.g. zap_pte_range() on another cpu, * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), * try_to_migrate() may return before folio_mapped() has become false, * if page table locking is skipped: use TTU_SYNC to wait for that. */ I tracked down to commit a98a2f0c8ce1 ('mm/rmap: split migration into its own function'), but not getting more detail on reasoning. Not fully understand it yet, but it seems there is some race between migration and unmap which is protected by PTL? Will look into this to get more detail. >>>Not sure about practical fallout, but should these PMD-level not_found() >>>cases also take PTL and restart if PMD changed? >>> >> >>-- >>Wei Yang >>Help you, Help me >> -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-17 8:18 ` Wei Yang @ 2026-06-19 2:30 ` Wei Yang 2026-06-19 12:19 ` Lance Yang 0 siblings, 1 reply; 14+ messages in thread From: Wei Yang @ 2026-06-19 2:30 UTC (permalink / raw) To: Wei Yang Cc: Lance Yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, lorenzo.stoakes, stable On Wed, Jun 17, 2026 at 08:18:15AM +0000, Wei Yang wrote: >On Wed, Jun 17, 2026 at 10:32:11AM +0800, Lance Yang wrote: >> >>On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >>>On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >>>> >>>>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>>>[...] >>>>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>>>index 2ccbabfb2cc1..21635fab209c 100644 >>>>>--- a/mm/page_vma_mapped.c >>>>>+++ b/mm/page_vma_mapped.c >>>>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>>>> */ >>>>> pmde = pmdp_get_lockless(pvmw->pmd); >>>>> >>>>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>- pmde = *pvmw->pmd; >>>>>- if (!pmd_present(pmde)) { >>>>>- softleaf_t entry; >>>>>- >>>>>- if (!thp_migration_supported() || >>>>>- !(pvmw->flags & PVMW_MIGRATION)) >>>>>- return not_found(pvmw); >>>>>- entry = softleaf_from_pmd(pmde); >>>>>- >>>>>- if (!softleaf_is_migration(entry) || >>>>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>>>- return not_found(pvmw); >>>>>- return true; >>>>>- } >>>>>- if (likely(pmd_trans_huge(pmde))) { >>>>>- if (pvmw->flags & PVMW_MIGRATION) >>>>>- return not_found(pvmw); >>>>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>- return not_found(pvmw); >>>>>- return true; >>>>>- } >>>>>- /* THP pmd was split under us: handle on pte level */ >>>>>- spin_unlock(pvmw->ptl); >>>>>- pvmw->ptl = NULL; >>>>>- } else if (!pmd_present(pmde)) { >>>>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>>>- >>>>>- if (softleaf_is_device_private(entry)) { >>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>- return true; >>>>>- } >>>>>+ if (pmd_present(pmde)) { >>>>>+ if (!pmd_leaf(pmde)) >>>>>+ goto pte_table; >>>>>+ if (pvmw->flags & PVMW_MIGRATION) >>>>>+ return not_found(pvmw); >>>>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>+ return not_found(pvmw); >>>>>+ } else if (pmd_is_migration_entry(pmde)) { >>>>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>>>+ >>>>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>>>+ return not_found(pvmw); >>>> >>>>Looked at history a bit, and I wonder if this changed something old >>>>here ... >>>> >>>>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>>>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>>>including not_found() cases. So lockless PMD read was just a filter ... >>>> >>>>With this fix, true case gets final pmd_same() check, but this >>>>not_found() case happens before taking PTL. >>>> >>>>So a !PVMW_MIGRATION walker could race with someone, e.g. >>>>remove_migration_pmd(): we make the not_found() decision from old PMD >>>>value that still says "migration", while real *pvmw->pmd may already be >>>>present again. We return without ever taking PTL :) >>>> >>> >>>Hi, Lance >>> >>>Thanks for take a look. >>> >>>I am trying to understand the scenario you mentioned. Let's say A migrate a >>>pmd and B want to unmap the pmd. >>> >>> A B >>> >>> try to migrate a pmd >>> pmd is set to migration entry >>> unmap the pmd ... >>> managed to finish migration >>> ...still see migration entry, >>> so skipped and unmap fail >>> >>>Would this be a timing case? Even B grab the PTL, it still could see migration >>>entry if B visit pmd before A finish migration. >>> >>>Maybe I miss something, look forward your insight. >> >>Right, seeing migration entry while migration is still ongoing is fine. >> >>What I meant was this ordering: >> >> CPU 0: pmde = pmdp_get_lockless(...); /* migration */ >> CPU 1: remove_migration_pmd() restores PMD to present >> CPU 0: returns not_found() from old pmde, without ever taking PTL and >> rechecking *pvmw->pmd >> >>So issue is not seeing migration entry itself, but making final >>not_found() decision from stale lockless PMD value ... >> >>Before this patch, PMD migration case took PTL before making that >>decision ... >> > >Yes, this patch changes the decision making condition for pmd entry. Thanks >for pointing out. > >Hmm... I took another look into current pte handling and find for pte entry, >we did two phase check: > > * map_pte() without ptl > * check_pte() with ptl > >While check_pte() do extra pfn range check, map_pte() doesn't. > >This means for pte entry, we may face the same situation as you describe: >make the decision before grab PTL. Till now, it looks reasonable. > >But one thing jumped at me, PVMW_SYNC. When this flag is specified, all check >is done under PTL. But now for pmd entry, we don't have a chance to do so. > >And as the comment says in try_to_migrate_one() > > /* > * When racing against e.g. zap_pte_range() on another cpu, > * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), > * try_to_migrate() may return before folio_mapped() has become false, > * if page table locking is skipped: use TTU_SYNC to wait for that. > */ > >I tracked down to commit a98a2f0c8ce1 ('mm/rmap: split migration into its own >function'), but not getting more detail on reasoning. Not fully understand it >yet, but it seems there is some race between migration and unmap which is >protected by PTL? > >Will look into this to get more detail. > After going through the history, I found this: commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 Author: Hugh Dickins <hughd@google.com> Date: Tue Jun 15 18:23:53 2021 -0700 mm/thp: try_to_unmap() use TTU_SYNC for safe splitting This one fix the race mentioned above: we expect mapcount is 0, but is not. IIUC, if we apply the change in this patch, the affected case is pmd_is_migration_entry(). In case someone else has cleared it but not update mapcount yet, try_to_migrate() would return before folio_mapped() is false. Thanks Lance for raise the question. If above analysis is true, I haven't got a neat way to take this into consideration. BTW, for a fix, I am thinking to keep it simple and direct. So how about leave the refactor as a followup cleanup? -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 2:30 ` Wei Yang @ 2026-06-19 12:19 ` Lance Yang 2026-06-20 2:02 ` Wei Yang 0 siblings, 1 reply; 14+ messages in thread From: Lance Yang @ 2026-06-19 12:19 UTC (permalink / raw) To: richard.weiyang Cc: lance.yang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 02:30:25AM +0000, Wei Yang wrote: >On Wed, Jun 17, 2026 at 08:18:15AM +0000, Wei Yang wrote: >>On Wed, Jun 17, 2026 at 10:32:11AM +0800, Lance Yang wrote: >>> >>>On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >>>>On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >>>>> >>>>>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>>>>[...] >>>>>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>>>>index 2ccbabfb2cc1..21635fab209c 100644 >>>>>>--- a/mm/page_vma_mapped.c >>>>>>+++ b/mm/page_vma_mapped.c >>>>>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>>>>> */ >>>>>> pmde = pmdp_get_lockless(pvmw->pmd); >>>>>> >>>>>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>>- pmde = *pvmw->pmd; >>>>>>- if (!pmd_present(pmde)) { >>>>>>- softleaf_t entry; >>>>>>- >>>>>>- if (!thp_migration_supported() || >>>>>>- !(pvmw->flags & PVMW_MIGRATION)) >>>>>>- return not_found(pvmw); >>>>>>- entry = softleaf_from_pmd(pmde); >>>>>>- >>>>>>- if (!softleaf_is_migration(entry) || >>>>>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>>>>- return not_found(pvmw); >>>>>>- return true; >>>>>>- } >>>>>>- if (likely(pmd_trans_huge(pmde))) { >>>>>>- if (pvmw->flags & PVMW_MIGRATION) >>>>>>- return not_found(pvmw); >>>>>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>>- return not_found(pvmw); >>>>>>- return true; >>>>>>- } >>>>>>- /* THP pmd was split under us: handle on pte level */ >>>>>>- spin_unlock(pvmw->ptl); >>>>>>- pvmw->ptl = NULL; >>>>>>- } else if (!pmd_present(pmde)) { >>>>>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>>>>- >>>>>>- if (softleaf_is_device_private(entry)) { >>>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>>- return true; >>>>>>- } >>>>>>+ if (pmd_present(pmde)) { >>>>>>+ if (!pmd_leaf(pmde)) >>>>>>+ goto pte_table; >>>>>>+ if (pvmw->flags & PVMW_MIGRATION) >>>>>>+ return not_found(pvmw); >>>>>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>>+ return not_found(pvmw); >>>>>>+ } else if (pmd_is_migration_entry(pmde)) { >>>>>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>>>>+ >>>>>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>>>>+ return not_found(pvmw); >>>>> >>>>>Looked at history a bit, and I wonder if this changed something old >>>>>here ... >>>>> >>>>>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>>>>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>>>>including not_found() cases. So lockless PMD read was just a filter ... >>>>> >>>>>With this fix, true case gets final pmd_same() check, but this >>>>>not_found() case happens before taking PTL. >>>>> >>>>>So a !PVMW_MIGRATION walker could race with someone, e.g. >>>>>remove_migration_pmd(): we make the not_found() decision from old PMD >>>>>value that still says "migration", while real *pvmw->pmd may already be >>>>>present again. We return without ever taking PTL :) >>>>> >>>> >>>>Hi, Lance >>>> >>>>Thanks for take a look. >>>> >>>>I am trying to understand the scenario you mentioned. Let's say A migrate a >>>>pmd and B want to unmap the pmd. >>>> >>>> A B >>>> >>>> try to migrate a pmd >>>> pmd is set to migration entry >>>> unmap the pmd ... >>>> managed to finish migration >>>> ...still see migration entry, >>>> so skipped and unmap fail >>>> >>>>Would this be a timing case? Even B grab the PTL, it still could see migration >>>>entry if B visit pmd before A finish migration. >>>> >>>>Maybe I miss something, look forward your insight. >>> >>>Right, seeing migration entry while migration is still ongoing is fine. >>> >>>What I meant was this ordering: >>> >>> CPU 0: pmde = pmdp_get_lockless(...); /* migration */ >>> CPU 1: remove_migration_pmd() restores PMD to present >>> CPU 0: returns not_found() from old pmde, without ever taking PTL and >>> rechecking *pvmw->pmd >>> >>>So issue is not seeing migration entry itself, but making final >>>not_found() decision from stale lockless PMD value ... >>> >>>Before this patch, PMD migration case took PTL before making that >>>decision ... >>> >> >>Yes, this patch changes the decision making condition for pmd entry. Thanks >>for pointing out. >> >>Hmm... I took another look into current pte handling and find for pte entry, >>we did two phase check: >> >> * map_pte() without ptl >> * check_pte() with ptl >> >>While check_pte() do extra pfn range check, map_pte() doesn't. >> >>This means for pte entry, we may face the same situation as you describe: >>make the decision before grab PTL. Till now, it looks reasonable. >> >>But one thing jumped at me, PVMW_SYNC. When this flag is specified, all check >>is done under PTL. But now for pmd entry, we don't have a chance to do so. >> >>And as the comment says in try_to_migrate_one() >> >> /* >> * When racing against e.g. zap_pte_range() on another cpu, >> * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), >> * try_to_migrate() may return before folio_mapped() has become false, >> * if page table locking is skipped: use TTU_SYNC to wait for that. >> */ >> >>I tracked down to commit a98a2f0c8ce1 ('mm/rmap: split migration into its own >>function'), but not getting more detail on reasoning. Not fully understand it >>yet, but it seems there is some race between migration and unmap which is >>protected by PTL? >> >>Will look into this to get more detail. >> > >After going through the history, I found this: > > commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 > Author: Hugh Dickins <hughd@google.com> > Date: Tue Jun 15 18:23:53 2021 -0700 > > mm/thp: try_to_unmap() use TTU_SYNC for safe splitting > >This one fix the race mentioned above: we expect mapcount is 0, but is not. Cool, thanks! I do want to spend more time on this refactor. It is touching some subtle page_vma_mapped_walk() rules, so I don't want to skim and guess ... One case I can pin down now is device-private: the PTE side gives us a clear rule to compare against :) On the PTE side: 1) PVMW_SYNC set, PVMW_MIGRATION set map_pte() uses pte_offset_map_lock(), so it takes PTL first. check_pte() then runs under PTL. Since PVMW_MIGRATION is set, check_pte() requires a migration entry, so device-private is rejected. 2) PVMW_SYNC set, PVMW_MIGRATION clear map_pte() takes PTL first. check_pte() then runs under PTL. Since PVMW_MIGRATION is clear, device-private can be a normal mapping, but check_pte() still checks entry type and PFN range. 3) PVMW_SYNC clear, PVMW_MIGRATION set map_pte() first does a lockless read. A non-present, non-none PTE can still be a candidate, so map_pte() takes PTL. check_pte() then rejects device-private, because PVMW_MIGRATION requires a migration entry. 4) PVMW_SYNC clear, PVMW_MIGRATION clear map_pte() first does a lockless read. A device-private PTE can be a normal mapping candidate, so map_pte() takes PTL. check_pte() then checks entry type and PFN range under PTL. On the PMD device-private side, before this patch, all four cases go through the same code once the lockless PMD read sees a device-private entry: - lockless read PMD into pmde - pmde is non-present - decode pmde as a softleaf entry - entry is device-private - take pmd_lock() - return true So compared with the PTE side: A) PVMW_SYNC set, PVMW_MIGRATION set PTE rejects device-private under PTL. PMD returns true. This does not match. The PMD code misses the PVMW_MIGRATION direction check, and does not reread/revalidate PMD under pmd_lock(). B) PVMW_SYNC set, PVMW_MIGRATION clear PTE can accept device-private, but only after locked check_pte() validation. PMD also returns true. The direction is OK, but the final check is missing. PMD returns true from the lockless PMD classification, without PMD revalidation and without check_pmd() PFN-range check. C) PVMW_SYNC clear, PVMW_MIGRATION set PTE can reach locked check_pte() from the lockless candidate, but check_pte() rejects device-private. PMD returns true. Same mismatch as case A: missing PVMW_MIGRATION direction check, and no locked PMD revalidation. D) PVMW_SYNC clear, PVMW_MIGRATION clear PTE can accept device-private after locked validation. PMD also returns true. Direction is OK here as well, but the PMD code still has no final locked check matching check_pte(): no PMD reread/revalidation, and no check_pmd() PFN-range check. > >IIUC, if we apply the change in this patch, the affected case is >pmd_is_migration_entry(). In case someone else has cleared it but not update >mapcount yet, try_to_migrate() would return before folio_mapped() is false. > >Thanks Lance for raise the question. > >If above analysis is true, I haven't got a neat way to take this into >consideration. > >BTW, for a fix, I am thinking to keep it simple and direct. So how about leave >the refactor as a followup cleanup? So for a fix, let's line up the PTE and PMD rules first :D Cheers, Lance >-- >Wei Yang >Help you, Help me > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 12:19 ` Lance Yang @ 2026-06-20 2:02 ` Wei Yang 0 siblings, 0 replies; 14+ messages in thread From: Wei Yang @ 2026-06-20 2:02 UTC (permalink / raw) To: Lance Yang Cc: richard.weiyang, akpm, david, ljs, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 08:19:09PM +0800, Lance Yang wrote: > >On Fri, Jun 19, 2026 at 02:30:25AM +0000, Wei Yang wrote: >>On Wed, Jun 17, 2026 at 08:18:15AM +0000, Wei Yang wrote: >>>On Wed, Jun 17, 2026 at 10:32:11AM +0800, Lance Yang wrote: >>>> >>>>On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote: >>>>>On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote: >>>>>> >>>>>>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>>>>>[...] >>>>>>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >>>>>>>index 2ccbabfb2cc1..21635fab209c 100644 >>>>>>>--- a/mm/page_vma_mapped.c >>>>>>>+++ b/mm/page_vma_mapped.c >>>>>>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >>>>>>> */ >>>>>>> pmde = pmdp_get_lockless(pvmw->pmd); >>>>>>> >>>>>>>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >>>>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>>>- pmde = *pvmw->pmd; >>>>>>>- if (!pmd_present(pmde)) { >>>>>>>- softleaf_t entry; >>>>>>>- >>>>>>>- if (!thp_migration_supported() || >>>>>>>- !(pvmw->flags & PVMW_MIGRATION)) >>>>>>>- return not_found(pvmw); >>>>>>>- entry = softleaf_from_pmd(pmde); >>>>>>>- >>>>>>>- if (!softleaf_is_migration(entry) || >>>>>>>- !check_pmd(softleaf_to_pfn(entry), pvmw)) >>>>>>>- return not_found(pvmw); >>>>>>>- return true; >>>>>>>- } >>>>>>>- if (likely(pmd_trans_huge(pmde))) { >>>>>>>- if (pvmw->flags & PVMW_MIGRATION) >>>>>>>- return not_found(pvmw); >>>>>>>- if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>>>- return not_found(pvmw); >>>>>>>- return true; >>>>>>>- } >>>>>>>- /* THP pmd was split under us: handle on pte level */ >>>>>>>- spin_unlock(pvmw->ptl); >>>>>>>- pvmw->ptl = NULL; >>>>>>>- } else if (!pmd_present(pmde)) { >>>>>>>- const softleaf_t entry = softleaf_from_pmd(pmde); >>>>>>>- >>>>>>>- if (softleaf_is_device_private(entry)) { >>>>>>>- pvmw->ptl = pmd_lock(mm, pvmw->pmd); >>>>>>>- return true; >>>>>>>- } >>>>>>>+ if (pmd_present(pmde)) { >>>>>>>+ if (!pmd_leaf(pmde)) >>>>>>>+ goto pte_table; >>>>>>>+ if (pvmw->flags & PVMW_MIGRATION) >>>>>>>+ return not_found(pvmw); >>>>>>>+ if (!check_pmd(pmd_pfn(pmde), pvmw)) >>>>>>>+ return not_found(pvmw); >>>>>>>+ } else if (pmd_is_migration_entry(pmde)) { >>>>>>>+ softleaf_t entry = softleaf_from_pmd(pmde); >>>>>>>+ >>>>>>>+ if (!(pvmw->flags & PVMW_MIGRATION)) >>>>>>>+ return not_found(pvmw); >>>>>> >>>>>>Looked at history a bit, and I wonder if this changed something old >>>>>>here ... >>>>>> >>>>>>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >>>>>>migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >>>>>>including not_found() cases. So lockless PMD read was just a filter ... >>>>>> >>>>>>With this fix, true case gets final pmd_same() check, but this >>>>>>not_found() case happens before taking PTL. >>>>>> >>>>>>So a !PVMW_MIGRATION walker could race with someone, e.g. >>>>>>remove_migration_pmd(): we make the not_found() decision from old PMD >>>>>>value that still says "migration", while real *pvmw->pmd may already be >>>>>>present again. We return without ever taking PTL :) >>>>>> >>>>> >>>>>Hi, Lance >>>>> >>>>>Thanks for take a look. >>>>> >>>>>I am trying to understand the scenario you mentioned. Let's say A migrate a >>>>>pmd and B want to unmap the pmd. >>>>> >>>>> A B >>>>> >>>>> try to migrate a pmd >>>>> pmd is set to migration entry >>>>> unmap the pmd ... >>>>> managed to finish migration >>>>> ...still see migration entry, >>>>> so skipped and unmap fail >>>>> >>>>>Would this be a timing case? Even B grab the PTL, it still could see migration >>>>>entry if B visit pmd before A finish migration. >>>>> >>>>>Maybe I miss something, look forward your insight. >>>> >>>>Right, seeing migration entry while migration is still ongoing is fine. >>>> >>>>What I meant was this ordering: >>>> >>>> CPU 0: pmde = pmdp_get_lockless(...); /* migration */ >>>> CPU 1: remove_migration_pmd() restores PMD to present >>>> CPU 0: returns not_found() from old pmde, without ever taking PTL and >>>> rechecking *pvmw->pmd >>>> >>>>So issue is not seeing migration entry itself, but making final >>>>not_found() decision from stale lockless PMD value ... >>>> >>>>Before this patch, PMD migration case took PTL before making that >>>>decision ... >>>> >>> >>>Yes, this patch changes the decision making condition for pmd entry. Thanks >>>for pointing out. >>> >>>Hmm... I took another look into current pte handling and find for pte entry, >>>we did two phase check: >>> >>> * map_pte() without ptl >>> * check_pte() with ptl >>> >>>While check_pte() do extra pfn range check, map_pte() doesn't. >>> >>>This means for pte entry, we may face the same situation as you describe: >>>make the decision before grab PTL. Till now, it looks reasonable. >>> >>>But one thing jumped at me, PVMW_SYNC. When this flag is specified, all check >>>is done under PTL. But now for pmd entry, we don't have a chance to do so. >>> >>>And as the comment says in try_to_migrate_one() >>> >>> /* >>> * When racing against e.g. zap_pte_range() on another cpu, >>> * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), >>> * try_to_migrate() may return before folio_mapped() has become false, >>> * if page table locking is skipped: use TTU_SYNC to wait for that. >>> */ >>> >>>I tracked down to commit a98a2f0c8ce1 ('mm/rmap: split migration into its own >>>function'), but not getting more detail on reasoning. Not fully understand it >>>yet, but it seems there is some race between migration and unmap which is >>>protected by PTL? >>> >>>Will look into this to get more detail. >>> >> >>After going through the history, I found this: >> >> commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 >> Author: Hugh Dickins <hughd@google.com> >> Date: Tue Jun 15 18:23:53 2021 -0700 >> >> mm/thp: try_to_unmap() use TTU_SYNC for safe splitting >> >>This one fix the race mentioned above: we expect mapcount is 0, but is not. > >Cool, thanks! > >I do want to spend more time on this refactor. It is touching some subtle >page_vma_mapped_walk() rules, so I don't want to skim and guess ... > >One case I can pin down now is device-private: the PTE side gives us a >clear rule to compare against :) > >On the PTE side: > >1) PVMW_SYNC set, PVMW_MIGRATION set > > map_pte() uses pte_offset_map_lock(), so it takes PTL first. > check_pte() then runs under PTL. Since PVMW_MIGRATION is set, > check_pte() requires a migration entry, so device-private is rejected. > >2) PVMW_SYNC set, PVMW_MIGRATION clear > > map_pte() takes PTL first. check_pte() then runs under PTL. > Since PVMW_MIGRATION is clear, device-private can be a normal mapping, > but check_pte() still checks entry type and PFN range. > >3) PVMW_SYNC clear, PVMW_MIGRATION set > > map_pte() first does a lockless read. A non-present, non-none PTE can > still be a candidate, so map_pte() takes PTL. check_pte() then rejects > device-private, because PVMW_MIGRATION requires a migration entry. > >4) PVMW_SYNC clear, PVMW_MIGRATION clear > > map_pte() first does a lockless read. A device-private PTE can be a > normal mapping candidate, so map_pte() takes PTL. check_pte() then > checks entry type and PFN range under PTL. > >On the PMD device-private side, before this patch, all four cases go >through the same code once the lockless PMD read sees a device-private >entry: > >- lockless read PMD into pmde >- pmde is non-present >- decode pmde as a softleaf entry >- entry is device-private >- take pmd_lock() >- return true > >So compared with the PTE side: > >A) PVMW_SYNC set, PVMW_MIGRATION set > > PTE rejects device-private under PTL. > > PMD returns true. > > This does not match. The PMD code misses the PVMW_MIGRATION direction > check, and does not reread/revalidate PMD under pmd_lock(). > >B) PVMW_SYNC set, PVMW_MIGRATION clear > > PTE can accept device-private, but only after locked check_pte() > validation. > > PMD also returns true. > > The direction is OK, but the final check is missing. PMD returns true > from the lockless PMD classification, without PMD revalidation and > without check_pmd() PFN-range check. > >C) PVMW_SYNC clear, PVMW_MIGRATION set > > PTE can reach locked check_pte() from the lockless candidate, but > check_pte() rejects device-private. > > PMD returns true. > > Same mismatch as case A: missing PVMW_MIGRATION direction check, and no > locked PMD revalidation. > >D) PVMW_SYNC clear, PVMW_MIGRATION clear > > PTE can accept device-private after locked validation. > > PMD also returns true. > > Direction is OK here as well, but the PMD code still has no final > locked check matching check_pte(): no PMD reread/revalidation, and no > check_pmd() PFN-range check. > Thanks for the detailed analysis. >> >>IIUC, if we apply the change in this patch, the affected case is >>pmd_is_migration_entry(). In case someone else has cleared it but not update >>mapcount yet, try_to_migrate() would return before folio_mapped() is false. >> >>Thanks Lance for raise the question. >> >>If above analysis is true, I haven't got a neat way to take this into >>consideration. >> >>BTW, for a fix, I am thinking to keep it simple and direct. So how about leave >>the refactor as a followup cleanup? > >So for a fix, let's line up the PTE and PMD rules first :D > Sure. Based on your above analysis, looks the change in v1 [1] is the right direction, IIUC. So I will send v3 based on this with comment adjust according to Lorenzo's comment. [1]: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/ >Cheers, Lance > >>-- >>Wei Yang >>Help you, Help me >> -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org>]
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd [not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org> @ 2026-06-17 2:11 ` Balbir Singh 2026-06-17 3:14 ` Lance Yang 0 siblings, 1 reply; 14+ messages in thread From: Balbir Singh @ 2026-06-17 2:11 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Lance Yang, richard.weiyang, akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm, lorenzo.stoakes, stable On Tue, Jun 16, 2026 at 03:07:53PM +0200, David Hildenbrand (Arm) wrote: > On 6/16/26 14:30, Lance Yang wrote: > > > > On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: > > [...] > >> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > >> index 2ccbabfb2cc1..21635fab209c 100644 > >> --- a/mm/page_vma_mapped.c > >> +++ b/mm/page_vma_mapped.c > >> @@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > >> */ > >> pmde = pmdp_get_lockless(pvmw->pmd); > >> > >> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { > >> - pvmw->ptl = pmd_lock(mm, pvmw->pmd); > >> - pmde = *pvmw->pmd; > >> - if (!pmd_present(pmde)) { > >> - softleaf_t entry; > >> - > >> - if (!thp_migration_supported() || > >> - !(pvmw->flags & PVMW_MIGRATION)) > >> - return not_found(pvmw); > >> - entry = softleaf_from_pmd(pmde); > >> - > >> - if (!softleaf_is_migration(entry) || > >> - !check_pmd(softleaf_to_pfn(entry), pvmw)) > >> - return not_found(pvmw); > >> - return true; > >> - } > >> - if (likely(pmd_trans_huge(pmde))) { > >> - if (pvmw->flags & PVMW_MIGRATION) > >> - return not_found(pvmw); > >> - if (!check_pmd(pmd_pfn(pmde), pvmw)) > >> - return not_found(pvmw); > >> - return true; > >> - } > >> - /* THP pmd was split under us: handle on pte level */ > >> - spin_unlock(pvmw->ptl); > >> - pvmw->ptl = NULL; > >> - } else if (!pmd_present(pmde)) { > >> - const softleaf_t entry = softleaf_from_pmd(pmde); > >> - > >> - if (softleaf_is_device_private(entry)) { > >> - pvmw->ptl = pmd_lock(mm, pvmw->pmd); > >> - return true; > >> - } > >> + if (pmd_present(pmde)) { > >> + if (!pmd_leaf(pmde)) > >> + goto pte_table; > >> + if (pvmw->flags & PVMW_MIGRATION) > >> + return not_found(pvmw); > >> + if (!check_pmd(pmd_pfn(pmde), pvmw)) > >> + return not_found(pvmw); > >> + } else if (pmd_is_migration_entry(pmde)) { > >> + softleaf_t entry = softleaf_from_pmd(pmde); > >> + > >> + if (!(pvmw->flags & PVMW_MIGRATION)) > >> + return not_found(pvmw); > > > > Looked at history a bit, and I wonder if this changed something old > > here ... > > > > Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD > > migration handling took PTL before doing PVMW_MIGRATION/PFN checks, > > including not_found() cases. So lockless PMD read was just a filter ... > > > > With this fix, true case gets final pmd_same() check, but this > > not_found() case happens before taking PTL. > > > > So a !PVMW_MIGRATION walker could race with someone, e.g. > > remove_migration_pmd(): we make the not_found() decision from old PMD > > value that still says "migration", while real *pvmw->pmd may already be > > present again. We return without ever taking PTL :) > > > > Not sure about practical fallout, but should these PMD-level not_found() > > cases also take PTL and restart if PMD changed? > I was hoping that we could so something similar to the PTE case. > > In map_pte(), we check whether the PMD changed, which is slightly different. > > The actual check happens in check_pte() after grabbing the PTL. > > For the case you describe, map_pte() would find !pte_none(ptent) ... > !pte_present(ptent) and !is_migration, and effectively grab the lock and proceed > to check_pte(). > > In check_pte() we re-check under lock indeed. > Thinking of the practical fallout, not finding the PMD for a non migration worker should be OK. Is there a case where it's not OK to report the old state. Balbir ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-17 2:11 ` Balbir Singh @ 2026-06-17 3:14 ` Lance Yang 0 siblings, 0 replies; 14+ messages in thread From: Lance Yang @ 2026-06-17 3:14 UTC (permalink / raw) To: balbirs Cc: david, lance.yang, richard.weiyang, akpm, ljs, riel, liam, vbabka, harry, jannh, ziy, sj, linux-mm, lorenzo.stoakes, stable On Wed, Jun 17, 2026 at 12:11:14PM +1000, Balbir Singh wrote: >On Tue, Jun 16, 2026 at 03:07:53PM +0200, David Hildenbrand (Arm) wrote: >> On 6/16/26 14:30, Lance Yang wrote: >> > >> > On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >> > [...] >> >> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >> >> index 2ccbabfb2cc1..21635fab209c 100644 >> >> --- a/mm/page_vma_mapped.c >> >> +++ b/mm/page_vma_mapped.c >> >> @@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >> >> */ >> >> pmde = pmdp_get_lockless(pvmw->pmd); >> >> >> >> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { >> >> - pvmw->ptl = pmd_lock(mm, pvmw->pmd); >> >> - pmde = *pvmw->pmd; >> >> - if (!pmd_present(pmde)) { >> >> - softleaf_t entry; >> >> - >> >> - if (!thp_migration_supported() || >> >> - !(pvmw->flags & PVMW_MIGRATION)) >> >> - return not_found(pvmw); >> >> - entry = softleaf_from_pmd(pmde); >> >> - >> >> - if (!softleaf_is_migration(entry) || >> >> - !check_pmd(softleaf_to_pfn(entry), pvmw)) >> >> - return not_found(pvmw); >> >> - return true; >> >> - } >> >> - if (likely(pmd_trans_huge(pmde))) { >> >> - if (pvmw->flags & PVMW_MIGRATION) >> >> - return not_found(pvmw); >> >> - if (!check_pmd(pmd_pfn(pmde), pvmw)) >> >> - return not_found(pvmw); >> >> - return true; >> >> - } >> >> - /* THP pmd was split under us: handle on pte level */ >> >> - spin_unlock(pvmw->ptl); >> >> - pvmw->ptl = NULL; >> >> - } else if (!pmd_present(pmde)) { >> >> - const softleaf_t entry = softleaf_from_pmd(pmde); >> >> - >> >> - if (softleaf_is_device_private(entry)) { >> >> - pvmw->ptl = pmd_lock(mm, pvmw->pmd); >> >> - return true; >> >> - } >> >> + if (pmd_present(pmde)) { >> >> + if (!pmd_leaf(pmde)) >> >> + goto pte_table; >> >> + if (pvmw->flags & PVMW_MIGRATION) >> >> + return not_found(pvmw); >> >> + if (!check_pmd(pmd_pfn(pmde), pvmw)) >> >> + return not_found(pvmw); >> >> + } else if (pmd_is_migration_entry(pmde)) { >> >> + softleaf_t entry = softleaf_from_pmd(pmde); >> >> + >> >> + if (!(pvmw->flags & PVMW_MIGRATION)) >> >> + return not_found(pvmw); >> > >> > Looked at history a bit, and I wonder if this changed something old >> > here ... >> > >> > Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD >> > migration handling took PTL before doing PVMW_MIGRATION/PFN checks, >> > including not_found() cases. So lockless PMD read was just a filter ... >> > >> > With this fix, true case gets final pmd_same() check, but this >> > not_found() case happens before taking PTL. >> > >> > So a !PVMW_MIGRATION walker could race with someone, e.g. >> > remove_migration_pmd(): we make the not_found() decision from old PMD >> > value that still says "migration", while real *pvmw->pmd may already be >> > present again. We return without ever taking PTL :) >> > >> > Not sure about practical fallout, but should these PMD-level not_found() >> > cases also take PTL and restart if PMD changed? >> I was hoping that we could so something similar to the PTE case. >> >> In map_pte(), we check whether the PMD changed, which is slightly different. >> >> The actual check happens in check_pte() after grabbing the PTL. >> >> For the case you describe, map_pte() would find !pte_none(ptent) ... >> !pte_present(ptent) and !is_migration, and effectively grab the lock and proceed >> to check_pte(). >> >> In check_pte() we re-check under lock indeed. >> > >Thinking of the practical fallout, not finding the PMD for a non >migration worker should be OK. Is there a case where it's not OK to >report the old state. I was thinking the lockless value should only be used as a first filter, to see whether entry looks worth checking. PTE side is roughly lockless filter + PTL/check_pte(). PMD true case now gets PTL/pmd_same(), but some PMD not_found() cases still come straight from lockless "pmde". That mismatch felt odd to me ... Cheers, Lance ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd [not found] <20260616063436.20455-1-richard.weiyang@gmail.com> 2026-06-16 12:30 ` [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Lance Yang @ 2026-06-19 10:44 ` Lorenzo Stoakes 2026-06-19 10:48 ` David Hildenbrand (Arm) 2026-06-20 2:11 ` Wei Yang 1 sibling, 2 replies; 14+ messages in thread From: Lorenzo Stoakes @ 2026-06-19 10:44 UTC (permalink / raw) To: Wei Yang Cc: akpm, david, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable -cc wrong email On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: > For pmd_trans_huge() and pmd_is_migration_entry(), we does following > before return the pmd entry: > > * re-validate pmd entry after PTL > * check PVMW_MIGRATION > * check_pmd() > * handle on pte level if split under us > > But for device-private pmd, we just return after pmd_lock(). > > This may return improper entry, e.g. if we are looking for a migration > entry, device-private entry could still be returned, which leads to data > corruption. I don't thik this is quite clear? How about: If a softleaf entry is present, the existing code simply acquires the PMD lock and returns success even if PVMW_MIGRATION is set (indicating a migration entry is sought), meaning that the caller can incorrectly interpret the entry as something it is not, causing data corruption. > > This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration > support device-private entries") by following the same pattern as > pmd_trans_huge() and pmd_is_migration_entry() for device private entry. > > While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). > > * Instead of handling trans huge/migration entry/device private entry > in a mixed manner, we put each case into its own if condition and > handle with the same pattern. > * Also we grab PTL and make sure pmd is not changed under us after > above check instead of do the check with PTL hold. > * restart the process if pmd is changed under us You're doing quite a bit for a fix and you're putting it all in one place. How about do the fix as 1 patch, and then cleanups as other ones? It helps with review too :) It's a general rule of thumb that if you do more than one of moving, refactoring or changing code, to do them as separate patches so a reviewer/somebody bisecting can clearly separate each. Also PLEASE do not add new functionality (this lock recheck) in a fixes patch. We'll end up backporting new logic that way. Make the fixes bit _minimal_. I think in general Andrew prefers separate fixes patches so I'd just make the _minimal_ change that fixes this for the backport, and the cleanup stuff as a separate series. > > Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") Hmm seems the device private stuff has had a rocky road of late! I wonder if we need some more test coverage on this? > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > Suggested-by: David Hildenbrand <david@kernel.org> > Cc: David Hildenbrand <david@kernel.org> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: SeongJae Park <sj@kernel.org> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Annoying nag: You sent to my correct email ljs@kernel.org (thanks!) but also cc'd the incorrect one, please only send to ljs@kernel.org thanks :) > Cc: <stable@vger.kernel.org> Be better to just have this with the Fixes tag, Andrew adds the Cc's from the actual cc- list anyway. > > --- > v2: > * specify the possible error case of current code and user visible effect > * besides fix, cleanup the pmd entry handling based on David's suggestion > > v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/ > --- > mm/page_vma_mapped.c | 63 +++++++++++++++++++++----------------------- > 1 file changed, 30 insertions(+), 33 deletions(-) > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > index 2ccbabfb2cc1..21635fab209c 100644 > --- a/mm/page_vma_mapped.c > +++ b/mm/page_vma_mapped.c > @@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > */ > pmde = pmdp_get_lockless(pvmw->pmd); > > - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) { > - pvmw->ptl = pmd_lock(mm, pvmw->pmd); > - pmde = *pvmw->pmd; > - if (!pmd_present(pmde)) { > - softleaf_t entry; > - > - if (!thp_migration_supported() || > - !(pvmw->flags & PVMW_MIGRATION)) > - return not_found(pvmw); > - entry = softleaf_from_pmd(pmde); > - > - if (!softleaf_is_migration(entry) || > - !check_pmd(softleaf_to_pfn(entry), pvmw)) > - return not_found(pvmw); > - return true; > - } > - if (likely(pmd_trans_huge(pmde))) { > - if (pvmw->flags & PVMW_MIGRATION) > - return not_found(pvmw); > - if (!check_pmd(pmd_pfn(pmde), pvmw)) > - return not_found(pvmw); > - return true; > - } > - /* THP pmd was split under us: handle on pte level */ Don't drop critical comments like this, that's very bad. > - spin_unlock(pvmw->ptl); > - pvmw->ptl = NULL; > - } else if (!pmd_present(pmde)) { > - const softleaf_t entry = softleaf_from_pmd(pmde); > - > - if (softleaf_is_device_private(entry)) { > - pvmw->ptl = pmd_lock(mm, pvmw->pmd); > - return true; > - } > + if (pmd_present(pmde)) { You're not checking pmd_trans_huge() at all now? Just assuming pmd_present() == pmd_trans_huge()? > + if (!pmd_leaf(pmde)) > + goto pte_table; OK now assuming pmd_leaf() == pmd_trans_huge()? You didn't mention this in the commit msg? Justificaiton please? > + if (pvmw->flags & PVMW_MIGRATION) > + return not_found(pvmw); > + if (!check_pmd(pmd_pfn(pmde), pvmw)) > + return not_found(pvmw); > + } else if (pmd_is_migration_entry(pmde)) { > + softleaf_t entry = softleaf_from_pmd(pmde); > + Err you dropped the thp_migration_supported() check? Why? > + if (!(pvmw->flags & PVMW_MIGRATION)) > + return not_found(pvmw); > + if (!check_pmd(softleaf_to_pfn(entry), pvmw)) > + return not_found(pvmw); > + } else if (pmd_is_device_private_entry(pmde)) { > + softleaf_t entry = softleaf_from_pmd(pmde); > > + if (pvmw->flags & PVMW_MIGRATION) > + return not_found(pvmw); > + if (!check_pmd(softleaf_to_pfn(entry), pvmw)) > + return not_found(pvmw); I mean it's less awful than what was there before, but as refactoring goes, putting a massive set of branches in the middle of a long function isn't really the best. Can we avoid this horrible goto by separating this out into a function? > + } else { Else means what exactly? A comment would be good. I feel like in an effort to keep all this at one level of indentation you've created a confusing else case. Sane thing would be to: a. have this as a separate function b. to do: if (pmd_none(pmde)) { ... return ...; } if (pmd_present(pmde)) { ... return ...; } softleaf = softleaf_from_pmd(pmde); /* softleaf stuff */ return ...; In this function. That way it's _very clear_ you're doing softleaf stuff. > if ((pvmw->flags & PVMW_SYNC) && > thp_vma_suitable_order(vma, pvmw->address, > PMD_ORDER) && Umm... previously this was done for all softleaf entries other than device-private, now not, and you haven't, why? > @@ -286,6 +274,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > step_forward(pvmw, PMD_SIZE); > continue; > } > + > + /* Double-check under PTL that the PMD didn't change. */ > + pvmw->ptl = pmd_lock(mm, pvmw->pmd); > + if (pmd_same(pmde, pmdp_get(pvmw->pmd))) > + return true; > + spin_unlock(pvmw->ptl); > + pvmw->ptl = NULL; > + goto restart; I'm not sure I really get the justification for this? seems you just added this arbitrarily? Again, you shouldn't be making changes like this in a fix patch. > +pte_table: > if (!map_pte(pvmw, &pmde, &ptl)) { > if (!pvmw->pte) > goto restart; > -- > 2.34.1 > This is really hard to review like this, as above please split this up properly. Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 10:44 ` Lorenzo Stoakes @ 2026-06-19 10:48 ` David Hildenbrand (Arm) 2026-06-19 11:04 ` Lorenzo Stoakes 2026-06-20 2:13 ` Wei Yang 2026-06-20 2:11 ` Wei Yang 1 sibling, 2 replies; 14+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-19 10:48 UTC (permalink / raw) To: Lorenzo Stoakes, Wei Yang Cc: akpm, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On 6/19/26 12:44, Lorenzo Stoakes wrote: > -cc wrong email > > On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >> For pmd_trans_huge() and pmd_is_migration_entry(), we does following >> before return the pmd entry: >> >> * re-validate pmd entry after PTL >> * check PVMW_MIGRATION >> * check_pmd() >> * handle on pte level if split under us >> >> But for device-private pmd, we just return after pmd_lock(). >> >> This may return improper entry, e.g. if we are looking for a migration >> entry, device-private entry could still be returned, which leads to data >> corruption. > > I don't thik this is quite clear? > > How about: > > If a softleaf entry is present, the existing code simply acquires the > PMD lock and returns success even if PVMW_MIGRATION is set (indicating a > migration entry is sought), meaning that the caller can incorrectly > interpret the entry as something it is not, causing data corruption. > >> >> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration >> support device-private entries") by following the same pattern as >> pmd_trans_huge() and pmd_is_migration_entry() for device private entry. >> >> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). >> >> * Instead of handling trans huge/migration entry/device private entry >> in a mixed manner, we put each case into its own if condition and >> handle with the same pattern. >> * Also we grab PTL and make sure pmd is not changed under us after >> above check instead of do the check with PTL hold. >> * restart the process if pmd is changed under us > > You're doing quite a bit for a fix and you're putting it all in one place. > > How about do the fix as 1 patch, and then cleanups as other ones? It helps with > review too :) > > It's a general rule of thumb that if you do more than one of moving, refactoring > or changing code, to do them as separate patches so a reviewer/somebody > bisecting can clearly separate each. > > Also PLEASE do not add new functionality (this lock recheck) in a fixes > patch. We'll end up backporting new logic that way. > > Make the fixes bit _minimal_. To be fair, I asked for this https://lore.kernel.org/all/2d48ef0d-1110-4a9d-adcb-f701a1ce2cfa@kernel.org/ But given that Wei mostly used my quick draft without properly checking the implications, yeah, let's fix it first separately. I can then follow up with a proper cleanup. > > I think in general Andrew prefers separate fixes patches so I'd just make the > _minimal_ change that fixes this for the backport, and the cleanup stuff as a > separate series. > The issue is that the existing handling is just crap, and to fix it, we're adding more crap. But yeah, let's add more crap first before we clean it up properly. -- Cheers, David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 10:48 ` David Hildenbrand (Arm) @ 2026-06-19 11:04 ` Lorenzo Stoakes 2026-06-20 2:13 ` Wei Yang 1 sibling, 0 replies; 14+ messages in thread From: Lorenzo Stoakes @ 2026-06-19 11:04 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Wei Yang, akpm, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 12:48:26PM +0200, David Hildenbrand (Arm) wrote: > On 6/19/26 12:44, Lorenzo Stoakes wrote: > > -cc wrong email > > > > On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: > >> For pmd_trans_huge() and pmd_is_migration_entry(), we does following > >> before return the pmd entry: > >> > >> * re-validate pmd entry after PTL > >> * check PVMW_MIGRATION > >> * check_pmd() > >> * handle on pte level if split under us > >> > >> But for device-private pmd, we just return after pmd_lock(). > >> > >> This may return improper entry, e.g. if we are looking for a migration > >> entry, device-private entry could still be returned, which leads to data > >> corruption. > > > > I don't thik this is quite clear? > > > > How about: > > > > If a softleaf entry is present, the existing code simply acquires the > > PMD lock and returns success even if PVMW_MIGRATION is set (indicating a > > migration entry is sought), meaning that the caller can incorrectly > > interpret the entry as something it is not, causing data corruption. > > > >> > >> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration > >> support device-private entries") by following the same pattern as > >> pmd_trans_huge() and pmd_is_migration_entry() for device private entry. > >> > >> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). > >> > >> * Instead of handling trans huge/migration entry/device private entry > >> in a mixed manner, we put each case into its own if condition and > >> handle with the same pattern. > >> * Also we grab PTL and make sure pmd is not changed under us after > >> above check instead of do the check with PTL hold. > >> * restart the process if pmd is changed under us > > > > You're doing quite a bit for a fix and you're putting it all in one place. > > > > How about do the fix as 1 patch, and then cleanups as other ones? It helps with > > review too :) > > > > It's a general rule of thumb that if you do more than one of moving, refactoring > > or changing code, to do them as separate patches so a reviewer/somebody > > bisecting can clearly separate each. > > > > Also PLEASE do not add new functionality (this lock recheck) in a fixes > > patch. We'll end up backporting new logic that way. > > > > Make the fixes bit _minimal_. > > To be fair, I asked for this > > https://lore.kernel.org/all/2d48ef0d-1110-4a9d-adcb-f701a1ce2cfa@kernel.org/ > > But given that Wei mostly used my quick draft without properly checking the > implications, yeah, let's fix it first separately. Ack yeah sorry I mean I agree that it needs cleanup just has to be done in the right way which clearly I think we agree on :) > > I can then follow up with a proper cleanup. Thanks! > > > > > I think in general Andrew prefers separate fixes patches so I'd just make the > > _minimal_ change that fixes this for the backport, and the cleanup stuff as a > > separate series. > > > > The issue is that the existing handling is just crap, and to fix it, we're > adding more crap. But yeah, let's add more crap first before we clean it up > properly. I couldn't agree more and to be clear - I hate how this is right now. But I think for the fix we have to wade in the crap first then clean it up afterwards... :) > > > -- > Cheers, > > David Cheers, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 10:48 ` David Hildenbrand (Arm) 2026-06-19 11:04 ` Lorenzo Stoakes @ 2026-06-20 2:13 ` Wei Yang 1 sibling, 0 replies; 14+ messages in thread From: Wei Yang @ 2026-06-20 2:13 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Lorenzo Stoakes, Wei Yang, akpm, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 12:48:26PM +0200, David Hildenbrand (Arm) wrote: >On 6/19/26 12:44, Lorenzo Stoakes wrote: >> -cc wrong email >> >> On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >>> For pmd_trans_huge() and pmd_is_migration_entry(), we does following >>> before return the pmd entry: >>> >>> * re-validate pmd entry after PTL >>> * check PVMW_MIGRATION >>> * check_pmd() >>> * handle on pte level if split under us >>> >>> But for device-private pmd, we just return after pmd_lock(). >>> >>> This may return improper entry, e.g. if we are looking for a migration >>> entry, device-private entry could still be returned, which leads to data >>> corruption. >> >> I don't thik this is quite clear? >> >> How about: >> >> If a softleaf entry is present, the existing code simply acquires the >> PMD lock and returns success even if PVMW_MIGRATION is set (indicating a >> migration entry is sought), meaning that the caller can incorrectly >> interpret the entry as something it is not, causing data corruption. >> >>> >>> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration >>> support device-private entries") by following the same pattern as >>> pmd_trans_huge() and pmd_is_migration_entry() for device private entry. >>> >>> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). >>> >>> * Instead of handling trans huge/migration entry/device private entry >>> in a mixed manner, we put each case into its own if condition and >>> handle with the same pattern. >>> * Also we grab PTL and make sure pmd is not changed under us after >>> above check instead of do the check with PTL hold. >>> * restart the process if pmd is changed under us >> >> You're doing quite a bit for a fix and you're putting it all in one place. >> >> How about do the fix as 1 patch, and then cleanups as other ones? It helps with >> review too :) >> >> It's a general rule of thumb that if you do more than one of moving, refactoring >> or changing code, to do them as separate patches so a reviewer/somebody >> bisecting can clearly separate each. >> >> Also PLEASE do not add new functionality (this lock recheck) in a fixes >> patch. We'll end up backporting new logic that way. >> >> Make the fixes bit _minimal_. > >To be fair, I asked for this > >https://lore.kernel.org/all/2d48ef0d-1110-4a9d-adcb-f701a1ce2cfa@kernel.org/ > >But given that Wei mostly used my quick draft without properly checking the >implications, yeah, let's fix it first separately. Sorry, if I misunderstand your point. > >I can then follow up with a proper cleanup. > I would like to do a followup cleanup for this, may I have this chance? -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd 2026-06-19 10:44 ` Lorenzo Stoakes 2026-06-19 10:48 ` David Hildenbrand (Arm) @ 2026-06-20 2:11 ` Wei Yang 1 sibling, 0 replies; 14+ messages in thread From: Wei Yang @ 2026-06-20 2:11 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Wei Yang, akpm, david, riel, liam, vbabka, harry, jannh, balbirs, ziy, sj, linux-mm, stable On Fri, Jun 19, 2026 at 11:44:13AM +0100, Lorenzo Stoakes wrote: >-cc wrong email > >On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote: >> For pmd_trans_huge() and pmd_is_migration_entry(), we does following >> before return the pmd entry: >> >> * re-validate pmd entry after PTL >> * check PVMW_MIGRATION >> * check_pmd() >> * handle on pte level if split under us >> >> But for device-private pmd, we just return after pmd_lock(). >> >> This may return improper entry, e.g. if we are looking for a migration >> entry, device-private entry could still be returned, which leads to data >> corruption. > >I don't thik this is quite clear? > >How about: > > If a softleaf entry is present, the existing code simply acquires the How about: If a softleaf entry is present, e.g. device-private pmd, > PMD lock and returns success even if PVMW_MIGRATION is set (indicating a > migration entry is sought), meaning that the caller can incorrectly > interpret the entry as something it is not, causing data corruption. > Looks better, thanks. >> >> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration >> support device-private entries") by following the same pattern as >> pmd_trans_huge() and pmd_is_migration_entry() for device private entry. >> >> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk(). >> >> * Instead of handling trans huge/migration entry/device private entry >> in a mixed manner, we put each case into its own if condition and >> handle with the same pattern. >> * Also we grab PTL and make sure pmd is not changed under us after >> above check instead of do the check with PTL hold. >> * restart the process if pmd is changed under us > >You're doing quite a bit for a fix and you're putting it all in one place. > >How about do the fix as 1 patch, and then cleanups as other ones? It helps with >review too :) > Got it. >It's a general rule of thumb that if you do more than one of moving, refactoring >or changing code, to do them as separate patches so a reviewer/somebody >bisecting can clearly separate each. > >Also PLEASE do not add new functionality (this lock recheck) in a fixes >patch. We'll end up backporting new logic that way. > >Make the fixes bit _minimal_. > >I think in general Andrew prefers separate fixes patches so I'd just make the >_minimal_ change that fixes this for the backport, and the cleanup stuff as a >separate series. > >> >> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries") > >Hmm seems the device private stuff has had a rocky road of late! > >I wonder if we need some more test coverage on this? > >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> Suggested-by: David Hildenbrand <david@kernel.org> >> Cc: David Hildenbrand <david@kernel.org> >> Cc: Balbir Singh <balbirs@nvidia.com> >> Cc: SeongJae Park <sj@kernel.org> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > >Annoying nag: You sent to my correct email ljs@kernel.org (thanks!) but also >cc'd the incorrect one, please only send to ljs@kernel.org thanks :) > Yeah, I pasted it from commit log of the fix commit. But get your kernel.org email from get_maintainer.pl... Will pay attention >> Cc: <stable@vger.kernel.org> > >Be better to just have this with the Fixes tag, Andrew adds the Cc's from the >actual cc- list anyway. > OK, will move it close to Fixes tag. -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-06-20 2:14 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260616063436.20455-1-richard.weiyang@gmail.com>
2026-06-16 12:30 ` [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Lance Yang
2026-06-16 23:50 ` Wei Yang
2026-06-17 2:32 ` Lance Yang
2026-06-17 8:18 ` Wei Yang
2026-06-19 2:30 ` Wei Yang
2026-06-19 12:19 ` Lance Yang
2026-06-20 2:02 ` Wei Yang
[not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org>
2026-06-17 2:11 ` Balbir Singh
2026-06-17 3:14 ` Lance Yang
2026-06-19 10:44 ` Lorenzo Stoakes
2026-06-19 10:48 ` David Hildenbrand (Arm)
2026-06-19 11:04 ` Lorenzo Stoakes
2026-06-20 2:13 ` Wei Yang
2026-06-20 2:11 ` Wei Yang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.