* Re: [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers [not found] <20260602142537.198755-12-usama.arif@linux.dev> @ 2026-06-12 6:45 ` Lance Yang 2026-06-12 15:05 ` Usama Arif 0 siblings, 1 reply; 3+ messages in thread From: Lance Yang @ 2026-06-12 6:45 UTC (permalink / raw) To: usama.arif Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he, willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas, baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka, lance.yang, linux-kernel, nphamcs, shikemeng, kernel-team, linux-mm +Cc linux-mm Please Cc linux-mm next time. Pretty clearly MM work ... On Tue, Jun 02, 2026 at 07:24:19AM -0700, Usama Arif wrote: [...] >diff --git a/mm/mincore.c b/mm/mincore.c >index e5d13eea9234..3fee8a7b9d9d 100644 >--- a/mm/mincore.c >+++ b/mm/mincore.c >@@ -172,7 +172,19 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, > > ptl = pmd_trans_huge_lock(pmd, vma); > if (ptl) { >- memset(vec, 1, nr); >+ if (pmd_present(*pmd)) { >+ memset(vec, 1, nr); >+ } else { >+ /* >+ * Non-present PMD: migration, device-private, or PMD >+ * swap entry. Route through mincore_swap() the same way >+ * the PTE path does -- the swap entry covers all 512 >+ * slots, so the whole vec gets the same answer. >+ */ >+ softleaf_t entry = softleaf_from_pmd(*pmd); >+ >+ memset(vec, mincore_swap(entry, false), nr); Looks buggy ... That assumes one swap-cache lookup is enough for whole PMD-sized range. I don't think that always holds ... See do_huge_pmd_swap_page(): ---8<--- folio = swap_cache_get_folio(swp_entry); [...] /* * Folio should be PMD-sized; if not (e.g. split in swap cache), * split the PMD swap entry and retry at PTE level. */ if (folio_nr_pages(folio) != HPAGE_PMD_NR) { folio_unlock(folio); folio_put(folio); goto split_fallback; } --- it handles the case where swap_cache_get_folio() returns a folio that is no longer PMD-sized. E.g. because it was split in the swap cache while the PMD swap entry was installed. Then it split the PMD swap entry and retries at PTE level :) unuse_pmd_entry() has the same fallback. Can mincore hit that case? Maybe the comment right above should say something like: " One lookup is enough for a PMD-sized swapcache folio. If the swapcache was split, check the per-page swap slots. " Hopefully, I'm not missing something here :D Cheers, Lance ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers 2026-06-12 6:45 ` [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers Lance Yang @ 2026-06-12 15:05 ` Usama Arif 2026-06-12 15:21 ` Lance Yang 0 siblings, 1 reply; 3+ messages in thread From: Usama Arif @ 2026-06-12 15:05 UTC (permalink / raw) To: Lance Yang Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he, willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas, baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka, linux-kernel, nphamcs, shikemeng, kernel-team, linux-mm On 12/06/2026 07:45, Lance Yang wrote: > +Cc linux-mm > > Please Cc linux-mm next time. Pretty clearly MM work ... Yes, thanks for this! I forgot, will be careful in v3. > > On Tue, Jun 02, 2026 at 07:24:19AM -0700, Usama Arif wrote: > [...] >> diff --git a/mm/mincore.c b/mm/mincore.c >> index e5d13eea9234..3fee8a7b9d9d 100644 >> --- a/mm/mincore.c >> +++ b/mm/mincore.c >> @@ -172,7 +172,19 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, >> >> ptl = pmd_trans_huge_lock(pmd, vma); >> if (ptl) { >> - memset(vec, 1, nr); >> + if (pmd_present(*pmd)) { >> + memset(vec, 1, nr); >> + } else { >> + /* >> + * Non-present PMD: migration, device-private, or PMD >> + * swap entry. Route through mincore_swap() the same way >> + * the PTE path does -- the swap entry covers all 512 >> + * slots, so the whole vec gets the same answer. >> + */ >> + softleaf_t entry = softleaf_from_pmd(*pmd); >> + >> + memset(vec, mincore_swap(entry, false), nr); > > Looks buggy ... > > That assumes one swap-cache lookup is enough for whole PMD-sized range. > I don't think that always holds ... > > See do_huge_pmd_swap_page(): > > ---8<--- > folio = swap_cache_get_folio(swp_entry); > [...] > /* > * Folio should be PMD-sized; if not (e.g. split in swap cache), > * split the PMD swap entry and retry at PTE level. > */ > if (folio_nr_pages(folio) != HPAGE_PMD_NR) { > folio_unlock(folio); > folio_put(folio); > goto split_fallback; > } > --- > > it handles the case where swap_cache_get_folio() returns a folio that > is no longer PMD-sized. E.g. because it was split in the swap cache > while the PMD swap entry was installed. Then it split the PMD swap entry > and retries at PTE level :) > > unuse_pmd_entry() has the same fallback. Can mincore hit that case? > > Maybe the comment right above should say something like: > > " > One lookup is enough for a PMD-sized swapcache folio. If the swapcache > was split, check the per-page swap slots. > " > > Hopefully, I'm not missing something here :D > > Cheers, Lance Good catch! Thanks for pointing this out. I think the below diff over this commit should be ok. I will add it to the next revision. Its slower, but it shouldn't be an issue as its just mincore: diff --git a/mm/mincore.c b/mm/mincore.c index 3fee8a7b9d9d..975513fff336 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -175,15 +175,42 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, if (pmd_present(*pmd)) { memset(vec, 1, nr); } else { - /* - * Non-present PMD: migration, device-private, or PMD - * swap entry. Route through mincore_swap() the same way - * the PTE path does -- the swap entry covers all 512 - * slots, so the whole vec gets the same answer. - */ softleaf_t entry = softleaf_from_pmd(*pmd); - memset(vec, mincore_swap(entry, false), nr); + /* + * Non-present PMD: migration, device-private, or + * PMD swap entry. Migration / device-private cover + * the whole PMD range with a single answer. + */ + if (!softleaf_is_swap(entry)) { + memset(vec, mincore_swap(entry, false), nr); + } else { + struct folio *folio = swap_cache_get_folio(entry); + + /* + * One lookup is enough for a PMD-sized + * swapcache folio. If the swapcache was split + * (e.g. by deferred_split_scan() or + * memory_failure()) while the PMD swap entry + * was installed, check the per-page swap slots. + */ + if (folio && folio_nr_pages(folio) == HPAGE_PMD_NR) { + memset(vec, folio_test_uptodate(folio), nr); + folio_put(folio); + } else { + unsigned long haddr = addr & HPAGE_PMD_MASK; + pgoff_t off = swp_offset(entry) + + ((addr - haddr) >> PAGE_SHIFT); + + if (folio) + folio_put(folio); + for (i = 0; i < nr; i++) + vec[i] = mincore_swap( + swp_entry(swp_type(entry), + off + i), + false); + } + } } spin_unlock(ptl); goto out; ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers 2026-06-12 15:05 ` Usama Arif @ 2026-06-12 15:21 ` Lance Yang 0 siblings, 0 replies; 3+ messages in thread From: Lance Yang @ 2026-06-12 15:21 UTC (permalink / raw) To: Usama Arif Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he, willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas, baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka, linux-kernel, nphamcs, shikemeng, kernel-team, linux-mm On 2026/6/12 23:05, Usama Arif wrote: > > > On 12/06/2026 07:45, Lance Yang wrote: >> +Cc linux-mm >> >> Please Cc linux-mm next time. Pretty clearly MM work ... > > Yes, thanks for this! I forgot, will be careful in v3. Cool. >> >> On Tue, Jun 02, 2026 at 07:24:19AM -0700, Usama Arif wrote: >> [...] >>> diff --git a/mm/mincore.c b/mm/mincore.c >>> index e5d13eea9234..3fee8a7b9d9d 100644 >>> --- a/mm/mincore.c >>> +++ b/mm/mincore.c >>> @@ -172,7 +172,19 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, >>> >>> ptl = pmd_trans_huge_lock(pmd, vma); >>> if (ptl) { >>> - memset(vec, 1, nr); >>> + if (pmd_present(*pmd)) { >>> + memset(vec, 1, nr); >>> + } else { >>> + /* >>> + * Non-present PMD: migration, device-private, or PMD >>> + * swap entry. Route through mincore_swap() the same way >>> + * the PTE path does -- the swap entry covers all 512 >>> + * slots, so the whole vec gets the same answer. >>> + */ >>> + softleaf_t entry = softleaf_from_pmd(*pmd); >>> + >>> + memset(vec, mincore_swap(entry, false), nr); >> >> Looks buggy ... >> >> That assumes one swap-cache lookup is enough for whole PMD-sized range. >> I don't think that always holds ... >> >> See do_huge_pmd_swap_page(): >> >> ---8<--- >> folio = swap_cache_get_folio(swp_entry); >> [...] >> /* >> * Folio should be PMD-sized; if not (e.g. split in swap cache), >> * split the PMD swap entry and retry at PTE level. >> */ >> if (folio_nr_pages(folio) != HPAGE_PMD_NR) { >> folio_unlock(folio); >> folio_put(folio); >> goto split_fallback; >> } >> --- >> >> it handles the case where swap_cache_get_folio() returns a folio that >> is no longer PMD-sized. E.g. because it was split in the swap cache >> while the PMD swap entry was installed. Then it split the PMD swap entry >> and retries at PTE level :) >> >> unuse_pmd_entry() has the same fallback. Can mincore hit that case? >> >> Maybe the comment right above should say something like: >> >> " >> One lookup is enough for a PMD-sized swapcache folio. If the swapcache >> was split, check the per-page swap slots. >> " >> >> Hopefully, I'm not missing something here :D >> >> Cheers, Lance > > Good catch! Thanks for pointing this out. > > I think the below diff over this commit should be ok. I will add > it to the next revision. Its slower, but it shouldn't be an issue > as its just mincore: Just skimmed it. That should do the trick. Will go through it properly in v3 :) Thanks, Lance > > diff --git a/mm/mincore.c b/mm/mincore.c > index 3fee8a7b9d9d..975513fff336 100644 > --- a/mm/mincore.c > +++ b/mm/mincore.c > @@ -175,15 +175,42 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, > if (pmd_present(*pmd)) { > memset(vec, 1, nr); > } else { > - /* > - * Non-present PMD: migration, device-private, or PMD > - * swap entry. Route through mincore_swap() the same way > - * the PTE path does -- the swap entry covers all 512 > - * slots, so the whole vec gets the same answer. > - */ > softleaf_t entry = softleaf_from_pmd(*pmd); > > - memset(vec, mincore_swap(entry, false), nr); > + /* > + * Non-present PMD: migration, device-private, or > + * PMD swap entry. Migration / device-private cover > + * the whole PMD range with a single answer. > + */ > + if (!softleaf_is_swap(entry)) { > + memset(vec, mincore_swap(entry, false), nr); > + } else { > + struct folio *folio = swap_cache_get_folio(entry); > + > + /* > + * One lookup is enough for a PMD-sized > + * swapcache folio. If the swapcache was split > + * (e.g. by deferred_split_scan() or > + * memory_failure()) while the PMD swap entry > + * was installed, check the per-page swap slots. > + */ > + if (folio && folio_nr_pages(folio) == HPAGE_PMD_NR) { > + memset(vec, folio_test_uptodate(folio), nr); > + folio_put(folio); > + } else { > + unsigned long haddr = addr & HPAGE_PMD_MASK; > + pgoff_t off = swp_offset(entry) + > + ((addr - haddr) >> PAGE_SHIFT); > + > + if (folio) > + folio_put(folio); > + for (i = 0; i < nr; i++) > + vec[i] = mincore_swap( > + swp_entry(swp_type(entry), > + off + i), > + false); > + } > + } > } > spin_unlock(ptl); > goto out; > ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-12 15:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260602142537.198755-12-usama.arif@linux.dev>
2026-06-12 6:45 ` [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers Lance Yang
2026-06-12 15:05 ` Usama Arif
2026-06-12 15:21 ` Lance Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox