* Re: [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers
[not found] <20260602142537.198755-12-usama.arif@linux.dev>
@ 2026-06-12 6:45 ` Lance Yang
2026-06-12 15:05 ` Usama Arif
0 siblings, 1 reply; 3+ messages in thread
From: Lance Yang @ 2026-06-12 6:45 UTC (permalink / raw)
To: usama.arif
Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he,
willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas,
baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka,
lance.yang, linux-kernel, nphamcs, shikemeng, kernel-team,
linux-mm
+Cc linux-mm
Please Cc linux-mm next time. Pretty clearly MM work ...
On Tue, Jun 02, 2026 at 07:24:19AM -0700, Usama Arif wrote:
[...]
>diff --git a/mm/mincore.c b/mm/mincore.c
>index e5d13eea9234..3fee8a7b9d9d 100644
>--- a/mm/mincore.c
>+++ b/mm/mincore.c
>@@ -172,7 +172,19 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
> ptl = pmd_trans_huge_lock(pmd, vma);
> if (ptl) {
>- memset(vec, 1, nr);
>+ if (pmd_present(*pmd)) {
>+ memset(vec, 1, nr);
>+ } else {
>+ /*
>+ * Non-present PMD: migration, device-private, or PMD
>+ * swap entry. Route through mincore_swap() the same way
>+ * the PTE path does -- the swap entry covers all 512
>+ * slots, so the whole vec gets the same answer.
>+ */
>+ softleaf_t entry = softleaf_from_pmd(*pmd);
>+
>+ memset(vec, mincore_swap(entry, false), nr);
Looks buggy ...
That assumes one swap-cache lookup is enough for whole PMD-sized range.
I don't think that always holds ...
See do_huge_pmd_swap_page():
---8<---
folio = swap_cache_get_folio(swp_entry);
[...]
/*
* Folio should be PMD-sized; if not (e.g. split in swap cache),
* split the PMD swap entry and retry at PTE level.
*/
if (folio_nr_pages(folio) != HPAGE_PMD_NR) {
folio_unlock(folio);
folio_put(folio);
goto split_fallback;
}
---
it handles the case where swap_cache_get_folio() returns a folio that
is no longer PMD-sized. E.g. because it was split in the swap cache
while the PMD swap entry was installed. Then it split the PMD swap entry
and retries at PTE level :)
unuse_pmd_entry() has the same fallback. Can mincore hit that case?
Maybe the comment right above should say something like:
"
One lookup is enough for a PMD-sized swapcache folio. If the swapcache
was split, check the per-page swap slots.
"
Hopefully, I'm not missing something here :D
Cheers, Lance
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers
2026-06-12 6:45 ` [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers Lance Yang
@ 2026-06-12 15:05 ` Usama Arif
2026-06-12 15:21 ` Lance Yang
0 siblings, 1 reply; 3+ messages in thread
From: Usama Arif @ 2026-06-12 15:05 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he,
willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas,
baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka,
linux-kernel, nphamcs, shikemeng, kernel-team, linux-mm
On 12/06/2026 07:45, Lance Yang wrote:
> +Cc linux-mm
>
> Please Cc linux-mm next time. Pretty clearly MM work ...
Yes, thanks for this! I forgot, will be careful in v3.
>
> On Tue, Jun 02, 2026 at 07:24:19AM -0700, Usama Arif wrote:
> [...]
>> diff --git a/mm/mincore.c b/mm/mincore.c
>> index e5d13eea9234..3fee8a7b9d9d 100644
>> --- a/mm/mincore.c
>> +++ b/mm/mincore.c
>> @@ -172,7 +172,19 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>
>> ptl = pmd_trans_huge_lock(pmd, vma);
>> if (ptl) {
>> - memset(vec, 1, nr);
>> + if (pmd_present(*pmd)) {
>> + memset(vec, 1, nr);
>> + } else {
>> + /*
>> + * Non-present PMD: migration, device-private, or PMD
>> + * swap entry. Route through mincore_swap() the same way
>> + * the PTE path does -- the swap entry covers all 512
>> + * slots, so the whole vec gets the same answer.
>> + */
>> + softleaf_t entry = softleaf_from_pmd(*pmd);
>> +
>> + memset(vec, mincore_swap(entry, false), nr);
>
> Looks buggy ...
>
> That assumes one swap-cache lookup is enough for whole PMD-sized range.
> I don't think that always holds ...
>
> See do_huge_pmd_swap_page():
>
> ---8<---
> folio = swap_cache_get_folio(swp_entry);
> [...]
> /*
> * Folio should be PMD-sized; if not (e.g. split in swap cache),
> * split the PMD swap entry and retry at PTE level.
> */
> if (folio_nr_pages(folio) != HPAGE_PMD_NR) {
> folio_unlock(folio);
> folio_put(folio);
> goto split_fallback;
> }
> ---
>
> it handles the case where swap_cache_get_folio() returns a folio that
> is no longer PMD-sized. E.g. because it was split in the swap cache
> while the PMD swap entry was installed. Then it split the PMD swap entry
> and retries at PTE level :)
>
> unuse_pmd_entry() has the same fallback. Can mincore hit that case?
>
> Maybe the comment right above should say something like:
>
> "
> One lookup is enough for a PMD-sized swapcache folio. If the swapcache
> was split, check the per-page swap slots.
> "
>
> Hopefully, I'm not missing something here :D
>
> Cheers, Lance
Good catch! Thanks for pointing this out.
I think the below diff over this commit should be ok. I will add
it to the next revision. Its slower, but it shouldn't be an issue
as its just mincore:
diff --git a/mm/mincore.c b/mm/mincore.c
index 3fee8a7b9d9d..975513fff336 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -175,15 +175,42 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
if (pmd_present(*pmd)) {
memset(vec, 1, nr);
} else {
- /*
- * Non-present PMD: migration, device-private, or PMD
- * swap entry. Route through mincore_swap() the same way
- * the PTE path does -- the swap entry covers all 512
- * slots, so the whole vec gets the same answer.
- */
softleaf_t entry = softleaf_from_pmd(*pmd);
- memset(vec, mincore_swap(entry, false), nr);
+ /*
+ * Non-present PMD: migration, device-private, or
+ * PMD swap entry. Migration / device-private cover
+ * the whole PMD range with a single answer.
+ */
+ if (!softleaf_is_swap(entry)) {
+ memset(vec, mincore_swap(entry, false), nr);
+ } else {
+ struct folio *folio = swap_cache_get_folio(entry);
+
+ /*
+ * One lookup is enough for a PMD-sized
+ * swapcache folio. If the swapcache was split
+ * (e.g. by deferred_split_scan() or
+ * memory_failure()) while the PMD swap entry
+ * was installed, check the per-page swap slots.
+ */
+ if (folio && folio_nr_pages(folio) == HPAGE_PMD_NR) {
+ memset(vec, folio_test_uptodate(folio), nr);
+ folio_put(folio);
+ } else {
+ unsigned long haddr = addr & HPAGE_PMD_MASK;
+ pgoff_t off = swp_offset(entry) +
+ ((addr - haddr) >> PAGE_SHIFT);
+
+ if (folio)
+ folio_put(folio);
+ for (i = 0; i < nr; i++)
+ vec[i] = mincore_swap(
+ swp_entry(swp_type(entry),
+ off + i),
+ false);
+ }
+ }
}
spin_unlock(ptl);
goto out;
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers
2026-06-12 15:05 ` Usama Arif
@ 2026-06-12 15:21 ` Lance Yang
0 siblings, 0 replies; 3+ messages in thread
From: Lance Yang @ 2026-06-12 15:21 UTC (permalink / raw)
To: Usama Arif
Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he,
willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas,
baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka,
linux-kernel, nphamcs, shikemeng, kernel-team, linux-mm
On 2026/6/12 23:05, Usama Arif wrote:
>
>
> On 12/06/2026 07:45, Lance Yang wrote:
>> +Cc linux-mm
>>
>> Please Cc linux-mm next time. Pretty clearly MM work ...
>
> Yes, thanks for this! I forgot, will be careful in v3.
Cool.
>>
>> On Tue, Jun 02, 2026 at 07:24:19AM -0700, Usama Arif wrote:
>> [...]
>>> diff --git a/mm/mincore.c b/mm/mincore.c
>>> index e5d13eea9234..3fee8a7b9d9d 100644
>>> --- a/mm/mincore.c
>>> +++ b/mm/mincore.c
>>> @@ -172,7 +172,19 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>>
>>> ptl = pmd_trans_huge_lock(pmd, vma);
>>> if (ptl) {
>>> - memset(vec, 1, nr);
>>> + if (pmd_present(*pmd)) {
>>> + memset(vec, 1, nr);
>>> + } else {
>>> + /*
>>> + * Non-present PMD: migration, device-private, or PMD
>>> + * swap entry. Route through mincore_swap() the same way
>>> + * the PTE path does -- the swap entry covers all 512
>>> + * slots, so the whole vec gets the same answer.
>>> + */
>>> + softleaf_t entry = softleaf_from_pmd(*pmd);
>>> +
>>> + memset(vec, mincore_swap(entry, false), nr);
>>
>> Looks buggy ...
>>
>> That assumes one swap-cache lookup is enough for whole PMD-sized range.
>> I don't think that always holds ...
>>
>> See do_huge_pmd_swap_page():
>>
>> ---8<---
>> folio = swap_cache_get_folio(swp_entry);
>> [...]
>> /*
>> * Folio should be PMD-sized; if not (e.g. split in swap cache),
>> * split the PMD swap entry and retry at PTE level.
>> */
>> if (folio_nr_pages(folio) != HPAGE_PMD_NR) {
>> folio_unlock(folio);
>> folio_put(folio);
>> goto split_fallback;
>> }
>> ---
>>
>> it handles the case where swap_cache_get_folio() returns a folio that
>> is no longer PMD-sized. E.g. because it was split in the swap cache
>> while the PMD swap entry was installed. Then it split the PMD swap entry
>> and retries at PTE level :)
>>
>> unuse_pmd_entry() has the same fallback. Can mincore hit that case?
>>
>> Maybe the comment right above should say something like:
>>
>> "
>> One lookup is enough for a PMD-sized swapcache folio. If the swapcache
>> was split, check the per-page swap slots.
>> "
>>
>> Hopefully, I'm not missing something here :D
>>
>> Cheers, Lance
>
> Good catch! Thanks for pointing this out.
>
> I think the below diff over this commit should be ok. I will add
> it to the next revision. Its slower, but it shouldn't be an issue
> as its just mincore:
Just skimmed it. That should do the trick. Will go through it
properly in v3 :)
Thanks, Lance
>
> diff --git a/mm/mincore.c b/mm/mincore.c
> index 3fee8a7b9d9d..975513fff336 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -175,15 +175,42 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> if (pmd_present(*pmd)) {
> memset(vec, 1, nr);
> } else {
> - /*
> - * Non-present PMD: migration, device-private, or PMD
> - * swap entry. Route through mincore_swap() the same way
> - * the PTE path does -- the swap entry covers all 512
> - * slots, so the whole vec gets the same answer.
> - */
> softleaf_t entry = softleaf_from_pmd(*pmd);
>
> - memset(vec, mincore_swap(entry, false), nr);
> + /*
> + * Non-present PMD: migration, device-private, or
> + * PMD swap entry. Migration / device-private cover
> + * the whole PMD range with a single answer.
> + */
> + if (!softleaf_is_swap(entry)) {
> + memset(vec, mincore_swap(entry, false), nr);
> + } else {
> + struct folio *folio = swap_cache_get_folio(entry);
> +
> + /*
> + * One lookup is enough for a PMD-sized
> + * swapcache folio. If the swapcache was split
> + * (e.g. by deferred_split_scan() or
> + * memory_failure()) while the PMD swap entry
> + * was installed, check the per-page swap slots.
> + */
> + if (folio && folio_nr_pages(folio) == HPAGE_PMD_NR) {
> + memset(vec, folio_test_uptodate(folio), nr);
> + folio_put(folio);
> + } else {
> + unsigned long haddr = addr & HPAGE_PMD_MASK;
> + pgoff_t off = swp_offset(entry) +
> + ((addr - haddr) >> PAGE_SHIFT);
> +
> + if (folio)
> + folio_put(folio);
> + for (i = 0; i < nr; i++)
> + vec[i] = mincore_swap(
> + swp_entry(swp_type(entry),
> + off + i),
> + false);
> + }
> + }
> }
> spin_unlock(ptl);
> goto out;
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-12 15:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260602142537.198755-12-usama.arif@linux.dev>
2026-06-12 6:45 ` [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers Lance Yang
2026-06-12 15:05 ` Usama Arif
2026-06-12 15:21 ` Lance Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox