All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: Usama Arif <usama.arif@linux.dev>
Cc: akpm@linux-foundation.org, david@kernel.org, chrisl@kernel.org,
	kasong@tencent.com, ljs@kernel.org, ziy@nvidia.com,
	ying.huang@linux.alibaba.com, baoquan.he@linux.dev,
	willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org,
	riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr,
	kas@kernel.org, baohua@kernel.org, dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	liam@infradead.org, ryan.roberts@arm.com, vbabka@kernel.org,
	linux-kernel@vger.kernel.org, nphamcs@gmail.com,
	shikemeng@huaweicloud.com, kernel-team@meta.com,
	linux-mm@kvack.org
Subject: Re: [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers
Date: Fri, 12 Jun 2026 23:21:22 +0800	[thread overview]
Message-ID: <c25fba1a-97fa-4d06-a129-2d0949eac95b@linux.dev> (raw)
In-Reply-To: <e99fcf49-c01b-45a1-aecd-c4406ffdf5dc@linux.dev>



On 2026/6/12 23:05, Usama Arif wrote:
> 
> 
> On 12/06/2026 07:45, Lance Yang wrote:
>> +Cc linux-mm
>>
>> Please Cc linux-mm next time. Pretty clearly MM work ...
> 
> Yes, thanks for this! I forgot, will be careful in v3.

Cool.

>>
>> On Tue, Jun 02, 2026 at 07:24:19AM -0700, Usama Arif wrote:
>> [...]
>>> diff --git a/mm/mincore.c b/mm/mincore.c
>>> index e5d13eea9234..3fee8a7b9d9d 100644
>>> --- a/mm/mincore.c
>>> +++ b/mm/mincore.c
>>> @@ -172,7 +172,19 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>>
>>> 	ptl = pmd_trans_huge_lock(pmd, vma);
>>> 	if (ptl) {
>>> -		memset(vec, 1, nr);
>>> +		if (pmd_present(*pmd)) {
>>> +			memset(vec, 1, nr);
>>> +		} else {
>>> +			/*
>>> +			 * Non-present PMD: migration, device-private, or PMD
>>> +			 * swap entry. Route through mincore_swap() the same way
>>> +			 * the PTE path does -- the swap entry covers all 512
>>> +			 * slots, so the whole vec gets the same answer.
>>> +			 */
>>> +			softleaf_t entry = softleaf_from_pmd(*pmd);
>>> +
>>> +			memset(vec, mincore_swap(entry, false), nr);
>>
>> Looks buggy ...
>>
>> That assumes one swap-cache lookup is enough for whole PMD-sized range.
>> I don't think that always holds ...
>>
>> See do_huge_pmd_swap_page():
>>
>> ---8<---
>> 	folio = swap_cache_get_folio(swp_entry);
>> [...]
>> 	/*
>> 	 * Folio should be PMD-sized; if not (e.g. split in swap cache),
>> 	 * split the PMD swap entry and retry at PTE level.
>> 	 */
>> 	if (folio_nr_pages(folio) != HPAGE_PMD_NR) {
>> 		folio_unlock(folio);
>> 		folio_put(folio);
>> 		goto split_fallback;
>> 	}
>> ---
>>
>> it handles the case where swap_cache_get_folio() returns a folio that
>> is no longer PMD-sized. E.g. because it was split in the swap cache
>> while the PMD swap entry was installed. Then it split the PMD swap entry
>> and retries at PTE level :)
>>
>> unuse_pmd_entry() has the same fallback. Can mincore hit that case?
>>
>> Maybe the comment right above should say something like:
>>
>> "
>> One lookup is enough for a PMD-sized swapcache folio. If the swapcache
>> was split, check the per-page swap slots.
>> "
>>
>> Hopefully, I'm not missing something here :D
>>
>> Cheers, Lance
> 
> Good catch! Thanks for pointing this out.
> 
> I think the below diff over this commit should be ok. I will add
> it to the next revision. Its slower, but it shouldn't be an issue
> as its just mincore:

Just skimmed it. That should do the trick. Will go through it
properly in v3 :)

Thanks, Lance

> 
> diff --git a/mm/mincore.c b/mm/mincore.c
> index 3fee8a7b9d9d..975513fff336 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -175,15 +175,42 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>                  if (pmd_present(*pmd)) {
>                          memset(vec, 1, nr);
>                  } else {
> -                       /*
> -                        * Non-present PMD: migration, device-private, or PMD
> -                        * swap entry. Route through mincore_swap() the same way
> -                        * the PTE path does -- the swap entry covers all 512
> -                        * slots, so the whole vec gets the same answer.
> -                        */
>                          softleaf_t entry = softleaf_from_pmd(*pmd);
>   
> -                       memset(vec, mincore_swap(entry, false), nr);
> +                       /*
> +                        * Non-present PMD: migration, device-private, or
> +                        * PMD swap entry. Migration / device-private cover
> +                        * the whole PMD range with a single answer.
> +                        */
> +                       if (!softleaf_is_swap(entry)) {
> +                               memset(vec, mincore_swap(entry, false), nr);
> +                       } else {
> +                               struct folio *folio = swap_cache_get_folio(entry);
> +
> +                               /*
> +                                * One lookup is enough for a PMD-sized
> +                                * swapcache folio. If the swapcache was split
> +                                * (e.g. by deferred_split_scan() or
> +                                * memory_failure()) while the PMD swap entry
> +                                * was installed, check the per-page swap slots.
> +                                */
> +                               if (folio && folio_nr_pages(folio) == HPAGE_PMD_NR) {
> +                                       memset(vec, folio_test_uptodate(folio), nr);
> +                                       folio_put(folio);
> +                               } else {
> +                                       unsigned long haddr = addr & HPAGE_PMD_MASK;
> +                                       pgoff_t off = swp_offset(entry) +
> +                                               ((addr - haddr) >> PAGE_SHIFT);
> +
> +                                       if (folio)
> +                                               folio_put(folio);
> +                                       for (i = 0; i < nr; i++)
> +                                               vec[i] = mincore_swap(
> +                                                       swp_entry(swp_type(entry),
> +                                                                 off + i),
> +                                                       false);
> +                               }
> +                       }
>                  }
>                  spin_unlock(ptl);
>                  goto out;
> 



  reply	other threads:[~2026-06-12 15:21 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-02 14:24 [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-02 14:24 ` [v2 01/16] mm: add softleaf_to_pmd() and convert existing callers Usama Arif
2026-06-02 14:24 ` [v2 02/16] mm: extract mm_prepare_for_swap_entries() helper Usama Arif
2026-06-02 14:24 ` [v2 03/16] fs/proc: use softleaf_has_pfn() in pagemap PMD walker Usama Arif
2026-06-02 14:24 ` [v2 04/16] mm/huge_memory: move softleaf_to_folio() inside migration branch Usama Arif
2026-06-02 14:24 ` [v2 05/16] mm/migrate_device: move softleaf_to_folio() inside device-private branch Usama Arif
2026-06-02 14:24 ` [v2 06/16] mm: rename ARCH_ENABLE_THP_MIGRATION to ARCH_SUPPORTS_PMD_SOFTLEAF Usama Arif
2026-06-02 14:24 ` [v2 07/16] mm: add PMD swap entry detection support Usama Arif
2026-06-02 14:24 ` [v2 08/16] mm: add PMD swap entry splitting support Usama Arif
2026-06-02 14:24 ` [v2 09/16] mm: handle PMD swap entries in fork path Usama Arif
2026-06-02 14:24 ` [v2 10/16] mm: swap in PMD swap entries as whole THPs during swapoff Usama Arif
2026-06-02 14:24 ` [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers Usama Arif
2026-06-12  6:45   ` Lance Yang
2026-06-12 15:05     ` Usama Arif
2026-06-12 15:21       ` Lance Yang [this message]
2026-06-02 14:24 ` [v2 12/16] mm: handle PMD swap entries in MADV_WILLNEED Usama Arif
2026-06-02 14:24 ` [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE Usama Arif
2026-06-12  8:50   ` Lance Yang
2026-06-02 14:24 ` [v2 14/16] mm: handle PMD swap entry faults on swap-in Usama Arif
2026-06-02 14:24 ` [v2 15/16] mm: install PMD swap entries on swap-out Usama Arif
2026-06-12 14:21   ` Lance Yang
2026-06-02 14:24 ` [v2 16/16] selftests/mm: add PMD swap entry tests Usama Arif
2026-06-09 14:29 ` [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-10 12:24   ` David Hildenbrand (Arm)
2026-06-10 13:01     ` Lance Yang
2026-06-10 13:48       ` David Hildenbrand (Arm)
2026-06-10 14:44         ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c25fba1a-97fa-4d06-a129-2d0949eac95b@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=baoquan.he@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=youngjun.park@lge.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.