The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Lance Yang <lance.yang@linux.dev>
Cc: akpm@linux-foundation.org, david@kernel.org, chrisl@kernel.org,
	kasong@tencent.com, ljs@kernel.org, ziy@nvidia.com,
	ying.huang@linux.alibaba.com, baoquan.he@linux.dev,
	willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org,
	riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr,
	kas@kernel.org, baohua@kernel.org, dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	liam@infradead.org, ryan.roberts@arm.com, vbabka@kernel.org,
	linux-kernel@vger.kernel.org, nphamcs@gmail.com,
	shikemeng@huaweicloud.com, kernel-team@meta.com,
	linux-mm@kvack.org
Subject: Re: [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers
Date: Fri, 12 Jun 2026 16:05:57 +0100	[thread overview]
Message-ID: <e99fcf49-c01b-45a1-aecd-c4406ffdf5dc@linux.dev> (raw)
In-Reply-To: <20260612064550.54968-1-lance.yang@linux.dev>



On 12/06/2026 07:45, Lance Yang wrote:
> +Cc linux-mm
> 
> Please Cc linux-mm next time. Pretty clearly MM work ...

Yes, thanks for this! I forgot, will be careful in v3.

> 
> On Tue, Jun 02, 2026 at 07:24:19AM -0700, Usama Arif wrote:
> [...]
>> diff --git a/mm/mincore.c b/mm/mincore.c
>> index e5d13eea9234..3fee8a7b9d9d 100644
>> --- a/mm/mincore.c
>> +++ b/mm/mincore.c
>> @@ -172,7 +172,19 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>
>> 	ptl = pmd_trans_huge_lock(pmd, vma);
>> 	if (ptl) {
>> -		memset(vec, 1, nr);
>> +		if (pmd_present(*pmd)) {
>> +			memset(vec, 1, nr);
>> +		} else {
>> +			/*
>> +			 * Non-present PMD: migration, device-private, or PMD
>> +			 * swap entry. Route through mincore_swap() the same way
>> +			 * the PTE path does -- the swap entry covers all 512
>> +			 * slots, so the whole vec gets the same answer.
>> +			 */
>> +			softleaf_t entry = softleaf_from_pmd(*pmd);
>> +
>> +			memset(vec, mincore_swap(entry, false), nr);
> 
> Looks buggy ...
> 
> That assumes one swap-cache lookup is enough for whole PMD-sized range.
> I don't think that always holds ...
> 
> See do_huge_pmd_swap_page():
> 
> ---8<---
> 	folio = swap_cache_get_folio(swp_entry);
> [...]
> 	/*
> 	 * Folio should be PMD-sized; if not (e.g. split in swap cache),
> 	 * split the PMD swap entry and retry at PTE level.
> 	 */
> 	if (folio_nr_pages(folio) != HPAGE_PMD_NR) {
> 		folio_unlock(folio);
> 		folio_put(folio);
> 		goto split_fallback;
> 	}
> ---
> 
> it handles the case where swap_cache_get_folio() returns a folio that
> is no longer PMD-sized. E.g. because it was split in the swap cache
> while the PMD swap entry was installed. Then it split the PMD swap entry
> and retries at PTE level :)
> 
> unuse_pmd_entry() has the same fallback. Can mincore hit that case?
> 
> Maybe the comment right above should say something like:
> 
> "
> One lookup is enough for a PMD-sized swapcache folio. If the swapcache
> was split, check the per-page swap slots.
> "
> 
> Hopefully, I'm not missing something here :D
> 
> Cheers, Lance

Good catch! Thanks for pointing this out.

I think the below diff over this commit should be ok. I will add
it to the next revision. Its slower, but it shouldn't be an issue
as its just mincore:


diff --git a/mm/mincore.c b/mm/mincore.c
index 3fee8a7b9d9d..975513fff336 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -175,15 +175,42 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
                if (pmd_present(*pmd)) {
                        memset(vec, 1, nr);
                } else {
-                       /*
-                        * Non-present PMD: migration, device-private, or PMD
-                        * swap entry. Route through mincore_swap() the same way
-                        * the PTE path does -- the swap entry covers all 512
-                        * slots, so the whole vec gets the same answer.
-                        */
                        softleaf_t entry = softleaf_from_pmd(*pmd);
 
-                       memset(vec, mincore_swap(entry, false), nr);
+                       /*
+                        * Non-present PMD: migration, device-private, or
+                        * PMD swap entry. Migration / device-private cover
+                        * the whole PMD range with a single answer.
+                        */
+                       if (!softleaf_is_swap(entry)) {
+                               memset(vec, mincore_swap(entry, false), nr);
+                       } else {
+                               struct folio *folio = swap_cache_get_folio(entry);
+
+                               /*
+                                * One lookup is enough for a PMD-sized
+                                * swapcache folio. If the swapcache was split
+                                * (e.g. by deferred_split_scan() or
+                                * memory_failure()) while the PMD swap entry
+                                * was installed, check the per-page swap slots.
+                                */
+                               if (folio && folio_nr_pages(folio) == HPAGE_PMD_NR) {
+                                       memset(vec, folio_test_uptodate(folio), nr);
+                                       folio_put(folio);
+                               } else {
+                                       unsigned long haddr = addr & HPAGE_PMD_MASK;
+                                       pgoff_t off = swp_offset(entry) +
+                                               ((addr - haddr) >> PAGE_SHIFT);
+
+                                       if (folio)
+                                               folio_put(folio);
+                                       for (i = 0; i < nr; i++)
+                                               vec[i] = mincore_swap(
+                                                       swp_entry(swp_type(entry),
+                                                                 off + i),
+                                                       false);
+                               }
+                       }
                }
                spin_unlock(ptl);
                goto out;


  reply	other threads:[~2026-06-12 15:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-02 14:24 [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-02 14:24 ` [v2 01/16] mm: add softleaf_to_pmd() and convert existing callers Usama Arif
2026-06-02 14:24 ` [v2 02/16] mm: extract mm_prepare_for_swap_entries() helper Usama Arif
2026-06-02 14:24 ` [v2 03/16] fs/proc: use softleaf_has_pfn() in pagemap PMD walker Usama Arif
2026-06-02 14:24 ` [v2 04/16] mm/huge_memory: move softleaf_to_folio() inside migration branch Usama Arif
2026-06-02 14:24 ` [v2 05/16] mm/migrate_device: move softleaf_to_folio() inside device-private branch Usama Arif
2026-06-02 14:24 ` [v2 06/16] mm: rename ARCH_ENABLE_THP_MIGRATION to ARCH_SUPPORTS_PMD_SOFTLEAF Usama Arif
2026-06-02 14:24 ` [v2 07/16] mm: add PMD swap entry detection support Usama Arif
2026-06-02 14:24 ` [v2 08/16] mm: add PMD swap entry splitting support Usama Arif
2026-06-02 14:24 ` [v2 09/16] mm: handle PMD swap entries in fork path Usama Arif
2026-06-02 14:24 ` [v2 10/16] mm: swap in PMD swap entries as whole THPs during swapoff Usama Arif
2026-06-02 14:24 ` [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers Usama Arif
2026-06-12  6:45   ` Lance Yang
2026-06-12 15:05     ` Usama Arif [this message]
2026-06-12 15:21       ` Lance Yang
2026-06-02 14:24 ` [v2 12/16] mm: handle PMD swap entries in MADV_WILLNEED Usama Arif
2026-06-02 14:24 ` [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE Usama Arif
2026-06-12  8:50   ` Lance Yang
2026-06-02 14:24 ` [v2 14/16] mm: handle PMD swap entry faults on swap-in Usama Arif
2026-06-02 14:24 ` [v2 15/16] mm: install PMD swap entries on swap-out Usama Arif
2026-06-12 14:21   ` Lance Yang
2026-06-02 14:24 ` [v2 16/16] selftests/mm: add PMD swap entry tests Usama Arif
2026-06-09 14:29 ` [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-10 12:24   ` David Hildenbrand (Arm)
2026-06-10 13:01     ` Lance Yang
2026-06-10 13:48       ` David Hildenbrand (Arm)
2026-06-10 14:44         ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e99fcf49-c01b-45a1-aecd-c4406ffdf5dc@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=baoquan.he@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=youngjun.park@lge.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox