From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, ljs@kernel.org, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, riel@surriel.com,
vbabka@kernel.org, harry@kernel.org, jannh@google.com,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
shuah@kernel.org, linux-mm@kvack.org,
Gavin Guo <gavinguo@igalia.com>
Subject: Re: [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry
Date: Wed, 29 Apr 2026 08:55:07 +0200 [thread overview]
Message-ID: <413feed4-6aab-43d9-b7e5-a9386fa79f4b@kernel.org> (raw)
In-Reply-To: <20260429024913.iepoi7cit3xnwca2@master>
On 4/29/26 04:49, Wei Yang wrote:
> On Tue, Apr 28, 2026 at 10:24:42AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/26/26 11:19, Wei Yang wrote:
>>>
>>> Will adjust related places.
>>>
>>>
>>> Got it.
>>>
>>>
>>> Here is my understanding.
>>>
>>> We get here when page_vma_mapped_walk() touch a pmd entry, with three cases:
>>>
>>> * pmd_trans_huge()
>>> * pmd_is_migration_entry()
>>> * pmd_is_device_private_entry()
>>>
>>> For the first two cases, we grab pmd_lock() and then check the condition is
>>> still valid before return. But for case 3, after grab pmd_lock(), it return
>>> directly.
>>>
>>> This may give chance for another thread to split pmd_is_device_private_entry()
>>> to pte mapped, IIUC. For this case, we should restart the walk here.
>>
>>
>> So what you are saying is that we should re-validate in page_vma_mapped_walk()
>> that we indeed still have a device-private entry after grabbing the lock?
>>
>> That's what we do in map_pte() through pmd_same() check.
>>
>> Likely we should apply the same model here!
>>
>
> Below is my proposed change:
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index a4d52fdb3056..6e915d35ae54 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -273,17 +273,21 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>
> if (softleaf_is_device_private(entry)) {
> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> - return true;
> + if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd)))
> + return true;
As we have a softleaf entry, I assume we wouldn't expect to get any other bits
(access/dirty) set until we grab the lock. Verifying
softleaf_is_device_private() again would be better cleaner, though.
But really, I do wonder if we should just have a "goto retry" back to the "pmde
= pmdp_get_lockless(pvmw->pmd);" instead?
And now I wonder why we don't have a check_pmd() handling in there? :/
Should we check for the pfn here?
> + /* THP pmd was split under us: handle on pte level */
> + spin_unlock(pvmw->ptl);
> + pvmw->ptl = NULL;
> + } else {
> + if ((pvmw->flags & PVMW_SYNC) &&
> + thp_vma_suitable_order(vma, pvmw->address,
> + PMD_ORDER) &&
> + (pvmw->nr_pages >= HPAGE_PMD_NR))
> + sync_with_folio_pmd_zap(mm, pvmw->pmd);
> +
> + step_forward(pvmw, PMD_SIZE);
> + continue;
> }
> -
> - if ((pvmw->flags & PVMW_SYNC) &&
> - thp_vma_suitable_order(vma, pvmw->address,
> - PMD_ORDER) &&
> - (pvmw->nr_pages >= HPAGE_PMD_NR))
> - sync_with_folio_pmd_zap(mm, pvmw->pmd);
> -
> - step_forward(pvmw, PMD_SIZE);
> - continue;
> }
> if (!map_pte(pvmw, &pmde, &ptl)) {
> if (!pvmw->pte)
>
> After this, we could simplify the logic in try_to_migrate_one() as:
>
> @@ -2471,14 +2471,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> * so we can detect this scenario and properly
> * abort the walk.
> */
> - if (split_huge_pmd_locked(vma, pvmw.address,
> - pvmw.pmd, true)) {
> - page_vma_mapped_walk_done(&pvmw);
> - break;
> - }
> - flags &= ~TTU_SPLIT_HUGE_PMD;
> - page_vma_mapped_walk_restart(&pvmw);
> - continue;
> + ret = split_huge_pmd_locked(vma, pvmw.address,
> + pvmw.pmd, true);
> + page_vma_mapped_walk_done(&pvmw);
> + break;
> }
Right. But just to be clear: Let's split the page_vma_mapped_walk() validation
(which looks like a bugfix to me) from the other optimization.
--
Cheers,
David
next prev parent reply other threads:[~2026-04-29 6:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 1:08 [PATCH 0/2] mm/huge_memory: optimize migration when huge PMD needs split Wei Yang
2026-04-15 1:08 ` [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry Wei Yang
2026-04-24 19:29 ` David Hildenbrand (Arm)
2026-04-26 9:19 ` Wei Yang
2026-04-28 8:24 ` David Hildenbrand (Arm)
2026-04-29 2:49 ` Wei Yang
2026-04-29 6:55 ` David Hildenbrand (Arm) [this message]
2026-05-03 0:38 ` Wei Yang
2026-05-04 12:44 ` David Hildenbrand (Arm)
2026-05-05 3:15 ` Wei Yang
2026-04-15 1:08 ` [PATCH 2/2] mm/selftests: add split_shared_pmd() Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=413feed4-6aab-43d9-b7e5-a9386fa79f4b@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=dev.jain@arm.com \
--cc=gavinguo@igalia.com \
--cc=harry@kernel.org \
--cc=jannh@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=richard.weiyang@gmail.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox