From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, ljs@kernel.org, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, riel@surriel.com,
vbabka@kernel.org, harry@kernel.org, jannh@google.com,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
shuah@kernel.org, linux-mm@kvack.org,
Gavin Guo <gavinguo@igalia.com>
Subject: Re: [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry
Date: Mon, 4 May 2026 14:44:43 +0200 [thread overview]
Message-ID: <ccb23360-d257-491e-926e-2a7f7b9c98c6@kernel.org> (raw)
In-Reply-To: <20260503003818.t35q5roc7osx6se2@master>
On 5/3/26 02:38, Wei Yang wrote:
> On Wed, Apr 29, 2026 at 08:55:07AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/29/26 04:49, Wei Yang wrote:
>>>
>>> Below is my proposed change:
>>>
>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>> index a4d52fdb3056..6e915d35ae54 100644
>>> --- a/mm/page_vma_mapped.c
>>> +++ b/mm/page_vma_mapped.c
>>> @@ -273,17 +273,21 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>>
>>> if (softleaf_is_device_private(entry)) {
>>> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>> - return true;
>>> + if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd)))
>>> + return true;
>>
>> As we have a softleaf entry, I assume we wouldn't expect to get any other bits
>> (access/dirty) set until we grab the lock. Verifying
>> softleaf_is_device_private() again would be better cleaner, though.
>>
>
> Got it.
>
>> But really, I do wonder if we should just have a "goto retry" back to the "pmde
>> = pmdp_get_lockless(pvmw->pmd);" instead?
>>
>
> Sounds reasonable. See below.
>
>>
>> And now I wonder why we don't have a check_pmd() handling in there? :/
>>
>> Should we check for the pfn here?
>
> Thanks for pointing out. I think you are right.
>
> After re-read the code, more questions come up my mind. I am afraid we need
> more cleanup for page_vma_mapped_walk().
>
> Below is my finding based on current understanding:
>
> 1. thp_migration_supported() seems not necessary
>
> The code reaches here means pmd_is_migration_entry() return true, which
> means CONFIG_ARCH_ENABLE_THP_MIGRATION is set, otherwise
> softleaf_from_pmd() should return softleaf_mk_none() which is not a
> migration softleaf.
>
> CONFIG_ARCH_ENABLE_THP_MIGRATION is set in turn means
> CONFIG_TRANSPARENT_HUGEPAGE is set, so thp_migration_supported() must
> returns true.
>
> 2. if migration entry change under us, we may need to handle on pte level
>
> In pmd_is_migration_entry() -> !pmd_present() branch, we have:
>
> if (!softleaf_is_migration(entry) ||
> !check_pmd(softleaf_to_pfn(entry), pvmw))
> return not_found(pvmw);
> return true;
>
> But I think we need do this:
>
> if (softleaf_is_migration(entry)) {
> if (check_pmd(softleaf_to_pfn(entry), pvmw))
> return not_found(pvmw);
> return true;
> }
>
> Per my understanding, if the pmd_is_migration_entry() change under us, we
> need to handle on pte level. Just like pmd_trans_huge() case. Break the
> loop and return false seems not consistent.
>
> 3. add proper check for device private entry
>
> For device private entry, currently we just grab lock and return. While
> according to the handling to pmd_trans_huge() and pmd_is_migration_entry(),
> we should:
>
> * re-validate it is still device private entry after pmd_lock()
> * check PVMW_MIGRATION
> * check_pmd()
>
> 4. consolidate pmd entry handling
>
> Per my understanding, there are 4 cases for pmd entry handling:
>
> * pmd_trans_huge()
> * pmd_is_migration_entry()
> * pmd_is_device_private_entry()
> * !pmd_present()
>
> Now we handle them in a mixed state check, which complicates the logic. And
> the first three share similar logic. (If my above analysis is correct.)
>
> * grab pmd_lock()
> * re-validate pmde
> * check PVMW_MIGRATION
> * check_pmd
>
> Here I would like to take a more bold step: consolidate handling for these
> three cases.
>
> Below is what it would look like.
>
> pmde = pmdp_get_lockless(pvmw->pmd);
>
> if (pmd_trans_huge(pmde) || pmd_is_valid_softleaf(pmde)) {
> unsigned long pfn;
> bool is_migration;
> bool for_migration;
>
> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd))) {
> is_migration = pmd_is_migration_entry(pmde);
> for_migration = !!(pvmw->flags & PVMW_MIGRATION);
>
> if (is_migration != for_migration)
> return not_found(pvmw);
>
> if (pmd_trans_huge(pmde))
> pfn = pmd_pfn(pmde);
> else
> pfn = softleaf_to_pfn(softleaf_from_pmd(pmde));
>
> if (!check_pmd(pfn, pvmw))
> return not_found(pvmw);
>
> return true;
> }
> /* THP pmd was split under us: handle on pte level */
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> } else if (!pmd_present(pmde)) {
> if ((pvmw->flags & PVMW_SYNC) &&
> thp_vma_suitable_order(vma, pvmw->address,
> PMD_ORDER) &&
> (pvmw->nr_pages >= HPAGE_PMD_NR))
> sync_with_folio_pmd_zap(mm, pvmw->pmd);
>
> step_forward(pvmw, PMD_SIZE);
> continue;
> }
>
> 5. use "goto retry"
>
> As you mentioned above. Instead of "handle on pte level", go to
> pmdp_get_lockless() for retry. This looks more reasonable to me.
>>
>>> + /* THP pmd was split under us: handle on pte level */
>>> + spin_unlock(pvmw->ptl);
>>> + pvmw->ptl = NULL;
>>
>>
>>
>>> + } else {
>>> + if ((pvmw->flags & PVMW_SYNC) &&
>>> + thp_vma_suitable_order(vma, pvmw->address,
>>> + PMD_ORDER) &&
>>> + (pvmw->nr_pages >= HPAGE_PMD_NR))
>>> + sync_with_folio_pmd_zap(mm, pvmw->pmd);
>>> +
>>> + step_forward(pvmw, PMD_SIZE);
>>> + continue;
>>> }
>>> -
>>> - if ((pvmw->flags & PVMW_SYNC) &&
>>> - thp_vma_suitable_order(vma, pvmw->address,
>>> - PMD_ORDER) &&
>>> - (pvmw->nr_pages >= HPAGE_PMD_NR))
>>> - sync_with_folio_pmd_zap(mm, pvmw->pmd);
>>> -
>>> - step_forward(pvmw, PMD_SIZE);
>>> - continue;
>>> }
>>> if (!map_pte(pvmw, &pmde, &ptl)) {
>>> if (!pvmw->pte)
>>>
>>> After this, we could simplify the logic in try_to_migrate_one() as:
>>>
>>> @@ -2471,14 +2471,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>> * so we can detect this scenario and properly
>>> * abort the walk.
>>> */
>>> - if (split_huge_pmd_locked(vma, pvmw.address,
>>> - pvmw.pmd, true)) {
>>> - page_vma_mapped_walk_done(&pvmw);
>>> - break;
>>> - }
>>> - flags &= ~TTU_SPLIT_HUGE_PMD;
>>> - page_vma_mapped_walk_restart(&pvmw);
>>> - continue;
>>> + ret = split_huge_pmd_locked(vma, pvmw.address,
>>> + pvmw.pmd, true);
>>> + page_vma_mapped_walk_done(&pvmw);
>>> + break;
>>> }
>>
>> Right. But just to be clear: Let's split the page_vma_mapped_walk() validation
>> (which looks like a bugfix to me) from the other optimization.
>>
>
> Sure, maybe we can split the page_vma_mapped_walk() cleanup out to another
> patch set for better reviewing?
Yes, but I assume it could even be fixes?
--
Cheers,
David
next prev parent reply other threads:[~2026-05-04 12:44 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 1:08 [PATCH 0/2] mm/huge_memory: optimize migration when huge PMD needs split Wei Yang
2026-04-15 1:08 ` [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry Wei Yang
2026-04-24 19:29 ` David Hildenbrand (Arm)
2026-04-26 9:19 ` Wei Yang
2026-04-28 8:24 ` David Hildenbrand (Arm)
2026-04-29 2:49 ` Wei Yang
2026-04-29 6:55 ` David Hildenbrand (Arm)
2026-05-03 0:38 ` Wei Yang
2026-05-04 12:44 ` David Hildenbrand (Arm) [this message]
2026-05-05 3:15 ` Wei Yang
2026-04-15 1:08 ` [PATCH 2/2] mm/selftests: add split_shared_pmd() Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ccb23360-d257-491e-926e-2a7f7b9c98c6@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=dev.jain@arm.com \
--cc=gavinguo@igalia.com \
--cc=harry@kernel.org \
--cc=jannh@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=richard.weiyang@gmail.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox