From: Wei Yang <richard.weiyang@gmail.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>,
akpm@linux-foundation.org, ljs@kernel.org, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, riel@surriel.com,
vbabka@kernel.org, harry@kernel.org, jannh@google.com,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
shuah@kernel.org, linux-mm@kvack.org,
Gavin Guo <gavinguo@igalia.com>
Subject: Re: [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry
Date: Sun, 3 May 2026 00:38:18 +0000 [thread overview]
Message-ID: <20260503003818.t35q5roc7osx6se2@master> (raw)
In-Reply-To: <413feed4-6aab-43d9-b7e5-a9386fa79f4b@kernel.org>
On Wed, Apr 29, 2026 at 08:55:07AM +0200, David Hildenbrand (Arm) wrote:
>On 4/29/26 04:49, Wei Yang wrote:
>> On Tue, Apr 28, 2026 at 10:24:42AM +0200, David Hildenbrand (Arm) wrote:
>>> On 4/26/26 11:19, Wei Yang wrote:
>>>>
>>>> Will adjust related places.
>>>>
>>>>
>>>> Got it.
>>>>
>>>>
>>>> Here is my understanding.
>>>>
>>>> We get here when page_vma_mapped_walk() touch a pmd entry, with three cases:
>>>>
>>>> * pmd_trans_huge()
>>>> * pmd_is_migration_entry()
>>>> * pmd_is_device_private_entry()
>>>>
>>>> For the first two cases, we grab pmd_lock() and then check the condition is
>>>> still valid before return. But for case 3, after grab pmd_lock(), it return
>>>> directly.
>>>>
>>>> This may give chance for another thread to split pmd_is_device_private_entry()
>>>> to pte mapped, IIUC. For this case, we should restart the walk here.
>>>
>>>
>>> So what you are saying is that we should re-validate in page_vma_mapped_walk()
>>> that we indeed still have a device-private entry after grabbing the lock?
>>>
>>> That's what we do in map_pte() through pmd_same() check.
>>>
>>> Likely we should apply the same model here!
>>>
>>
>> Below is my proposed change:
>>
>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>> index a4d52fdb3056..6e915d35ae54 100644
>> --- a/mm/page_vma_mapped.c
>> +++ b/mm/page_vma_mapped.c
>> @@ -273,17 +273,21 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>
>> if (softleaf_is_device_private(entry)) {
>> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> - return true;
>> + if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd)))
>> + return true;
>
>As we have a softleaf entry, I assume we wouldn't expect to get any other bits
>(access/dirty) set until we grab the lock. Verifying
>softleaf_is_device_private() again would be better cleaner, though.
>
Got it.
>But really, I do wonder if we should just have a "goto retry" back to the "pmde
>= pmdp_get_lockless(pvmw->pmd);" instead?
>
Sounds reasonable. See below.
>
>And now I wonder why we don't have a check_pmd() handling in there? :/
>
>Should we check for the pfn here?
Thanks for pointing out. I think you are right.
After re-read the code, more questions come up my mind. I am afraid we need
more cleanup for page_vma_mapped_walk().
Below is my finding based on current understanding:
1. thp_migration_supported() seems not necessary
The code reaches here means pmd_is_migration_entry() return true, which
means CONFIG_ARCH_ENABLE_THP_MIGRATION is set, otherwise
softleaf_from_pmd() should return softleaf_mk_none() which is not a
migration softleaf.
CONFIG_ARCH_ENABLE_THP_MIGRATION is set in turn means
CONFIG_TRANSPARENT_HUGEPAGE is set, so thp_migration_supported() must
returns true.
2. if migration entry change under us, we may need to handle on pte level
In pmd_is_migration_entry() -> !pmd_present() branch, we have:
if (!softleaf_is_migration(entry) ||
!check_pmd(softleaf_to_pfn(entry), pvmw))
return not_found(pvmw);
return true;
But I think we need do this:
if (softleaf_is_migration(entry)) {
if (check_pmd(softleaf_to_pfn(entry), pvmw))
return not_found(pvmw);
return true;
}
Per my understanding, if the pmd_is_migration_entry() change under us, we
need to handle on pte level. Just like pmd_trans_huge() case. Break the
loop and return false seems not consistent.
3. add proper check for device private entry
For device private entry, currently we just grab lock and return. While
according to the handling to pmd_trans_huge() and pmd_is_migration_entry(),
we should:
* re-validate it is still device private entry after pmd_lock()
* check PVMW_MIGRATION
* check_pmd()
4. consolidate pmd entry handling
Per my understanding, there are 4 cases for pmd entry handling:
* pmd_trans_huge()
* pmd_is_migration_entry()
* pmd_is_device_private_entry()
* !pmd_present()
Now we handle them in a mixed state check, which complicates the logic. And
the first three share similar logic. (If my above analysis is correct.)
* grab pmd_lock()
* re-validate pmde
* check PVMW_MIGRATION
* check_pmd
Here I would like to take a more bold step: consolidate handling for these
three cases.
Below is what it would look like.
pmde = pmdp_get_lockless(pvmw->pmd);
if (pmd_trans_huge(pmde) || pmd_is_valid_softleaf(pmde)) {
unsigned long pfn;
bool is_migration;
bool for_migration;
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd))) {
is_migration = pmd_is_migration_entry(pmde);
for_migration = !!(pvmw->flags & PVMW_MIGRATION);
if (is_migration != for_migration)
return not_found(pvmw);
if (pmd_trans_huge(pmde))
pfn = pmd_pfn(pmde);
else
pfn = softleaf_to_pfn(softleaf_from_pmd(pmde));
if (!check_pmd(pfn, pvmw))
return not_found(pvmw);
return true;
}
/* THP pmd was split under us: handle on pte level */
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
} else if (!pmd_present(pmde)) {
if ((pvmw->flags & PVMW_SYNC) &&
thp_vma_suitable_order(vma, pvmw->address,
PMD_ORDER) &&
(pvmw->nr_pages >= HPAGE_PMD_NR))
sync_with_folio_pmd_zap(mm, pvmw->pmd);
step_forward(pvmw, PMD_SIZE);
continue;
}
5. use "goto retry"
As you mentioned above. Instead of "handle on pte level", go to
pmdp_get_lockless() for retry. This looks more reasonable to me.
>
>> + /* THP pmd was split under us: handle on pte level */
>> + spin_unlock(pvmw->ptl);
>> + pvmw->ptl = NULL;
>
>
>
>> + } else {
>> + if ((pvmw->flags & PVMW_SYNC) &&
>> + thp_vma_suitable_order(vma, pvmw->address,
>> + PMD_ORDER) &&
>> + (pvmw->nr_pages >= HPAGE_PMD_NR))
>> + sync_with_folio_pmd_zap(mm, pvmw->pmd);
>> +
>> + step_forward(pvmw, PMD_SIZE);
>> + continue;
>> }
>> -
>> - if ((pvmw->flags & PVMW_SYNC) &&
>> - thp_vma_suitable_order(vma, pvmw->address,
>> - PMD_ORDER) &&
>> - (pvmw->nr_pages >= HPAGE_PMD_NR))
>> - sync_with_folio_pmd_zap(mm, pvmw->pmd);
>> -
>> - step_forward(pvmw, PMD_SIZE);
>> - continue;
>> }
>> if (!map_pte(pvmw, &pmde, &ptl)) {
>> if (!pvmw->pte)
>>
>> After this, we could simplify the logic in try_to_migrate_one() as:
>>
>> @@ -2471,14 +2471,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>> * so we can detect this scenario and properly
>> * abort the walk.
>> */
>> - if (split_huge_pmd_locked(vma, pvmw.address,
>> - pvmw.pmd, true)) {
>> - page_vma_mapped_walk_done(&pvmw);
>> - break;
>> - }
>> - flags &= ~TTU_SPLIT_HUGE_PMD;
>> - page_vma_mapped_walk_restart(&pvmw);
>> - continue;
>> + ret = split_huge_pmd_locked(vma, pvmw.address,
>> + pvmw.pmd, true);
>> + page_vma_mapped_walk_done(&pvmw);
>> + break;
>> }
>
>Right. But just to be clear: Let's split the page_vma_mapped_walk() validation
>(which looks like a bugfix to me) from the other optimization.
>
Sure, maybe we can split the page_vma_mapped_walk() cleanup out to another
patch set for better reviewing?
>--
>Cheers,
>
>David
--
Wei Yang
Help you, Help me
next prev parent reply other threads:[~2026-05-03 0:38 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 1:08 [PATCH 0/2] mm/huge_memory: optimize migration when huge PMD needs split Wei Yang
2026-04-15 1:08 ` [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry Wei Yang
2026-04-24 19:29 ` David Hildenbrand (Arm)
2026-04-26 9:19 ` Wei Yang
2026-04-28 8:24 ` David Hildenbrand (Arm)
2026-04-29 2:49 ` Wei Yang
2026-04-29 6:55 ` David Hildenbrand (Arm)
2026-05-03 0:38 ` Wei Yang [this message]
2026-05-04 12:44 ` David Hildenbrand (Arm)
2026-05-05 3:15 ` Wei Yang
2026-04-15 1:08 ` [PATCH 2/2] mm/selftests: add split_shared_pmd() Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260503003818.t35q5roc7osx6se2@master \
--to=richard.weiyang@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gavinguo@igalia.com \
--cc=harry@kernel.org \
--cc=jannh@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox