From: Lance Yang <lance.yang@linux.dev>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: akpm@linux-foundation.org, david@redhat.com,
Liam.Howlett@oracle.com, baohua@kernel.org,
baolin.wang@linux.alibaba.com, dev.jain@arm.com,
hughd@google.com, ioworker0@gmail.com, kirill@shutemov.name,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
mpenttil@redhat.com, npache@redhat.com, ryan.roberts@arm.com,
ziy@nvidia.com, richard.weiyang@gmail.com
Subject: Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
Date: Tue, 14 Oct 2025 23:01:13 +0800 [thread overview]
Message-ID: <d2e5b099-94bd-444d-9946-182807443539@linux.dev> (raw)
In-Reply-To: <f7392f43-b8f1-4e6a-b9c8-25ad8a47f82c@lucifer.local>
On 2025/10/14 22:39, Lorenzo Stoakes wrote:
> On Tue, Oct 14, 2025 at 10:26:20PM +0800, Lance Yang wrote:
>>
>>
>> On 2025/10/14 19:08, Lorenzo Stoakes wrote:
>>> On Wed, Oct 08, 2025 at 11:26:57AM +0800, Lance Yang wrote:
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index abe54f0043c7..bec3e268dc76 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>>>> if (!is_swap_pte(vmf.orig_pte))
>>>> continue;
>>>>
>>>> + if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
>>>> + result = SCAN_PTE_NON_PRESENT;
>>>> + goto out;
>>>> + }
>>>
>>> OK seems in line with what we were discussing before...
>>
>> Yep. That's the idea :)
>>
>>>
>>>> +
>>>> vmf.pte = pte;
>>>> vmf.ptl = ptl;
>>>> ret = do_swap_page(&vmf);
>>>> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>> _pte++, addr += PAGE_SIZE) {
>>>> pte_t pteval = ptep_get(_pte);
>>>> - if (is_swap_pte(pteval)) {
>>>> + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>>>> + ++none_or_zero;
>>>> + if (!userfaultfd_armed(vma) &&
>>>> + (!cc->is_khugepaged ||
>>>> + none_or_zero <= khugepaged_max_ptes_none)) {
>>>> + continue;
>>>> + } else {
>>>> + result = SCAN_EXCEED_NONE_PTE;
>>>> + count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>>>> + goto out_unmap;
>>>> + }
>>>> + } else if (!pte_present(pteval)) {
>>>> + if (non_swap_entry(pte_to_swp_entry(pteval))) {
>>>
>>
>> Thanks for pointing that out!
>
> You've deleted what I've said here and also not indicated whether you'll do what
> I asked :)
>
> Please be clearer...
Oh, I didn't delete your comment at all ... It's just below ...
>
>>>>> Hm but can't this be pte_protnone() at this stage (or something
else)? And then <-- Here!
>>
>> Yeah. The funny thing is, a protnone pte cannot actually get here, IIUC.
>>
>> ```
>> static inline int pte_protnone(pte_t pte)
>> {
>> return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
>> == _PAGE_PROTNONE;
>> }
>>
>> static inline int pte_present(pte_t a)
>> {
>> return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
>> }
>> ```
>>
>> On x86, pte_present() returns true for a protnone pte. And I'd assume
>> other archs behave similarly ...
>
> This was one example, we may make changes in the future that result in entries
> that are non-present but also non-swap.
>
> I don't see the point in eliminating this check based on an implicit, open-coded
> assumption that this can never be the case, this is just asking for trouble.
>
>>
>>> we're just assuming pte_to_swp_entry() is operating on a swap entry when it in
>>> fact might not be?
>>>
>>> Couldn't we end up with false positives here?
>>
>> Emm, I think we're good here and the code is doing the right thing.
>
> I mean sorry but just - NO - to doing swap operations based on open-coded checks
> that you implicitly feel must imply a swap entry.
>
> This makes the code a lot more confusing, it opens us up to accidentally
> breaking things in future and has little to no benefit, I don't see why we're
> doing it.
>
> I don't think every little 'aha X must imply Y so just eliminate Z' idea need be
> implemented, this feels like a sort of 'mathematical reduction of code ignoring
> all other factors'.
Understood. Changing !pte_present() to is_swap_pte() will resolve all your
concerns, right?
```
if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
[...]
} else if (is_swap_pte(pteval)) { <-- Here
if (non_swap_entry(pte_to_swp_entry(pteval))) {
[...]
}
[...]}
```
>
>>
>>>
>>>> + result = SCAN_PTE_NON_PRESENT;
>>>> + goto out_unmap;
>>>> + }
>>>> +
>>>> ++unmapped;
>>>> if (!cc->is_khugepaged ||
>>>> unmapped <= khugepaged_max_ptes_swap) {
>>>> @@ -1290,7 +1311,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>> * enabled swap entries. Please see
>>>> * comment below for pte_uffd_wp().
>>>> */
>>>> - if (pte_swp_uffd_wp_any(pteval)) {
>>>> + if (pte_swp_uffd_wp(pteval)) {
>>>
>>> Again you're assuming it's a swap entry but you're not asserting this is a swap
>>> entry in this branch?
>>
>> As we discussed above, the non_swap_entry() check has already kicked out
>> anything that isn't a true swap entry, right?
>
> This is a different function?
>
> Actually I'm mistaken here I think - you check in the code above:
>
> if (is_swap_pte(pteval)) {
> ...
> }
>
> So this is fine, please ignore sorry :)
No worries at all, thanks for double-checking and clarifying!
>
>>
>>>
>>> Also an aside - I hate, hate, hate how this uffd wp stuff has infiltrated all
>>> kinds of open-coded stuff. It's so gross (not your fault, just a general
>>> comment...)
>>
>> Haha, tell me about it. No argument from me there ;)
>
> :)
>
>>
>> Thanks,
>> Lance
>
> Cheers, Lorenzo
next prev parent reply other threads:[~2025-10-14 15:02 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-08 3:26 [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
2025-10-08 8:18 ` David Hildenbrand
2025-10-10 3:44 ` Barry Song
2025-10-10 15:21 ` Zi Yan
2025-10-10 15:34 ` Lance Yang
2025-10-10 15:35 ` Zi Yan
2025-10-14 11:08 ` Lorenzo Stoakes
2025-10-14 14:26 ` Lance Yang
2025-10-14 14:32 ` David Hildenbrand
2025-10-14 14:37 ` Lance Yang
2025-10-14 14:43 ` Lorenzo Stoakes
2025-10-14 14:39 ` Lorenzo Stoakes
2025-10-14 15:01 ` Lance Yang [this message]
2025-10-14 15:12 ` David Hildenbrand
2025-10-14 15:33 ` David Hildenbrand
2025-10-14 16:10 ` Lorenzo Stoakes
2025-10-14 15:41 ` Lorenzo Stoakes
2025-10-14 15:48 ` David Hildenbrand
2025-10-14 15:57 ` Lorenzo Stoakes
2025-10-14 15:11 ` David Hildenbrand
2025-10-14 15:52 ` Lorenzo Stoakes
2025-10-14 16:09 ` Lance Yang
2025-10-14 16:27 ` Lorenzo Stoakes
2025-10-15 1:52 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d2e5b099-94bd-444d-9946-182807443539@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mpenttil@redhat.com \
--cc=npache@redhat.com \
--cc=richard.weiyang@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.