From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Barry Song <baohua@kernel.org>
Cc: Kairui Song <ryncsn@gmail.com>,
akpm@linux-foundation.org, linux-mm@kvack.org,
baoquan.he@linux.dev, chrisl@kernel.org, jp.kobryn@linux.dev,
kasong@tencent.com, liam@infradead.org,
linux-kernel@vger.kernel.org, ljs@kernel.org, mhocko@suse.com,
nphamcs@gmail.com, rppt@kernel.org, shakeel.butt@linux.dev,
shikemeng@huaweicloud.com, surenb@google.com,
usama.arif@linux.dev, vbabka@kernel.org, youngjun.park@lge.com
Subject: Re: [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios
Date: Tue, 30 Jun 2026 07:48:32 +0200 [thread overview]
Message-ID: <1e7a712b-33f4-44c2-85ed-6333ddca421c@kernel.org> (raw)
In-Reply-To: <CAGsJ_4xh1P1yw3JogQxTS=Vu8kLEiMMh8si2YnCc9qnoKWLwjA@mail.gmail.com>
On 6/30/26 01:59, Barry Song wrote:
> On Sat, Jun 27, 2026 at 12:36 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>>>
>>> That's awkward, my bad, terribly sorry about this :(
>>>
>>> I sincerely apologize for the oversight and the trouble this has caused.
>>>
>>
>> All good, I was just surprised to see a previous optimization partially
>> reverted without a clear reasoning :)
>>
>> Because it should have removed the handling in should_try_to_free_swap() as well.
>>
>> It's good that we are discussing it now!
>>
>>> I haven't seen any performance regression in any workload recently
>>> though, or any correctness issue, perhaps the round trip of
>>> do_wp_page wasn't that bad. It should still catch the reuse folios,
>>> just more costly than doing things in-place.
>>
>> Right, do_wp_page() handles it, after the page was mapped. It adds some overhead,
>> but fortunately no TLB flush if we're just upgrading write permissions.
>>
>> The optimization dates back to pre PageAnonExclusive handling.
>>
>>>
>>> I think we should restore the original check first. We might also want to
>>> avoid dropping the swap cache if the folio will not be reused, which
>>> was discussed here:
>>>
>>> https://lore.kernel.org/linux-mm/CAMgjq7BDfvNXdWH0cqarsujjUn3i3tDDhDkmSg01TR4h-tDorQ@mail.gmail.com/
>>>
>>> Maybe extracting some common part into a helper can help make this
>>> cleaner.
>>>
>>>
>>> Hi Barry,
>>>
>>> The problem is more than that, the `exclusive || folio_ref_count(folio) == 1`
>>> in do_swap_page is also ineffective now.
>>
>> Exactly.
>>
>> If the roundtrip through do_wp_page() is good enough today, we can just do
>>
>> @@ -4512,7 +4516,6 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
>> static inline bool should_try_to_free_swap(struct swap_info_struct *si,
>> struct folio *folio,
>> struct vm_area_struct *vma,
>> - unsigned int extra_refs,
>> unsigned int fault_flags)
>> {
>> if (!folio_test_swapcache(folio))
>> @@ -4528,14 +4531,7 @@ static inline bool should_try_to_free_swap(struct swap_info_struct *si,
>> if (mem_cgroup_swap_full(folio) || (vma->vm_flags & VM_LOCKED) ||
>> folio_test_mlocked(folio))
>> return true;
>> - /*
>> - * If we want to map a page that's in the swapcache writable, we
>> - * have to detect via the refcount if we're really the exclusive
>> - * user. Try freeing the swapcache to get rid of the swapcache
>> - * reference only in case it's likely that we'll be the exclusive user.
>> - */
>> - return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
>> - folio_ref_count(folio) == (extra_refs + folio_nr_pages(folio));
>> + return false;
>> }
>>
>> static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
>> @@ -5095,7 +5091,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>> * Do it after mapping, so raced page faults will likely see the folio
>> * in swap cache and wait on the folio lock.
>> */
>> - if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
>> + if (should_try_to_free_swap(si, folio, vma, vmf->flags))
>> folio_free_swap(folio);
>>
>> folio_unlock(folio);
>>
>> But then, one question is whether we'd actually want to try removing the swapcache when
>> we mapped the page writable (iow: exclusive)?
>
> I guess this question comes from my earlier commit c18160dba5ff
> ("mm: swap: reuse exclusive folio directly instead of wp page faults"),
> where the folio is reused even for READ faults. In that case, we
> would miss do_wp_page(), which could later drop the swapcache?
> Also, again, nobody has reported any regression for this.
>
> Holding swapcache for a clean folio for non-sync swap I/O has the
> benefit of avoiding a potential pageout(). Now, reuse even for read
> faults and always dropping swapcache seems to somewhat defeat that
> benefit. On the other hand, we always drop swapcache for sync I/O
> to avoid the copy in zRAM or elsewhere consuming memory, so it seems
> safe enough to always enable reuse in do_swap_page() for sync I/O.
Right, I meant during write faults, when a write is expected. See below.
>
>>
>>
>> @@ -4512,7 +4516,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
>> static inline bool should_try_to_free_swap(struct swap_info_struct *si,
>> struct folio *folio,
>> struct vm_area_struct *vma,
>> - unsigned int extra_refs,
>> + bool exclusive,
>> unsigned int fault_flags)
>> {
>> if (!folio_test_swapcache(folio))
>> @@ -4529,13 +4533,11 @@ static inline bool should_try_to_free_swap(struct swap_info_struct *si,
>> folio_test_mlocked(folio))
>> return true;
>> /*
>> - * If we want to map a page that's in the swapcache writable, we
>> - * have to detect via the refcount if we're really the exclusive
>> - * user. Try freeing the swapcache to get rid of the swapcache
>> - * reference only in case it's likely that we'll be the exclusive user.
>> + * We have an exclusive page that was mapped writable or will soon
>> + * be mapped writable (as we are in a write fault). Let's just try
>> + * to reclaim swap immediately.
>> */
>> - return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
>> - folio_ref_count(folio) == (extra_refs + folio_nr_pages(folio));
>> + return (fault_flags & FAULT_FLAG_WRITE) && exclusive;
>
> I assume you mean (fault_flags & FAULT_FLAG_WRITE) || exclusive, or
No, I meant "we are serving a write fault (write is definitely going to happen)
and we are definitely reusing the page (exclusive).
> we can just remove (fault_flags & FAULT_FLAG_WRITE) and use
> "exclusive" instead since we are always using do_wp_page() now,
If you drop the "FAULT_FLAG_WRITE", you'd remove clean pages (that will likely
stay clean as no write fault) from the swapcache, As you correctly say above,
can avoid a pageout(), so I think we should keep that.
> and FAULT_FLAG_WRITE in fault_flags could have been cleared
> by the reuse of do_swap_page().
Ah, yes, that existing handling is nasty. We should look into not messing with
fault flags. Something like the following
diff --git a/mm/memory.c b/mm/memory.c
index ff338c2abe923..b0d8f3674525b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5052,10 +5052,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) &&
!pte_needs_soft_dirty_wp(vma, pte)) {
pte = pte_mkwrite(pte, vma);
- if (vmf->flags & FAULT_FLAG_WRITE) {
+ if (vmf->flags & FAULT_FLAG_WRITE)
pte = pte_mkdirty(pte);
- vmf->flags &= ~FAULT_FLAG_WRITE;
- }
}
rmap_flags |= RMAP_EXCLUSIVE;
}
@@ -5112,7 +5110,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
folio_put(swapcache);
}
- if (vmf->flags & FAULT_FLAG_WRITE) {
+ if ((vmf->flags & FAULT_FLAG_WRITE) && !pte_write(pte)) {
ret |= do_wp_page(vmf);
if (ret & VM_FAULT_ERROR)
ret &= VM_FAULT_ERROR;
--
Cheers,
David
next prev parent reply other threads:[~2026-06-30 5:48 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 23:16 [PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
2026-06-23 23:16 ` [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
2026-06-24 10:14 ` Kairui Song
2026-06-24 15:02 ` David Hildenbrand (Arm)
2026-06-24 21:04 ` Barry Song
2026-06-26 16:25 ` David Hildenbrand (Arm)
2026-06-27 2:44 ` Shakeel Butt
2026-06-27 7:20 ` Barry Song
2026-06-29 7:52 ` David Hildenbrand (Arm)
2026-06-29 23:16 ` Barry Song
2026-06-30 5:38 ` David Hildenbrand (Arm)
2026-06-23 23:16 ` [PATCH v2 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
2026-06-24 15:07 ` David Hildenbrand (Arm)
2026-06-24 21:29 ` Barry Song
2026-06-23 23:16 ` [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
2026-06-24 10:16 ` Kairui Song
2026-06-24 15:10 ` David Hildenbrand (Arm)
2026-06-23 23:16 ` [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios Barry Song (Xiaomi)
2026-06-24 15:20 ` David Hildenbrand (Arm)
2026-06-24 21:14 ` Barry Song
2026-06-25 14:40 ` Kairui Song
2026-06-26 16:35 ` David Hildenbrand (Arm)
2026-06-29 23:59 ` Barry Song
2026-06-30 5:48 ` David Hildenbrand (Arm) [this message]
2026-06-30 22:36 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1e7a712b-33f4-44c2-85ed-6333ddca421c@kernel.org \
--to=david@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baoquan.he@linux.dev \
--cc=chrisl@kernel.org \
--cc=jp.kobryn@linux.dev \
--cc=kasong@tencent.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=nphamcs@gmail.com \
--cc=rppt@kernel.org \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=vbabka@kernel.org \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox