Re: [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Barry Song <baohua@kernel.org>
Cc: Kairui Song <ryncsn@gmail.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	baoquan.he@linux.dev, chrisl@kernel.org, jp.kobryn@linux.dev,
	kasong@tencent.com, liam@infradead.org,
	linux-kernel@vger.kernel.org, ljs@kernel.org, mhocko@suse.com,
	nphamcs@gmail.com, rppt@kernel.org, shakeel.butt@linux.dev,
	shikemeng@huaweicloud.com, surenb@google.com,
	usama.arif@linux.dev, vbabka@kernel.org, youngjun.park@lge.com
Subject: Re: [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios
Date: Tue, 30 Jun 2026 07:48:32 +0200	[thread overview]
Message-ID: <1e7a712b-33f4-44c2-85ed-6333ddca421c@kernel.org> (raw)
In-Reply-To: <CAGsJ_4xh1P1yw3JogQxTS=Vu8kLEiMMh8si2YnCc9qnoKWLwjA@mail.gmail.com>

On 6/30/26 01:59, Barry Song wrote:
> On Sat, Jun 27, 2026 at 12:36 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>>>
>>> That's awkward, my bad, terribly sorry about this :(
>>>
>>> I sincerely apologize for the oversight and the trouble this has caused.
>>>
>>
>> All good, I was just surprised to see a previous optimization partially
>> reverted without a clear reasoning :)
>>
>> Because it should have removed the handling in should_try_to_free_swap() as well.
>>
>> It's good that we are discussing it now!
>>
>>> I haven't seen any performance regression in any workload recently
>>> though, or any correctness issue, perhaps the round trip of
>>> do_wp_page wasn't that bad. It should still catch the reuse folios,
>>> just more costly than doing things in-place.
>>
>> Right, do_wp_page() handles it, after the page was mapped. It adds some overhead,
>> but fortunately no TLB flush if we're just upgrading write permissions.
>>
>> The optimization dates back to pre PageAnonExclusive handling.
>>
>>>
>>> I think we should restore the original check first. We might also want to
>>> avoid dropping the swap cache if the folio will not be reused, which
>>> was discussed here:
>>>
>>> https://lore.kernel.org/linux-mm/CAMgjq7BDfvNXdWH0cqarsujjUn3i3tDDhDkmSg01TR4h-tDorQ@mail.gmail.com/
>>>
>>> Maybe extracting some common part into a helper can help make this
>>> cleaner.
>>>
>>>
>>> Hi Barry,
>>>
>>> The problem is more than that, the `exclusive || folio_ref_count(folio) == 1`
>>> in do_swap_page is also ineffective now.
>>
>> Exactly.
>>
>> If the roundtrip through do_wp_page() is good enough today, we can just do
>>
>> @@ -4512,7 +4516,6 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
>>  static inline bool should_try_to_free_swap(struct swap_info_struct *si,
>>                                            struct folio *folio,
>>                                            struct vm_area_struct *vma,
>> -                                          unsigned int extra_refs,
>>                                            unsigned int fault_flags)
>>  {
>>         if (!folio_test_swapcache(folio))
>> @@ -4528,14 +4531,7 @@ static inline bool should_try_to_free_swap(struct swap_info_struct *si,
>>         if (mem_cgroup_swap_full(folio) || (vma->vm_flags & VM_LOCKED) ||
>>             folio_test_mlocked(folio))
>>                 return true;
>> -       /*
>> -        * If we want to map a page that's in the swapcache writable, we
>> -        * have to detect via the refcount if we're really the exclusive
>> -        * user. Try freeing the swapcache to get rid of the swapcache
>> -        * reference only in case it's likely that we'll be the exclusive user.
>> -        */
>> -       return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
>> -               folio_ref_count(folio) == (extra_refs + folio_nr_pages(folio));
>> +       return false;
>>  }
>>
>>  static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
>> @@ -5095,7 +5091,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>          * Do it after mapping, so raced page faults will likely see the folio
>>          * in swap cache and wait on the folio lock.
>>          */
>> -       if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
>> +       if (should_try_to_free_swap(si, folio, vma, vmf->flags))
>>                 folio_free_swap(folio);
>>
>>         folio_unlock(folio);
>>
>> But then, one question is whether we'd actually want to try removing the swapcache when
>> we mapped the page writable (iow: exclusive)?
> 
> I guess this question comes from my earlier commit c18160dba5ff
> ("mm: swap: reuse exclusive folio directly instead of wp page faults"),
> where the folio is reused even for READ faults. In that case, we
> would miss do_wp_page(), which could later drop the swapcache?
> Also, again, nobody has reported any regression for this.
> 
> Holding swapcache for a clean folio for non-sync swap I/O has the
> benefit of avoiding a potential pageout(). Now, reuse even for read
> faults and always dropping swapcache seems to somewhat defeat that
> benefit. On the other hand, we always drop swapcache for sync I/O
> to avoid the copy in zRAM or elsewhere consuming memory, so it seems
> safe enough to always enable reuse in do_swap_page() for sync I/O.

Right, I meant during write faults, when a write is expected. See below.

> 
>>
>>
>> @@ -4512,7 +4516,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
>>  static inline bool should_try_to_free_swap(struct swap_info_struct *si,
>>                                            struct folio *folio,
>>                                            struct vm_area_struct *vma,
>> -                                          unsigned int extra_refs,
>> +                                          bool exclusive,
>>                                            unsigned int fault_flags)
>>  {
>>         if (!folio_test_swapcache(folio))
>> @@ -4529,13 +4533,11 @@ static inline bool should_try_to_free_swap(struct swap_info_struct *si,
>>             folio_test_mlocked(folio))
>>                 return true;
>>         /*
>> -        * If we want to map a page that's in the swapcache writable, we
>> -        * have to detect via the refcount if we're really the exclusive
>> -        * user. Try freeing the swapcache to get rid of the swapcache
>> -        * reference only in case it's likely that we'll be the exclusive user.
>> +        * We have an exclusive page that was mapped writable or will soon
>> +        * be mapped writable (as we are in a write fault). Let's just try
>> +        * to reclaim swap immediately.
>>          */
>> -       return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
>> -               folio_ref_count(folio) == (extra_refs + folio_nr_pages(folio));
>> +       return (fault_flags & FAULT_FLAG_WRITE) && exclusive;
> 
> I assume you mean (fault_flags & FAULT_FLAG_WRITE) || exclusive, or

No, I meant "we are serving a write fault (write is definitely going to happen)
and we are definitely reusing the page (exclusive).

> we can just remove (fault_flags & FAULT_FLAG_WRITE) and use
> "exclusive" instead since we are always using do_wp_page() now,

If you drop the "FAULT_FLAG_WRITE", you'd remove clean pages (that will likely
stay clean as no write fault) from the swapcache, As you correctly say above,
can avoid a pageout(), so I think we should keep that.

> and FAULT_FLAG_WRITE in fault_flags could have been cleared
> by the reuse of do_swap_page().

Ah, yes, that existing handling is nasty. We should look into not messing with
fault flags. Something like the following

diff --git a/mm/memory.c b/mm/memory.c
index ff338c2abe923..b0d8f3674525b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5052,10 +5052,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
                if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) &&
                    !pte_needs_soft_dirty_wp(vma, pte)) {
                        pte = pte_mkwrite(pte, vma);
-                       if (vmf->flags & FAULT_FLAG_WRITE) {
+                       if (vmf->flags & FAULT_FLAG_WRITE)
                                pte = pte_mkdirty(pte);
-                               vmf->flags &= ~FAULT_FLAG_WRITE;
-                       }
                }
                rmap_flags |= RMAP_EXCLUSIVE;
        }
@@ -5112,7 +5110,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
                folio_put(swapcache);
        }

-       if (vmf->flags & FAULT_FLAG_WRITE) {
+       if ((vmf->flags & FAULT_FLAG_WRITE) && !pte_write(pte)) {
                ret |= do_wp_page(vmf);
                if (ret & VM_FAULT_ERROR)
                        ret &= VM_FAULT_ERROR;


-- 
Cheers,

David

next prev parent reply	other threads:[~2026-06-30  5:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 23:16 [PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
2026-06-23 23:16 ` [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
2026-06-24 10:14   ` Kairui Song
2026-06-24 15:02   ` David Hildenbrand (Arm)
2026-06-24 21:04     ` Barry Song
2026-06-26 16:25       ` David Hildenbrand (Arm)
2026-06-27  2:44         ` Shakeel Butt
2026-06-27  7:20           ` Barry Song
2026-06-29  7:52             ` David Hildenbrand (Arm)
2026-06-29 23:16               ` Barry Song
2026-06-30  5:38                 ` David Hildenbrand (Arm)
2026-06-23 23:16 ` [PATCH v2 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
2026-06-24 15:07   ` David Hildenbrand (Arm)
2026-06-24 21:29     ` Barry Song
2026-06-23 23:16 ` [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
2026-06-24 10:16   ` Kairui Song
2026-06-24 15:10   ` David Hildenbrand (Arm)
2026-06-23 23:16 ` [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios Barry Song (Xiaomi)
2026-06-24 15:20   ` David Hildenbrand (Arm)
2026-06-24 21:14     ` Barry Song
2026-06-25 14:40       ` Kairui Song
2026-06-26 16:35         ` David Hildenbrand (Arm)
2026-06-29 23:59           ` Barry Song
2026-06-30  5:48             ` David Hildenbrand (Arm) [this message]
2026-06-30 22:36               ` Barry Song

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:ff338c2abe92 dfblob:b0d8f3674525 )
 OR (
bs:"Re: [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e7a712b-33f4-44c2-85ed-6333ddca421c@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baoquan.he@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=jp.kobryn@linux.dev \
    --cc=kasong@tencent.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=nphamcs@gmail.com \
    --cc=rppt@kernel.org \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox