Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Barry Song <baohua@kernel.org>, Shakeel Butt <shakeel.butt@linux.dev>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	baoquan.he@linux.dev, chrisl@kernel.org, jp.kobryn@linux.dev,
	kasong@tencent.com, liam@infradead.org,
	linux-kernel@vger.kernel.org, ljs@kernel.org, mhocko@suse.com,
	nphamcs@gmail.com, rppt@kernel.org, shikemeng@huaweicloud.com,
	surenb@google.com, usama.arif@linux.dev, vbabka@kernel.org,
	youngjun.park@lge.com
Subject: Re: [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio()
Date: Mon, 29 Jun 2026 09:52:38 +0200	[thread overview]
Message-ID: <3903def5-e823-4fe3-9784-b6bd63470516@kernel.org> (raw)
In-Reply-To: <CAGsJ_4xbYOVvonZg6d8eeuKcY9SUic3zXLbWUvxJrbf=U9sQnw@mail.gmail.com>

On 6/27/26 09:20, Barry Song wrote:
> On Sat, Jun 27, 2026 at 10:44 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>
>> On Fri, Jun 26, 2026 at 06:25:48PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> is this with this patch only?
> 
> Right. wp_reuse_skipped_drain accounts for the behavior
> introduced by this patch, while do_swap_skipped_drain comes
> from patch 3/3.
> 
>>>
>>>
>>> This is all data that belongs into this patch description, not hidden
>>> somewhere on the internet :)
> 
> Agreed. Sorry, I was being a bit lazy.
> 
>>
>> I asked the same i.e. to include this information in the commit message [1].
>>
>> [1] https://lore.kernel.org/linux-mm/ajK4N2zK3sPZFQuf@linux.dev/
> 
> Sorry, I missed this one. I'll add it in v3.
> 
>>
>>>
>>>
>>> And why do we care about local draining? This is not a drain-all.
>>
>> Let me take a stab at it. Local draining can potentially make the lru cache's
>> batching ineffective. The whole reason for lru caches is to batch the lru
>> operations to keep the lru lock contention low but if we keep draining
>> prematurely we will continue to make lru lock contention worse. This is
>> particularly bad for workloads which chrun a lot of LRU pages through
>> allocation, free and/or reclaim.
> 
> I actually asked the same question myself in the RFC here[1]:
> 
> "On the other hand, the folio may be sitting in the lru_cache of
> another CPU, which the current drain cannot flush. As things stand
> today, the drain only succeeds if the folio happens to be queued on
> the current CPU's lru_cache, giving it roughly a 1/nr_cpus chance of
> working. drain_all would clearly be too expensive to avoid.
> So another possibility is to drop this drain as well. The folio can
> be released later, at the cost of missing some opportunities for
> reuse."
> 
> After thinking about it for a couple of days, my gut feeling is
> that keeping it might increase the chances of reuse in at least
> two cases:
> 1. It could be a newly allocated folio in do_swap_page(), and
> do_swap_page() may subsequently call do_wp_page(). In
> that case, task migration is less likely to have occurred.
> 2. On systems with fewer CPUs (and typically less memory), the
> chance of having the folio in the same CPU's lru_cache,
> which is roughly 1 / nr_cpus, may still be reasonably high.
> 
> I'm not entirely sure about this, so I chose a relatively
> conservative approach: making an obviously correct improvement
> without introducing dramatic changes. Another reason is that it's
> also hard to evaluate the impact of removing the local drain
> across all systems and workloads.
> 
> Shakeel, David, what are your thoughts on this? Should we go
> with a code refinement in v3 as David suggested, or completely
> remove this drain?

When we added that code, we didn't have PageAnonExclusive yet. I documented that in:

commit d4c470970d45c863fafc757521a82be2f80b1232
Author: David Hildenbrand <david@kernel.org>
Date:   Thu Mar 24 18:13:34 2022 -0700

    mm: optimize do_wp_page() for fresh pages in local LRU pagevecs

    For example, if a page just got swapped in via a read fault, the LRU
    pagevecs might still hold a reference to the page.  If we trigger a write
    fault on such a page, the additional reference from the LRU pagevecs will
    prohibit reusing the page.

    Let's conditionally drain the local LRU pagevecs when we stumble over a
    !PageLRU() page.  We cannot easily drain remote LRU pagevecs and it might
    not be desirable performance-wise.  Consequently, this will only avoid
    copying in some cases.

    Add a simple "page_count(page) > 3" check first but keep the
    "page_count(page) > 1 + PageSwapCache(page)" check in place, as we want to
    minimize cases where we remove a page from the swapcache but won't be able
    to reuse it, for example, because another process has it mapped R/O, to
    not affect reclaim.

    We cannot easily handle the following cases and we will always have to
    copy:

    (1) The page is referenced in the LRU pagevecs of other CPUs. We really
        would have to drain the LRU pagevecs of all CPUs -- most probably
        copying is much cheaper.

    (2) The page is already PageLRU() but is getting moved between LRU
        lists, for example, for activation (e.g., mark_page_accessed()),
        deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to
        drain mostly unconditionally, which might be bad performance-wise.
        Most probably this won't happen too often in practice.

    Note that there are other reasons why an anon page might temporarily not
    be PageLRU(): for example, compaction and migration have to isolate LRU
    pages from the LRU lists first (isolate_lru_page()), moving them to
    temporary local lists and clearing PageLRU() and holding an additional
    reference on the page.  In that case, we'll always copy.

    This change seems to be fairly effective with the reproducer [1] shared by
    Nadav, as long as writeback is done synchronously, for example, using
    zram.  However, with asynchronous writeback, we'll usually fail to free
    the swapcache because the page is still under writeback: something we
    cannot easily optimize for, and maybe it's not really relevant in
    practice.

    [1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com

With PAE, this case would now be handled differently in the simple alloc+swapin
scenario. Throw in a fork()+execve() in between and the optimization would still
matter.

So yeah, if Nadav's reproducer works even without the drain (which I assume), we
could indeed think about just removing this optimization here.

-- 
Cheers,

David


  reply	other threads:[~2026-06-29  7:52 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 23:16 [PATCH v2 0/4] mm: drop redundant lru_add_drain in anon folio reuse paths Barry Song (Xiaomi)
2026-06-23 23:16 ` [PATCH v2 1/4] mm: avoid unnecessary lru drain for wp_can_reuse_anon_folio() Barry Song (Xiaomi)
2026-06-24 10:14   ` Kairui Song
2026-06-24 15:02   ` David Hildenbrand (Arm)
2026-06-24 21:04     ` Barry Song
2026-06-26 16:25       ` David Hildenbrand (Arm)
2026-06-27  2:44         ` Shakeel Butt
2026-06-27  7:20           ` Barry Song
2026-06-29  7:52             ` David Hildenbrand (Arm) [this message]
2026-06-23 23:16 ` [PATCH v2 2/4] mm: drop stale folio_ref_count()==1 check in do_swap_page reuse logic Barry Song (Xiaomi)
2026-06-24 15:07   ` David Hildenbrand (Arm)
2026-06-24 21:29     ` Barry Song
2026-06-23 23:16 ` [PATCH v2 3/4] mm: entirely remove lru_add_drain in do_swap_page Barry Song (Xiaomi)
2026-06-24 10:16   ` Kairui Song
2026-06-24 15:10   ` David Hildenbrand (Arm)
2026-06-23 23:16 ` [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios Barry Song (Xiaomi)
2026-06-24 15:20   ` David Hildenbrand (Arm)
2026-06-24 21:14     ` Barry Song
2026-06-25 14:40       ` Kairui Song
2026-06-26 16:35         ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3903def5-e823-4fe3-9784-b6bd63470516@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baoquan.he@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=jp.kobryn@linux.dev \
    --cc=kasong@tencent.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=nphamcs@gmail.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox