From: Michal Hocko <mhocko@suse.com>
To: "JP Kobryn (Meta)" <jp.kobryn@linux.dev>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, vbabka@kernel.org,
willy@infradead.org, hannes@cmpxchg.org, shakeel.butt@linux.dev,
riel@surriel.com, chrisl@kernel.org, kasong@tencent.com,
shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com,
baohua@kernel.org, youngjun.park@lge.com, qi.zheng@linux.dev,
axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] mm/lruvec: preemptively free dead folios during lru_add drain
Date: Fri, 24 Apr 2026 10:32:27 +0200 [thread overview]
Message-ID: <aesqm1S6otma3c11@tiehlicka> (raw)
In-Reply-To: <20260423164307.29805-1-jp.kobryn@linux.dev>
On Thu 23-04-26 09:43:07, JP Kobryn (Meta) wrote:
> Of all observable lruvec lock contention in our fleet, we find that ~24%
> occurs when dead folios are present in lru_add batches at drain time. This
> is wasteful in the sense that the folio is added to the LRU just to be
> immediately removed via folios_put_refs(), incurring two unnecessary lock
> acquisitions.
>
> Eliminate this overhead by preemptively cleaning up dead folios before they
> make it into the LRU. Use folio_ref_freeze() to filter folios whose only
> remaining refcount is the batch ref. When dead folios are found, move them
> off the add batch and onto a temporary batch to be freed.
>
> During A/B testing on one of our prod instagram workloads (high-frequency
> short-lived requests), the patch intercepted almost all dead folios before
> they entered the LRU. Data collected using the mm_lru_insertion tracepoint
> shows the effectiveness of the patch:
>
> Per-host LRU add averages at 95% CPU load
> (60 hosts each side, 3 x 60s intervals)
>
> dead folios/min total folios/min dead %
> unpatched: 1,297,785 19,341,986 6.7097%
> patched: 14 19,039,996 0.0001%
>
> Within this workload, we save ~2.6M lock acquisitions per minute per host
> as a result.
>
> System-wide memory stats improved on the patched side also at 95% CPU load:
> - direct reclaim scanning reduced 7%
> - allocation stalls reduced 5.2%
> - compaction stalls reduced 12.3%
> - page frees reduced 4.9%
>
> No regressions were observed in requests served per second or request tail
> latency (p99). Both metrics showed directional improvement at higher CPU
> utilization (comparing 85% to 95%).
>
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!
> ---
> mm/swap.c | 36 +++++++++++++++++++++++++++++++++++-
> 1 file changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 5cc44f0de9877..71607b0ce3d18 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -160,13 +160,36 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
> int i;
> struct lruvec *lruvec = NULL;
> unsigned long flags = 0;
> + struct folio_batch free_fbatch;
> + bool is_lru_add = (move_fn == lru_add);
> +
> + /*
> + * If we're adding to the LRU, preemptively filter dead folios. Use
> + * this dedicated folio batch for temp storage and deferred cleanup.
> + */
> + if (is_lru_add)
> + folio_batch_init(&free_fbatch);
>
> for (i = 0; i < folio_batch_count(fbatch); i++) {
> struct folio *folio = fbatch->folios[i];
>
> /* block memcg migration while the folio moves between lru */
> - if (move_fn != lru_add && !folio_test_clear_lru(folio))
> + if (!is_lru_add && !folio_test_clear_lru(folio))
> + continue;
> +
> + /*
> + * Filter dead folios by moving them from the add batch to the temp
> + * batch for freeing after this loop.
> + *
> + * Since the folio may be part of a huge page, unqueue from
> + * deferred split list to avoid a dangling list entry.
> + */
> + if (is_lru_add && folio_ref_freeze(folio, 1)) {
> + folio_unqueue_deferred_split(folio);
> + fbatch->folios[i] = NULL;
> + folio_batch_add(&free_fbatch, folio);
> continue;
> + }
>
> folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
> move_fn(lruvec, folio);
> @@ -176,6 +199,13 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
>
> if (lruvec)
> lruvec_unlock_irqrestore(lruvec, flags);
> +
> + /* Cleanup filtered dead folios. */
> + if (is_lru_add) {
> + mem_cgroup_uncharge_folios(&free_fbatch);
> + free_unref_folios(&free_fbatch);
> + }
> +
> folios_put(fbatch);
> }
>
> @@ -964,6 +994,10 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs)
> struct folio *folio = folios->folios[i];
> unsigned int nr_refs = refs ? refs[i] : 1;
>
> + /* Folio batch entry may have been preemptively removed during drain. */
> + if (!folio)
> + continue;
> +
> if (is_huge_zero_folio(folio))
> continue;
>
> --
> 2.52.0
--
Michal Hocko
SUSE Labs
prev parent reply other threads:[~2026-04-24 8:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 16:43 [PATCH] mm/lruvec: preemptively free dead folios during lru_add drain JP Kobryn (Meta)
2026-04-23 17:15 ` Matthew Wilcox
2026-04-23 18:21 ` JP Kobryn (Meta)
2026-04-23 18:46 ` Shakeel Butt
2026-04-23 21:18 ` JP Kobryn (Meta)
2026-04-23 22:45 ` Shakeel Butt
2026-04-23 23:22 ` Barry Song
2026-04-23 23:46 ` Shakeel Butt
2026-04-23 23:53 ` Barry Song
2026-04-24 1:46 ` JP Kobryn (Meta)
2026-04-24 15:38 ` JP Kobryn (Meta)
2026-04-24 16:30 ` Shakeel Butt
2026-04-24 7:37 ` [syzbot ci] " syzbot ci
2026-04-24 8:32 ` Michal Hocko [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aesqm1S6otma3c11@tiehlicka \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jp.kobryn@linux.dev \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=qi.zheng@linux.dev \
--cc=riel@surriel.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=youngjun.park@lge.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.