Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Barry Song <baohua@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	riel@surriel.com, david@kernel.org, baoquan.he@linux.dev,
	chrisl@kernel.org, kasong@tencent.com, linux-mm@kvack.org,
	hannes@cmpxchg.org, shakeel.butt@linux.dev, nphamcs@gmail.com,
	shikemeng@huaweicloud.com, youngjun.park@lge.com,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] mm/swap_state: remove unnecessary lru_add_drain() from readahead
Date: Tue, 9 Jun 2026 10:29:53 +0100	[thread overview]
Message-ID: <50c03b64-1809-4000-88dd-ab147ddc6620@linux.dev> (raw)
In-Reply-To: <CAGsJ_4xOcTTkpVnno9RgTAkj5-62dLASUa9YnJYSLOEiU1ULww@mail.gmail.com>



On 09/06/2026 09:01, Barry Song wrote:
> On Mon, Jun 8, 2026 at 10:33 PM Usama Arif <usama.arif@linux.dev> wrote:
>>
>> swap_cluster_readahead() and swap_vma_readahead() end the readahead
>> loop with an explicit lru_add_drain() call. That drain is a leftover
>> from 2.6.12 era code and serves no functional purpose for the callers:
>>
>> - do_swap_page() ignores LRU residency for the readahead folios;
>>   it only needs the target folio it called swapin_readahead() for,
>>   and if the write-fault path needs the target folio on the LRU to count
>>   references accurately, it runs its own lru_add_drain() at the
>>   wp_can_reuse_anon_folio() and do_swap_page() sites.
> 
> right. as i can see the below in do_swap_page():
> 
>         /*
>          * If we want to map a page that's in the swapcache writable, we
>          * have to detect via the refcount if we're really the exclusive
>          * owner. Try removing the extra reference from the local LRU
>          * caches if required.
>          */
>         if ((vmf->flags & FAULT_FLAG_WRITE) &&
>             !folio_test_ksm(folio) && !folio_test_lru(folio))
>                 lru_add_drain();
> 
> and the below in wp_can_reuse_anon_folio():
> 
>         if (!folio_test_lru(folio))
>                 /*
>                  * We cannot easily detect+handle references from
>                  * remote LRU caches or references to LRU folios.
>                  */
>                 lru_add_drain();
> 
>>
>> - shmem_swapin_cluster() immediately locks the  returned folio, waits
>>   for writeback, then operates on it - LRU residency of either the target
>>   or the readahead folios is irrelevant.
>>
>> - try_to_unuse() likewise locks the folio and calls unuse_pte() without
>>   depending on LRU presence.
>>
>> Folios newly added to the swap cache by the readahead loop sit in
>> the per-CPU LRU folio_batch and will be drained naturally as the
>> batch fills (FOLIO_BATCH_SIZE),by the next reclaim/compaction
>> lru_add_drain_all() and so on.  The unconditional drain only
>> synchronously flushes a partial batch and forces contention on
>> lruvec_lock.
>>
>> On a 176-CPU production host running a memory-pressured workload, this
>> path was observed to call folio_batch_move_lru() from
>> swap_cluster_readahead() ~28K/min, a very large source of LRU lock
>> traffic.
>>
> 
> Do we see a workload improvement? If yes, can we put the data?
> 

Hello Barry!

So lru lock contention is a source of issue in the meta fleet.

This problem was specifically seen when I ran `perf lock contention -a -b`
in production on a workload that has a really big anon heap and heavy swap
activity.

When I tried to trace with bpftrace who was the biggest consumer, it was
readahead.

It is easy to run perf and bpftrace on prod on this specific workload, but
more difficult to flash a new kernel and see results. The easiest would be
when kernel upgrade happens and this patch lands to see the difference and
I can report back.

Thanks,
Usama

> Thanks
> Barry



  reply	other threads:[~2026-06-09  9:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08 14:32 [PATCH] mm/swap_state: remove unnecessary lru_add_drain() from readahead Usama Arif
2026-06-08 16:06 ` Shakeel Butt
2026-06-08 17:39   ` JP Kobryn
2026-06-09  8:01 ` Barry Song
2026-06-09  9:29   ` Usama Arif [this message]
2026-06-09 10:03     ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50c03b64-1809-4000-88dd-ab147ddc6620@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baoquan.he@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox