* [PATCH] mm/swap_state: remove unnecessary lru_add_drain() from readahead
@ 2026-06-08 14:32 Usama Arif
2026-06-08 16:06 ` Shakeel Butt
0 siblings, 1 reply; 3+ messages in thread
From: Usama Arif @ 2026-06-08 14:32 UTC (permalink / raw)
To: Andrew Morton, riel, david, baohua, baoquan.he, chrisl, kasong,
linux-mm
Cc: hannes, shakeel.butt, nphamcs, shikemeng, youngjun.park,
linux-kernel, kernel-team, Usama Arif
swap_cluster_readahead() and swap_vma_readahead() end the readahead
loop with an explicit lru_add_drain() call. That drain is a leftover
from 2.6.12 era code and serves no functional purpose for the callers:
- do_swap_page() ignores LRU residency for the readahead folios;
it only needs the target folio it called swapin_readahead() for,
and if the write-fault path needs the target folio on the LRU to count
references accurately, it runs its own lru_add_drain() at the
wp_can_reuse_anon_folio() and do_swap_page() sites.
- shmem_swapin_cluster() immediately locks the returned folio, waits
for writeback, then operates on it - LRU residency of either the target
or the readahead folios is irrelevant.
- try_to_unuse() likewise locks the folio and calls unuse_pte() without
depending on LRU presence.
Folios newly added to the swap cache by the readahead loop sit in
the per-CPU LRU folio_batch and will be drained naturally as the
batch fills (FOLIO_BATCH_SIZE),by the next reclaim/compaction
lru_add_drain_all() and so on. The unconditional drain only
synchronously flushes a partial batch and forces contention on
lruvec_lock.
On a 176-CPU production host running a memory-pressured workload, this
path was observed to call folio_batch_move_lru() from
swap_cluster_readahead() ~28K/min, a very large source of LRU lock
traffic.
This is a direct continuation of the cleanup started in commit
1aa43598c03b ("mm: remove unnecessary calls to lru_add_drain") which
removed the equivalent drain from free_pages_and_swap_cache() with
the same rationale. A detailed reasoning for this is present in [1].
Remove both drains.
[1] https://lore.kernel.org/all/dca2824e8e88e826c6b260a831d79089b5b9c79d.camel@surriel.com/T/#u
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
mm/swap_state.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 9c3a5cf99778..6fd6e3415b71 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -836,7 +836,6 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
}
blk_finish_plug(&plug);
swap_read_unplug(splug);
- lru_add_drain(); /* Push any new pages onto the LRU now */
skip:
/* The page was likely read above, so no need for plugging here */
return swap_cache_read_folio(entry, gfp_mask, mpol, ilx, NULL, false);
@@ -951,7 +950,6 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
pte_unmap(pte);
blk_finish_plug(&plug);
swap_read_unplug(splug);
- lru_add_drain();
skip:
/* The folio was likely read above, so no need for plugging here */
folio = swap_cache_read_folio(targ_entry, gfp_mask, mpol, targ_ilx,
--
2.52.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] mm/swap_state: remove unnecessary lru_add_drain() from readahead
2026-06-08 14:32 [PATCH] mm/swap_state: remove unnecessary lru_add_drain() from readahead Usama Arif
@ 2026-06-08 16:06 ` Shakeel Butt
2026-06-08 17:39 ` JP Kobryn
0 siblings, 1 reply; 3+ messages in thread
From: Shakeel Butt @ 2026-06-08 16:06 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, riel, david, baohua, baoquan.he, chrisl, kasong,
linux-mm, hannes, nphamcs, shikemeng, youngjun.park, linux-kernel,
kernel-team, inwardvessel
On Mon, Jun 08, 2026 at 07:32:42AM -0700, Usama Arif wrote:
> swap_cluster_readahead() and swap_vma_readahead() end the readahead
> loop with an explicit lru_add_drain() call. That drain is a leftover
> from 2.6.12 era code and serves no functional purpose for the callers:
>
> - do_swap_page() ignores LRU residency for the readahead folios;
> it only needs the target folio it called swapin_readahead() for,
> and if the write-fault path needs the target folio on the LRU to count
> references accurately, it runs its own lru_add_drain() at the
> wp_can_reuse_anon_folio() and do_swap_page() sites.
>
> - shmem_swapin_cluster() immediately locks the returned folio, waits
> for writeback, then operates on it - LRU residency of either the target
> or the readahead folios is irrelevant.
>
> - try_to_unuse() likewise locks the folio and calls unuse_pte() without
> depending on LRU presence.
>
> Folios newly added to the swap cache by the readahead loop sit in
> the per-CPU LRU folio_batch and will be drained naturally as the
> batch fills (FOLIO_BATCH_SIZE),by the next reclaim/compaction
> lru_add_drain_all() and so on. The unconditional drain only
> synchronously flushes a partial batch and forces contention on
> lruvec_lock.
>
> On a 176-CPU production host running a memory-pressured workload, this
> path was observed to call folio_batch_move_lru() from
> swap_cluster_readahead() ~28K/min, a very large source of LRU lock
> traffic.
>
> This is a direct continuation of the cleanup started in commit
> 1aa43598c03b ("mm: remove unnecessary calls to lru_add_drain") which
> removed the equivalent drain from free_pages_and_swap_cache() with
> the same rationale. A detailed reasoning for this is present in [1].
>
> Remove both drains.
>
> [1] https://lore.kernel.org/all/dca2824e8e88e826c6b260a831d79089b5b9c79d.camel@surriel.com/T/#u
>
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Thanks for pushing this. JP was also looking into LRU lock contention sources.
Particularly we lack visibiluty into the lru_add_drain_all() callers. The idea
was to add tracepoints to tracks such callers. (Just nudging you towards it :P)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm/swap_state: remove unnecessary lru_add_drain() from readahead
2026-06-08 16:06 ` Shakeel Butt
@ 2026-06-08 17:39 ` JP Kobryn
0 siblings, 0 replies; 3+ messages in thread
From: JP Kobryn @ 2026-06-08 17:39 UTC (permalink / raw)
To: Shakeel Butt, Usama Arif
Cc: Andrew Morton, riel, david, baohua, baoquan.he, chrisl, kasong,
linux-mm, hannes, nphamcs, shikemeng, youngjun.park, linux-kernel,
kernel-team
On 6/8/26 9:06 AM, Shakeel Butt wrote:
> On Mon, Jun 08, 2026 at 07:32:42AM -0700, Usama Arif wrote:
>> swap_cluster_readahead() and swap_vma_readahead() end the readahead
>> loop with an explicit lru_add_drain() call. That drain is a leftover
>> from 2.6.12 era code and serves no functional purpose for the callers:
>>
>> - do_swap_page() ignores LRU residency for the readahead folios;
>> it only needs the target folio it called swapin_readahead() for,
>> and if the write-fault path needs the target folio on the LRU to count
>> references accurately, it runs its own lru_add_drain() at the
>> wp_can_reuse_anon_folio() and do_swap_page() sites.
>>
>> - shmem_swapin_cluster() immediately locks the returned folio, waits
>> for writeback, then operates on it - LRU residency of either the target
>> or the readahead folios is irrelevant.
>>
>> - try_to_unuse() likewise locks the folio and calls unuse_pte() without
>> depending on LRU presence.
>>
>> Folios newly added to the swap cache by the readahead loop sit in
>> the per-CPU LRU folio_batch and will be drained naturally as the
>> batch fills (FOLIO_BATCH_SIZE),by the next reclaim/compaction
>> lru_add_drain_all() and so on. The unconditional drain only
>> synchronously flushes a partial batch and forces contention on
>> lruvec_lock.
>>
>> On a 176-CPU production host running a memory-pressured workload, this
>> path was observed to call folio_batch_move_lru() from
>> swap_cluster_readahead() ~28K/min, a very large source of LRU lock
>> traffic.
>>
>> This is a direct continuation of the cleanup started in commit
>> 1aa43598c03b ("mm: remove unnecessary calls to lru_add_drain") which
>> removed the equivalent drain from free_pages_and_swap_cache() with
>> the same rationale. A detailed reasoning for this is present in [1].
>>
>> Remove both drains.
>>
>> [1] https://lore.kernel.org/all/dca2824e8e88e826c6b260a831d79089b5b9c79d.camel@surriel.com/T/#u
>>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
>
> Thanks for pushing this. JP was also looking into LRU lock contention sources.
> Particularly we lack visibiluty into the lru_add_drain_all() callers. The idea
> was to add tracepoints to tracks such callers. (Just nudging you towards it :P)
Yup, I'll send a patch for that one.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-08 17:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 14:32 [PATCH] mm/swap_state: remove unnecessary lru_add_drain() from readahead Usama Arif
2026-06-08 16:06 ` Shakeel Butt
2026-06-08 17:39 ` JP Kobryn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox