All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] memcg: use ratelimited stats flush in the reclaim
@ 2024-06-15  8:12 Shakeel Butt
  2024-06-16  0:28 ` Yosry Ahmed
  0 siblings, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2024-06-15  8:12 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner
  Cc: Michal Hocko, Roman Gushchin, Yosry Ahmed, Jesper Dangaard Brouer,
	Yu Zhao, Muchun Song, Facebook Kernel Team, linux-mm,
	linux-kernel

The Meta prod is seeing large amount of stalls in memcg stats flush
from the memcg reclaim code path. At the moment, this specific callsite
is doing a synchronous memcg stats flush. The rstat flush is an
expensive and time consuming operation, so concurrent relaimers will
busywait on the lock potentially for a long time. Actually this issue is
not unique to Meta and has been observed by Cloudflare [1] as well. For
the Cloudflare case, the stalls were due to contention between kswapd
threads running on their 8 numa node machines which does not make sense
as rstat flush is global and flush from one kswapd thread should be
sufficient for all. Simply replace the synchronous flush with the
ratelimited one.

One may raise a concern on potentially using 2 sec stale (at worst)
stats for heuristics like desirable inactive:active ratio and preferring
inactive file pages over anon pages but these specific heuristics do not
require very precise stats and also are ignored under severe memory
pressure. This patch has been running on Meta fleet for more than a
month and we have not observed any issues. Please note that MGLRU is not
impacted by this issue at all as it avoids rstat flushing completely.

Link: https://lore.kernel.org/all/6ee2518b-81dd-4082-bdf5-322883895ffc@kernel.org [1]
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c0429fd6c573..bda4f92eba71 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2263,7 +2263,7 @@ static void prepare_scan_control(pg_data_t *pgdat, struct scan_control *sc)
 	 * Flush the memory cgroup stats, so that we read accurate per-memcg
 	 * lruvec stats for heuristics.
 	 */
-	mem_cgroup_flush_stats(sc->target_mem_cgroup);
+	mem_cgroup_flush_stats_ratelimited(sc->target_mem_cgroup);
 
 	/*
 	 * Determine the scan balance between anon and file LRUs.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-06-24 21:41 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-15  8:12 [PATCH] memcg: use ratelimited stats flush in the reclaim Shakeel Butt
2024-06-16  0:28 ` Yosry Ahmed
2024-06-17 15:31   ` Jesper Dangaard Brouer
2024-06-17 18:01     ` Shakeel Butt
2024-06-18 15:53       ` Jesper Dangaard Brouer
2024-06-18 18:07         ` Shakeel Butt
2024-06-21  7:35         ` [PATCH RFC] cgroup/rstat: avoid thundering herd problem on root cgrp Jesper Dangaard Brouer
2024-06-17 17:20   ` [PATCH] memcg: use ratelimited stats flush in the reclaim Shakeel Butt
2024-06-24 12:57     ` Yosry Ahmed
2024-06-24 17:02       ` Shakeel Butt
2024-06-24 17:15         ` Yosry Ahmed
2024-06-24 18:59           ` Shakeel Butt
2024-06-24 19:06             ` Yosry Ahmed
2024-06-24 20:01               ` Shakeel Butt
2024-06-24 21:41                 ` Yosry Ahmed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.