From: Usama Arif <usama.arif@linux.dev>
To: brauner@kernel.org, jack@suse.cz, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
linux-mm@kvack.org
Cc: hughd@google.com, boris@bur.io, clm@fb.com, dsterba@suse.com,
linux-btrfs@vger.kernel.org, cem@kernel.org,
linux-xfs@vger.kernel.org, shakeel.butt@linux.dev,
hannes@cmpxchg.org, riel@surriel.com, kernel-team@meta.com,
Usama Arif <usama.arif@linux.dev>
Subject: [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink
Date: Tue, 9 Jun 2026 05:30:47 -0700 [thread overview]
Message-ID: <20260609123047.1948242-1-usama.arif@linux.dev> (raw)
The super_block shrinker is registered with SHRINKER_MEMCG_AWARE because its
dentry and inode LRUs are memcg-aware (via list_lru). But the optional
->nr_cached_objects() hooks that the shrinker also drives are not memcg-aware:
btrfs extent maps and xfs inode reclaim operate on filesystem-global
state, and shmem's unused-huge shrinker walks a per-superblock shrinklist.
None of them filter by sc->memcg.
The mismatch shows up under memcg-heavy slab reclaim. shrink_slab_memcg()
calls do_shrink_slab() once per (memcg, NUMA node) pair for every memcg
whose bit is set in the per-superblock shrinker bitmap, which on a busy
host means hundreds of calls per reclaim pass. Each scan queues the same
global shrinker work item that's already kicked from the root path.
Because btrfs/xfs global count is typically non-zero on any in-use filesystem,
the returned total stays positive even if a memcg's own dentry/inode LRUs
are empty. shrink_slab_memcg() therefore never clears the SB shrinker bit
in the memcg bitmap, so subsequent reclaim passes from the same memcg
re-enter super_cache_count() and pay for the global counter walk again.
Restrict ->nr_cached_objects() to the global shrink path (sc->memcg NULL
or root). The memcg-aware dentry/inode LRUs keep being counted and
scanned per memcg as before; only the global fs-specific hooks are skipped.
The root/global shrink path still drives those hooks; only their
invocation from non-root memcg slab reclaim is removed.
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
fs/super.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index 378e81efe643..5216c5dbd4c4 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -24,6 +24,7 @@
#include <linux/export.h>
#include <linux/slab.h>
#include <linux/blkdev.h>
+#include <linux/memcontrol.h>
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/writeback.h> /* for the emergency remount stuff */
@@ -169,6 +170,19 @@ static void super_wake(struct super_block *sb, unsigned int flag)
wake_up_var(&sb->s_flags);
}
+/*
+ * The s_op->nr_cached_objects hooks (used for example by btrfs and xfs)
+ * operate on filesystem-global state and ignore sc->memcg. Driving them
+ * from per-memcg shrink_slab_memcg() invocations only burns CPU walking
+ * per-cpu counters and queueing duplicate work: the actual reclaim happens on
+ * the global path (kswapd or root direct reclaim) regardless. Restrict them
+ * to that path.
+ */
+static inline bool super_fs_objects_eligible(struct shrink_control *sc)
+{
+ return !sc->memcg || mem_cgroup_is_root(sc->memcg);
+}
+
/*
* One thing we have to be careful of with a per-sb shrinker is that we don't
* drop the last active reference to the superblock from within the shrinker.
@@ -198,7 +212,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
if (!super_trylock_shared(sb))
return SHRINK_STOP;
- if (sb->s_op->nr_cached_objects)
+ if (sb->s_op->nr_cached_objects && super_fs_objects_eligible(sc))
fs_objects = sb->s_op->nr_cached_objects(sb, sc);
inodes = list_lru_shrink_count(&sb->s_inode_lru, sc);
@@ -259,7 +273,8 @@ static unsigned long super_cache_count(struct shrinker *shrink,
return 0;
smp_rmb();
- if (sb->s_op && sb->s_op->nr_cached_objects)
+ if (sb->s_op && sb->s_op->nr_cached_objects &&
+ super_fs_objects_eligible(sc))
total_objects = sb->s_op->nr_cached_objects(sb, sc);
total_objects += list_lru_shrink_count(&sb->s_dentry_lru, sc);
--
2.52.0
next reply other threads:[~2026-06-09 12:31 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 12:30 Usama Arif [this message]
2026-06-09 12:52 ` [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink Jan Kara
2026-06-09 13:38 ` Usama Arif
2026-06-23 10:14 ` Christian Brauner
2026-06-23 18:08 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260609123047.1948242-1-usama.arif@linux.dev \
--to=usama.arif@linux.dev \
--cc=boris@bur.io \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kernel-team@meta.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=riel@surriel.com \
--cc=shakeel.butt@linux.dev \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.