* [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink
@ 2026-06-09 12:30 Usama Arif
2026-06-09 12:52 ` Jan Kara
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Usama Arif @ 2026-06-09 12:30 UTC (permalink / raw)
To: brauner, jack, linux-fsdevel, linux-kernel, Al Viro, linux-mm
Cc: hughd, boris, clm, dsterba, linux-btrfs, cem, linux-xfs,
shakeel.butt, hannes, riel, kernel-team, Usama Arif
The super_block shrinker is registered with SHRINKER_MEMCG_AWARE because its
dentry and inode LRUs are memcg-aware (via list_lru). But the optional
->nr_cached_objects() hooks that the shrinker also drives are not memcg-aware:
btrfs extent maps and xfs inode reclaim operate on filesystem-global
state, and shmem's unused-huge shrinker walks a per-superblock shrinklist.
None of them filter by sc->memcg.
The mismatch shows up under memcg-heavy slab reclaim. shrink_slab_memcg()
calls do_shrink_slab() once per (memcg, NUMA node) pair for every memcg
whose bit is set in the per-superblock shrinker bitmap, which on a busy
host means hundreds of calls per reclaim pass. Each scan queues the same
global shrinker work item that's already kicked from the root path.
Because btrfs/xfs global count is typically non-zero on any in-use filesystem,
the returned total stays positive even if a memcg's own dentry/inode LRUs
are empty. shrink_slab_memcg() therefore never clears the SB shrinker bit
in the memcg bitmap, so subsequent reclaim passes from the same memcg
re-enter super_cache_count() and pay for the global counter walk again.
Restrict ->nr_cached_objects() to the global shrink path (sc->memcg NULL
or root). The memcg-aware dentry/inode LRUs keep being counted and
scanned per memcg as before; only the global fs-specific hooks are skipped.
The root/global shrink path still drives those hooks; only their
invocation from non-root memcg slab reclaim is removed.
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
fs/super.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index 378e81efe643..5216c5dbd4c4 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -24,6 +24,7 @@
#include <linux/export.h>
#include <linux/slab.h>
#include <linux/blkdev.h>
+#include <linux/memcontrol.h>
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/writeback.h> /* for the emergency remount stuff */
@@ -169,6 +170,19 @@ static void super_wake(struct super_block *sb, unsigned int flag)
wake_up_var(&sb->s_flags);
}
+/*
+ * The s_op->nr_cached_objects hooks (used for example by btrfs and xfs)
+ * operate on filesystem-global state and ignore sc->memcg. Driving them
+ * from per-memcg shrink_slab_memcg() invocations only burns CPU walking
+ * per-cpu counters and queueing duplicate work: the actual reclaim happens on
+ * the global path (kswapd or root direct reclaim) regardless. Restrict them
+ * to that path.
+ */
+static inline bool super_fs_objects_eligible(struct shrink_control *sc)
+{
+ return !sc->memcg || mem_cgroup_is_root(sc->memcg);
+}
+
/*
* One thing we have to be careful of with a per-sb shrinker is that we don't
* drop the last active reference to the superblock from within the shrinker.
@@ -198,7 +212,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
if (!super_trylock_shared(sb))
return SHRINK_STOP;
- if (sb->s_op->nr_cached_objects)
+ if (sb->s_op->nr_cached_objects && super_fs_objects_eligible(sc))
fs_objects = sb->s_op->nr_cached_objects(sb, sc);
inodes = list_lru_shrink_count(&sb->s_inode_lru, sc);
@@ -259,7 +273,8 @@ static unsigned long super_cache_count(struct shrinker *shrink,
return 0;
smp_rmb();
- if (sb->s_op && sb->s_op->nr_cached_objects)
+ if (sb->s_op && sb->s_op->nr_cached_objects &&
+ super_fs_objects_eligible(sc))
total_objects = sb->s_op->nr_cached_objects(sb, sc);
total_objects += list_lru_shrink_count(&sb->s_dentry_lru, sc);
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink
2026-06-09 12:30 [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink Usama Arif
@ 2026-06-09 12:52 ` Jan Kara
2026-06-09 13:38 ` Usama Arif
2026-06-23 10:14 ` Christian Brauner
2026-06-23 18:08 ` Shakeel Butt
2 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2026-06-09 12:52 UTC (permalink / raw)
To: Usama Arif
Cc: brauner, jack, linux-fsdevel, linux-kernel, Al Viro, linux-mm,
hughd, boris, clm, dsterba, linux-btrfs, cem, linux-xfs,
shakeel.butt, hannes, riel, kernel-team
On Tue 09-06-26 05:30:47, Usama Arif wrote:
> The super_block shrinker is registered with SHRINKER_MEMCG_AWARE because its
> dentry and inode LRUs are memcg-aware (via list_lru). But the optional
> ->nr_cached_objects() hooks that the shrinker also drives are not memcg-aware:
> btrfs extent maps and xfs inode reclaim operate on filesystem-global
> state, and shmem's unused-huge shrinker walks a per-superblock shrinklist.
> None of them filter by sc->memcg.
>
> The mismatch shows up under memcg-heavy slab reclaim. shrink_slab_memcg()
> calls do_shrink_slab() once per (memcg, NUMA node) pair for every memcg
> whose bit is set in the per-superblock shrinker bitmap, which on a busy
> host means hundreds of calls per reclaim pass. Each scan queues the same
> global shrinker work item that's already kicked from the root path.
>
> Because btrfs/xfs global count is typically non-zero on any in-use filesystem,
> the returned total stays positive even if a memcg's own dentry/inode LRUs
> are empty. shrink_slab_memcg() therefore never clears the SB shrinker bit
> in the memcg bitmap, so subsequent reclaim passes from the same memcg
> re-enter super_cache_count() and pay for the global counter walk again.
>
> Restrict ->nr_cached_objects() to the global shrink path (sc->memcg NULL
> or root). The memcg-aware dentry/inode LRUs keep being counted and
> scanned per memcg as before; only the global fs-specific hooks are skipped.
> The root/global shrink path still drives those hooks; only their
> invocation from non-root memcg slab reclaim is removed.
>
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
To me this makes sense. However I'm bit surprised that XFS inode shrinker
(which is what gets counted in nr_cached_objects for XFS) isn't memcg
aware. I guess since these inodes are on their way to a relatively quick
destruction, nobody really bothered. So I'm fine with the change, just
I'd like to make sure XFS folks are aware and don't plan anything in this
area.
Honza
> ---
> fs/super.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index 378e81efe643..5216c5dbd4c4 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -24,6 +24,7 @@
> #include <linux/export.h>
> #include <linux/slab.h>
> #include <linux/blkdev.h>
> +#include <linux/memcontrol.h>
> #include <linux/mount.h>
> #include <linux/security.h>
> #include <linux/writeback.h> /* for the emergency remount stuff */
> @@ -169,6 +170,19 @@ static void super_wake(struct super_block *sb, unsigned int flag)
> wake_up_var(&sb->s_flags);
> }
>
> +/*
> + * The s_op->nr_cached_objects hooks (used for example by btrfs and xfs)
> + * operate on filesystem-global state and ignore sc->memcg. Driving them
> + * from per-memcg shrink_slab_memcg() invocations only burns CPU walking
> + * per-cpu counters and queueing duplicate work: the actual reclaim happens on
> + * the global path (kswapd or root direct reclaim) regardless. Restrict them
> + * to that path.
> + */
> +static inline bool super_fs_objects_eligible(struct shrink_control *sc)
> +{
> + return !sc->memcg || mem_cgroup_is_root(sc->memcg);
> +}
> +
> /*
> * One thing we have to be careful of with a per-sb shrinker is that we don't
> * drop the last active reference to the superblock from within the shrinker.
> @@ -198,7 +212,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
> if (!super_trylock_shared(sb))
> return SHRINK_STOP;
>
> - if (sb->s_op->nr_cached_objects)
> + if (sb->s_op->nr_cached_objects && super_fs_objects_eligible(sc))
> fs_objects = sb->s_op->nr_cached_objects(sb, sc);
>
> inodes = list_lru_shrink_count(&sb->s_inode_lru, sc);
> @@ -259,7 +273,8 @@ static unsigned long super_cache_count(struct shrinker *shrink,
> return 0;
> smp_rmb();
>
> - if (sb->s_op && sb->s_op->nr_cached_objects)
> + if (sb->s_op && sb->s_op->nr_cached_objects &&
> + super_fs_objects_eligible(sc))
> total_objects = sb->s_op->nr_cached_objects(sb, sc);
>
> total_objects += list_lru_shrink_count(&sb->s_dentry_lru, sc);
> --
> 2.52.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink
2026-06-09 12:52 ` Jan Kara
@ 2026-06-09 13:38 ` Usama Arif
0 siblings, 0 replies; 5+ messages in thread
From: Usama Arif @ 2026-06-09 13:38 UTC (permalink / raw)
To: Jan Kara
Cc: brauner, linux-fsdevel, linux-kernel, Al Viro, linux-mm, hughd,
boris, clm, dsterba, linux-btrfs, cem, linux-xfs, shakeel.butt,
hannes, riel, kernel-team
On 09/06/2026 13:52, Jan Kara wrote:
> On Tue 09-06-26 05:30:47, Usama Arif wrote:
>> The super_block shrinker is registered with SHRINKER_MEMCG_AWARE because its
>> dentry and inode LRUs are memcg-aware (via list_lru). But the optional
>> ->nr_cached_objects() hooks that the shrinker also drives are not memcg-aware:
>> btrfs extent maps and xfs inode reclaim operate on filesystem-global
>> state, and shmem's unused-huge shrinker walks a per-superblock shrinklist.
>> None of them filter by sc->memcg.
>>
>> The mismatch shows up under memcg-heavy slab reclaim. shrink_slab_memcg()
>> calls do_shrink_slab() once per (memcg, NUMA node) pair for every memcg
>> whose bit is set in the per-superblock shrinker bitmap, which on a busy
>> host means hundreds of calls per reclaim pass. Each scan queues the same
>> global shrinker work item that's already kicked from the root path.
>>
>> Because btrfs/xfs global count is typically non-zero on any in-use filesystem,
>> the returned total stays positive even if a memcg's own dentry/inode LRUs
>> are empty. shrink_slab_memcg() therefore never clears the SB shrinker bit
>> in the memcg bitmap, so subsequent reclaim passes from the same memcg
>> re-enter super_cache_count() and pay for the global counter walk again.
>>
>> Restrict ->nr_cached_objects() to the global shrink path (sc->memcg NULL
>> or root). The memcg-aware dentry/inode LRUs keep being counted and
>> scanned per memcg as before; only the global fs-specific hooks are skipped.
>> The root/global shrink path still drives those hooks; only their
>> invocation from non-root memcg slab reclaim is removed.
>>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>
> To me this makes sense. However I'm bit surprised that XFS inode shrinker
> (which is what gets counted in nr_cached_objects for XFS) isn't memcg
> aware. I guess since these inodes are on their way to a relatively quick
> destruction, nobody really bothered. So I'm fine with the change, just
> I'd like to make sure XFS folks are aware and don't plan anything in this
> area.
>
Thanks for the review!
Yes I am hoping that if there are any objections from xfs or btrfs, it gets
raised. Have cc'ed the btrfs and xfs maintainers and reviewers.
Thanks!
Usama
> Honza
>
>> ---
>> fs/super.c | 19 +++++++++++++++++--
>> 1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/super.c b/fs/super.c
>> index 378e81efe643..5216c5dbd4c4 100644
>> --- a/fs/super.c
>> +++ b/fs/super.c
>> @@ -24,6 +24,7 @@
>> #include <linux/export.h>
>> #include <linux/slab.h>
>> #include <linux/blkdev.h>
>> +#include <linux/memcontrol.h>
>> #include <linux/mount.h>
>> #include <linux/security.h>
>> #include <linux/writeback.h> /* for the emergency remount stuff */
>> @@ -169,6 +170,19 @@ static void super_wake(struct super_block *sb, unsigned int flag)
>> wake_up_var(&sb->s_flags);
>> }
>>
>> +/*
>> + * The s_op->nr_cached_objects hooks (used for example by btrfs and xfs)
>> + * operate on filesystem-global state and ignore sc->memcg. Driving them
>> + * from per-memcg shrink_slab_memcg() invocations only burns CPU walking
>> + * per-cpu counters and queueing duplicate work: the actual reclaim happens on
>> + * the global path (kswapd or root direct reclaim) regardless. Restrict them
>> + * to that path.
>> + */
>> +static inline bool super_fs_objects_eligible(struct shrink_control *sc)
>> +{
>> + return !sc->memcg || mem_cgroup_is_root(sc->memcg);
>> +}
>> +
>> /*
>> * One thing we have to be careful of with a per-sb shrinker is that we don't
>> * drop the last active reference to the superblock from within the shrinker.
>> @@ -198,7 +212,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
>> if (!super_trylock_shared(sb))
>> return SHRINK_STOP;
>>
>> - if (sb->s_op->nr_cached_objects)
>> + if (sb->s_op->nr_cached_objects && super_fs_objects_eligible(sc))
>> fs_objects = sb->s_op->nr_cached_objects(sb, sc);
>>
>> inodes = list_lru_shrink_count(&sb->s_inode_lru, sc);
>> @@ -259,7 +273,8 @@ static unsigned long super_cache_count(struct shrinker *shrink,
>> return 0;
>> smp_rmb();
>>
>> - if (sb->s_op && sb->s_op->nr_cached_objects)
>> + if (sb->s_op && sb->s_op->nr_cached_objects &&
>> + super_fs_objects_eligible(sc))
>> total_objects = sb->s_op->nr_cached_objects(sb, sc);
>>
>> total_objects += list_lru_shrink_count(&sb->s_dentry_lru, sc);
>> --
>> 2.52.0
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink
2026-06-09 12:30 [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink Usama Arif
2026-06-09 12:52 ` Jan Kara
@ 2026-06-23 10:14 ` Christian Brauner
2026-06-23 18:08 ` Shakeel Butt
2 siblings, 0 replies; 5+ messages in thread
From: Christian Brauner @ 2026-06-23 10:14 UTC (permalink / raw)
To: jack, linux-fsdevel, linux-kernel, Al Viro, linux-mm, Usama Arif
Cc: hughd, boris, clm, dsterba, linux-btrfs, cem, linux-xfs,
shakeel.butt, hannes, riel, kernel-team
On Tue, 09 Jun 2026 05:30:47 -0700, Usama Arif wrote:
> fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink
Applied to the vfs-7.3.misc branch of the vfs/vfs.git tree.
Patches in the vfs-7.3.misc branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-7.3.misc
[1/1] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink
https://git.kernel.org/vfs/vfs/c/5eacee52be6b
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink
2026-06-09 12:30 [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink Usama Arif
2026-06-09 12:52 ` Jan Kara
2026-06-23 10:14 ` Christian Brauner
@ 2026-06-23 18:08 ` Shakeel Butt
2 siblings, 0 replies; 5+ messages in thread
From: Shakeel Butt @ 2026-06-23 18:08 UTC (permalink / raw)
To: Usama Arif
Cc: brauner, jack, linux-fsdevel, linux-kernel, Al Viro, linux-mm,
hughd, boris, clm, dsterba, linux-btrfs, cem, linux-xfs, hannes,
riel, kernel-team
On Tue, Jun 09, 2026 at 05:30:47AM -0700, Usama Arif wrote:
> The super_block shrinker is registered with SHRINKER_MEMCG_AWARE because its
> dentry and inode LRUs are memcg-aware (via list_lru). But the optional
> ->nr_cached_objects() hooks that the shrinker also drives are not memcg-aware:
> btrfs extent maps and xfs inode reclaim operate on filesystem-global
> state, and shmem's unused-huge shrinker walks a per-superblock shrinklist.
> None of them filter by sc->memcg.
I see the underlying objects whose count is returned by ->nr_cached_objects()
hook is memcg charged for shmem and xfs but not for btrfs. Do you envision
there might be a rare scenario where we have a lot of memory charged to a memcg
consumed by objects which ->nr_cached_objects() tracks and that memory becomes
unreclaimable due to this patch?
>
> The mismatch shows up under memcg-heavy slab reclaim. shrink_slab_memcg()
> calls do_shrink_slab() once per (memcg, NUMA node) pair for every memcg
> whose bit is set in the per-superblock shrinker bitmap, which on a busy
> host means hundreds of calls per reclaim pass. Each scan queues the same
> global shrinker work item that's already kicked from the root path.
>
> Because btrfs/xfs global count is typically non-zero on any in-use filesystem,
> the returned total stays positive even if a memcg's own dentry/inode LRUs
> are empty. shrink_slab_memcg() therefore never clears the SB shrinker bit
> in the memcg bitmap, so subsequent reclaim passes from the same memcg
> re-enter super_cache_count() and pay for the global counter walk again.
What is the main concern? Is it the amount of CPU wasted or are we over
reclaiming or reclaiming from unrelated memcgs?
>
> Restrict ->nr_cached_objects() to the global shrink path (sc->memcg NULL
> or root). The memcg-aware dentry/inode LRUs keep being counted and
> scanned per memcg as before; only the global fs-specific hooks are skipped.
> The root/global shrink path still drives those hooks; only their
> invocation from non-root memcg slab reclaim is removed.
>
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
I am fine with the stopgap but it would be nice to have proper memcg awareness
in xfs and shmem callbacks. For btrfs, I am not sure if it makes sense to memcg
charge btrfs_extent_map objects but at least to decision to skip memcg reclaim
will be inside the fs callbacks i.e. nr_cached_objects.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-23 18:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 12:30 [PATCH] fs/super: skip non-memcg-aware nr_cached_objects in memcg slab shrink Usama Arif
2026-06-09 12:52 ` Jan Kara
2026-06-09 13:38 ` Usama Arif
2026-06-23 10:14 ` Christian Brauner
2026-06-23 18:08 ` Shakeel Butt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.