* [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
@ 2026-06-10 16:05 Zenghui Yu
2026-06-10 16:38 ` Nhat Pham
0 siblings, 1 reply; 10+ messages in thread
From: Zenghui Yu @ 2026-06-10 16:05 UTC (permalink / raw)
To: linux-mm; +Cc: hannes, yosry, nphamcs, chengming.zhou
Hi all,
The following splat was triggered on the mainline kernel:
BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x78/0x90
dump_stack+0x18/0x24
__might_resched+0x114/0x170
__might_sleep+0x48/0x98
css_rstat_flush+0x54/0x564
mem_cgroup_flush_stats+0x9c/0xb0
zswap_shrinker_count+0xe4/0x1e4
shrinker_debugfs_count_show+0xd8/0x268
seq_read_iter+0x1b8/0x4ac
seq_read+0xe0/0x11c
full_proxy_read+0x6c/0xa8
vfs_read+0xc0/0x2fc
ksys_read+0x68/0xfc
__arm64_sys_read+0x1c/0x28
invoke_syscall+0x54/0x110
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x38/0x128
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
The kernel is built with arm64's virt.config plus
+CONFIG_DEBUG_ATOMIC_SLEEP=y
+CONFIG_SHRINKER_DEBUG=y
+CONFIG_ZSWAP=y
I can reproduce the issue with the following steps:
$ echo Y > /sys/module/zswap/parameters/enabled
$ echo Y > /sys/module/zswap/parameters/shrinker_enabled
$ cat /sys/kernel/debug/shrinker/mm-zswap-60/count
Please have a look.
Thanks,
Zenghui
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu
@ 2026-06-10 16:38 ` Nhat Pham
2026-06-10 16:47 ` Nhat Pham
2026-06-10 17:31 ` Shakeel Butt
0 siblings, 2 replies; 10+ messages in thread
From: Nhat Pham @ 2026-06-10 16:38 UTC (permalink / raw)
To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt
On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
Thanks for reporting, Zenghui.
>
> Hi all,
>
> The following splat was triggered on the mainline kernel:
>
> BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> preempt_count: 0, expected: 0
> RCU nest depth: 1, expected: 0
> CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> Call trace:
> show_stack+0x18/0x24 (C)
> dump_stack_lvl+0x78/0x90
> dump_stack+0x18/0x24
> __might_resched+0x114/0x170
> __might_sleep+0x48/0x98
> css_rstat_flush+0x54/0x564
> mem_cgroup_flush_stats+0x9c/0xb0
> zswap_shrinker_count+0xe4/0x1e4
> shrinker_debugfs_count_show+0xd8/0x268
Ah, this seems a bit tricky.
Seems like shrinker_debugfs_count_show() is invoking
zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
triggers a stats flushing, which might sleep. Not ideal.
Is the rcu_read_section() here to protect memcg or shrinker? For
memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
memcg before returning.
(memcg maintainers please fact check me).
If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
rcu_read_lock();
list_for_each_entry_rcu(shrinker, &shrinker_list, list)
{
if (!shrinker_try_get(shrinker))
continue;
rcu_read_unlock();
}
But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
rcu_read_lock();
memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
We get the shrinker reference outside of the rcu_read_section(), and
just dereference it without any checking inside of the section.
I think we can just remove the rcu_read_(un)lock() here?
Long term, I still think we'd be better off getting rid of this stats
flushing. Seems expensive either way.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 16:38 ` Nhat Pham
@ 2026-06-10 16:47 ` Nhat Pham
2026-06-10 16:48 ` Nhat Pham
2026-06-10 17:31 ` Shakeel Butt
1 sibling, 1 reply; 10+ messages in thread
From: Nhat Pham @ 2026-06-10 16:47 UTC (permalink / raw)
To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt
On Wed, Jun 10, 2026 at 9:38 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>
> Thanks for reporting, Zenghui.
>
>
> >
> > Hi all,
> >
> > The following splat was triggered on the mainline kernel:
> >
> > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > preempt_count: 0, expected: 0
> > RCU nest depth: 1, expected: 0
> > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > Call trace:
> > show_stack+0x18/0x24 (C)
> > dump_stack_lvl+0x78/0x90
> > dump_stack+0x18/0x24
> > __might_resched+0x114/0x170
> > __might_sleep+0x48/0x98
> > css_rstat_flush+0x54/0x564
> > mem_cgroup_flush_stats+0x9c/0xb0
> > zswap_shrinker_count+0xe4/0x1e4
> > shrinker_debugfs_count_show+0xd8/0x268
>
> Ah, this seems a bit tricky.
>
> Seems like shrinker_debugfs_count_show() is invoking
> zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> triggers a stats flushing, which might sleep. Not ideal.
>
> Is the rcu_read_section() here to protect memcg or shrinker? For
> memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> memcg before returning.
>
> (memcg maintainers please fact check me).
>
> If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>
> rcu_read_lock();
> list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> {
> if (!shrinker_try_get(shrinker))
> continue;
> rcu_read_unlock();
> }
>
> But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>
> rcu_read_lock();
> memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
>
> We get the shrinker reference outside of the rcu_read_section(), and
> just dereference it without any checking inside of the section.
>
> I think we can just remove the rcu_read_(un)lock() here?
>
Also, looking at the code a bit closer - if (!shrinker->flags &
SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all
and inducing all the memcg-related overhead at all...
The code really should be structure as:
if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
total = shrinker_count_objects(shrinker, NULL, count_per_node);
if (total)
...
} else {
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
}
...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 16:47 ` Nhat Pham
@ 2026-06-10 16:48 ` Nhat Pham
0 siblings, 0 replies; 10+ messages in thread
From: Nhat Pham @ 2026-06-10 16:48 UTC (permalink / raw)
To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt
On Wed, Jun 10, 2026 at 9:47 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
>
> Also, looking at the code a bit closer - if (!shrinker->flags &
> SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all
> and inducing all the memcg-related overhead at all...
>
> The code really should be structure as:
>
> if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
As usual, i flip the conditionals :( But you get the idea...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 16:38 ` Nhat Pham
2026-06-10 16:47 ` Nhat Pham
@ 2026-06-10 17:31 ` Shakeel Butt
2026-06-10 18:38 ` Nhat Pham
1 sibling, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2026-06-10 17:31 UTC (permalink / raw)
To: Nhat Pham
Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
roman.gushchin, qi.zheng
+Roman, Qi
On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>
> Thanks for reporting, Zenghui.
>
>
> >
> > Hi all,
> >
> > The following splat was triggered on the mainline kernel:
> >
> > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > preempt_count: 0, expected: 0
> > RCU nest depth: 1, expected: 0
> > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > Call trace:
> > show_stack+0x18/0x24 (C)
> > dump_stack_lvl+0x78/0x90
> > dump_stack+0x18/0x24
> > __might_resched+0x114/0x170
> > __might_sleep+0x48/0x98
> > css_rstat_flush+0x54/0x564
> > mem_cgroup_flush_stats+0x9c/0xb0
> > zswap_shrinker_count+0xe4/0x1e4
> > shrinker_debugfs_count_show+0xd8/0x268
>
> Ah, this seems a bit tricky.
>
> Seems like shrinker_debugfs_count_show() is invoking
> zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> triggers a stats flushing, which might sleep. Not ideal.
>
> Is the rcu_read_section() here to protect memcg or shrinker? For
> memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> memcg before returning.
>
> (memcg maintainers please fact check me).
mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
read section for memcg.
>
> If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>
> rcu_read_lock();
> list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> {
> if (!shrinker_try_get(shrinker))
> continue;
> rcu_read_unlock();
> }
>
> But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
Shouldn't the caller already holds the reference to the shrinker which it is
giving to this function? Does debugfs file entry holds a reference to the
shrinker which it is giving.
After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
shrinker_free_rcu_cb), so this rcu read section is against that.
I think we can simply use shrinker_try_get() here as Nhat said.
>
> rcu_read_lock();
> memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
>
> We get the shrinker reference outside of the rcu_read_section(), and
> just dereference it without any checking inside of the section.
>
> I think we can just remove the rcu_read_(un)lock() here?
>
> Long term, I still think we'd be better off getting rid of this stats
> flushing. Seems expensive either way.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 17:31 ` Shakeel Butt
@ 2026-06-10 18:38 ` Nhat Pham
2026-06-10 22:08 ` Shakeel Butt
0 siblings, 1 reply; 10+ messages in thread
From: Nhat Pham @ 2026-06-10 18:38 UTC (permalink / raw)
To: Shakeel Butt
Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
roman.gushchin, qi.zheng
On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> +Roman, Qi
>
> On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> >
> > Thanks for reporting, Zenghui.
> >
> >
> > >
> > > Hi all,
> > >
> > > The following splat was triggered on the mainline kernel:
> > >
> > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > > preempt_count: 0, expected: 0
> > > RCU nest depth: 1, expected: 0
> > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > > Call trace:
> > > show_stack+0x18/0x24 (C)
> > > dump_stack_lvl+0x78/0x90
> > > dump_stack+0x18/0x24
> > > __might_resched+0x114/0x170
> > > __might_sleep+0x48/0x98
> > > css_rstat_flush+0x54/0x564
> > > mem_cgroup_flush_stats+0x9c/0xb0
> > > zswap_shrinker_count+0xe4/0x1e4
> > > shrinker_debugfs_count_show+0xd8/0x268
> >
> > Ah, this seems a bit tricky.
> >
> > Seems like shrinker_debugfs_count_show() is invoking
> > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> > triggers a stats flushing, which might sleep. Not ideal.
> >
> > Is the rcu_read_section() here to protect memcg or shrinker? For
> > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> > memcg before returning.
> >
> > (memcg maintainers please fact check me).
>
> mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> read section for memcg.
>
> >
> > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> >
> > rcu_read_lock();
> > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> > {
> > if (!shrinker_try_get(shrinker))
> > continue;
> > rcu_read_unlock();
> > }
> >
> > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>
> Shouldn't the caller already holds the reference to the shrinker which it is
> giving to this function? Does debugfs file entry holds a reference to the
> shrinker which it is giving.
>
> After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> shrinker_free_rcu_cb), so this rcu read section is against that.
>
> I think we can simply use shrinker_try_get() here as Nhat said.
Hmm, so is this unsafe even with the current rcu shennanigans? What's
stopping shrinker to be freed by that callback before we enter
rcu_read_section()?
Seems like this is just implicitly correct - shrinker_debugfs_detach()
and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
shrinker_free_rcu_cb);, so if you're reading this file, then it's
before shrinker_free_rcu_cb() is even registered?
Do we still need rcu or shrinker_try_get() here?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 18:38 ` Nhat Pham
@ 2026-06-10 22:08 ` Shakeel Butt
2026-06-11 18:34 ` Roman Gushchin
0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2026-06-10 22:08 UTC (permalink / raw)
To: Nhat Pham
Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
roman.gushchin, qi.zheng
On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > +Roman, Qi
> >
> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> > >
> > > Thanks for reporting, Zenghui.
> > >
> > >
> > > >
> > > > Hi all,
> > > >
> > > > The following splat was triggered on the mainline kernel:
> > > >
> > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > > > preempt_count: 0, expected: 0
> > > > RCU nest depth: 1, expected: 0
> > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > > > Call trace:
> > > > show_stack+0x18/0x24 (C)
> > > > dump_stack_lvl+0x78/0x90
> > > > dump_stack+0x18/0x24
> > > > __might_resched+0x114/0x170
> > > > __might_sleep+0x48/0x98
> > > > css_rstat_flush+0x54/0x564
> > > > mem_cgroup_flush_stats+0x9c/0xb0
> > > > zswap_shrinker_count+0xe4/0x1e4
> > > > shrinker_debugfs_count_show+0xd8/0x268
> > >
> > > Ah, this seems a bit tricky.
> > >
> > > Seems like shrinker_debugfs_count_show() is invoking
> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> > > triggers a stats flushing, which might sleep. Not ideal.
> > >
> > > Is the rcu_read_section() here to protect memcg or shrinker? For
> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> > > memcg before returning.
> > >
> > > (memcg maintainers please fact check me).
> >
> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> > read section for memcg.
> >
> > >
> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> > >
> > > rcu_read_lock();
> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> > > {
> > > if (!shrinker_try_get(shrinker))
> > > continue;
> > > rcu_read_unlock();
> > > }
> > >
> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
> >
> > Shouldn't the caller already holds the reference to the shrinker which it is
> > giving to this function? Does debugfs file entry holds a reference to the
> > shrinker which it is giving.
> >
> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> > shrinker_free_rcu_cb), so this rcu read section is against that.
> >
> > I think we can simply use shrinker_try_get() here as Nhat said.
>
> Hmm, so is this unsafe even with the current rcu shennanigans? What's
> stopping shrinker to be freed by that callback before we enter
> rcu_read_section()?
>
> Seems like this is just implicitly correct - shrinker_debugfs_detach()
> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
> shrinker_free_rcu_cb);, so if you're reading this file, then it's
> before shrinker_free_rcu_cb() is even registered?
>
> Do we still need rcu or shrinker_try_get() here?
I think you are right that we don't need rcu or shrinker_try_get() but it is
more about an active debugfs file reader. Suppose we are sleeping within rstat
flush from shrinker_debugfs_count_show() and there is a parallel
shrinker_debugfs_remove() call.
shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
there is a call wait_for_completion(&fsd->active_users_drained) which will wait
for active users, one of which is sleeping within rstat flush.
So, let's simply remove rcu read here.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 22:08 ` Shakeel Butt
@ 2026-06-11 18:34 ` Roman Gushchin
2026-06-11 18:38 ` Shakeel Butt
0 siblings, 1 reply; 10+ messages in thread
From: Roman Gushchin @ 2026-06-11 18:34 UTC (permalink / raw)
To: Shakeel Butt
Cc: Nhat Pham, Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
qi.zheng
Shakeel Butt <shakeel.butt@linux.dev> writes:
> On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
>> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>> >
>> > +Roman, Qi
>> >
>> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
>> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>> > >
>> > > Thanks for reporting, Zenghui.
>> > >
>> > >
>> > > >
>> > > > Hi all,
>> > > >
>> > > > The following splat was triggered on the mainline kernel:
>> > > >
>> > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
>> > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
>> > > > preempt_count: 0, expected: 0
>> > > > RCU nest depth: 1, expected: 0
>> > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
>> > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
>> > > > Call trace:
>> > > > show_stack+0x18/0x24 (C)
>> > > > dump_stack_lvl+0x78/0x90
>> > > > dump_stack+0x18/0x24
>> > > > __might_resched+0x114/0x170
>> > > > __might_sleep+0x48/0x98
>> > > > css_rstat_flush+0x54/0x564
>> > > > mem_cgroup_flush_stats+0x9c/0xb0
>> > > > zswap_shrinker_count+0xe4/0x1e4
>> > > > shrinker_debugfs_count_show+0xd8/0x268
>> > >
>> > > Ah, this seems a bit tricky.
>> > >
>> > > Seems like shrinker_debugfs_count_show() is invoking
>> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
>> > > triggers a stats flushing, which might sleep. Not ideal.
>> > >
>> > > Is the rcu_read_section() here to protect memcg or shrinker? For
>> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
>> > > memcg before returning.
>> > >
>> > > (memcg maintainers please fact check me).
>> >
>> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
>> > read section for memcg.
>> >
>> > >
>> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>> > >
>> > > rcu_read_lock();
>> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
>> > > {
>> > > if (!shrinker_try_get(shrinker))
>> > > continue;
>> > > rcu_read_unlock();
>> > > }
>> > >
>> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>> >
>> > Shouldn't the caller already holds the reference to the shrinker which it is
>> > giving to this function? Does debugfs file entry holds a reference to the
>> > shrinker which it is giving.
>> >
>> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
>> > shrinker_free_rcu_cb), so this rcu read section is against that.
>> >
>> > I think we can simply use shrinker_try_get() here as Nhat said.
>>
>> Hmm, so is this unsafe even with the current rcu shennanigans? What's
>> stopping shrinker to be freed by that callback before we enter
>> rcu_read_section()?
>>
>> Seems like this is just implicitly correct - shrinker_debugfs_detach()
>> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
>> shrinker_free_rcu_cb);, so if you're reading this file, then it's
>> before shrinker_free_rcu_cb() is even registered?
>>
>> Do we still need rcu or shrinker_try_get() here?
>
> I think you are right that we don't need rcu or shrinker_try_get() but it is
> more about an active debugfs file reader. Suppose we are sleeping within rstat
> flush from shrinker_debugfs_count_show() and there is a parallel
> shrinker_debugfs_remove() call.
>
> shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
> there is a call wait_for_completion(&fsd->active_users_drained) which will wait
> for active users, one of which is sleeping within rstat flush.
>
> So, let's simply remove rcu read here.
+1 to this. How about this version?
--
From a4a018d026c9a39ec15a5b30014e81ce2381e281 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <roman.gushchin@linux.dev>
Date: Thu, 11 Jun 2026 17:40:21 +0000
Subject: [PATCH] mm: shrinkers: remove unnecessary rcu_read_lock()
Zenghui Yu reported a lockdep splat caused by sleeping within a
rcu_read_lock section in shrinker_debugfs_count_show():
BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x78/0x90
dump_stack+0x18/0x24
__might_resched+0x114/0x170
__might_sleep+0x48/0x98
css_rstat_flush+0x54/0x564
mem_cgroup_flush_stats+0x9c/0xb0
zswap_shrinker_count+0xe4/0x1e4
shrinker_debugfs_count_show+0xd8/0x268
seq_read_iter+0x1b8/0x4ac
seq_read+0xe0/0x11c
full_proxy_read+0x6c/0xa8
vfs_read+0xc0/0x2fc
ksys_read+0x68/0xfc
__arm64_sys_read+0x1c/0x28
invoke_syscall+0x54/0x110
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x38/0x128
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
Fix it by removing the rcu_read_lock()/unlock() entirely.
Indeed: it's not needed here: memcg's are protected by
mem_cgroup_iter() and shrinker by being attached to the debugfs file.
Reported-by: Zenghui Yu <zenghui.yu@linux.dev>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Change-Id: Ifa381f7983491cde61354af9b1cb14ca373c1f75
---
mm/shrinker_debug.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index affa64437302..cda4e86428c8 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -57,8 +57,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
if (!count_per_node)
return -ENOMEM;
- rcu_read_lock();
-
memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
memcg = mem_cgroup_iter(NULL, NULL, NULL);
@@ -88,8 +86,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
}
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
- rcu_read_unlock();
-
kfree(count_per_node);
return ret;
}
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-11 18:34 ` Roman Gushchin
@ 2026-06-11 18:38 ` Shakeel Butt
2026-06-11 18:53 ` Roman Gushchin
0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2026-06-11 18:38 UTC (permalink / raw)
To: Roman Gushchin
Cc: Nhat Pham, Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
qi.zheng
On Thu, Jun 11, 2026 at 06:34:15PM +0000, Roman Gushchin wrote:
> Shakeel Butt <shakeel.butt@linux.dev> writes:
>
> > On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
> >> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >> >
> >> > +Roman, Qi
> >> >
> >> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> >> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> >> > >
> >> > > Thanks for reporting, Zenghui.
> >> > >
> >> > >
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > The following splat was triggered on the mainline kernel:
> >> > > >
> >> > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> >> > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> >> > > > preempt_count: 0, expected: 0
> >> > > > RCU nest depth: 1, expected: 0
> >> > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> >> > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> >> > > > Call trace:
> >> > > > show_stack+0x18/0x24 (C)
> >> > > > dump_stack_lvl+0x78/0x90
> >> > > > dump_stack+0x18/0x24
> >> > > > __might_resched+0x114/0x170
> >> > > > __might_sleep+0x48/0x98
> >> > > > css_rstat_flush+0x54/0x564
> >> > > > mem_cgroup_flush_stats+0x9c/0xb0
> >> > > > zswap_shrinker_count+0xe4/0x1e4
> >> > > > shrinker_debugfs_count_show+0xd8/0x268
> >> > >
> >> > > Ah, this seems a bit tricky.
> >> > >
> >> > > Seems like shrinker_debugfs_count_show() is invoking
> >> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> >> > > triggers a stats flushing, which might sleep. Not ideal.
> >> > >
> >> > > Is the rcu_read_section() here to protect memcg or shrinker? For
> >> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> >> > > memcg before returning.
> >> > >
> >> > > (memcg maintainers please fact check me).
> >> >
> >> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> >> > read section for memcg.
> >> >
> >> > >
> >> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> >> > >
> >> > > rcu_read_lock();
> >> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> >> > > {
> >> > > if (!shrinker_try_get(shrinker))
> >> > > continue;
> >> > > rcu_read_unlock();
> >> > > }
> >> > >
> >> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
> >> >
> >> > Shouldn't the caller already holds the reference to the shrinker which it is
> >> > giving to this function? Does debugfs file entry holds a reference to the
> >> > shrinker which it is giving.
> >> >
> >> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> >> > shrinker_free_rcu_cb), so this rcu read section is against that.
> >> >
> >> > I think we can simply use shrinker_try_get() here as Nhat said.
> >>
> >> Hmm, so is this unsafe even with the current rcu shennanigans? What's
> >> stopping shrinker to be freed by that callback before we enter
> >> rcu_read_section()?
> >>
> >> Seems like this is just implicitly correct - shrinker_debugfs_detach()
> >> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
> >> shrinker_free_rcu_cb);, so if you're reading this file, then it's
> >> before shrinker_free_rcu_cb() is even registered?
> >>
> >> Do we still need rcu or shrinker_try_get() here?
> >
> > I think you are right that we don't need rcu or shrinker_try_get() but it is
> > more about an active debugfs file reader. Suppose we are sleeping within rstat
> > flush from shrinker_debugfs_count_show() and there is a parallel
> > shrinker_debugfs_remove() call.
> >
> > shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
> > there is a call wait_for_completion(&fsd->active_users_drained) which will wait
> > for active users, one of which is sleeping within rstat flush.
> >
> > So, let's simply remove rcu read here.
>
> +1 to this. How about this version?
>
Thanks, Andrew already picked up
https://lore.kernel.org/all/20260610232048.62930-1-shakeel.butt@linux.dev/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-11 18:38 ` Shakeel Butt
@ 2026-06-11 18:53 ` Roman Gushchin
0 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2026-06-11 18:53 UTC (permalink / raw)
To: Shakeel Butt
Cc: Nhat Pham, Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
qi.zheng
Shakeel Butt <shakeel.butt@linux.dev> writes:
> On Thu, Jun 11, 2026 at 06:34:15PM +0000, Roman Gushchin wrote:
>> Shakeel Butt <shakeel.butt@linux.dev> writes:
>>
>> > On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
>> >> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>> >> >
>> >> > +Roman, Qi
>> >> >
>> >> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
>> >> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>> >> > >
>> >> > > Thanks for reporting, Zenghui.
>> >> > >
>> >> > >
>> >> > > >
>> >> > > > Hi all,
>> >> > > >
>> >> > > > The following splat was triggered on the mainline kernel:
>> >> > > >
>> >> > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
>> >> > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
>> >> > > > preempt_count: 0, expected: 0
>> >> > > > RCU nest depth: 1, expected: 0
>> >> > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
>> >> > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
>> >> > > > Call trace:
>> >> > > > show_stack+0x18/0x24 (C)
>> >> > > > dump_stack_lvl+0x78/0x90
>> >> > > > dump_stack+0x18/0x24
>> >> > > > __might_resched+0x114/0x170
>> >> > > > __might_sleep+0x48/0x98
>> >> > > > css_rstat_flush+0x54/0x564
>> >> > > > mem_cgroup_flush_stats+0x9c/0xb0
>> >> > > > zswap_shrinker_count+0xe4/0x1e4
>> >> > > > shrinker_debugfs_count_show+0xd8/0x268
>> >> > >
>> >> > > Ah, this seems a bit tricky.
>> >> > >
>> >> > > Seems like shrinker_debugfs_count_show() is invoking
>> >> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
>> >> > > triggers a stats flushing, which might sleep. Not ideal.
>> >> > >
>> >> > > Is the rcu_read_section() here to protect memcg or shrinker? For
>> >> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
>> >> > > memcg before returning.
>> >> > >
>> >> > > (memcg maintainers please fact check me).
>> >> >
>> >> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
>> >> > read section for memcg.
>> >> >
>> >> > >
>> >> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>> >> > >
>> >> > > rcu_read_lock();
>> >> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
>> >> > > {
>> >> > > if (!shrinker_try_get(shrinker))
>> >> > > continue;
>> >> > > rcu_read_unlock();
>> >> > > }
>> >> > >
>> >> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>> >> >
>> >> > Shouldn't the caller already holds the reference to the shrinker which it is
>> >> > giving to this function? Does debugfs file entry holds a reference to the
>> >> > shrinker which it is giving.
>> >> >
>> >> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
>> >> > shrinker_free_rcu_cb), so this rcu read section is against that.
>> >> >
>> >> > I think we can simply use shrinker_try_get() here as Nhat said.
>> >>
>> >> Hmm, so is this unsafe even with the current rcu shennanigans? What's
>> >> stopping shrinker to be freed by that callback before we enter
>> >> rcu_read_section()?
>> >>
>> >> Seems like this is just implicitly correct - shrinker_debugfs_detach()
>> >> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
>> >> shrinker_free_rcu_cb);, so if you're reading this file, then it's
>> >> before shrinker_free_rcu_cb() is even registered?
>> >>
>> >> Do we still need rcu or shrinker_try_get() here?
>> >
>> > I think you are right that we don't need rcu or shrinker_try_get() but it is
>> > more about an active debugfs file reader. Suppose we are sleeping within rstat
>> > flush from shrinker_debugfs_count_show() and there is a parallel
>> > shrinker_debugfs_remove() call.
>> >
>> > shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
>> > there is a call wait_for_completion(&fsd->active_users_drained) which will wait
>> > for active users, one of which is sleeping within rstat flush.
>> >
>> > So, let's simply remove rcu read here.
>>
>> +1 to this. How about this version?
>>
>
> Thanks, Andrew already picked up
> https://lore.kernel.org/all/20260610232048.62930-1-shakeel.butt@linux.dev/
Awesome, thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-11 18:54 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu
2026-06-10 16:38 ` Nhat Pham
2026-06-10 16:47 ` Nhat Pham
2026-06-10 16:48 ` Nhat Pham
2026-06-10 17:31 ` Shakeel Butt
2026-06-10 18:38 ` Nhat Pham
2026-06-10 22:08 ` Shakeel Butt
2026-06-11 18:34 ` Roman Gushchin
2026-06-11 18:38 ` Shakeel Butt
2026-06-11 18:53 ` Roman Gushchin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.