* [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
@ 2026-06-10 16:05 Zenghui Yu
2026-06-10 16:38 ` Nhat Pham
0 siblings, 1 reply; 7+ messages in thread
From: Zenghui Yu @ 2026-06-10 16:05 UTC (permalink / raw)
To: linux-mm; +Cc: hannes, yosry, nphamcs, chengming.zhou
Hi all,
The following splat was triggered on the mainline kernel:
BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x78/0x90
dump_stack+0x18/0x24
__might_resched+0x114/0x170
__might_sleep+0x48/0x98
css_rstat_flush+0x54/0x564
mem_cgroup_flush_stats+0x9c/0xb0
zswap_shrinker_count+0xe4/0x1e4
shrinker_debugfs_count_show+0xd8/0x268
seq_read_iter+0x1b8/0x4ac
seq_read+0xe0/0x11c
full_proxy_read+0x6c/0xa8
vfs_read+0xc0/0x2fc
ksys_read+0x68/0xfc
__arm64_sys_read+0x1c/0x28
invoke_syscall+0x54/0x110
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x38/0x128
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
The kernel is built with arm64's virt.config plus
+CONFIG_DEBUG_ATOMIC_SLEEP=y
+CONFIG_SHRINKER_DEBUG=y
+CONFIG_ZSWAP=y
I can reproduce the issue with the following steps:
$ echo Y > /sys/module/zswap/parameters/enabled
$ echo Y > /sys/module/zswap/parameters/shrinker_enabled
$ cat /sys/kernel/debug/shrinker/mm-zswap-60/count
Please have a look.
Thanks,
Zenghui
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu
@ 2026-06-10 16:38 ` Nhat Pham
2026-06-10 16:47 ` Nhat Pham
2026-06-10 17:31 ` Shakeel Butt
0 siblings, 2 replies; 7+ messages in thread
From: Nhat Pham @ 2026-06-10 16:38 UTC (permalink / raw)
To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt
On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
Thanks for reporting, Zenghui.
>
> Hi all,
>
> The following splat was triggered on the mainline kernel:
>
> BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> preempt_count: 0, expected: 0
> RCU nest depth: 1, expected: 0
> CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> Call trace:
> show_stack+0x18/0x24 (C)
> dump_stack_lvl+0x78/0x90
> dump_stack+0x18/0x24
> __might_resched+0x114/0x170
> __might_sleep+0x48/0x98
> css_rstat_flush+0x54/0x564
> mem_cgroup_flush_stats+0x9c/0xb0
> zswap_shrinker_count+0xe4/0x1e4
> shrinker_debugfs_count_show+0xd8/0x268
Ah, this seems a bit tricky.
Seems like shrinker_debugfs_count_show() is invoking
zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
triggers a stats flushing, which might sleep. Not ideal.
Is the rcu_read_section() here to protect memcg or shrinker? For
memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
memcg before returning.
(memcg maintainers please fact check me).
If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
rcu_read_lock();
list_for_each_entry_rcu(shrinker, &shrinker_list, list)
{
if (!shrinker_try_get(shrinker))
continue;
rcu_read_unlock();
}
But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
rcu_read_lock();
memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
We get the shrinker reference outside of the rcu_read_section(), and
just dereference it without any checking inside of the section.
I think we can just remove the rcu_read_(un)lock() here?
Long term, I still think we'd be better off getting rid of this stats
flushing. Seems expensive either way.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 16:38 ` Nhat Pham
@ 2026-06-10 16:47 ` Nhat Pham
2026-06-10 16:48 ` Nhat Pham
2026-06-10 17:31 ` Shakeel Butt
1 sibling, 1 reply; 7+ messages in thread
From: Nhat Pham @ 2026-06-10 16:47 UTC (permalink / raw)
To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt
On Wed, Jun 10, 2026 at 9:38 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>
> Thanks for reporting, Zenghui.
>
>
> >
> > Hi all,
> >
> > The following splat was triggered on the mainline kernel:
> >
> > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > preempt_count: 0, expected: 0
> > RCU nest depth: 1, expected: 0
> > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > Call trace:
> > show_stack+0x18/0x24 (C)
> > dump_stack_lvl+0x78/0x90
> > dump_stack+0x18/0x24
> > __might_resched+0x114/0x170
> > __might_sleep+0x48/0x98
> > css_rstat_flush+0x54/0x564
> > mem_cgroup_flush_stats+0x9c/0xb0
> > zswap_shrinker_count+0xe4/0x1e4
> > shrinker_debugfs_count_show+0xd8/0x268
>
> Ah, this seems a bit tricky.
>
> Seems like shrinker_debugfs_count_show() is invoking
> zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> triggers a stats flushing, which might sleep. Not ideal.
>
> Is the rcu_read_section() here to protect memcg or shrinker? For
> memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> memcg before returning.
>
> (memcg maintainers please fact check me).
>
> If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>
> rcu_read_lock();
> list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> {
> if (!shrinker_try_get(shrinker))
> continue;
> rcu_read_unlock();
> }
>
> But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>
> rcu_read_lock();
> memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
>
> We get the shrinker reference outside of the rcu_read_section(), and
> just dereference it without any checking inside of the section.
>
> I think we can just remove the rcu_read_(un)lock() here?
>
Also, looking at the code a bit closer - if (!shrinker->flags &
SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all
and inducing all the memcg-related overhead at all...
The code really should be structure as:
if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
total = shrinker_count_objects(shrinker, NULL, count_per_node);
if (total)
...
} else {
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
}
...
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 16:47 ` Nhat Pham
@ 2026-06-10 16:48 ` Nhat Pham
0 siblings, 0 replies; 7+ messages in thread
From: Nhat Pham @ 2026-06-10 16:48 UTC (permalink / raw)
To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt
On Wed, Jun 10, 2026 at 9:47 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
>
> Also, looking at the code a bit closer - if (!shrinker->flags &
> SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all
> and inducing all the memcg-related overhead at all...
>
> The code really should be structure as:
>
> if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
As usual, i flip the conditionals :( But you get the idea...
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 16:38 ` Nhat Pham
2026-06-10 16:47 ` Nhat Pham
@ 2026-06-10 17:31 ` Shakeel Butt
2026-06-10 18:38 ` Nhat Pham
1 sibling, 1 reply; 7+ messages in thread
From: Shakeel Butt @ 2026-06-10 17:31 UTC (permalink / raw)
To: Nhat Pham
Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
roman.gushchin, qi.zheng
+Roman, Qi
On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>
> Thanks for reporting, Zenghui.
>
>
> >
> > Hi all,
> >
> > The following splat was triggered on the mainline kernel:
> >
> > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > preempt_count: 0, expected: 0
> > RCU nest depth: 1, expected: 0
> > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > Call trace:
> > show_stack+0x18/0x24 (C)
> > dump_stack_lvl+0x78/0x90
> > dump_stack+0x18/0x24
> > __might_resched+0x114/0x170
> > __might_sleep+0x48/0x98
> > css_rstat_flush+0x54/0x564
> > mem_cgroup_flush_stats+0x9c/0xb0
> > zswap_shrinker_count+0xe4/0x1e4
> > shrinker_debugfs_count_show+0xd8/0x268
>
> Ah, this seems a bit tricky.
>
> Seems like shrinker_debugfs_count_show() is invoking
> zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> triggers a stats flushing, which might sleep. Not ideal.
>
> Is the rcu_read_section() here to protect memcg or shrinker? For
> memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> memcg before returning.
>
> (memcg maintainers please fact check me).
mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
read section for memcg.
>
> If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>
> rcu_read_lock();
> list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> {
> if (!shrinker_try_get(shrinker))
> continue;
> rcu_read_unlock();
> }
>
> But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
Shouldn't the caller already holds the reference to the shrinker which it is
giving to this function? Does debugfs file entry holds a reference to the
shrinker which it is giving.
After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
shrinker_free_rcu_cb), so this rcu read section is against that.
I think we can simply use shrinker_try_get() here as Nhat said.
>
> rcu_read_lock();
> memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
>
> We get the shrinker reference outside of the rcu_read_section(), and
> just dereference it without any checking inside of the section.
>
> I think we can just remove the rcu_read_(un)lock() here?
>
> Long term, I still think we'd be better off getting rid of this stats
> flushing. Seems expensive either way.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 17:31 ` Shakeel Butt
@ 2026-06-10 18:38 ` Nhat Pham
2026-06-10 22:08 ` Shakeel Butt
0 siblings, 1 reply; 7+ messages in thread
From: Nhat Pham @ 2026-06-10 18:38 UTC (permalink / raw)
To: Shakeel Butt
Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
roman.gushchin, qi.zheng
On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> +Roman, Qi
>
> On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> >
> > Thanks for reporting, Zenghui.
> >
> >
> > >
> > > Hi all,
> > >
> > > The following splat was triggered on the mainline kernel:
> > >
> > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > > preempt_count: 0, expected: 0
> > > RCU nest depth: 1, expected: 0
> > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > > Call trace:
> > > show_stack+0x18/0x24 (C)
> > > dump_stack_lvl+0x78/0x90
> > > dump_stack+0x18/0x24
> > > __might_resched+0x114/0x170
> > > __might_sleep+0x48/0x98
> > > css_rstat_flush+0x54/0x564
> > > mem_cgroup_flush_stats+0x9c/0xb0
> > > zswap_shrinker_count+0xe4/0x1e4
> > > shrinker_debugfs_count_show+0xd8/0x268
> >
> > Ah, this seems a bit tricky.
> >
> > Seems like shrinker_debugfs_count_show() is invoking
> > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> > triggers a stats flushing, which might sleep. Not ideal.
> >
> > Is the rcu_read_section() here to protect memcg or shrinker? For
> > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> > memcg before returning.
> >
> > (memcg maintainers please fact check me).
>
> mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> read section for memcg.
>
> >
> > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> >
> > rcu_read_lock();
> > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> > {
> > if (!shrinker_try_get(shrinker))
> > continue;
> > rcu_read_unlock();
> > }
> >
> > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>
> Shouldn't the caller already holds the reference to the shrinker which it is
> giving to this function? Does debugfs file entry holds a reference to the
> shrinker which it is giving.
>
> After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> shrinker_free_rcu_cb), so this rcu read section is against that.
>
> I think we can simply use shrinker_try_get() here as Nhat said.
Hmm, so is this unsafe even with the current rcu shennanigans? What's
stopping shrinker to be freed by that callback before we enter
rcu_read_section()?
Seems like this is just implicitly correct - shrinker_debugfs_detach()
and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
shrinker_free_rcu_cb);, so if you're reading this file, then it's
before shrinker_free_rcu_cb() is even registered?
Do we still need rcu or shrinker_try_get() here?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
2026-06-10 18:38 ` Nhat Pham
@ 2026-06-10 22:08 ` Shakeel Butt
0 siblings, 0 replies; 7+ messages in thread
From: Shakeel Butt @ 2026-06-10 22:08 UTC (permalink / raw)
To: Nhat Pham
Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
roman.gushchin, qi.zheng
On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > +Roman, Qi
> >
> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> > >
> > > Thanks for reporting, Zenghui.
> > >
> > >
> > > >
> > > > Hi all,
> > > >
> > > > The following splat was triggered on the mainline kernel:
> > > >
> > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > > > preempt_count: 0, expected: 0
> > > > RCU nest depth: 1, expected: 0
> > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > > > Call trace:
> > > > show_stack+0x18/0x24 (C)
> > > > dump_stack_lvl+0x78/0x90
> > > > dump_stack+0x18/0x24
> > > > __might_resched+0x114/0x170
> > > > __might_sleep+0x48/0x98
> > > > css_rstat_flush+0x54/0x564
> > > > mem_cgroup_flush_stats+0x9c/0xb0
> > > > zswap_shrinker_count+0xe4/0x1e4
> > > > shrinker_debugfs_count_show+0xd8/0x268
> > >
> > > Ah, this seems a bit tricky.
> > >
> > > Seems like shrinker_debugfs_count_show() is invoking
> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> > > triggers a stats flushing, which might sleep. Not ideal.
> > >
> > > Is the rcu_read_section() here to protect memcg or shrinker? For
> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> > > memcg before returning.
> > >
> > > (memcg maintainers please fact check me).
> >
> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> > read section for memcg.
> >
> > >
> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> > >
> > > rcu_read_lock();
> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> > > {
> > > if (!shrinker_try_get(shrinker))
> > > continue;
> > > rcu_read_unlock();
> > > }
> > >
> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
> >
> > Shouldn't the caller already holds the reference to the shrinker which it is
> > giving to this function? Does debugfs file entry holds a reference to the
> > shrinker which it is giving.
> >
> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> > shrinker_free_rcu_cb), so this rcu read section is against that.
> >
> > I think we can simply use shrinker_try_get() here as Nhat said.
>
> Hmm, so is this unsafe even with the current rcu shennanigans? What's
> stopping shrinker to be freed by that callback before we enter
> rcu_read_section()?
>
> Seems like this is just implicitly correct - shrinker_debugfs_detach()
> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
> shrinker_free_rcu_cb);, so if you're reading this file, then it's
> before shrinker_free_rcu_cb() is even registered?
>
> Do we still need rcu or shrinker_try_get() here?
I think you are right that we don't need rcu or shrinker_try_get() but it is
more about an active debugfs file reader. Suppose we are sleeping within rstat
flush from shrinker_debugfs_count_show() and there is a parallel
shrinker_debugfs_remove() call.
shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
there is a call wait_for_completion(&fsd->active_users_drained) which will wait
for active users, one of which is sleeping within rstat flush.
So, let's simply remove rcu read here.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-10 22:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu
2026-06-10 16:38 ` Nhat Pham
2026-06-10 16:47 ` Nhat Pham
2026-06-10 16:48 ` Nhat Pham
2026-06-10 17:31 ` Shakeel Butt
2026-06-10 18:38 ` Nhat Pham
2026-06-10 22:08 ` Shakeel Butt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.