* [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
@ 2026-06-10 16:05 Zenghui Yu
2026-06-10 16:38 ` Nhat Pham
0 siblings, 1 reply; 10+ messages in thread
From: Zenghui Yu @ 2026-06-10 16:05 UTC (permalink / raw)
To: linux-mm; +Cc: hannes, yosry, nphamcs, chengming.zhou
Hi all,
The following splat was triggered on the mainline kernel:
BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x78/0x90
dump_stack+0x18/0x24
__might_resched+0x114/0x170
__might_sleep+0x48/0x98
css_rstat_flush+0x54/0x564
mem_cgroup_flush_stats+0x9c/0xb0
zswap_shrinker_count+0xe4/0x1e4
shrinker_debugfs_count_show+0xd8/0x268
seq_read_iter+0x1b8/0x4ac
seq_read+0xe0/0x11c
full_proxy_read+0x6c/0xa8
vfs_read+0xc0/0x2fc
ksys_read+0x68/0xfc
__arm64_sys_read+0x1c/0x28
invoke_syscall+0x54/0x110
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x38/0x128
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
The kernel is built with arm64's virt.config plus
+CONFIG_DEBUG_ATOMIC_SLEEP=y
+CONFIG_SHRINKER_DEBUG=y
+CONFIG_ZSWAP=y
I can reproduce the issue with the following steps:
$ echo Y > /sys/module/zswap/parameters/enabled
$ echo Y > /sys/module/zswap/parameters/shrinker_enabled
$ cat /sys/kernel/debug/shrinker/mm-zswap-60/count
Please have a look.
Thanks,
Zenghui
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu @ 2026-06-10 16:38 ` Nhat Pham 2026-06-10 16:47 ` Nhat Pham 2026-06-10 17:31 ` Shakeel Butt 0 siblings, 2 replies; 10+ messages in thread From: Nhat Pham @ 2026-06-10 16:38 UTC (permalink / raw) To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote: Thanks for reporting, Zenghui. > > Hi all, > > The following splat was triggered on the mainline kernel: > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat > preempt_count: 0, expected: 0 > RCU nest depth: 1, expected: 0 > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024 > Call trace: > show_stack+0x18/0x24 (C) > dump_stack_lvl+0x78/0x90 > dump_stack+0x18/0x24 > __might_resched+0x114/0x170 > __might_sleep+0x48/0x98 > css_rstat_flush+0x54/0x564 > mem_cgroup_flush_stats+0x9c/0xb0 > zswap_shrinker_count+0xe4/0x1e4 > shrinker_debugfs_count_show+0xd8/0x268 Ah, this seems a bit tricky. Seems like shrinker_debugfs_count_show() is invoking zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count() triggers a stats flushing, which might sleep. Not ideal. Is the rcu_read_section() here to protect memcg or shrinker? For memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the memcg before returning. (memcg maintainers please fact check me). If this is for the shrinker think this needs to follow shrink_slab()'s pattern.: rcu_read_lock(); list_for_each_entry_rcu(shrinker, &shrinker_list, list) { if (!shrinker_try_get(shrinker)) continue; rcu_read_unlock(); } But OTOH, doesn't seem like rcu_read_section() is what keeping it safe: rcu_read_lock(); memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE; We get the shrinker reference outside of the rcu_read_section(), and just dereference it without any checking inside of the section. I think we can just remove the rcu_read_(un)lock() here? Long term, I still think we'd be better off getting rid of this stats flushing. Seems expensive either way. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 2026-06-10 16:38 ` Nhat Pham @ 2026-06-10 16:47 ` Nhat Pham 2026-06-10 16:48 ` Nhat Pham 2026-06-10 17:31 ` Shakeel Butt 1 sibling, 1 reply; 10+ messages in thread From: Nhat Pham @ 2026-06-10 16:47 UTC (permalink / raw) To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt On Wed, Jun 10, 2026 at 9:38 AM Nhat Pham <nphamcs@gmail.com> wrote: > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote: > > Thanks for reporting, Zenghui. > > > > > > Hi all, > > > > The following splat was triggered on the mainline kernel: > > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat > > preempt_count: 0, expected: 0 > > RCU nest depth: 1, expected: 0 > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024 > > Call trace: > > show_stack+0x18/0x24 (C) > > dump_stack_lvl+0x78/0x90 > > dump_stack+0x18/0x24 > > __might_resched+0x114/0x170 > > __might_sleep+0x48/0x98 > > css_rstat_flush+0x54/0x564 > > mem_cgroup_flush_stats+0x9c/0xb0 > > zswap_shrinker_count+0xe4/0x1e4 > > shrinker_debugfs_count_show+0xd8/0x268 > > Ah, this seems a bit tricky. > > Seems like shrinker_debugfs_count_show() is invoking > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count() > triggers a stats flushing, which might sleep. Not ideal. > > Is the rcu_read_section() here to protect memcg or shrinker? For > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the > memcg before returning. > > (memcg maintainers please fact check me). > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.: > > rcu_read_lock(); > list_for_each_entry_rcu(shrinker, &shrinker_list, list) > { > if (!shrinker_try_get(shrinker)) > continue; > rcu_read_unlock(); > } > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe: > > rcu_read_lock(); > memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE; > > We get the shrinker reference outside of the rcu_read_section(), and > just dereference it without any checking inside of the section. > > I think we can just remove the rcu_read_(un)lock() here? > Also, looking at the code a bit closer - if (!shrinker->flags & SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all and inducing all the memcg-related overhead at all... The code really should be structure as: if (shrinker->flags & SHRINKER_MEMCG_AWARE) { total = shrinker_count_objects(shrinker, NULL, count_per_node); if (total) ... } else { memcg = mem_cgroup_iter(NULL, NULL, NULL); do { } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); } ... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 2026-06-10 16:47 ` Nhat Pham @ 2026-06-10 16:48 ` Nhat Pham 0 siblings, 0 replies; 10+ messages in thread From: Nhat Pham @ 2026-06-10 16:48 UTC (permalink / raw) To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt On Wed, Jun 10, 2026 at 9:47 AM Nhat Pham <nphamcs@gmail.com> wrote: > > > Also, looking at the code a bit closer - if (!shrinker->flags & > SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all > and inducing all the memcg-related overhead at all... > > The code really should be structure as: > > if (shrinker->flags & SHRINKER_MEMCG_AWARE) { As usual, i flip the conditionals :( But you get the idea... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 2026-06-10 16:38 ` Nhat Pham 2026-06-10 16:47 ` Nhat Pham @ 2026-06-10 17:31 ` Shakeel Butt 2026-06-10 18:38 ` Nhat Pham 1 sibling, 1 reply; 10+ messages in thread From: Shakeel Butt @ 2026-06-10 17:31 UTC (permalink / raw) To: Nhat Pham Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou, roman.gushchin, qi.zheng +Roman, Qi On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote: > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote: > > Thanks for reporting, Zenghui. > > > > > > Hi all, > > > > The following splat was triggered on the mainline kernel: > > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat > > preempt_count: 0, expected: 0 > > RCU nest depth: 1, expected: 0 > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024 > > Call trace: > > show_stack+0x18/0x24 (C) > > dump_stack_lvl+0x78/0x90 > > dump_stack+0x18/0x24 > > __might_resched+0x114/0x170 > > __might_sleep+0x48/0x98 > > css_rstat_flush+0x54/0x564 > > mem_cgroup_flush_stats+0x9c/0xb0 > > zswap_shrinker_count+0xe4/0x1e4 > > shrinker_debugfs_count_show+0xd8/0x268 > > Ah, this seems a bit tricky. > > Seems like shrinker_debugfs_count_show() is invoking > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count() > triggers a stats flushing, which might sleep. Not ideal. > > Is the rcu_read_section() here to protect memcg or shrinker? For > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the > memcg before returning. > > (memcg maintainers please fact check me). mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu read section for memcg. > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.: > > rcu_read_lock(); > list_for_each_entry_rcu(shrinker, &shrinker_list, list) > { > if (!shrinker_try_get(shrinker)) > continue; > rcu_read_unlock(); > } > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe: Shouldn't the caller already holds the reference to the shrinker which it is giving to this function? Does debugfs file entry holds a reference to the shrinker which it is giving. After looking at shrinker_free(), it has call_rcu(&shrinker->rcu, shrinker_free_rcu_cb), so this rcu read section is against that. I think we can simply use shrinker_try_get() here as Nhat said. > > rcu_read_lock(); > memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE; > > We get the shrinker reference outside of the rcu_read_section(), and > just dereference it without any checking inside of the section. > > I think we can just remove the rcu_read_(un)lock() here? > > Long term, I still think we'd be better off getting rid of this stats > flushing. Seems expensive either way. > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 2026-06-10 17:31 ` Shakeel Butt @ 2026-06-10 18:38 ` Nhat Pham 2026-06-10 22:08 ` Shakeel Butt 0 siblings, 1 reply; 10+ messages in thread From: Nhat Pham @ 2026-06-10 18:38 UTC (permalink / raw) To: Shakeel Butt Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou, roman.gushchin, qi.zheng On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > +Roman, Qi > > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote: > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote: > > > > Thanks for reporting, Zenghui. > > > > > > > > > > Hi all, > > > > > > The following splat was triggered on the mainline kernel: > > > > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat > > > preempt_count: 0, expected: 0 > > > RCU nest depth: 1, expected: 0 > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024 > > > Call trace: > > > show_stack+0x18/0x24 (C) > > > dump_stack_lvl+0x78/0x90 > > > dump_stack+0x18/0x24 > > > __might_resched+0x114/0x170 > > > __might_sleep+0x48/0x98 > > > css_rstat_flush+0x54/0x564 > > > mem_cgroup_flush_stats+0x9c/0xb0 > > > zswap_shrinker_count+0xe4/0x1e4 > > > shrinker_debugfs_count_show+0xd8/0x268 > > > > Ah, this seems a bit tricky. > > > > Seems like shrinker_debugfs_count_show() is invoking > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count() > > triggers a stats flushing, which might sleep. Not ideal. > > > > Is the rcu_read_section() here to protect memcg or shrinker? For > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the > > memcg before returning. > > > > (memcg maintainers please fact check me). > > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu > read section for memcg. > > > > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.: > > > > rcu_read_lock(); > > list_for_each_entry_rcu(shrinker, &shrinker_list, list) > > { > > if (!shrinker_try_get(shrinker)) > > continue; > > rcu_read_unlock(); > > } > > > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe: > > Shouldn't the caller already holds the reference to the shrinker which it is > giving to this function? Does debugfs file entry holds a reference to the > shrinker which it is giving. > > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu, > shrinker_free_rcu_cb), so this rcu read section is against that. > > I think we can simply use shrinker_try_get() here as Nhat said. Hmm, so is this unsafe even with the current rcu shennanigans? What's stopping shrinker to be freed by that callback before we enter rcu_read_section()? Seems like this is just implicitly correct - shrinker_debugfs_detach() and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu, shrinker_free_rcu_cb);, so if you're reading this file, then it's before shrinker_free_rcu_cb() is even registered? Do we still need rcu or shrinker_try_get() here? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 2026-06-10 18:38 ` Nhat Pham @ 2026-06-10 22:08 ` Shakeel Butt 2026-06-11 18:34 ` Roman Gushchin 0 siblings, 1 reply; 10+ messages in thread From: Shakeel Butt @ 2026-06-10 22:08 UTC (permalink / raw) To: Nhat Pham Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou, roman.gushchin, qi.zheng On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote: > On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > +Roman, Qi > > > > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote: > > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote: > > > > > > Thanks for reporting, Zenghui. > > > > > > > > > > > > > > Hi all, > > > > > > > > The following splat was triggered on the mainline kernel: > > > > > > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 > > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat > > > > preempt_count: 0, expected: 0 > > > > RCU nest depth: 1, expected: 0 > > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT > > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024 > > > > Call trace: > > > > show_stack+0x18/0x24 (C) > > > > dump_stack_lvl+0x78/0x90 > > > > dump_stack+0x18/0x24 > > > > __might_resched+0x114/0x170 > > > > __might_sleep+0x48/0x98 > > > > css_rstat_flush+0x54/0x564 > > > > mem_cgroup_flush_stats+0x9c/0xb0 > > > > zswap_shrinker_count+0xe4/0x1e4 > > > > shrinker_debugfs_count_show+0xd8/0x268 > > > > > > Ah, this seems a bit tricky. > > > > > > Seems like shrinker_debugfs_count_show() is invoking > > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count() > > > triggers a stats flushing, which might sleep. Not ideal. > > > > > > Is the rcu_read_section() here to protect memcg or shrinker? For > > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the > > > memcg before returning. > > > > > > (memcg maintainers please fact check me). > > > > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu > > read section for memcg. > > > > > > > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.: > > > > > > rcu_read_lock(); > > > list_for_each_entry_rcu(shrinker, &shrinker_list, list) > > > { > > > if (!shrinker_try_get(shrinker)) > > > continue; > > > rcu_read_unlock(); > > > } > > > > > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe: > > > > Shouldn't the caller already holds the reference to the shrinker which it is > > giving to this function? Does debugfs file entry holds a reference to the > > shrinker which it is giving. > > > > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu, > > shrinker_free_rcu_cb), so this rcu read section is against that. > > > > I think we can simply use shrinker_try_get() here as Nhat said. > > Hmm, so is this unsafe even with the current rcu shennanigans? What's > stopping shrinker to be freed by that callback before we enter > rcu_read_section()? > > Seems like this is just implicitly correct - shrinker_debugfs_detach() > and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu, > shrinker_free_rcu_cb);, so if you're reading this file, then it's > before shrinker_free_rcu_cb() is even registered? > > Do we still need rcu or shrinker_try_get() here? I think you are right that we don't need rcu or shrinker_try_get() but it is more about an active debugfs file reader. Suppose we are sleeping within rstat flush from shrinker_debugfs_count_show() and there is a parallel shrinker_debugfs_remove() call. shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack there is a call wait_for_completion(&fsd->active_users_drained) which will wait for active users, one of which is sleeping within rstat flush. So, let's simply remove rcu read here. > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 2026-06-10 22:08 ` Shakeel Butt @ 2026-06-11 18:34 ` Roman Gushchin 2026-06-11 18:38 ` Shakeel Butt 0 siblings, 1 reply; 10+ messages in thread From: Roman Gushchin @ 2026-06-11 18:34 UTC (permalink / raw) To: Shakeel Butt Cc: Nhat Pham, Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou, qi.zheng Shakeel Butt <shakeel.butt@linux.dev> writes: > On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote: >> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote: >> > >> > +Roman, Qi >> > >> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote: >> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote: >> > > >> > > Thanks for reporting, Zenghui. >> > > >> > > >> > > > >> > > > Hi all, >> > > > >> > > > The following splat was triggered on the mainline kernel: >> > > > >> > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 >> > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat >> > > > preempt_count: 0, expected: 0 >> > > > RCU nest depth: 1, expected: 0 >> > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT >> > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024 >> > > > Call trace: >> > > > show_stack+0x18/0x24 (C) >> > > > dump_stack_lvl+0x78/0x90 >> > > > dump_stack+0x18/0x24 >> > > > __might_resched+0x114/0x170 >> > > > __might_sleep+0x48/0x98 >> > > > css_rstat_flush+0x54/0x564 >> > > > mem_cgroup_flush_stats+0x9c/0xb0 >> > > > zswap_shrinker_count+0xe4/0x1e4 >> > > > shrinker_debugfs_count_show+0xd8/0x268 >> > > >> > > Ah, this seems a bit tricky. >> > > >> > > Seems like shrinker_debugfs_count_show() is invoking >> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count() >> > > triggers a stats flushing, which might sleep. Not ideal. >> > > >> > > Is the rcu_read_section() here to protect memcg or shrinker? For >> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the >> > > memcg before returning. >> > > >> > > (memcg maintainers please fact check me). >> > >> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu >> > read section for memcg. >> > >> > > >> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.: >> > > >> > > rcu_read_lock(); >> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list) >> > > { >> > > if (!shrinker_try_get(shrinker)) >> > > continue; >> > > rcu_read_unlock(); >> > > } >> > > >> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe: >> > >> > Shouldn't the caller already holds the reference to the shrinker which it is >> > giving to this function? Does debugfs file entry holds a reference to the >> > shrinker which it is giving. >> > >> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu, >> > shrinker_free_rcu_cb), so this rcu read section is against that. >> > >> > I think we can simply use shrinker_try_get() here as Nhat said. >> >> Hmm, so is this unsafe even with the current rcu shennanigans? What's >> stopping shrinker to be freed by that callback before we enter >> rcu_read_section()? >> >> Seems like this is just implicitly correct - shrinker_debugfs_detach() >> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu, >> shrinker_free_rcu_cb);, so if you're reading this file, then it's >> before shrinker_free_rcu_cb() is even registered? >> >> Do we still need rcu or shrinker_try_get() here? > > I think you are right that we don't need rcu or shrinker_try_get() but it is > more about an active debugfs file reader. Suppose we are sleeping within rstat > flush from shrinker_debugfs_count_show() and there is a parallel > shrinker_debugfs_remove() call. > > shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack > there is a call wait_for_completion(&fsd->active_users_drained) which will wait > for active users, one of which is sleeping within rstat flush. > > So, let's simply remove rcu read here. +1 to this. How about this version? -- From a4a018d026c9a39ec15a5b30014e81ce2381e281 Mon Sep 17 00:00:00 2001 From: Roman Gushchin <roman.gushchin@linux.dev> Date: Thu, 11 Jun 2026 17:40:21 +0000 Subject: [PATCH] mm: shrinkers: remove unnecessary rcu_read_lock() Zenghui Yu reported a lockdep splat caused by sleeping within a rcu_read_lock section in shrinker_debugfs_count_show(): BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024 Call trace: show_stack+0x18/0x24 (C) dump_stack_lvl+0x78/0x90 dump_stack+0x18/0x24 __might_resched+0x114/0x170 __might_sleep+0x48/0x98 css_rstat_flush+0x54/0x564 mem_cgroup_flush_stats+0x9c/0xb0 zswap_shrinker_count+0xe4/0x1e4 shrinker_debugfs_count_show+0xd8/0x268 seq_read_iter+0x1b8/0x4ac seq_read+0xe0/0x11c full_proxy_read+0x6c/0xa8 vfs_read+0xc0/0x2fc ksys_read+0x68/0xfc __arm64_sys_read+0x1c/0x28 invoke_syscall+0x54/0x110 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x38/0x128 el0t_64_sync_handler+0xa0/0xe4 el0t_64_sync+0x198/0x19c Fix it by removing the rcu_read_lock()/unlock() entirely. Indeed: it's not needed here: memcg's are protected by mem_cgroup_iter() and shrinker by being attached to the debugfs file. Reported-by: Zenghui Yu <zenghui.yu@linux.dev> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Change-Id: Ifa381f7983491cde61354af9b1cb14ca373c1f75 --- mm/shrinker_debug.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c index affa64437302..cda4e86428c8 100644 --- a/mm/shrinker_debug.c +++ b/mm/shrinker_debug.c @@ -57,8 +57,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v) if (!count_per_node) return -ENOMEM; - rcu_read_lock(); - memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE; memcg = mem_cgroup_iter(NULL, NULL, NULL); @@ -88,8 +86,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v) } } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); - rcu_read_unlock(); - kfree(count_per_node); return ret; } -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 2026-06-11 18:34 ` Roman Gushchin @ 2026-06-11 18:38 ` Shakeel Butt 2026-06-11 18:53 ` Roman Gushchin 0 siblings, 1 reply; 10+ messages in thread From: Shakeel Butt @ 2026-06-11 18:38 UTC (permalink / raw) To: Roman Gushchin Cc: Nhat Pham, Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou, qi.zheng On Thu, Jun 11, 2026 at 06:34:15PM +0000, Roman Gushchin wrote: > Shakeel Butt <shakeel.butt@linux.dev> writes: > > > On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote: > >> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote: > >> > > >> > +Roman, Qi > >> > > >> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote: > >> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote: > >> > > > >> > > Thanks for reporting, Zenghui. > >> > > > >> > > > >> > > > > >> > > > Hi all, > >> > > > > >> > > > The following splat was triggered on the mainline kernel: > >> > > > > >> > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 > >> > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat > >> > > > preempt_count: 0, expected: 0 > >> > > > RCU nest depth: 1, expected: 0 > >> > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT > >> > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024 > >> > > > Call trace: > >> > > > show_stack+0x18/0x24 (C) > >> > > > dump_stack_lvl+0x78/0x90 > >> > > > dump_stack+0x18/0x24 > >> > > > __might_resched+0x114/0x170 > >> > > > __might_sleep+0x48/0x98 > >> > > > css_rstat_flush+0x54/0x564 > >> > > > mem_cgroup_flush_stats+0x9c/0xb0 > >> > > > zswap_shrinker_count+0xe4/0x1e4 > >> > > > shrinker_debugfs_count_show+0xd8/0x268 > >> > > > >> > > Ah, this seems a bit tricky. > >> > > > >> > > Seems like shrinker_debugfs_count_show() is invoking > >> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count() > >> > > triggers a stats flushing, which might sleep. Not ideal. > >> > > > >> > > Is the rcu_read_section() here to protect memcg or shrinker? For > >> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the > >> > > memcg before returning. > >> > > > >> > > (memcg maintainers please fact check me). > >> > > >> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu > >> > read section for memcg. > >> > > >> > > > >> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.: > >> > > > >> > > rcu_read_lock(); > >> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list) > >> > > { > >> > > if (!shrinker_try_get(shrinker)) > >> > > continue; > >> > > rcu_read_unlock(); > >> > > } > >> > > > >> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe: > >> > > >> > Shouldn't the caller already holds the reference to the shrinker which it is > >> > giving to this function? Does debugfs file entry holds a reference to the > >> > shrinker which it is giving. > >> > > >> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu, > >> > shrinker_free_rcu_cb), so this rcu read section is against that. > >> > > >> > I think we can simply use shrinker_try_get() here as Nhat said. > >> > >> Hmm, so is this unsafe even with the current rcu shennanigans? What's > >> stopping shrinker to be freed by that callback before we enter > >> rcu_read_section()? > >> > >> Seems like this is just implicitly correct - shrinker_debugfs_detach() > >> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu, > >> shrinker_free_rcu_cb);, so if you're reading this file, then it's > >> before shrinker_free_rcu_cb() is even registered? > >> > >> Do we still need rcu or shrinker_try_get() here? > > > > I think you are right that we don't need rcu or shrinker_try_get() but it is > > more about an active debugfs file reader. Suppose we are sleeping within rstat > > flush from shrinker_debugfs_count_show() and there is a parallel > > shrinker_debugfs_remove() call. > > > > shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack > > there is a call wait_for_completion(&fsd->active_users_drained) which will wait > > for active users, one of which is sleeping within rstat flush. > > > > So, let's simply remove rcu read here. > > +1 to this. How about this version? > Thanks, Andrew already picked up https://lore.kernel.org/all/20260610232048.62930-1-shakeel.butt@linux.dev/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 2026-06-11 18:38 ` Shakeel Butt @ 2026-06-11 18:53 ` Roman Gushchin 0 siblings, 0 replies; 10+ messages in thread From: Roman Gushchin @ 2026-06-11 18:53 UTC (permalink / raw) To: Shakeel Butt Cc: Nhat Pham, Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou, qi.zheng Shakeel Butt <shakeel.butt@linux.dev> writes: > On Thu, Jun 11, 2026 at 06:34:15PM +0000, Roman Gushchin wrote: >> Shakeel Butt <shakeel.butt@linux.dev> writes: >> >> > On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote: >> >> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote: >> >> > >> >> > +Roman, Qi >> >> > >> >> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote: >> >> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote: >> >> > > >> >> > > Thanks for reporting, Zenghui. >> >> > > >> >> > > >> >> > > > >> >> > > > Hi all, >> >> > > > >> >> > > > The following splat was triggered on the mainline kernel: >> >> > > > >> >> > > > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 >> >> > > > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat >> >> > > > preempt_count: 0, expected: 0 >> >> > > > RCU nest depth: 1, expected: 0 >> >> > > > CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT >> >> > > > Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024 >> >> > > > Call trace: >> >> > > > show_stack+0x18/0x24 (C) >> >> > > > dump_stack_lvl+0x78/0x90 >> >> > > > dump_stack+0x18/0x24 >> >> > > > __might_resched+0x114/0x170 >> >> > > > __might_sleep+0x48/0x98 >> >> > > > css_rstat_flush+0x54/0x564 >> >> > > > mem_cgroup_flush_stats+0x9c/0xb0 >> >> > > > zswap_shrinker_count+0xe4/0x1e4 >> >> > > > shrinker_debugfs_count_show+0xd8/0x268 >> >> > > >> >> > > Ah, this seems a bit tricky. >> >> > > >> >> > > Seems like shrinker_debugfs_count_show() is invoking >> >> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count() >> >> > > triggers a stats flushing, which might sleep. Not ideal. >> >> > > >> >> > > Is the rcu_read_section() here to protect memcg or shrinker? For >> >> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the >> >> > > memcg before returning. >> >> > > >> >> > > (memcg maintainers please fact check me). >> >> > >> >> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu >> >> > read section for memcg. >> >> > >> >> > > >> >> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.: >> >> > > >> >> > > rcu_read_lock(); >> >> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list) >> >> > > { >> >> > > if (!shrinker_try_get(shrinker)) >> >> > > continue; >> >> > > rcu_read_unlock(); >> >> > > } >> >> > > >> >> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe: >> >> > >> >> > Shouldn't the caller already holds the reference to the shrinker which it is >> >> > giving to this function? Does debugfs file entry holds a reference to the >> >> > shrinker which it is giving. >> >> > >> >> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu, >> >> > shrinker_free_rcu_cb), so this rcu read section is against that. >> >> > >> >> > I think we can simply use shrinker_try_get() here as Nhat said. >> >> >> >> Hmm, so is this unsafe even with the current rcu shennanigans? What's >> >> stopping shrinker to be freed by that callback before we enter >> >> rcu_read_section()? >> >> >> >> Seems like this is just implicitly correct - shrinker_debugfs_detach() >> >> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu, >> >> shrinker_free_rcu_cb);, so if you're reading this file, then it's >> >> before shrinker_free_rcu_cb() is even registered? >> >> >> >> Do we still need rcu or shrinker_try_get() here? >> > >> > I think you are right that we don't need rcu or shrinker_try_get() but it is >> > more about an active debugfs file reader. Suppose we are sleeping within rstat >> > flush from shrinker_debugfs_count_show() and there is a parallel >> > shrinker_debugfs_remove() call. >> > >> > shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack >> > there is a call wait_for_completion(&fsd->active_users_drained) which will wait >> > for active users, one of which is sleeping within rstat flush. >> > >> > So, let's simply remove rcu read here. >> >> +1 to this. How about this version? >> > > Thanks, Andrew already picked up > https://lore.kernel.org/all/20260610232048.62930-1-shakeel.butt@linux.dev/ Awesome, thanks! ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-11 18:54 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu 2026-06-10 16:38 ` Nhat Pham 2026-06-10 16:47 ` Nhat Pham 2026-06-10 16:48 ` Nhat Pham 2026-06-10 17:31 ` Shakeel Butt 2026-06-10 18:38 ` Nhat Pham 2026-06-10 22:08 ` Shakeel Butt 2026-06-11 18:34 ` Roman Gushchin 2026-06-11 18:38 ` Shakeel Butt 2026-06-11 18:53 ` Roman Gushchin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.