[zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421

All of lore.kernel.org
 help / color / mirror / Atom feed

* [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
@ 2026-06-10 16:05 Zenghui Yu
  2026-06-10 16:38 ` Nhat Pham
  0 siblings, 1 reply; 10+ messages in thread
From: Zenghui Yu @ 2026-06-10 16:05 UTC (permalink / raw)
  To: linux-mm; +Cc: hannes, yosry, nphamcs, chengming.zhou

Hi all,

The following splat was triggered on the mainline kernel:

 BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
 preempt_count: 0, expected: 0
 RCU nest depth: 1, expected: 0
 CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT 
 Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
 Call trace:
  show_stack+0x18/0x24 (C)
  dump_stack_lvl+0x78/0x90
  dump_stack+0x18/0x24
  __might_resched+0x114/0x170
  __might_sleep+0x48/0x98
  css_rstat_flush+0x54/0x564
  mem_cgroup_flush_stats+0x9c/0xb0
  zswap_shrinker_count+0xe4/0x1e4
  shrinker_debugfs_count_show+0xd8/0x268
  seq_read_iter+0x1b8/0x4ac
  seq_read+0xe0/0x11c
  full_proxy_read+0x6c/0xa8
  vfs_read+0xc0/0x2fc
  ksys_read+0x68/0xfc
  __arm64_sys_read+0x1c/0x28
  invoke_syscall+0x54/0x110
  el0_svc_common.constprop.0+0x40/0xe0
  do_el0_svc+0x1c/0x28
  el0_svc+0x38/0x128
  el0t_64_sync_handler+0xa0/0xe4
  el0t_64_sync+0x198/0x19c

The kernel is built with arm64's virt.config plus

+CONFIG_DEBUG_ATOMIC_SLEEP=y
+CONFIG_SHRINKER_DEBUG=y
+CONFIG_ZSWAP=y

I can reproduce the issue with the following steps:

    $ echo Y > /sys/module/zswap/parameters/enabled
    $ echo Y > /sys/module/zswap/parameters/shrinker_enabled
    $ cat /sys/kernel/debug/shrinker/mm-zswap-60/count

Please have a look.

Thanks,
Zenghui


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu
@ 2026-06-10 16:38 ` Nhat Pham
  2026-06-10 16:47   ` Nhat Pham
  2026-06-10 17:31   ` Shakeel Butt
  0 siblings, 2 replies; 10+ messages in thread
From: Nhat Pham @ 2026-06-10 16:38 UTC (permalink / raw)
  To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt

On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:

Thanks for reporting, Zenghui.

>
> Hi all,
>
> The following splat was triggered on the mainline kernel:
>
>  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
>  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
>  preempt_count: 0, expected: 0
>  RCU nest depth: 1, expected: 0
>  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
>  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
>  Call trace:
>   show_stack+0x18/0x24 (C)
>   dump_stack_lvl+0x78/0x90
>   dump_stack+0x18/0x24
>   __might_resched+0x114/0x170
>   __might_sleep+0x48/0x98
>   css_rstat_flush+0x54/0x564
>   mem_cgroup_flush_stats+0x9c/0xb0
>   zswap_shrinker_count+0xe4/0x1e4
>   shrinker_debugfs_count_show+0xd8/0x268

Ah, this seems a bit tricky.

Seems like shrinker_debugfs_count_show() is invoking
zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
triggers a stats flushing, which might sleep. Not ideal.

Is the rcu_read_section() here to protect memcg or shrinker? For
memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
memcg before returning.

(memcg maintainers please fact check me).

If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:

rcu_read_lock();
list_for_each_entry_rcu(shrinker, &shrinker_list, list)
{
    if (!shrinker_try_get(shrinker))
        continue;
    rcu_read_unlock();
}

But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:

rcu_read_lock();
memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;

We get the shrinker reference outside of the rcu_read_section(), and
just dereference it without any checking inside of the section.

I think we can just remove the rcu_read_(un)lock() here?

Long term, I still think we'd be better off getting rid of this stats
flushing. Seems expensive either way.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 16:38 ` Nhat Pham
@ 2026-06-10 16:47   ` Nhat Pham
  2026-06-10 16:48     ` Nhat Pham
  2026-06-10 17:31   ` Shakeel Butt
  1 sibling, 1 reply; 10+ messages in thread
From: Nhat Pham @ 2026-06-10 16:47 UTC (permalink / raw)
  To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt

On Wed, Jun 10, 2026 at 9:38 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>
> Thanks for reporting, Zenghui.
>
>
> >
> > Hi all,
> >
> > The following splat was triggered on the mainline kernel:
> >
> >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> >  preempt_count: 0, expected: 0
> >  RCU nest depth: 1, expected: 0
> >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> >  Call trace:
> >   show_stack+0x18/0x24 (C)
> >   dump_stack_lvl+0x78/0x90
> >   dump_stack+0x18/0x24
> >   __might_resched+0x114/0x170
> >   __might_sleep+0x48/0x98
> >   css_rstat_flush+0x54/0x564
> >   mem_cgroup_flush_stats+0x9c/0xb0
> >   zswap_shrinker_count+0xe4/0x1e4
> >   shrinker_debugfs_count_show+0xd8/0x268
>
> Ah, this seems a bit tricky.
>
> Seems like shrinker_debugfs_count_show() is invoking
> zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> triggers a stats flushing, which might sleep. Not ideal.
>
> Is the rcu_read_section() here to protect memcg or shrinker? For
> memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> memcg before returning.
>
> (memcg maintainers please fact check me).
>
> If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>
> rcu_read_lock();
> list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> {
>     if (!shrinker_try_get(shrinker))
>         continue;
>     rcu_read_unlock();
> }
>
> But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>
> rcu_read_lock();
> memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
>
> We get the shrinker reference outside of the rcu_read_section(), and
> just dereference it without any checking inside of the section.
>
> I think we can just remove the rcu_read_(un)lock() here?
>

Also, looking at the code a bit closer - if (!shrinker->flags &
SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all
and inducing all the memcg-related overhead at all...

The code really should be structure as:

if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
    total = shrinker_count_objects(shrinker, NULL, count_per_node);
    if (total)
      ...
} else {
   memcg = mem_cgroup_iter(NULL, NULL, NULL);
   do {
   } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
}
...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 16:47   ` Nhat Pham
@ 2026-06-10 16:48     ` Nhat Pham
  0 siblings, 0 replies; 10+ messages in thread
From: Nhat Pham @ 2026-06-10 16:48 UTC (permalink / raw)
  To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt

On Wed, Jun 10, 2026 at 9:47 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
>
> Also, looking at the code a bit closer - if (!shrinker->flags &
> SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all
> and inducing all the memcg-related overhead at all...
>
> The code really should be structure as:
>
> if (shrinker->flags & SHRINKER_MEMCG_AWARE) {

As usual, i flip the conditionals :( But you get the idea...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 16:38 ` Nhat Pham
  2026-06-10 16:47   ` Nhat Pham
@ 2026-06-10 17:31   ` Shakeel Butt
  2026-06-10 18:38     ` Nhat Pham
  1 sibling, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2026-06-10 17:31 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
	roman.gushchin, qi.zheng

+Roman, Qi

On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> 
> Thanks for reporting, Zenghui.
> 
> 
> >
> > Hi all,
> >
> > The following splat was triggered on the mainline kernel:
> >
> >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> >  preempt_count: 0, expected: 0
> >  RCU nest depth: 1, expected: 0
> >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> >  Call trace:
> >   show_stack+0x18/0x24 (C)
> >   dump_stack_lvl+0x78/0x90
> >   dump_stack+0x18/0x24
> >   __might_resched+0x114/0x170
> >   __might_sleep+0x48/0x98
> >   css_rstat_flush+0x54/0x564
> >   mem_cgroup_flush_stats+0x9c/0xb0
> >   zswap_shrinker_count+0xe4/0x1e4
> >   shrinker_debugfs_count_show+0xd8/0x268
> 
> Ah, this seems a bit tricky.
> 
> Seems like shrinker_debugfs_count_show() is invoking
> zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> triggers a stats flushing, which might sleep. Not ideal.
> 
> Is the rcu_read_section() here to protect memcg or shrinker? For
> memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> memcg before returning.
> 
> (memcg maintainers please fact check me).

mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
read section for memcg.

> 
> If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> 
> rcu_read_lock();
> list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> {
>     if (!shrinker_try_get(shrinker))
>         continue;
>     rcu_read_unlock();
> }
> 
> But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:

Shouldn't the caller already holds the reference to the shrinker which it is
giving to this function? Does debugfs file entry holds a reference to the
shrinker which it is giving.

After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
shrinker_free_rcu_cb), so this rcu read section is against that.

I think we can simply use shrinker_try_get() here as Nhat said.

> 
> rcu_read_lock();
> memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
> 
> We get the shrinker reference outside of the rcu_read_section(), and
> just dereference it without any checking inside of the section.
> 
> I think we can just remove the rcu_read_(un)lock() here?
> 
> Long term, I still think we'd be better off getting rid of this stats
> flushing. Seems expensive either way.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 17:31   ` Shakeel Butt
@ 2026-06-10 18:38     ` Nhat Pham
  2026-06-10 22:08       ` Shakeel Butt
  0 siblings, 1 reply; 10+ messages in thread
From: Nhat Pham @ 2026-06-10 18:38 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
	roman.gushchin, qi.zheng

On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> +Roman, Qi
>
> On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> >
> > Thanks for reporting, Zenghui.
> >
> >
> > >
> > > Hi all,
> > >
> > > The following splat was triggered on the mainline kernel:
> > >
> > >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > >  preempt_count: 0, expected: 0
> > >  RCU nest depth: 1, expected: 0
> > >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > >  Call trace:
> > >   show_stack+0x18/0x24 (C)
> > >   dump_stack_lvl+0x78/0x90
> > >   dump_stack+0x18/0x24
> > >   __might_resched+0x114/0x170
> > >   __might_sleep+0x48/0x98
> > >   css_rstat_flush+0x54/0x564
> > >   mem_cgroup_flush_stats+0x9c/0xb0
> > >   zswap_shrinker_count+0xe4/0x1e4
> > >   shrinker_debugfs_count_show+0xd8/0x268
> >
> > Ah, this seems a bit tricky.
> >
> > Seems like shrinker_debugfs_count_show() is invoking
> > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> > triggers a stats flushing, which might sleep. Not ideal.
> >
> > Is the rcu_read_section() here to protect memcg or shrinker? For
> > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> > memcg before returning.
> >
> > (memcg maintainers please fact check me).
>
> mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> read section for memcg.
>
> >
> > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> >
> > rcu_read_lock();
> > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> > {
> >     if (!shrinker_try_get(shrinker))
> >         continue;
> >     rcu_read_unlock();
> > }
> >
> > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>
> Shouldn't the caller already holds the reference to the shrinker which it is
> giving to this function? Does debugfs file entry holds a reference to the
> shrinker which it is giving.
>
> After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> shrinker_free_rcu_cb), so this rcu read section is against that.
>
> I think we can simply use shrinker_try_get() here as Nhat said.

Hmm, so is this unsafe even with the current rcu shennanigans? What's
stopping shrinker to be freed by that callback before we enter
rcu_read_section()?

Seems like this is just implicitly correct - shrinker_debugfs_detach()
and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
shrinker_free_rcu_cb);, so if you're reading this file, then it's
before shrinker_free_rcu_cb() is even registered?

Do we still need rcu or shrinker_try_get() here?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 18:38     ` Nhat Pham
@ 2026-06-10 22:08       ` Shakeel Butt
  2026-06-11 18:34         ` Roman Gushchin
  0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2026-06-10 22:08 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
	roman.gushchin, qi.zheng

On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > +Roman, Qi
> >
> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> > >
> > > Thanks for reporting, Zenghui.
> > >
> > >
> > > >
> > > > Hi all,
> > > >
> > > > The following splat was triggered on the mainline kernel:
> > > >
> > > >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > > >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > > >  preempt_count: 0, expected: 0
> > > >  RCU nest depth: 1, expected: 0
> > > >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > > >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > > >  Call trace:
> > > >   show_stack+0x18/0x24 (C)
> > > >   dump_stack_lvl+0x78/0x90
> > > >   dump_stack+0x18/0x24
> > > >   __might_resched+0x114/0x170
> > > >   __might_sleep+0x48/0x98
> > > >   css_rstat_flush+0x54/0x564
> > > >   mem_cgroup_flush_stats+0x9c/0xb0
> > > >   zswap_shrinker_count+0xe4/0x1e4
> > > >   shrinker_debugfs_count_show+0xd8/0x268
> > >
> > > Ah, this seems a bit tricky.
> > >
> > > Seems like shrinker_debugfs_count_show() is invoking
> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> > > triggers a stats flushing, which might sleep. Not ideal.
> > >
> > > Is the rcu_read_section() here to protect memcg or shrinker? For
> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> > > memcg before returning.
> > >
> > > (memcg maintainers please fact check me).
> >
> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> > read section for memcg.
> >
> > >
> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> > >
> > > rcu_read_lock();
> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> > > {
> > >     if (!shrinker_try_get(shrinker))
> > >         continue;
> > >     rcu_read_unlock();
> > > }
> > >
> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
> >
> > Shouldn't the caller already holds the reference to the shrinker which it is
> > giving to this function? Does debugfs file entry holds a reference to the
> > shrinker which it is giving.
> >
> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> > shrinker_free_rcu_cb), so this rcu read section is against that.
> >
> > I think we can simply use shrinker_try_get() here as Nhat said.
> 
> Hmm, so is this unsafe even with the current rcu shennanigans? What's
> stopping shrinker to be freed by that callback before we enter
> rcu_read_section()?
> 
> Seems like this is just implicitly correct - shrinker_debugfs_detach()
> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
> shrinker_free_rcu_cb);, so if you're reading this file, then it's
> before shrinker_free_rcu_cb() is even registered?
> 
> Do we still need rcu or shrinker_try_get() here?

I think you are right that we don't need rcu or shrinker_try_get() but it is
more about an active debugfs file reader. Suppose we are sleeping within rstat
flush from shrinker_debugfs_count_show() and there is a parallel
shrinker_debugfs_remove() call.

shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
there is a call wait_for_completion(&fsd->active_users_drained) which will wait
for active users, one of which is sleeping within rstat flush.

So, let's simply remove rcu read here.

> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 22:08       ` Shakeel Butt
@ 2026-06-11 18:34         ` Roman Gushchin
  2026-06-11 18:38           ` Shakeel Butt
  0 siblings, 1 reply; 10+ messages in thread
From: Roman Gushchin @ 2026-06-11 18:34 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Nhat Pham, Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
	qi.zheng

Shakeel Butt <shakeel.butt@linux.dev> writes:

> On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
>> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>> >
>> > +Roman, Qi
>> >
>> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
>> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>> > >
>> > > Thanks for reporting, Zenghui.
>> > >
>> > >
>> > > >
>> > > > Hi all,
>> > > >
>> > > > The following splat was triggered on the mainline kernel:
>> > > >
>> > > >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
>> > > >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
>> > > >  preempt_count: 0, expected: 0
>> > > >  RCU nest depth: 1, expected: 0
>> > > >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
>> > > >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
>> > > >  Call trace:
>> > > >   show_stack+0x18/0x24 (C)
>> > > >   dump_stack_lvl+0x78/0x90
>> > > >   dump_stack+0x18/0x24
>> > > >   __might_resched+0x114/0x170
>> > > >   __might_sleep+0x48/0x98
>> > > >   css_rstat_flush+0x54/0x564
>> > > >   mem_cgroup_flush_stats+0x9c/0xb0
>> > > >   zswap_shrinker_count+0xe4/0x1e4
>> > > >   shrinker_debugfs_count_show+0xd8/0x268
>> > >
>> > > Ah, this seems a bit tricky.
>> > >
>> > > Seems like shrinker_debugfs_count_show() is invoking
>> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
>> > > triggers a stats flushing, which might sleep. Not ideal.
>> > >
>> > > Is the rcu_read_section() here to protect memcg or shrinker? For
>> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
>> > > memcg before returning.
>> > >
>> > > (memcg maintainers please fact check me).
>> >
>> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
>> > read section for memcg.
>> >
>> > >
>> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>> > >
>> > > rcu_read_lock();
>> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
>> > > {
>> > >     if (!shrinker_try_get(shrinker))
>> > >         continue;
>> > >     rcu_read_unlock();
>> > > }
>> > >
>> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>> >
>> > Shouldn't the caller already holds the reference to the shrinker which it is
>> > giving to this function? Does debugfs file entry holds a reference to the
>> > shrinker which it is giving.
>> >
>> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
>> > shrinker_free_rcu_cb), so this rcu read section is against that.
>> >
>> > I think we can simply use shrinker_try_get() here as Nhat said.
>> 
>> Hmm, so is this unsafe even with the current rcu shennanigans? What's
>> stopping shrinker to be freed by that callback before we enter
>> rcu_read_section()?
>> 
>> Seems like this is just implicitly correct - shrinker_debugfs_detach()
>> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
>> shrinker_free_rcu_cb);, so if you're reading this file, then it's
>> before shrinker_free_rcu_cb() is even registered?
>> 
>> Do we still need rcu or shrinker_try_get() here?
>
> I think you are right that we don't need rcu or shrinker_try_get() but it is
> more about an active debugfs file reader. Suppose we are sleeping within rstat
> flush from shrinker_debugfs_count_show() and there is a parallel
> shrinker_debugfs_remove() call.
>
> shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
> there is a call wait_for_completion(&fsd->active_users_drained) which will wait
> for active users, one of which is sleeping within rstat flush.
>
> So, let's simply remove rcu read here.

+1 to this. How about this version?

--

From a4a018d026c9a39ec15a5b30014e81ce2381e281 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <roman.gushchin@linux.dev>
Date: Thu, 11 Jun 2026 17:40:21 +0000
Subject: [PATCH] mm: shrinkers: remove unnecessary rcu_read_lock()

Zenghui Yu reported a lockdep splat caused by sleeping within a
rcu_read_lock section in shrinker_debugfs_count_show():

 BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
 preempt_count: 0, expected: 0
 RCU nest depth: 1, expected: 0
 CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
 Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
 Call trace:
  show_stack+0x18/0x24 (C)
  dump_stack_lvl+0x78/0x90
  dump_stack+0x18/0x24
  __might_resched+0x114/0x170
  __might_sleep+0x48/0x98
  css_rstat_flush+0x54/0x564
  mem_cgroup_flush_stats+0x9c/0xb0
  zswap_shrinker_count+0xe4/0x1e4
  shrinker_debugfs_count_show+0xd8/0x268
  seq_read_iter+0x1b8/0x4ac
  seq_read+0xe0/0x11c
  full_proxy_read+0x6c/0xa8
  vfs_read+0xc0/0x2fc
  ksys_read+0x68/0xfc
  __arm64_sys_read+0x1c/0x28
  invoke_syscall+0x54/0x110
  el0_svc_common.constprop.0+0x40/0xe0
  do_el0_svc+0x1c/0x28
  el0_svc+0x38/0x128
  el0t_64_sync_handler+0xa0/0xe4
  el0t_64_sync+0x198/0x19c

Fix it by removing the rcu_read_lock()/unlock() entirely.
Indeed: it's not needed here: memcg's are protected by
mem_cgroup_iter() and shrinker by being attached to the debugfs file.

Reported-by: Zenghui Yu <zenghui.yu@linux.dev>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Change-Id: Ifa381f7983491cde61354af9b1cb14ca373c1f75
---
 mm/shrinker_debug.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index affa64437302..cda4e86428c8 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -57,8 +57,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
 	if (!count_per_node)
 		return -ENOMEM;
 
-	rcu_read_lock();
-
 	memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
 
 	memcg = mem_cgroup_iter(NULL, NULL, NULL);
@@ -88,8 +86,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
 		}
 	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
 
-	rcu_read_unlock();
-
 	kfree(count_per_node);
 	return ret;
 }
-- 
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-11 18:34         ` Roman Gushchin
@ 2026-06-11 18:38           ` Shakeel Butt
  2026-06-11 18:53             ` Roman Gushchin
  0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2026-06-11 18:38 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Nhat Pham, Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
	qi.zheng

On Thu, Jun 11, 2026 at 06:34:15PM +0000, Roman Gushchin wrote:
> Shakeel Butt <shakeel.butt@linux.dev> writes:
> 
> > On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
> >> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >> >
> >> > +Roman, Qi
> >> >
> >> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> >> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> >> > >
> >> > > Thanks for reporting, Zenghui.
> >> > >
> >> > >
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > The following splat was triggered on the mainline kernel:
> >> > > >
> >> > > >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> >> > > >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> >> > > >  preempt_count: 0, expected: 0
> >> > > >  RCU nest depth: 1, expected: 0
> >> > > >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> >> > > >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> >> > > >  Call trace:
> >> > > >   show_stack+0x18/0x24 (C)
> >> > > >   dump_stack_lvl+0x78/0x90
> >> > > >   dump_stack+0x18/0x24
> >> > > >   __might_resched+0x114/0x170
> >> > > >   __might_sleep+0x48/0x98
> >> > > >   css_rstat_flush+0x54/0x564
> >> > > >   mem_cgroup_flush_stats+0x9c/0xb0
> >> > > >   zswap_shrinker_count+0xe4/0x1e4
> >> > > >   shrinker_debugfs_count_show+0xd8/0x268
> >> > >
> >> > > Ah, this seems a bit tricky.
> >> > >
> >> > > Seems like shrinker_debugfs_count_show() is invoking
> >> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> >> > > triggers a stats flushing, which might sleep. Not ideal.
> >> > >
> >> > > Is the rcu_read_section() here to protect memcg or shrinker? For
> >> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> >> > > memcg before returning.
> >> > >
> >> > > (memcg maintainers please fact check me).
> >> >
> >> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> >> > read section for memcg.
> >> >
> >> > >
> >> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> >> > >
> >> > > rcu_read_lock();
> >> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> >> > > {
> >> > >     if (!shrinker_try_get(shrinker))
> >> > >         continue;
> >> > >     rcu_read_unlock();
> >> > > }
> >> > >
> >> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
> >> >
> >> > Shouldn't the caller already holds the reference to the shrinker which it is
> >> > giving to this function? Does debugfs file entry holds a reference to the
> >> > shrinker which it is giving.
> >> >
> >> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> >> > shrinker_free_rcu_cb), so this rcu read section is against that.
> >> >
> >> > I think we can simply use shrinker_try_get() here as Nhat said.
> >> 
> >> Hmm, so is this unsafe even with the current rcu shennanigans? What's
> >> stopping shrinker to be freed by that callback before we enter
> >> rcu_read_section()?
> >> 
> >> Seems like this is just implicitly correct - shrinker_debugfs_detach()
> >> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
> >> shrinker_free_rcu_cb);, so if you're reading this file, then it's
> >> before shrinker_free_rcu_cb() is even registered?
> >> 
> >> Do we still need rcu or shrinker_try_get() here?
> >
> > I think you are right that we don't need rcu or shrinker_try_get() but it is
> > more about an active debugfs file reader. Suppose we are sleeping within rstat
> > flush from shrinker_debugfs_count_show() and there is a parallel
> > shrinker_debugfs_remove() call.
> >
> > shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
> > there is a call wait_for_completion(&fsd->active_users_drained) which will wait
> > for active users, one of which is sleeping within rstat flush.
> >
> > So, let's simply remove rcu read here.
> 
> +1 to this. How about this version?
> 

Thanks, Andrew already picked up
https://lore.kernel.org/all/20260610232048.62930-1-shakeel.butt@linux.dev/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-11 18:38           ` Shakeel Butt
@ 2026-06-11 18:53             ` Roman Gushchin
  0 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2026-06-11 18:53 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Nhat Pham, Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
	qi.zheng

Shakeel Butt <shakeel.butt@linux.dev> writes:

> On Thu, Jun 11, 2026 at 06:34:15PM +0000, Roman Gushchin wrote:
>> Shakeel Butt <shakeel.butt@linux.dev> writes:
>> 
>> > On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
>> >> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>> >> >
>> >> > +Roman, Qi
>> >> >
>> >> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
>> >> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>> >> > >
>> >> > > Thanks for reporting, Zenghui.
>> >> > >
>> >> > >
>> >> > > >
>> >> > > > Hi all,
>> >> > > >
>> >> > > > The following splat was triggered on the mainline kernel:
>> >> > > >
>> >> > > >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
>> >> > > >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
>> >> > > >  preempt_count: 0, expected: 0
>> >> > > >  RCU nest depth: 1, expected: 0
>> >> > > >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
>> >> > > >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
>> >> > > >  Call trace:
>> >> > > >   show_stack+0x18/0x24 (C)
>> >> > > >   dump_stack_lvl+0x78/0x90
>> >> > > >   dump_stack+0x18/0x24
>> >> > > >   __might_resched+0x114/0x170
>> >> > > >   __might_sleep+0x48/0x98
>> >> > > >   css_rstat_flush+0x54/0x564
>> >> > > >   mem_cgroup_flush_stats+0x9c/0xb0
>> >> > > >   zswap_shrinker_count+0xe4/0x1e4
>> >> > > >   shrinker_debugfs_count_show+0xd8/0x268
>> >> > >
>> >> > > Ah, this seems a bit tricky.
>> >> > >
>> >> > > Seems like shrinker_debugfs_count_show() is invoking
>> >> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
>> >> > > triggers a stats flushing, which might sleep. Not ideal.
>> >> > >
>> >> > > Is the rcu_read_section() here to protect memcg or shrinker? For
>> >> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
>> >> > > memcg before returning.
>> >> > >
>> >> > > (memcg maintainers please fact check me).
>> >> >
>> >> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
>> >> > read section for memcg.
>> >> >
>> >> > >
>> >> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>> >> > >
>> >> > > rcu_read_lock();
>> >> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
>> >> > > {
>> >> > >     if (!shrinker_try_get(shrinker))
>> >> > >         continue;
>> >> > >     rcu_read_unlock();
>> >> > > }
>> >> > >
>> >> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>> >> >
>> >> > Shouldn't the caller already holds the reference to the shrinker which it is
>> >> > giving to this function? Does debugfs file entry holds a reference to the
>> >> > shrinker which it is giving.
>> >> >
>> >> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
>> >> > shrinker_free_rcu_cb), so this rcu read section is against that.
>> >> >
>> >> > I think we can simply use shrinker_try_get() here as Nhat said.
>> >> 
>> >> Hmm, so is this unsafe even with the current rcu shennanigans? What's
>> >> stopping shrinker to be freed by that callback before we enter
>> >> rcu_read_section()?
>> >> 
>> >> Seems like this is just implicitly correct - shrinker_debugfs_detach()
>> >> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
>> >> shrinker_free_rcu_cb);, so if you're reading this file, then it's
>> >> before shrinker_free_rcu_cb() is even registered?
>> >> 
>> >> Do we still need rcu or shrinker_try_get() here?
>> >
>> > I think you are right that we don't need rcu or shrinker_try_get() but it is
>> > more about an active debugfs file reader. Suppose we are sleeping within rstat
>> > flush from shrinker_debugfs_count_show() and there is a parallel
>> > shrinker_debugfs_remove() call.
>> >
>> > shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
>> > there is a call wait_for_completion(&fsd->active_users_drained) which will wait
>> > for active users, one of which is sleeping within rstat flush.
>> >
>> > So, let's simply remove rcu read here.
>> 
>> +1 to this. How about this version?
>> 
>
> Thanks, Andrew already picked up
> https://lore.kernel.org/all/20260610232048.62930-1-shakeel.butt@linux.dev/

Awesome, thanks!


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-06-11 18:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu
2026-06-10 16:38 ` Nhat Pham
2026-06-10 16:47   ` Nhat Pham
2026-06-10 16:48     ` Nhat Pham
2026-06-10 17:31   ` Shakeel Butt
2026-06-10 18:38     ` Nhat Pham
2026-06-10 22:08       ` Shakeel Butt
2026-06-11 18:34         ` Roman Gushchin
2026-06-11 18:38           ` Shakeel Butt
2026-06-11 18:53             ` Roman Gushchin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.