Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
@ 2026-06-10 16:05 Zenghui Yu
  2026-06-10 16:38 ` Nhat Pham
  0 siblings, 1 reply; 7+ messages in thread
From: Zenghui Yu @ 2026-06-10 16:05 UTC (permalink / raw)
  To: linux-mm; +Cc: hannes, yosry, nphamcs, chengming.zhou

Hi all,

The following splat was triggered on the mainline kernel:

 BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
 preempt_count: 0, expected: 0
 RCU nest depth: 1, expected: 0
 CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT 
 Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
 Call trace:
  show_stack+0x18/0x24 (C)
  dump_stack_lvl+0x78/0x90
  dump_stack+0x18/0x24
  __might_resched+0x114/0x170
  __might_sleep+0x48/0x98
  css_rstat_flush+0x54/0x564
  mem_cgroup_flush_stats+0x9c/0xb0
  zswap_shrinker_count+0xe4/0x1e4
  shrinker_debugfs_count_show+0xd8/0x268
  seq_read_iter+0x1b8/0x4ac
  seq_read+0xe0/0x11c
  full_proxy_read+0x6c/0xa8
  vfs_read+0xc0/0x2fc
  ksys_read+0x68/0xfc
  __arm64_sys_read+0x1c/0x28
  invoke_syscall+0x54/0x110
  el0_svc_common.constprop.0+0x40/0xe0
  do_el0_svc+0x1c/0x28
  el0_svc+0x38/0x128
  el0t_64_sync_handler+0xa0/0xe4
  el0t_64_sync+0x198/0x19c

The kernel is built with arm64's virt.config plus

+CONFIG_DEBUG_ATOMIC_SLEEP=y
+CONFIG_SHRINKER_DEBUG=y
+CONFIG_ZSWAP=y

I can reproduce the issue with the following steps:

    $ echo Y > /sys/module/zswap/parameters/enabled
    $ echo Y > /sys/module/zswap/parameters/shrinker_enabled
    $ cat /sys/kernel/debug/shrinker/mm-zswap-60/count

Please have a look.

Thanks,
Zenghui


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu
@ 2026-06-10 16:38 ` Nhat Pham
  2026-06-10 16:47   ` Nhat Pham
  2026-06-10 17:31   ` Shakeel Butt
  0 siblings, 2 replies; 7+ messages in thread
From: Nhat Pham @ 2026-06-10 16:38 UTC (permalink / raw)
  To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt

On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:

Thanks for reporting, Zenghui.


>
> Hi all,
>
> The following splat was triggered on the mainline kernel:
>
>  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
>  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
>  preempt_count: 0, expected: 0
>  RCU nest depth: 1, expected: 0
>  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
>  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
>  Call trace:
>   show_stack+0x18/0x24 (C)
>   dump_stack_lvl+0x78/0x90
>   dump_stack+0x18/0x24
>   __might_resched+0x114/0x170
>   __might_sleep+0x48/0x98
>   css_rstat_flush+0x54/0x564
>   mem_cgroup_flush_stats+0x9c/0xb0
>   zswap_shrinker_count+0xe4/0x1e4
>   shrinker_debugfs_count_show+0xd8/0x268

Ah, this seems a bit tricky.

Seems like shrinker_debugfs_count_show() is invoking
zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
triggers a stats flushing, which might sleep. Not ideal.

Is the rcu_read_section() here to protect memcg or shrinker? For
memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
memcg before returning.

(memcg maintainers please fact check me).

If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:

rcu_read_lock();
list_for_each_entry_rcu(shrinker, &shrinker_list, list)
{
    if (!shrinker_try_get(shrinker))
        continue;
    rcu_read_unlock();
}

But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:

rcu_read_lock();
memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;

We get the shrinker reference outside of the rcu_read_section(), and
just dereference it without any checking inside of the section.

I think we can just remove the rcu_read_(un)lock() here?

Long term, I still think we'd be better off getting rid of this stats
flushing. Seems expensive either way.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 16:38 ` Nhat Pham
@ 2026-06-10 16:47   ` Nhat Pham
  2026-06-10 16:48     ` Nhat Pham
  2026-06-10 17:31   ` Shakeel Butt
  1 sibling, 1 reply; 7+ messages in thread
From: Nhat Pham @ 2026-06-10 16:47 UTC (permalink / raw)
  To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt

On Wed, Jun 10, 2026 at 9:38 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
>
> Thanks for reporting, Zenghui.
>
>
> >
> > Hi all,
> >
> > The following splat was triggered on the mainline kernel:
> >
> >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> >  preempt_count: 0, expected: 0
> >  RCU nest depth: 1, expected: 0
> >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> >  Call trace:
> >   show_stack+0x18/0x24 (C)
> >   dump_stack_lvl+0x78/0x90
> >   dump_stack+0x18/0x24
> >   __might_resched+0x114/0x170
> >   __might_sleep+0x48/0x98
> >   css_rstat_flush+0x54/0x564
> >   mem_cgroup_flush_stats+0x9c/0xb0
> >   zswap_shrinker_count+0xe4/0x1e4
> >   shrinker_debugfs_count_show+0xd8/0x268
>
> Ah, this seems a bit tricky.
>
> Seems like shrinker_debugfs_count_show() is invoking
> zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> triggers a stats flushing, which might sleep. Not ideal.
>
> Is the rcu_read_section() here to protect memcg or shrinker? For
> memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> memcg before returning.
>
> (memcg maintainers please fact check me).
>
> If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
>
> rcu_read_lock();
> list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> {
>     if (!shrinker_try_get(shrinker))
>         continue;
>     rcu_read_unlock();
> }
>
> But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>
> rcu_read_lock();
> memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
>
> We get the shrinker reference outside of the rcu_read_section(), and
> just dereference it without any checking inside of the section.
>
> I think we can just remove the rcu_read_(un)lock() here?
>

Also, looking at the code a bit closer - if (!shrinker->flags &
SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all
and inducing all the memcg-related overhead at all...

The code really should be structure as:

if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
    total = shrinker_count_objects(shrinker, NULL, count_per_node);
    if (total)
      ...
} else {
   memcg = mem_cgroup_iter(NULL, NULL, NULL);
   do {
   } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
}
...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 16:47   ` Nhat Pham
@ 2026-06-10 16:48     ` Nhat Pham
  0 siblings, 0 replies; 7+ messages in thread
From: Nhat Pham @ 2026-06-10 16:48 UTC (permalink / raw)
  To: Zenghui Yu; +Cc: linux-mm, hannes, yosry, chengming.zhou, Shakeel Butt

On Wed, Jun 10, 2026 at 9:47 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
>
> Also, looking at the code a bit closer - if (!shrinker->flags &
> SHRINKER_MEMCG_AWARE), we shouldn't be getting into this loop at all
> and inducing all the memcg-related overhead at all...
>
> The code really should be structure as:
>
> if (shrinker->flags & SHRINKER_MEMCG_AWARE) {

As usual, i flip the conditionals :( But you get the idea...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 16:38 ` Nhat Pham
  2026-06-10 16:47   ` Nhat Pham
@ 2026-06-10 17:31   ` Shakeel Butt
  2026-06-10 18:38     ` Nhat Pham
  1 sibling, 1 reply; 7+ messages in thread
From: Shakeel Butt @ 2026-06-10 17:31 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
	roman.gushchin, qi.zheng

+Roman, Qi

On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> 
> Thanks for reporting, Zenghui.
> 
> 
> >
> > Hi all,
> >
> > The following splat was triggered on the mainline kernel:
> >
> >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> >  preempt_count: 0, expected: 0
> >  RCU nest depth: 1, expected: 0
> >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> >  Call trace:
> >   show_stack+0x18/0x24 (C)
> >   dump_stack_lvl+0x78/0x90
> >   dump_stack+0x18/0x24
> >   __might_resched+0x114/0x170
> >   __might_sleep+0x48/0x98
> >   css_rstat_flush+0x54/0x564
> >   mem_cgroup_flush_stats+0x9c/0xb0
> >   zswap_shrinker_count+0xe4/0x1e4
> >   shrinker_debugfs_count_show+0xd8/0x268
> 
> Ah, this seems a bit tricky.
> 
> Seems like shrinker_debugfs_count_show() is invoking
> zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> triggers a stats flushing, which might sleep. Not ideal.
> 
> Is the rcu_read_section() here to protect memcg or shrinker? For
> memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> memcg before returning.
> 
> (memcg maintainers please fact check me).

mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
read section for memcg.

> 
> If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> 
> rcu_read_lock();
> list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> {
>     if (!shrinker_try_get(shrinker))
>         continue;
>     rcu_read_unlock();
> }
> 
> But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:

Shouldn't the caller already holds the reference to the shrinker which it is
giving to this function? Does debugfs file entry holds a reference to the
shrinker which it is giving.

After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
shrinker_free_rcu_cb), so this rcu read section is against that.

I think we can simply use shrinker_try_get() here as Nhat said.

> 
> rcu_read_lock();
> memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
> 
> We get the shrinker reference outside of the rcu_read_section(), and
> just dereference it without any checking inside of the section.
> 
> I think we can just remove the rcu_read_(un)lock() here?
> 
> Long term, I still think we'd be better off getting rid of this stats
> flushing. Seems expensive either way.
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 17:31   ` Shakeel Butt
@ 2026-06-10 18:38     ` Nhat Pham
  2026-06-10 22:08       ` Shakeel Butt
  0 siblings, 1 reply; 7+ messages in thread
From: Nhat Pham @ 2026-06-10 18:38 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
	roman.gushchin, qi.zheng

On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> +Roman, Qi
>
> On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> >
> > Thanks for reporting, Zenghui.
> >
> >
> > >
> > > Hi all,
> > >
> > > The following splat was triggered on the mainline kernel:
> > >
> > >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > >  preempt_count: 0, expected: 0
> > >  RCU nest depth: 1, expected: 0
> > >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > >  Call trace:
> > >   show_stack+0x18/0x24 (C)
> > >   dump_stack_lvl+0x78/0x90
> > >   dump_stack+0x18/0x24
> > >   __might_resched+0x114/0x170
> > >   __might_sleep+0x48/0x98
> > >   css_rstat_flush+0x54/0x564
> > >   mem_cgroup_flush_stats+0x9c/0xb0
> > >   zswap_shrinker_count+0xe4/0x1e4
> > >   shrinker_debugfs_count_show+0xd8/0x268
> >
> > Ah, this seems a bit tricky.
> >
> > Seems like shrinker_debugfs_count_show() is invoking
> > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> > triggers a stats flushing, which might sleep. Not ideal.
> >
> > Is the rcu_read_section() here to protect memcg or shrinker? For
> > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> > memcg before returning.
> >
> > (memcg maintainers please fact check me).
>
> mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> read section for memcg.
>
> >
> > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> >
> > rcu_read_lock();
> > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> > {
> >     if (!shrinker_try_get(shrinker))
> >         continue;
> >     rcu_read_unlock();
> > }
> >
> > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
>
> Shouldn't the caller already holds the reference to the shrinker which it is
> giving to this function? Does debugfs file entry holds a reference to the
> shrinker which it is giving.
>
> After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> shrinker_free_rcu_cb), so this rcu read section is against that.
>
> I think we can simply use shrinker_try_get() here as Nhat said.

Hmm, so is this unsafe even with the current rcu shennanigans? What's
stopping shrinker to be freed by that callback before we enter
rcu_read_section()?

Seems like this is just implicitly correct - shrinker_debugfs_detach()
and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
shrinker_free_rcu_cb);, so if you're reading this file, then it's
before shrinker_free_rcu_cb() is even registered?

Do we still need rcu or shrinker_try_get() here?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
  2026-06-10 18:38     ` Nhat Pham
@ 2026-06-10 22:08       ` Shakeel Butt
  0 siblings, 0 replies; 7+ messages in thread
From: Shakeel Butt @ 2026-06-10 22:08 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Zenghui Yu, linux-mm, hannes, yosry, chengming.zhou,
	roman.gushchin, qi.zheng

On Wed, Jun 10, 2026 at 11:38:29AM -0700, Nhat Pham wrote:
> On Wed, Jun 10, 2026 at 10:31 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > +Roman, Qi
> >
> > On Wed, Jun 10, 2026 at 09:38:03AM -0700, Nhat Pham wrote:
> > > On Wed, Jun 10, 2026 at 9:05 AM Zenghui Yu <zenghui.yu@linux.dev> wrote:
> > >
> > > Thanks for reporting, Zenghui.
> > >
> > >
> > > >
> > > > Hi all,
> > > >
> > > > The following splat was triggered on the mainline kernel:
> > > >
> > > >  BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
> > > >  in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1126, name: cat
> > > >  preempt_count: 0, expected: 0
> > > >  RCU nest depth: 1, expected: 0
> > > >  CPU: 7 UID: 0 PID: 1126 Comm: cat Kdump: loaded Not tainted 7.1.0-rc7-00056-gacb7500801e9-dirty #304 PREEMPT
> > > >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > > >  Call trace:
> > > >   show_stack+0x18/0x24 (C)
> > > >   dump_stack_lvl+0x78/0x90
> > > >   dump_stack+0x18/0x24
> > > >   __might_resched+0x114/0x170
> > > >   __might_sleep+0x48/0x98
> > > >   css_rstat_flush+0x54/0x564
> > > >   mem_cgroup_flush_stats+0x9c/0xb0
> > > >   zswap_shrinker_count+0xe4/0x1e4
> > > >   shrinker_debugfs_count_show+0xd8/0x268
> > >
> > > Ah, this seems a bit tricky.
> > >
> > > Seems like shrinker_debugfs_count_show() is invoking
> > > zswap_shrinker_count() in rcu_read_section(). zswap_shrinker_count()
> > > triggers a stats flushing, which might sleep. Not ideal.
> > >
> > > Is the rcu_read_section() here to protect memcg or shrinker? For
> > > memcg, i dont think it's necessary, no? mem_cgroup_iter() pins the
> > > memcg before returning.
> > >
> > > (memcg maintainers please fact check me).
> >
> > mem_cgroup_iter() handles the lifetime of memcg, so there is no need for rcu
> > read section for memcg.
> >
> > >
> > > If this is for the shrinker think this needs to follow shrink_slab()'s pattern.:
> > >
> > > rcu_read_lock();
> > > list_for_each_entry_rcu(shrinker, &shrinker_list, list)
> > > {
> > >     if (!shrinker_try_get(shrinker))
> > >         continue;
> > >     rcu_read_unlock();
> > > }
> > >
> > > But OTOH, doesn't seem like rcu_read_section() is what keeping it safe:
> >
> > Shouldn't the caller already holds the reference to the shrinker which it is
> > giving to this function? Does debugfs file entry holds a reference to the
> > shrinker which it is giving.
> >
> > After looking at shrinker_free(), it has call_rcu(&shrinker->rcu,
> > shrinker_free_rcu_cb), so this rcu read section is against that.
> >
> > I think we can simply use shrinker_try_get() here as Nhat said.
> 
> Hmm, so is this unsafe even with the current rcu shennanigans? What's
> stopping shrinker to be freed by that callback before we enter
> rcu_read_section()?
> 
> Seems like this is just implicitly correct - shrinker_debugfs_detach()
> and shrinker_debugfs_remove() happens before call_rcu(&shrinker->rcu,
> shrinker_free_rcu_cb);, so if you're reading this file, then it's
> before shrinker_free_rcu_cb() is even registered?
> 
> Do we still need rcu or shrinker_try_get() here?

I think you are right that we don't need rcu or shrinker_try_get() but it is
more about an active debugfs file reader. Suppose we are sleeping within rstat
flush from shrinker_debugfs_count_show() and there is a parallel
shrinker_debugfs_remove() call.

shrinker_debugfs_remove calls debugfs_remove_recursive and deep in the stack
there is a call wait_for_completion(&fsd->active_users_drained) which will wait
for active users, one of which is sleeping within rstat flush.

So, let's simply remove rcu read here.

> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-10 22:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 16:05 [zswap?] BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421 Zenghui Yu
2026-06-10 16:38 ` Nhat Pham
2026-06-10 16:47   ` Nhat Pham
2026-06-10 16:48     ` Nhat Pham
2026-06-10 17:31   ` Shakeel Butt
2026-06-10 18:38     ` Nhat Pham
2026-06-10 22:08       ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox