public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Qi Zheng <qi.zheng@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>, shakeel.butt@linux.dev
Cc: syzbot <syzbot+7d60b33a8a546263da7c@syzkaller.appspotmail.com>,
	Liam.Howlett@oracle.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, ljs@kernel.org, surenb@google.com,
	syzkaller-bugs@googlegroups.com, vbabka@kernel.org,
	Muchun Song <songmuchun@bytedance.com>
Subject: Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page
Date: Mon, 27 Apr 2026 17:43:38 +0800	[thread overview]
Message-ID: <f0ab0c7a-c597-41a8-92b3-4424b12b1b1e@linux.dev> (raw)
In-Reply-To: <3591c663-a4a9-4c22-97cf-b58b2e7d8a41@linux.dev>



On 4/27/26 3:24 PM, Qi Zheng wrote:
> 
> 
> On 4/27/26 1:55 AM, Andrew Morton wrote:
>> On Sun, 26 Apr 2026 23:57:42 +0800 Qi Zheng <qi.zheng@linux.dev> wrote:
>>
>>> Hi Andrew,
>>>
>>> On 4/26/26 6:49 PM, Andrew Morton wrote:
>>>> On Sun, 26 Apr 2026 01:17:25 -0700 syzbot 
>>>> <syzbot+7d60b33a8a546263da7c@syzkaller.appspotmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> syzbot found the following issue on:
>>>>>
>>>>> HEAD commit:    6596a02b2078 Merge tag 'drm-next-2026-04-22' of 
>>>>> https://gi..
>>>>> git tree:       upstream
>>>>> console output: https://syzkaller.appspot.com/x/log.txt? 
>>>>> x=12483702580000
>>>>> kernel config:  https://syzkaller.appspot.com/x/.config? 
>>>>> x=24c8da4692f901cb
>>>>> dashboard link: https://syzkaller.appspot.com/bug? 
>>>>> extid=7d60b33a8a546263da7c
>>>>> compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils 
>>>>> for Debian) 2.44
>>>>> userspace arch: i386
>>>>>
>>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>
>>>> argh, that dreaded sentence.
>>>>
>>>> Thanks.
>>>>
>>>> Something's definitely amiss.  This is at least the fifth report of
>>>> rcu_read_lock() imbalance post-7.0.  Others:
>>>>
>>>> https://lore.kernel.org/69eab803.a00a0220.17a17.004a.GAE@google.com
>>>> https://lore.kernel.org/69eab803.a00a0220.17a17.004b.GAE@google.com
>>>> https://lore.kernel.org/69eafb0e.a00a0220.9259.0031.GAE@google.com
>>>> https://lore.kernel.org/69ebcbe2.a00a0220.7773.0005.GAE@google.com
>>>
>>> All the kernel configs mentioned above include 'CONFIG_MEMCG_V1=y'.
>>>
>>> Theoretically, a rebind_subsystems() can lead a rcu unbalance, see my
>>> previous discussion with Shakeel for details:
>>>
>>> https://lore.kernel.org/all/358c60e1- 
>>> fa91-40a1-9e00-84c93340c04e@linux.dev/
>>
>> Right, that looks similar.
>>
>> The rcu locking under lruvec_stat_mod_folio() is very simple, and that
>> return in get_non_dying_memcg_end() does look super suspicious.  Why
>> does it omit the unlock?
>>
>> otoh, in
>> https://lore.kernel.org/all/69eafb0e.a00a0220.9259.0031.GAE@google.com/
>> we're trying to release an rcu_read_lock() which isn't presently held.
>> But if cgroup_subsys_on_dfl() were to become false between the
>> get_non_dying_memcg_start/end pair, that's what would happen.
>>
>> So yup, I agree, concurrent rebind_subsystems() activity could cause
>> all of this.  The reports are pretty common - is there some debugging
>> patch we can temporarily add to confirm this theory?  And/or is it
>> possible to cook up a selftest which will trigger this?
> 
> I've been trying to reproduce this locally, but unfortunately I haven't
> succeeded yet.

Alright, it seems I have successfully reproduced it:
(The reproducer is attached at the bottom of this email.)

[   43.883623][  T270] mod_memcg_lruvec_state: key_on_dfl=0 rcu_locked=0 
depth_before=2 depth_now=2
[   43.884267][  T270] ------------[ cut here ]------------
[   43.884663][  T270] WARNING: mm/memcontrol.c:850 at 
mod_memcg_lruvec_state+0x94/0x130, CPU#0: memcg-repro/270
[   43.885375][  T270] Modules linked in:
[   43.885704][  T270] CPU: 0 UID: 0 PID: 270 Comm: memcg-repro Tainted: 
G        W           7.0.0-next-20260420+ #
[   43.886554][  T270] Tainted: [W]=WARN
[   43.886833][  T270] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS 1.12.0-1 04/01/2014
[   43.887490][  T270] RIP: 0010:mod_memcg_lruvec_state+0x94/0x130
[   43.887932][  T270] Code: 5c 41 5d 41 5e 41 5f e9 4a 52 a3 00 48 8d 
b3 58 09 00 00 b9 0c 00 00 00 48 c7 c7 72 de f
[   43.889319][  T270] RSP: 0000:ffffc900041bfc38 EFLAGS: 00010246
[   43.889763][  T270] RAX: 0000000000000000 RBX: ffff888104619bc0 RCX: 
0000000000000000
[   43.890332][  T270] RDX: 0000000000000619 RSI: ffff88810461a524 RDI: 
ffffffff827bde7e
[   43.890908][  T270] RBP: 0000000000000001 R08: ffffffff83549028 R09: 
0000000000000001
[   43.891481][  T270] R10: ffffffffffffdfff R11: ffffc900041bfa78 R12: 
0000000000000011
[   43.892051][  T270] R13: ffff8882bfffa1c0 R14: 0000000000000002 R15: 
ffff88810203a7c0
[   43.892629][  T270] FS:  00007f73c4641740(0000) 
GS:ffff8883324cb000(0000) knlGS:0000000000000000
[   43.893262][  T270] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.893737][  T270] CR2: 00005590e4eb8000 CR3: 00000001040d2000 CR4: 
00000000000006f0
[   43.894300][  T270] Call Trace:
[   43.894548][  T270]  <TASK>
[   43.894767][  T270]  lruvec_stat_mod_folio+0xc2/0x1a0
[   43.895138][  T270]  __folio_mod_stat+0x25/0x80
[   43.895483][  T270]  folio_add_new_anon_rmap+0xb1/0x2b0
[   43.895880][  T270]  map_anon_folio_pte_nopf+0xa3/0x120
[   43.896267][  T270]  do_pte_missing+0xad5/0xb40
[   43.896620][  T270]  __handle_mm_fault+0x80e/0xcd0
[   43.896983][  T270]  handle_mm_fault+0x146/0x310
[   43.897332][  T270]  do_user_addr_fault+0x303/0x880
[   43.897708][  T270]  exc_page_fault+0x9b/0x270
[   43.898042][  T270]  asm_exc_page_fault+0x26/0x30
[   43.898387][  T270] RIP: 0033:0x5590e4eb41ea
[   43.898722][  T270] Code: 61 cc 66 0f 6f e0 66 0f 61 c2 66 0f db cd 
66 0f 69 e2 66 0f 6f d0 66 0f 69 d4 66 0f 61 0
[   43.900107][  T270] RSP: 002b:00007ffcad25f030 EFLAGS: 00010202
[   43.900546][  T270] RAX: 00005590e4eb8010 RBX: 00007ffcad260f7d RCX: 
00007f73c474d44d
[   43.901114][  T270] RDX: 00005590e4eb80a0 RSI: 00005590e4eb503c RDI: 
000000000000000f
[   43.901691][  T270] RBP: 00005590e4eb70a0 R08: 0000000000000000 R09: 
00007f73c483a680
[   43.902257][  T270] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000000000
[   43.902831][  T270] R13: 00007ffcad25f180 R14: 00005590e4eb6dd8 R15: 
00007f73c4869020
[   43.903407][  T270]  </TASK>
[   43.903637][  T270] irq event stamp: 2919
[   43.903933][  T270] hardirqs last  enabled at (2927): 
[<ffffffff8137acfe>] __up_console_sem+0x5e/0x70
[   43.904605][  T270] hardirqs last disabled at (2936): 
[<ffffffff8137ace3>] __up_console_sem+0x43/0x70
[   43.905264][  T270] softirqs last  enabled at (2048): 
[<ffffffff812c7f1e>] handle_softirqs+0x38e/0x460
[   43.905952][  T270] softirqs last disabled at (2031): 
[<ffffffff812c84c9>] irq_exit_rcu+0xe9/0x160
[   43.906606][  T270] ---[ end trace 0000000000000000 ]---
[   43.907004][  T270]
[   43.907174][  T270] =====================================
[   43.907565][  T270] WARNING: bad unlock balance detected!
[   43.907954][  T270] 7.0.0-next-20260420+ #83 Tainted: G        W
[   43.908450][  T270] -------------------------------------
[   43.908845][  T270] memcg-repro/270 is trying to release lock 
(rcu_read_lock) at:
[   43.909382][  T270] [<ffffffff815f57f7>] rcu_read_unlock+0x17/0x60
[   43.909830][  T270] but there are no more locks to release!
[   43.910234][  T270]
[   43.910234][  T270] other info that might help us debug this:
[   43.910807][  T270] 1 lock held by memcg-repro/270:
[   43.911163][  T270]  #0: ffff888102fa2088 (vm_lock){++++}-{0:0}, at: 
do_user_addr_fault+0x285/0x880
[   43.911820][  T270]
[   43.911820][  T270] stack backtrace:
[   43.912237][  T270] CPU: 0 UID: 0 PID: 270 Comm: memcg-repro Tainted: 
G        W           7.0.0-next-20260420+ #
[   43.912239][  T270] Tainted: [W]=WARN
[   43.912240][  T270] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS 1.12.0-1 04/01/2014
[   43.912240][  T270] Call Trace:
[   43.912241][  T270]  <TASK>
[   43.912242][  T270]  ? rcu_read_unlock+0x17/0x60
[   43.912244][  T270]  dump_stack_lvl+0x77/0xb0
[   43.912248][  T270]  print_unlock_imbalance_bug+0xe0/0xf0
[   43.912251][  T270]  ? rcu_read_unlock+0x17/0x60
[   43.912253][  T270]  lock_release+0x21d/0x2a0
[   43.912256][  T270]  rcu_read_unlock+0x1c/0x60
[   43.912258][  T270]  do_pte_missing+0x233/0xb40
[   43.912260][  T270]  __handle_mm_fault+0x80e/0xcd0
[   43.912265][  T270]  handle_mm_fault+0x146/0x310
[   43.912268][  T270]  do_user_addr_fault+0x303/0x880
[   43.912271][  T270]  exc_page_fault+0x9b/0x270
[   43.912273][  T270]  asm_exc_page_fault+0x26/0x30
[   43.912274][  T270] RIP: 0033:0x5590e4eb41ea
[   43.912276][  T270] Code: 61 cc 66 0f 6f e0 66 0f 61 c2 66 0f db cd 
66 0f 69 e2 66 0f 6f d0 66 0f 69 d4 66 0f 61 0
[   43.912277][  T270] RSP: 002b:00007ffcad25f030 EFLAGS: 00010202
[   43.912278][  T270] RAX: 00005590e4eb8010 RBX: 00007ffcad260f7d RCX: 
00007f73c474d44d
[   43.912278][  T270] RDX: 00005590e4eb80a0 RSI: 00005590e4eb503c RDI: 
000000000000000f
[   43.912279][  T270] RBP: 00005590e4eb70a0 R08: 0000000000000000 R09: 
00007f73c483a680
[   43.912280][  T270] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000000000
[   43.912280][  T270] R13: 00007ffcad25f180 R14: 00005590e4eb6dd8 R15: 
00007f73c4869020
[   43.912284][  T270]  </TASK>
[   43.923741][  T270] ------------[ cut here ]------------
[   43.924127][  T270] WARNING: kernel/rcu/tree_plugin.h:443 at 
__rcu_read_unlock+0x117/0x210, CPU#0: memcg-repro/270
[   43.924968][  T270] Modules linked in:
[   43.925251][  T270] CPU: 0 UID: 0 PID: 270 Comm: memcg-repro Tainted: 
G        W           7.0.0-next-20260420+ #
[   43.926102][  T270] Tainted: [W]=WARN
[   43.926376][  T270] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS 1.12.0-1 04/01/2014
[   43.927038][  T270] RIP: 0010:__rcu_read_unlock+0x117/0x210
[   43.927469][  T270] Code: 68 56 83 01 00 00 00 bf 09 00 00 00 e8 62 
da f1 ff 4d 85 ed 0f 84 27 ff ff ff e8 24 f7 5
[   43.928861][  T270] RSP: 0000:ffffc900041bfcf8 EFLAGS: 00010286
[   43.929292][  T270] RAX: 00000000ffffffff RBX: ffff888104619bc0 RCX: 
0000000000000027
[   43.929876][  T270] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffff8882b5a19780
[   43.930431][  T270] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000001
[   43.931012][  T270] R10: ffffffffffffdfff R11: ffffc900041bf920 R12: 
ffff8881000f3ac0
[   43.931611][  T270] R13: 00005590e4eb8000 R14: 0000000000000001 R15: 
ffff888102fa2000
[   43.932188][  T270] FS:  00007f73c4641740(0000) 
GS:ffff8883324cb000(0000) knlGS:0000000000000000
[   43.932838][  T270] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.933301][  T270] CR2: 00005590e4eb8000 CR3: 00000001040d2000 CR4: 
00000000000006f0
[   43.933882][  T270] Call Trace:
[   43.934124][  T270]  <TASK>
[   43.934472][  T270]  do_pte_missing+0x233/0xb40
[   43.935004][  T270]  __handle_mm_fault+0x80e/0xcd0
[   43.935953][  T270]  handle_mm_fault+0x146/0x310
[   43.936462][  T270]  do_user_addr_fault+0x303/0x880
[   43.937078][  T270]  exc_page_fault+0x9b/0x270
[   43.937552][  T270]  asm_exc_page_fault+0x26/0x30
[   43.937918][  T270] RIP: 0033:0x5590e4eb41ea
[   43.938246][  T270] Code: 61 cc 66 0f 6f e0 66 0f 61 c2 66 0f db cd 
66 0f 69 e2 66 0f 6f d0 66 0f 69 d4 66 0f 61 0
[   43.939645][  T270] RSP: 002b:00007ffcad25f030 EFLAGS: 00010202
[   43.940075][  T270] RAX: 00005590e4eb8010 RBX: 00007ffcad260f7d RCX: 
00007f73c474d44d
[   43.940644][  T270] RDX: 00005590e4eb80a0 RSI: 00005590e4eb503c RDI: 
000000000000000f
[   43.941210][  T270] RBP: 00005590e4eb70a0 R08: 0000000000000000 R09: 
00007f73c483a680
[   43.941786][  T270] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000000000
[   43.942351][  T270] R13: 00007ffcad25f180 R14: 00005590e4eb6dd8 R15: 
00007f73c4869020
[   43.943383][  T270]  </TASK>
[   43.943620][  T270] irq event stamp: 2975
[   43.943912][  T270] hardirqs last  enabled at (2975): 
[<ffffffff81312500>] raw_spin_rq_unlock_irq+0x10/0x30
[   43.944626][  T270] hardirqs last disabled at (2974): 
[<ffffffff820e83e5>] __schedule+0xd35/0x1df0
[   43.945270][  T270] softirqs last  enabled at (2048): 
[<ffffffff812c7f1e>] handle_softirqs+0x38e/0x460
[   43.945956][  T270] softirqs last disabled at (2031): 
[<ffffffff812c84c9>] irq_exit_rcu+0xe9/0x160
[   43.946625][  T270] ---[ end trace 0000000000000000 ]---

> 
>>
>>> However, in a production environment, this is practically impossible.
>>
>> Can you expand on this?
>>
>> sysbot isn't a production environment ;)
> 
> Rebinding only works when the hierarchy is completely empty. This is
> generally not the case in a production environment (e.g. when systemd
> is used).
> 
> BTW, it seems rebinding is about to be deprecated:
> 
> cgroup1_reconfigure
> --> pr_warn("option changes via remount are deprecated (pid=%d comm=%s)\n",
>              task_tgid_nr(current), current->comm);
> 
> Also, it appears the current memcg subsystem assumes that
> cgroup_subsys_on_dfl(memory_cgrp_subsys) cannot be changed at runtime.
> (Please correct me if I missed anything.)
> 
> If we can get a reproducer, we can try the following fix, or simply drop
> rebinding altogether?
> 
>  From 6ae41b91339625dd7bf0f819f775f26e78171a73 Mon Sep 17 00:00:00 2001
> From: Qi Zheng <zhengqi.arch@bytedance.com>
> Date: Mon, 27 Apr 2026 11:20:21 +0800
> Subject: [PATCH] mm: memcontrol: fix rcu unbalance in
>   get_non_dying_memcg_end()
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>   mm/memcontrol.c | 30 ++++++++++++++++++++----------
>   1 file changed, 20 insertions(+), 10 deletions(-)

With the above patch applied, the warnings are gone.

If no one objects, I'll submit the formal fix. Or should we actually
just remove rebinding instead?

Thanks,
Qi

=====
Repro
=====


kernel diff
-----------
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c3d98ab41f1f1..419883a483e32 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -36,6 +36,7 @@
  #include <linux/pagemap.h>
  #include <linux/folio_batch.h>
  #include <linux/vm_event_item.h>
+#include <linux/delay.h>
  #include <linux/smp.h>
  #include <linux/page-flags.h>
  #include <linux/backing-dev.h>
@@ -805,6 +806,28 @@ static long memcg_state_val_in_pages(int idx, long val)
   * Used in mod_memcg_state() and mod_memcg_lruvec_state() to avoid 
race with
   * reparenting of non-hierarchical state_locals.
   */
+static __always_inline bool memcg_rcu_repro_task(void)
+{
+       return !strncmp(current->comm, "memcg-repro", TASK_COMM_LEN);
+}
+
+static noinline void memcg_rcu_repro_pause(void)
+{
+       if (memcg_rcu_repro_task())
+               mdelay(200);
+}
+
+static noinline void memcg_rcu_repro_check(const char *site, int 
depth_before)
+{
+       bool key_on_dfl = cgroup_subsys_on_dfl(memory_cgrp_subsys);
+       bool rcu_locked = rcu_preempt_depth() != depth_before;
+
+       WARN_ON_ONCE(memcg_rcu_repro_task() && key_on_dfl == rcu_locked);
+       if (memcg_rcu_repro_task() && key_on_dfl == rcu_locked)
+               pr_warn("%s: key_on_dfl=%d rcu_locked=%d depth_before=%d 
depth_now=%d\n",
+                       site, key_on_dfl, rcu_locked, depth_before, 
rcu_preempt_depth());
+}
+
  static inline struct mem_cgroup *get_non_dying_memcg_start(struct 
mem_cgroup *memcg)
  {
         if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
@@ -865,10 +888,15 @@ static void __mod_memcg_state(struct mem_cgroup 
*memcg,
  void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
                        int val)
  {
+       int depth_before;
+
         if (mem_cgroup_disabled())
                 return;

+       depth_before = rcu_preempt_depth();
         memcg = get_non_dying_memcg_start(memcg);
+       memcg_rcu_repro_pause();
+       memcg_rcu_repro_check(__func__, depth_before);
         __mod_memcg_state(memcg, idx, val);
         get_non_dying_memcg_end();
  }
@@ -932,10 +960,14 @@ static void mod_memcg_lruvec_state(struct lruvec 
*lruvec,
  {
         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
         struct mem_cgroup_per_node *pn;
+       int depth_before;
         struct mem_cgroup *memcg;

         pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+       depth_before = rcu_preempt_depth();
         memcg = get_non_dying_memcg_start(pn->memcg);
+       memcg_rcu_repro_pause();
+       memcg_rcu_repro_check(__func__, depth_before);
         pn = memcg->nodeinfo[pgdat->node_id];

         __mod_memcg_lruvec_state(pn, idx, val);


/root/memcg-rcu-unbalance-repro.c
---------------------------------
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <linux/prctl.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

static void die(const char *msg)
{
         perror(msg);
         exit(1);
}

static void ensure_parent_dir(const char *path)
{
         char tmp[PATH_MAX];
         char *slash;

         if (strlen(path) >= sizeof(tmp))
                 die("path too long");

         strcpy(tmp, path);
         slash = strrchr(tmp, '/');
         if (!slash)
                 return;

         while (slash > tmp && *slash == '/')
                 *slash-- = '\0';
         if (slash < tmp)
                 return;
         *++slash = '\0';

         for (slash = tmp + 1; *slash; slash++) {
                 if (*slash != '/')
                         continue;
                 *slash = '\0';
                 if (mkdir(tmp, 0755) < 0 && errno != EEXIST)
                         die("mkdir");
                 *slash = '/';
         }

         if (mkdir(tmp, 0755) < 0 && errno != EEXIST)
                 die("mkdir");
}

static void reset_file(int fd, off_t *off)
{
         if (ftruncate(fd, 0) < 0)
                 die("ftruncate");
         *off = 0;
}

static void socket_roundtrip(int txfd, int rxfd, const void *buf, size_t 
len)
{
         char rxbuf[4096];
         ssize_t n;

         for (;;) {
                 n = send(txfd, buf, len, 0);
                 if (n >= 0)
                         break;
                 if (errno != EINTR)
                         die("send");
         }
         if ((size_t)n != len) {
                 errno = EIO;
                 die("send");
         }

         for (;;) {
                 n = recv(rxfd, rxbuf, sizeof(rxbuf), 0);
                 if (n >= 0)
                         break;
                 if (errno != EINTR)
                         die("recv");
         }
         if ((size_t)n != len) {
                 errno = EIO;
                 die("recv");
         }
}

int main(int argc, char **argv)
{
         const char *path = argc > 1 ? argv[1] : 
"/tmp/memcg-rcu-repro.file";
         static char buf[4096];
         off_t off = 0;
         off_t max = 16LL * 1024 * 1024;
         int fd;
         int sv[2];
         int i;

         if (prctl(PR_SET_NAME, "memcg-repro", 0, 0, 0) < 0)
                 die("prctl(PR_SET_NAME)");

         for (i = 0; i < (int)sizeof(buf); i++)
                 buf[i] = (char)i;

         ensure_parent_dir(path);
         fd = open(path, O_CREAT | O_RDWR | O_TRUNC, 0600);
         if (fd < 0)
                 die("open");
         if (socketpair(AF_UNIX, SOCK_DGRAM, 0, sv) < 0)
                 die("socketpair");

         for (;;) {
                 ssize_t n = pwrite(fd, buf, sizeof(buf), off);

                 if (n != (ssize_t)sizeof(buf)) {
                         if (n < 0 && errno == EINTR)
                                 continue;
                         if (n < 0 && (errno == ENOSPC || errno == 
EDQUOT)) {
                                 reset_file(fd, &off);
                                 continue;
                         }
                         die("pwrite");
                 }

                 off += sizeof(buf);
                 if ((off & ((1 << 20) - 1)) == 0) {
                         if (fsync(fd) < 0) {
                                 if (errno == EINTR)
                                         continue;
                                 if (errno == ENOSPC || errno == EDQUOT) {
                                         reset_file(fd, &off);
                                         continue;
                                 }
                                 die("fsync");
                         }
                 }

                 if (off >= max)
                         reset_file(fd, &off);

                 for (i = 0; i < 16; i++)
                         socket_roundtrip(sv[0], sv[1], buf, sizeof(buf));
         }
}

/root/memcg-rcu-unbalance-repro.sh
----------------------------------

#!/bin/sh
set -eu

WORKER_SRC="/root/memcg-rcu-unbalance-repro.c"
WORKER_BIN="/root/memcg-rcu-unbalance-repro"
WORKER_BIN_FALLBACK="/tmp/memcg-rcu-unbalance-repro"
WORKDIR="/tmp/memcg-rcu-repro"
CGV2_PROBE_MNT="$WORKDIR/cgv2-probe"
DATA_FILE="$WORKDIR/repro.file"
CG_MNT="/sys/fs/cgroup"
REPRO_HIER_NAME="memcg-rcu-repro"
RESTORE_CGROUP2_ON_EXIT=0
WORKER_CPU=""
V1_HOLD_MS="${V1_HOLD_MS:-800}"
V2_HOLD_MS="${V2_HOLD_MS:-50}"

need_root() {
     if [ "$(id -u)" -ne 0 ]; then
         echo "must run as root" >&2
         exit 1
     fi
}

is_mounted() {
     grep -Fqs " $1 " /proc/self/mountinfo
}

mount_fstype() {
     awk -v mountpoint="$1" '
         $5 == mountpoint {
             for (i = 1; i <= NF; i++) {
                 if ($i == "-") {
                     print $(i + 1)
                     exit
                 }
             }
         }
     ' /proc/self/mountinfo
}

setup_early_boot_env() {
     mount -o remount,rw / >/dev/null 2>&1 || true

     [ -d /proc ] || mkdir -p /proc
     [ -d /sys ] || mkdir -p /sys
     [ -d /dev ] || mkdir -p /dev
     [ -d /tmp ] || mkdir -p /tmp

     is_mounted /proc || mount -t proc proc /proc
     is_mounted /sys || mount -t sysfs sysfs /sys

     if ! is_mounted /dev && grep -qw devtmpfs /proc/filesystems 
2>/dev/null; then
         mount -t devtmpfs devtmpfs /dev >/dev/null 2>&1 || true
     fi
}

need_memory_controller() {
     if [ -r /proc/cgroups ] &&
        awk '$1 == "memory" && $4 == 1 { found = 1 } END { exit found ? 
0 : 1 }' /proc/cgroups; then
         return 0
     fi

     echo "memory controller not available; expected an enabled memory 
entry in /proc/cgroups" >&2
     exit 1
}

count_child_cgroups() {
     mountpoint="$1"
     count=0

     for d in "$mountpoint"/*; do
         [ -d "$d" ] || continue
         count=$((count + 1))
     done

     echo "$count"
}

umount_if_mounted() {
     if is_mounted "$1"; then
         umount "$1"
     fi
}

mount_cgroup2_probe() {
     if [ "$(mount_fstype "$CG_MNT")" = "cgroup2" ]; then
         echo "$CG_MNT"
         return 0
     fi

     umount_if_mounted "$CGV2_PROBE_MNT"
     mount -t cgroup2 none "$CGV2_PROBE_MNT"
     echo "$CGV2_PROBE_MNT"
}

mount_named_cgroup1_root() {
     umount_if_mounted "$CG_MNT"
     mount -t cgroup -o "none,name=$REPRO_HIER_NAME" none "$CG_MNT"
}

remount_memory_to_v1() {
     mount -t cgroup -o "remount,memory,name=$REPRO_HIER_NAME" none 
"$CG_MNT"
}

remount_memory_to_v2() {
     mount -t cgroup -o "remount,none,name=$REPRO_HIER_NAME" none "$CG_MNT"
}

sleep_ms() {
     ms="$1"

     if [ "$ms" -le 0 ]; then
         return 0
     fi

     if command -v usleep >/dev/null 2>&1; then
         usleep $((ms * 1000))
         return 0
     fi

     if command -v busybox >/dev/null 2>&1 && busybox usleep 1000 
 >/dev/null 2>&1; then
         busybox usleep $((ms * 1000))
         return 0
     fi

     if [ $((ms % 1000)) -eq 0 ]; then
         sleep $((ms / 1000))
         return 0
     fi

     sleep "$(printf '%d.%03d' $((ms / 1000)) $((ms % 1000)))"
}

cleanup() {
     set +e
     if [ -n "${WORKER_PID:-}" ]; then
         kill "$WORKER_PID" 2>/dev/null || true
         wait "$WORKER_PID" 2>/dev/null || true
     fi
     umount_if_mounted "$CGV2_PROBE_MNT"
     if [ "$RESTORE_CGROUP2_ON_EXIT" -eq 1 ]; then
         umount_if_mounted "$CG_MNT"
         mount -t cgroup2 none "$CG_MNT" >/dev/null 2>&1 || true
     fi
}

prepare_worker() {
     if [ -x "$WORKER_BIN" ]; then
         return 0
     fi

     if [ -x "$WORKER_BIN_FALLBACK" ]; then
         WORKER_BIN="$WORKER_BIN_FALLBACK"
         return 0
     fi

     if ! command -v cc >/dev/null 2>&1; then
         echo "no usable worker binary and no compiler in current 
environment" >&2
         echo "prebuild it before reboot with:" >&2
         echo "  cc -O2 -Wall -Wextra -o $WORKER_BIN $WORKER_SRC" >&2
         exit 1
     fi

     if cc -O2 -Wall -Wextra -o "$WORKER_BIN" "$WORKER_SRC"; then
         return 0
     fi

     echo "failed to compile worker in early-boot shell" >&2
     echo "prebuild it before reboot with:" >&2
     echo "  cc -O2 -Wall -Wextra -o $WORKER_BIN $WORKER_SRC" >&2
     exit 1
}

wait_for_worker_ready() {
     tries=0

     while [ "$tries" -lt 5 ]; do
         if kill -0 "$WORKER_PID" 2>/dev/null &&
            [ -r "/proc/$WORKER_PID/comm" ] &&
            grep -qx "memcg-repro" "/proc/$WORKER_PID/comm" &&
            [ -s "$DATA_FILE" ]; then
             return 0
         fi
         tries=$((tries + 1))
         sleep 1
     done

     echo "worker failed to become ready before remount loop" >&2
     if [ -r "/proc/$WORKER_PID/comm" ]; then
         echo "worker pid=$WORKER_PID comm=$(cat 
"/proc/$WORKER_PID/comm")" >&2
     else
         echo "worker pid=$WORKER_PID is not alive" >&2
     fi
     exit 1
}

need_root
setup_early_boot_env
mkdir -p "$WORKDIR" "$CGV2_PROBE_MNT"
trap cleanup EXIT INT TERM

if [ ! -d "$CG_MNT" ]; then
     mkdir -p "$CG_MNT"
fi

need_memory_controller
CGV2_CHECK_MNT="$(mount_cgroup2_probe)"
if [ ! -r "$CGV2_CHECK_MNT/cgroup.controllers" ] ||
    ! grep -qw memory "$CGV2_CHECK_MNT/cgroup.controllers"; then
     echo "memory controller is not on the default cgroup v2 hierarchy 
before repro" >&2
     echo "run this in early boot before anything binds memory to a 
legacy v1 hierarchy" >&2
     exit 1
fi

child_count="$(count_child_cgroups "$CGV2_CHECK_MNT")"
if [ "$child_count" -ne 0 ]; then
     echo "cgroup2 root already has child cgroups; memory rebind to v1 
will likely hit -EBUSY" >&2
     echo "run this in a minimal initramfs or early-boot shell with no 
non-root cgroups" >&2
     exit 1
fi

if [ "$CGV2_CHECK_MNT" = "$CGV2_PROBE_MNT" ]; then
     umount_if_mounted "$CGV2_PROBE_MNT"
fi

mount_named_cgroup1_root
RESTORE_CGROUP2_ON_EXIT=1

prepare_worker

if command -v nproc >/dev/null 2>&1 && command -v taskset >/dev/null 
2>&1; then
     if [ "$(nproc)" -ge 2 ]; then
         taskset -pc 1 $$ >/dev/null 2>&1 || true
         WORKER_CPU="0"
     else
         WORKER_CPU=""
     fi
else
     WORKER_CPU=""
fi

echo "apply the kernel patch in /root/memcg-rcu-unbalance-repro.patch 
before running this script"
echo "recommended kernel config: CONFIG_MEMCG=y CONFIG_MEMCG_V1=y 
CONFIG_PREEMPT_RCU=y"
echo "recommended boot param: panic_on_warn=1"
echo "worker binary: $WORKER_BIN"
echo "repro hierarchy: name=$REPRO_HIER_NAME mountpoint=$CG_MNT"
echo "remount cadence: v2=${V2_HOLD_MS}ms v1=${V1_HOLD_MS}ms"

if [ -n "$WORKER_CPU" ]; then
     taskset -c "$WORKER_CPU" "$WORKER_BIN" "$DATA_FILE" &
else
     "$WORKER_BIN" "$DATA_FILE" &
fi
WORKER_PID=$!
wait_for_worker_ready

echo "worker pid=$WORKER_PID comm=$(cat "/proc/$WORKER_PID/comm") 
data_file=$DATA_FILE"
echo "cgroup v1 remount/rebind loop starting; watch dmesg for:"
echo "  option changes via remount are deprecated"
echo "  mod_memcg_state: key_on_dfl=0 rcu_locked=0 depth_before=0 
depth_now=0"
echo "  WARN.*memcg_rcu_repro_check"
echo "  Voluntary context switch within RCU read-side critical section"
echo "  rcu_read_unlock.*underflow / bad unlock"

i=0
while :; do
     i=$((i + 1))
     remount_memory_to_v2
     sleep_ms "$V2_HOLD_MS"
     remount_memory_to_v1
     sleep_ms "$V1_HOLD_MS"
     if [ $((i % 10)) -eq 0 ]; then
         echo "completed $i rebind cycles"
     fi
done


  reply	other threads:[~2026-04-27  9:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-26  8:17 [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page syzbot
2026-04-26 10:49 ` Andrew Morton
2026-04-26 15:57   ` Qi Zheng
2026-04-26 17:55     ` Andrew Morton
2026-04-27  7:24       ` Qi Zheng
2026-04-27  9:43         ` Qi Zheng [this message]
2026-04-27 10:44           ` Andrew Morton
2026-04-27 10:57             ` Qi Zheng
2026-04-27 10:43         ` Andrew Morton
2026-04-27 10:54           ` Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0ab0c7a-c597-41a8-92b3-4424b12b1b1e@linux.dev \
    --to=qi.zheng@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=surenb@google.com \
    --cc=syzbot+7d60b33a8a546263da7c@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox