* [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page
@ 2026-04-26 8:17 syzbot
2026-04-26 10:49 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: syzbot @ 2026-04-26 8:17 UTC (permalink / raw)
To: Liam.Howlett, akpm, linux-kernel, linux-mm, ljs, shakeel.butt,
surenb, syzkaller-bugs, vbabka
Hello,
syzbot found the following issue on:
HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of https://gi..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12483702580000
kernel config: https://syzkaller.appspot.com/x/.config?x=24c8da4692f901cb
dashboard link: https://syzkaller.appspot.com/bug?extid=7d60b33a8a546263da7c
compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
userspace arch: i386
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-6596a02b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c5a50c04af50/vmlinux-6596a02b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/70da0dbf8561/bzImage-6596a02b.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+7d60b33a8a546263da7c@syzkaller.appspotmail.com
=====================================
WARNING: bad unlock balance detected!
syzkaller #0 Not tainted
-------------------------------------
dhcpcd-run-hook/5941 is trying to release lock (rcu_read_lock) at:
[<ffffffff8258a32d>] rcu_read_unlock+0x2d/0xb0 include/linux/rcupdate.h:867
but there are no more locks to release!
other info that might help us debug this:
1 lock held by dhcpcd-run-hook/5941:
#0: ffff8880440f3d48 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0x11d/0x590 mm/mmap_lock.c:310
stack backtrace:
CPU: 2 UID: 0 PID: 5941 Comm: dhcpcd-run-hook Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
print_unlock_imbalance_bug.part.0+0xfb/0x106 kernel/locking/lockdep.c:5298
print_unlock_imbalance_bug kernel/locking/lockdep.c:5278 [inline]
__lock_release kernel/locking/lockdep.c:5537 [inline]
lock_release kernel/locking/lockdep.c:5889 [inline]
lock_release+0x28d/0x310 kernel/locking/lockdep.c:5875
rcu_read_unlock+0x32/0xb0 include/linux/rcupdate.h:867
pte_unmap include/linux/pgtable.h:117 [inline]
wp_page_copy mm/memory.c:3960 [inline]
do_wp_page+0x13d7/0x4350 mm/memory.c:4320
handle_pte_fault mm/memory.c:6427 [inline]
__handle_mm_fault+0x1ab6/0x2a00 mm/memory.c:6549
handle_mm_fault+0x36d/0xa20 mm/memory.c:6718
do_user_addr_fault+0x5a3/0x12f0 arch/x86/mm/fault.c:1334
handle_page_fault arch/x86/mm/fault.c:1474 [inline]
exc_page_fault+0x6f/0xd0 arch/x86/mm/fault.c:1527
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
RIP: 0033:0x7f7501c33f87
Code: 5c 25 28 49 8b 57 10 48 85 db 74 27 8b 74 24 0c 23 73 08 44 39 f6 75 16 48 39 c2 75 05 e8 62 ff ff ff 48 8b 53 10 48 83 c0 08 <48> 89 50 f8 48 8b 1b eb d0 49 83 c4 08 49 81 fc 38 01 00 00 75 be
RSP: 002b:00007ffd29921270 EFLAGS: 00010212
RAX: 00005645dcc81140 RBX: 00005645dcc716d0 RCX: 0000000000000002
RDX: 00007ffd29925f83 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000008
R13: 00005645dcc70e60 R14: 0000000000000001 R15: 00005645dcc70c30
</TASK>
------------[ cut here ]------------
rrln < 0 || rrln > RCU_NEST_PMAX
WARNING: kernel/rcu/tree_plugin.h:443 at __rcu_read_unlock kernel/rcu/tree_plugin.h:443 [inline], CPU#3: dhcpcd-run-hook/5941
WARNING: kernel/rcu/tree_plugin.h:443 at __rcu_read_unlock+0x235/0x5e0 kernel/rcu/tree_plugin.h:430, CPU#3: dhcpcd-run-hook/5941
Modules linked in:
CPU: 3 UID: 0 PID: 5941 Comm: dhcpcd-run-hook Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:__rcu_read_unlock kernel/rcu/tree_plugin.h:443 [inline]
RIP: 0010:__rcu_read_unlock+0x235/0x5e0 kernel/rcu/tree_plugin.h:430
Code: 74 11 c7 45 58 01 00 00 00 bf 09 00 00 00 e8 12 a5 da ff e8 9d e2 22 00 9c 58 f6 c4 02 0f 85 dd 02 00 00 fb e9 57 fe ff ff 90 <0f> 0b 90 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d9 14 ad 09 e8 44 64 87
RSP: 0000:ffffc900049c7af0 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff888029ec4a00 RCX: ffffffff81e80bfe
RDX: 0000000000000000 RSI: ffffffff8df2fec2 RDI: ffff888029ec4ec4
RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000012 R12: ffff88802ad93408
R13: ffffea0000ad7800 R14: 0000000000000000 R15: ffffea0000ad7800
FS: 00007f7501945c80(0000) GS:ffff8880973e2000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005645dcc70950 CR3: 00000000297d6000 CR4: 0000000000352ef0
Call Trace:
<TASK>
pte_unmap include/linux/pgtable.h:117 [inline]
wp_page_copy mm/memory.c:3960 [inline]
do_wp_page+0x13d7/0x4350 mm/memory.c:4320
handle_pte_fault mm/memory.c:6427 [inline]
__handle_mm_fault+0x1ab6/0x2a00 mm/memory.c:6549
handle_mm_fault+0x36d/0xa20 mm/memory.c:6718
do_user_addr_fault+0x5a3/0x12f0 arch/x86/mm/fault.c:1334
handle_page_fault arch/x86/mm/fault.c:1474 [inline]
exc_page_fault+0x6f/0xd0 arch/x86/mm/fault.c:1527
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
RIP: 0033:0x7f7501c33f87
Code: 5c 25 28 49 8b 57 10 48 85 db 74 27 8b 74 24 0c 23 73 08 44 39 f6 75 16 48 39 c2 75 05 e8 62 ff ff ff 48 8b 53 10 48 83 c0 08 <48> 89 50 f8 48 8b 1b eb d0 49 83 c4 08 49 81 fc 38 01 00 00 75 be
RSP: 002b:00007ffd29921270 EFLAGS: 00010212
RAX: 00005645dcc81140 RBX: 00005645dcc716d0 RCX: 0000000000000002
RDX: 00007ffd29925f83 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000008
R13: 00005645dcc70e60 R14: 0000000000000001 R15: 00005645dcc70c30
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page 2026-04-26 8:17 [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page syzbot @ 2026-04-26 10:49 ` Andrew Morton 2026-04-26 15:57 ` Qi Zheng 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2026-04-26 10:49 UTC (permalink / raw) To: syzbot Cc: Liam.Howlett, linux-kernel, linux-mm, ljs, shakeel.butt, surenb, syzkaller-bugs, vbabka, Muchun Song, Qi Zheng On Sun, 26 Apr 2026 01:17:25 -0700 syzbot <syzbot+7d60b33a8a546263da7c@syzkaller.appspotmail.com> wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of https://gi.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=12483702580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=24c8da4692f901cb > dashboard link: https://syzkaller.appspot.com/bug?extid=7d60b33a8a546263da7c > compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 > userspace arch: i386 > > Unfortunately, I don't have any reproducer for this issue yet. argh, that dreaded sentence. Thanks. Something's definitely amiss. This is at least the fifth report of rcu_read_lock() imbalance post-7.0. Others: https://lore.kernel.org/69eab803.a00a0220.17a17.004a.GAE@google.com https://lore.kernel.org/69eab803.a00a0220.17a17.004b.GAE@google.com https://lore.kernel.org/69eafb0e.a00a0220.9259.0031.GAE@google.com https://lore.kernel.org/69ebcbe2.a00a0220.7773.0005.GAE@google.com In some cases we released it too often, in other cases we failed to release it. The first one is slightly more useful in that it tells us that the not-released rcu_read_lock() was taken in folio_lruvec_lock_irqsave(). Muchun & Qi: you played with that rcu locking in 31b54a5e8916. Can you please double-check that we didn't miss something? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page 2026-04-26 10:49 ` Andrew Morton @ 2026-04-26 15:57 ` Qi Zheng 2026-04-26 17:55 ` Andrew Morton 0 siblings, 1 reply; 10+ messages in thread From: Qi Zheng @ 2026-04-26 15:57 UTC (permalink / raw) To: Andrew Morton, shakeel.butt, syzbot Cc: Liam.Howlett, linux-kernel, linux-mm, ljs, surenb, syzkaller-bugs, vbabka, Muchun Song Hi Andrew, On 4/26/26 6:49 PM, Andrew Morton wrote: > On Sun, 26 Apr 2026 01:17:25 -0700 syzbot <syzbot+7d60b33a8a546263da7c@syzkaller.appspotmail.com> wrote: > >> Hello, >> >> syzbot found the following issue on: >> >> HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of https://gi.. >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=12483702580000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=24c8da4692f901cb >> dashboard link: https://syzkaller.appspot.com/bug?extid=7d60b33a8a546263da7c >> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 >> userspace arch: i386 >> >> Unfortunately, I don't have any reproducer for this issue yet. > > argh, that dreaded sentence. > > Thanks. > > Something's definitely amiss. This is at least the fifth report of > rcu_read_lock() imbalance post-7.0. Others: > > https://lore.kernel.org/69eab803.a00a0220.17a17.004a.GAE@google.com > https://lore.kernel.org/69eab803.a00a0220.17a17.004b.GAE@google.com > https://lore.kernel.org/69eafb0e.a00a0220.9259.0031.GAE@google.com > https://lore.kernel.org/69ebcbe2.a00a0220.7773.0005.GAE@google.com All the kernel configs mentioned above include 'CONFIG_MEMCG_V1=y'. Theoretically, a rebind_subsystems() can lead a rcu unbalance, see my previous discussion with Shakeel for details: https://lore.kernel.org/all/358c60e1-fa91-40a1-9e00-84c93340c04e@linux.dev/ However, in a production environment, this is practically impossible. So Shakeel and I chose to wait for a reproducer at the time. :( > > In some cases we released it too often, in other cases we failed to > release it. > > The first one is slightly more useful in that it tells us that the > not-released rcu_read_lock() was taken in folio_lruvec_lock_irqsave(). I double-checked some callers of folio_lruvec_lock_irqsave() (such as folios_put_refs()), but didn't find anything suspicious. :( > > Muchun & Qi: you played with that rcu locking in 31b54a5e8916. Can you > please double-check that we didn't miss something? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page 2026-04-26 15:57 ` Qi Zheng @ 2026-04-26 17:55 ` Andrew Morton 2026-04-27 7:24 ` Qi Zheng 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2026-04-26 17:55 UTC (permalink / raw) To: Qi Zheng Cc: shakeel.butt, syzbot, Liam.Howlett, linux-kernel, linux-mm, ljs, surenb, syzkaller-bugs, vbabka, Muchun Song On Sun, 26 Apr 2026 23:57:42 +0800 Qi Zheng <qi.zheng@linux.dev> wrote: > Hi Andrew, > > On 4/26/26 6:49 PM, Andrew Morton wrote: > > On Sun, 26 Apr 2026 01:17:25 -0700 syzbot <syzbot+7d60b33a8a546263da7c@syzkaller.appspotmail.com> wrote: > > > >> Hello, > >> > >> syzbot found the following issue on: > >> > >> HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of https://gi.. > >> git tree: upstream > >> console output: https://syzkaller.appspot.com/x/log.txt?x=12483702580000 > >> kernel config: https://syzkaller.appspot.com/x/.config?x=24c8da4692f901cb > >> dashboard link: https://syzkaller.appspot.com/bug?extid=7d60b33a8a546263da7c > >> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 > >> userspace arch: i386 > >> > >> Unfortunately, I don't have any reproducer for this issue yet. > > > > argh, that dreaded sentence. > > > > Thanks. > > > > Something's definitely amiss. This is at least the fifth report of > > rcu_read_lock() imbalance post-7.0. Others: > > > > https://lore.kernel.org/69eab803.a00a0220.17a17.004a.GAE@google.com > > https://lore.kernel.org/69eab803.a00a0220.17a17.004b.GAE@google.com > > https://lore.kernel.org/69eafb0e.a00a0220.9259.0031.GAE@google.com > > https://lore.kernel.org/69ebcbe2.a00a0220.7773.0005.GAE@google.com > > All the kernel configs mentioned above include 'CONFIG_MEMCG_V1=y'. > > Theoretically, a rebind_subsystems() can lead a rcu unbalance, see my > previous discussion with Shakeel for details: > > https://lore.kernel.org/all/358c60e1-fa91-40a1-9e00-84c93340c04e@linux.dev/ Right, that looks similar. The rcu locking under lruvec_stat_mod_folio() is very simple, and that return in get_non_dying_memcg_end() does look super suspicious. Why does it omit the unlock? otoh, in https://lore.kernel.org/all/69eafb0e.a00a0220.9259.0031.GAE@google.com/ we're trying to release an rcu_read_lock() which isn't presently held. But if cgroup_subsys_on_dfl() were to become false between the get_non_dying_memcg_start/end pair, that's what would happen. So yup, I agree, concurrent rebind_subsystems() activity could cause all of this. The reports are pretty common - is there some debugging patch we can temporarily add to confirm this theory? And/or is it possible to cook up a selftest which will trigger this? > However, in a production environment, this is practically impossible. Can you expand on this? sysbot isn't a production environment ;) > So Shakeel and I chose to wait for a reproducer at the time. :( > > > > > In some cases we released it too often, in other cases we failed to > > release it. > > > > The first one is slightly more useful in that it tells us that the > > not-released rcu_read_lock() was taken in folio_lruvec_lock_irqsave(). > > I double-checked some callers of folio_lruvec_lock_irqsave() (such as > folios_put_refs()), but didn't find anything suspicious. :( Right - it's rare and smells of a race condition. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page 2026-04-26 17:55 ` Andrew Morton @ 2026-04-27 7:24 ` Qi Zheng 2026-04-27 9:43 ` Qi Zheng 2026-04-27 10:43 ` Andrew Morton 0 siblings, 2 replies; 10+ messages in thread From: Qi Zheng @ 2026-04-27 7:24 UTC (permalink / raw) To: Andrew Morton Cc: shakeel.butt, syzbot, Liam.Howlett, linux-kernel, linux-mm, ljs, surenb, syzkaller-bugs, vbabka, Muchun Song On 4/27/26 1:55 AM, Andrew Morton wrote: > On Sun, 26 Apr 2026 23:57:42 +0800 Qi Zheng <qi.zheng@linux.dev> wrote: > >> Hi Andrew, >> >> On 4/26/26 6:49 PM, Andrew Morton wrote: >>> On Sun, 26 Apr 2026 01:17:25 -0700 syzbot <syzbot+7d60b33a8a546263da7c@syzkaller.appspotmail.com> wrote: >>> >>>> Hello, >>>> >>>> syzbot found the following issue on: >>>> >>>> HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of https://gi.. >>>> git tree: upstream >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12483702580000 >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=24c8da4692f901cb >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=7d60b33a8a546263da7c >>>> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 >>>> userspace arch: i386 >>>> >>>> Unfortunately, I don't have any reproducer for this issue yet. >>> >>> argh, that dreaded sentence. >>> >>> Thanks. >>> >>> Something's definitely amiss. This is at least the fifth report of >>> rcu_read_lock() imbalance post-7.0. Others: >>> >>> https://lore.kernel.org/69eab803.a00a0220.17a17.004a.GAE@google.com >>> https://lore.kernel.org/69eab803.a00a0220.17a17.004b.GAE@google.com >>> https://lore.kernel.org/69eafb0e.a00a0220.9259.0031.GAE@google.com >>> https://lore.kernel.org/69ebcbe2.a00a0220.7773.0005.GAE@google.com >> >> All the kernel configs mentioned above include 'CONFIG_MEMCG_V1=y'. >> >> Theoretically, a rebind_subsystems() can lead a rcu unbalance, see my >> previous discussion with Shakeel for details: >> >> https://lore.kernel.org/all/358c60e1-fa91-40a1-9e00-84c93340c04e@linux.dev/ > > Right, that looks similar. > > The rcu locking under lruvec_stat_mod_folio() is very simple, and that > return in get_non_dying_memcg_end() does look super suspicious. Why > does it omit the unlock? > > otoh, in > https://lore.kernel.org/all/69eafb0e.a00a0220.9259.0031.GAE@google.com/ > we're trying to release an rcu_read_lock() which isn't presently held. > But if cgroup_subsys_on_dfl() were to become false between the > get_non_dying_memcg_start/end pair, that's what would happen. > > So yup, I agree, concurrent rebind_subsystems() activity could cause > all of this. The reports are pretty common - is there some debugging > patch we can temporarily add to confirm this theory? And/or is it > possible to cook up a selftest which will trigger this? I've been trying to reproduce this locally, but unfortunately I haven't succeeded yet. > >> However, in a production environment, this is practically impossible. > > Can you expand on this? > > sysbot isn't a production environment ;) Rebinding only works when the hierarchy is completely empty. This is generally not the case in a production environment (e.g. when systemd is used). BTW, it seems rebinding is about to be deprecated: cgroup1_reconfigure --> pr_warn("option changes via remount are deprecated (pid=%d comm=%s)\n", task_tgid_nr(current), current->comm); Also, it appears the current memcg subsystem assumes that cgroup_subsys_on_dfl(memory_cgrp_subsys) cannot be changed at runtime. (Please correct me if I missed anything.) If we can get a reproducer, we can try the following fix, or simply drop rebinding altogether? From 6ae41b91339625dd7bf0f819f775f26e78171a73 Mon Sep 17 00:00:00 2001 From: Qi Zheng <zhengqi.arch@bytedance.com> Date: Mon, 27 Apr 2026 11:20:21 +0800 Subject: [PATCH] mm: memcontrol: fix rcu unbalance in get_non_dying_memcg_end() Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- mm/memcontrol.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c3d98ab41f1f1..46ff40faf295a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -805,10 +805,15 @@ static long memcg_state_val_in_pages(int idx, long val) * Used in mod_memcg_state() and mod_memcg_lruvec_state() to avoid race with * reparenting of non-hierarchical state_locals. */ -static inline struct mem_cgroup *get_non_dying_memcg_start(struct mem_cgroup *memcg) +static inline struct mem_cgroup *get_non_dying_memcg_start(struct mem_cgroup *memcg, + bool *locked) { - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) { + *locked = false; return memcg; + } + + *locked = true; rcu_read_lock(); @@ -818,20 +823,22 @@ static inline struct mem_cgroup *get_non_dying_memcg_start(struct mem_cgroup *me return memcg; } -static inline void get_non_dying_memcg_end(void) +static inline void get_non_dying_memcg_end(bool rcu_locked) { - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + if (!rcu_locked) return; rcu_read_unlock(); } #else -static inline struct mem_cgroup *get_non_dying_memcg_start(struct mem_cgroup *memcg) +static inline struct mem_cgroup *get_non_dying_memcg_start(struct mem_cgroup *memcg, + bool *locked) { + *locked = false; return memcg; } -static inline void get_non_dying_memcg_end(void) +static inline void get_non_dying_memcg_end(bool rcu_locked) { } #endif @@ -865,12 +872,14 @@ static void __mod_memcg_state(struct mem_cgroup *memcg, void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val) { + bool locked; + if (mem_cgroup_disabled()) return; - memcg = get_non_dying_memcg_start(memcg); + memcg = get_non_dying_memcg_start(memcg, &locked); __mod_memcg_state(memcg, idx, val); - get_non_dying_memcg_end(); + get_non_dying_memcg_end(locked); } #ifdef CONFIG_MEMCG_V1 @@ -933,14 +942,15 @@ static void mod_memcg_lruvec_state(struct lruvec *lruvec, struct pglist_data *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup_per_node *pn; struct mem_cgroup *memcg; + bool locked; pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - memcg = get_non_dying_memcg_start(pn->memcg); + memcg = get_non_dying_memcg_start(pn->memcg, &locked); pn = memcg->nodeinfo[pgdat->node_id]; __mod_memcg_lruvec_state(pn, idx, val); - get_non_dying_memcg_end(); + get_non_dying_memcg_end(locked); } /** -- 2.20.1 Thanks, Qi > >> So Shakeel and I chose to wait for a reproducer at the time. :( >> >>> >>> In some cases we released it too often, in other cases we failed to >>> release it. >>> >>> The first one is slightly more useful in that it tells us that the >>> not-released rcu_read_lock() was taken in folio_lruvec_lock_irqsave(). >> >> I double-checked some callers of folio_lruvec_lock_irqsave() (such as >> folios_put_refs()), but didn't find anything suspicious. :( > > Right - it's rare and smells of a race condition. > ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page 2026-04-27 7:24 ` Qi Zheng @ 2026-04-27 9:43 ` Qi Zheng 2026-04-27 10:44 ` Andrew Morton 2026-04-27 10:43 ` Andrew Morton 1 sibling, 1 reply; 10+ messages in thread From: Qi Zheng @ 2026-04-27 9:43 UTC (permalink / raw) To: Andrew Morton, shakeel.butt Cc: syzbot, Liam.Howlett, linux-kernel, linux-mm, ljs, surenb, syzkaller-bugs, vbabka, Muchun Song On 4/27/26 3:24 PM, Qi Zheng wrote: > > > On 4/27/26 1:55 AM, Andrew Morton wrote: >> On Sun, 26 Apr 2026 23:57:42 +0800 Qi Zheng <qi.zheng@linux.dev> wrote: >> >>> Hi Andrew, >>> >>> On 4/26/26 6:49 PM, Andrew Morton wrote: >>>> On Sun, 26 Apr 2026 01:17:25 -0700 syzbot >>>> <syzbot+7d60b33a8a546263da7c@syzkaller.appspotmail.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> syzbot found the following issue on: >>>>> >>>>> HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of >>>>> https://gi.. >>>>> git tree: upstream >>>>> console output: https://syzkaller.appspot.com/x/log.txt? >>>>> x=12483702580000 >>>>> kernel config: https://syzkaller.appspot.com/x/.config? >>>>> x=24c8da4692f901cb >>>>> dashboard link: https://syzkaller.appspot.com/bug? >>>>> extid=7d60b33a8a546263da7c >>>>> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils >>>>> for Debian) 2.44 >>>>> userspace arch: i386 >>>>> >>>>> Unfortunately, I don't have any reproducer for this issue yet. >>>> >>>> argh, that dreaded sentence. >>>> >>>> Thanks. >>>> >>>> Something's definitely amiss. This is at least the fifth report of >>>> rcu_read_lock() imbalance post-7.0. Others: >>>> >>>> https://lore.kernel.org/69eab803.a00a0220.17a17.004a.GAE@google.com >>>> https://lore.kernel.org/69eab803.a00a0220.17a17.004b.GAE@google.com >>>> https://lore.kernel.org/69eafb0e.a00a0220.9259.0031.GAE@google.com >>>> https://lore.kernel.org/69ebcbe2.a00a0220.7773.0005.GAE@google.com >>> >>> All the kernel configs mentioned above include 'CONFIG_MEMCG_V1=y'. >>> >>> Theoretically, a rebind_subsystems() can lead a rcu unbalance, see my >>> previous discussion with Shakeel for details: >>> >>> https://lore.kernel.org/all/358c60e1- >>> fa91-40a1-9e00-84c93340c04e@linux.dev/ >> >> Right, that looks similar. >> >> The rcu locking under lruvec_stat_mod_folio() is very simple, and that >> return in get_non_dying_memcg_end() does look super suspicious. Why >> does it omit the unlock? >> >> otoh, in >> https://lore.kernel.org/all/69eafb0e.a00a0220.9259.0031.GAE@google.com/ >> we're trying to release an rcu_read_lock() which isn't presently held. >> But if cgroup_subsys_on_dfl() were to become false between the >> get_non_dying_memcg_start/end pair, that's what would happen. >> >> So yup, I agree, concurrent rebind_subsystems() activity could cause >> all of this. The reports are pretty common - is there some debugging >> patch we can temporarily add to confirm this theory? And/or is it >> possible to cook up a selftest which will trigger this? > > I've been trying to reproduce this locally, but unfortunately I haven't > succeeded yet. Alright, it seems I have successfully reproduced it: (The reproducer is attached at the bottom of this email.) [ 43.883623][ T270] mod_memcg_lruvec_state: key_on_dfl=0 rcu_locked=0 depth_before=2 depth_now=2 [ 43.884267][ T270] ------------[ cut here ]------------ [ 43.884663][ T270] WARNING: mm/memcontrol.c:850 at mod_memcg_lruvec_state+0x94/0x130, CPU#0: memcg-repro/270 [ 43.885375][ T270] Modules linked in: [ 43.885704][ T270] CPU: 0 UID: 0 PID: 270 Comm: memcg-repro Tainted: G W 7.0.0-next-20260420+ # [ 43.886554][ T270] Tainted: [W]=WARN [ 43.886833][ T270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 43.887490][ T270] RIP: 0010:mod_memcg_lruvec_state+0x94/0x130 [ 43.887932][ T270] Code: 5c 41 5d 41 5e 41 5f e9 4a 52 a3 00 48 8d b3 58 09 00 00 b9 0c 00 00 00 48 c7 c7 72 de f [ 43.889319][ T270] RSP: 0000:ffffc900041bfc38 EFLAGS: 00010246 [ 43.889763][ T270] RAX: 0000000000000000 RBX: ffff888104619bc0 RCX: 0000000000000000 [ 43.890332][ T270] RDX: 0000000000000619 RSI: ffff88810461a524 RDI: ffffffff827bde7e [ 43.890908][ T270] RBP: 0000000000000001 R08: ffffffff83549028 R09: 0000000000000001 [ 43.891481][ T270] R10: ffffffffffffdfff R11: ffffc900041bfa78 R12: 0000000000000011 [ 43.892051][ T270] R13: ffff8882bfffa1c0 R14: 0000000000000002 R15: ffff88810203a7c0 [ 43.892629][ T270] FS: 00007f73c4641740(0000) GS:ffff8883324cb000(0000) knlGS:0000000000000000 [ 43.893262][ T270] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 43.893737][ T270] CR2: 00005590e4eb8000 CR3: 00000001040d2000 CR4: 00000000000006f0 [ 43.894300][ T270] Call Trace: [ 43.894548][ T270] <TASK> [ 43.894767][ T270] lruvec_stat_mod_folio+0xc2/0x1a0 [ 43.895138][ T270] __folio_mod_stat+0x25/0x80 [ 43.895483][ T270] folio_add_new_anon_rmap+0xb1/0x2b0 [ 43.895880][ T270] map_anon_folio_pte_nopf+0xa3/0x120 [ 43.896267][ T270] do_pte_missing+0xad5/0xb40 [ 43.896620][ T270] __handle_mm_fault+0x80e/0xcd0 [ 43.896983][ T270] handle_mm_fault+0x146/0x310 [ 43.897332][ T270] do_user_addr_fault+0x303/0x880 [ 43.897708][ T270] exc_page_fault+0x9b/0x270 [ 43.898042][ T270] asm_exc_page_fault+0x26/0x30 [ 43.898387][ T270] RIP: 0033:0x5590e4eb41ea [ 43.898722][ T270] Code: 61 cc 66 0f 6f e0 66 0f 61 c2 66 0f db cd 66 0f 69 e2 66 0f 6f d0 66 0f 69 d4 66 0f 61 0 [ 43.900107][ T270] RSP: 002b:00007ffcad25f030 EFLAGS: 00010202 [ 43.900546][ T270] RAX: 00005590e4eb8010 RBX: 00007ffcad260f7d RCX: 00007f73c474d44d [ 43.901114][ T270] RDX: 00005590e4eb80a0 RSI: 00005590e4eb503c RDI: 000000000000000f [ 43.901691][ T270] RBP: 00005590e4eb70a0 R08: 0000000000000000 R09: 00007f73c483a680 [ 43.902257][ T270] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 43.902831][ T270] R13: 00007ffcad25f180 R14: 00005590e4eb6dd8 R15: 00007f73c4869020 [ 43.903407][ T270] </TASK> [ 43.903637][ T270] irq event stamp: 2919 [ 43.903933][ T270] hardirqs last enabled at (2927): [<ffffffff8137acfe>] __up_console_sem+0x5e/0x70 [ 43.904605][ T270] hardirqs last disabled at (2936): [<ffffffff8137ace3>] __up_console_sem+0x43/0x70 [ 43.905264][ T270] softirqs last enabled at (2048): [<ffffffff812c7f1e>] handle_softirqs+0x38e/0x460 [ 43.905952][ T270] softirqs last disabled at (2031): [<ffffffff812c84c9>] irq_exit_rcu+0xe9/0x160 [ 43.906606][ T270] ---[ end trace 0000000000000000 ]--- [ 43.907004][ T270] [ 43.907174][ T270] ===================================== [ 43.907565][ T270] WARNING: bad unlock balance detected! [ 43.907954][ T270] 7.0.0-next-20260420+ #83 Tainted: G W [ 43.908450][ T270] ------------------------------------- [ 43.908845][ T270] memcg-repro/270 is trying to release lock (rcu_read_lock) at: [ 43.909382][ T270] [<ffffffff815f57f7>] rcu_read_unlock+0x17/0x60 [ 43.909830][ T270] but there are no more locks to release! [ 43.910234][ T270] [ 43.910234][ T270] other info that might help us debug this: [ 43.910807][ T270] 1 lock held by memcg-repro/270: [ 43.911163][ T270] #0: ffff888102fa2088 (vm_lock){++++}-{0:0}, at: do_user_addr_fault+0x285/0x880 [ 43.911820][ T270] [ 43.911820][ T270] stack backtrace: [ 43.912237][ T270] CPU: 0 UID: 0 PID: 270 Comm: memcg-repro Tainted: G W 7.0.0-next-20260420+ # [ 43.912239][ T270] Tainted: [W]=WARN [ 43.912240][ T270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 43.912240][ T270] Call Trace: [ 43.912241][ T270] <TASK> [ 43.912242][ T270] ? rcu_read_unlock+0x17/0x60 [ 43.912244][ T270] dump_stack_lvl+0x77/0xb0 [ 43.912248][ T270] print_unlock_imbalance_bug+0xe0/0xf0 [ 43.912251][ T270] ? rcu_read_unlock+0x17/0x60 [ 43.912253][ T270] lock_release+0x21d/0x2a0 [ 43.912256][ T270] rcu_read_unlock+0x1c/0x60 [ 43.912258][ T270] do_pte_missing+0x233/0xb40 [ 43.912260][ T270] __handle_mm_fault+0x80e/0xcd0 [ 43.912265][ T270] handle_mm_fault+0x146/0x310 [ 43.912268][ T270] do_user_addr_fault+0x303/0x880 [ 43.912271][ T270] exc_page_fault+0x9b/0x270 [ 43.912273][ T270] asm_exc_page_fault+0x26/0x30 [ 43.912274][ T270] RIP: 0033:0x5590e4eb41ea [ 43.912276][ T270] Code: 61 cc 66 0f 6f e0 66 0f 61 c2 66 0f db cd 66 0f 69 e2 66 0f 6f d0 66 0f 69 d4 66 0f 61 0 [ 43.912277][ T270] RSP: 002b:00007ffcad25f030 EFLAGS: 00010202 [ 43.912278][ T270] RAX: 00005590e4eb8010 RBX: 00007ffcad260f7d RCX: 00007f73c474d44d [ 43.912278][ T270] RDX: 00005590e4eb80a0 RSI: 00005590e4eb503c RDI: 000000000000000f [ 43.912279][ T270] RBP: 00005590e4eb70a0 R08: 0000000000000000 R09: 00007f73c483a680 [ 43.912280][ T270] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 43.912280][ T270] R13: 00007ffcad25f180 R14: 00005590e4eb6dd8 R15: 00007f73c4869020 [ 43.912284][ T270] </TASK> [ 43.923741][ T270] ------------[ cut here ]------------ [ 43.924127][ T270] WARNING: kernel/rcu/tree_plugin.h:443 at __rcu_read_unlock+0x117/0x210, CPU#0: memcg-repro/270 [ 43.924968][ T270] Modules linked in: [ 43.925251][ T270] CPU: 0 UID: 0 PID: 270 Comm: memcg-repro Tainted: G W 7.0.0-next-20260420+ # [ 43.926102][ T270] Tainted: [W]=WARN [ 43.926376][ T270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 43.927038][ T270] RIP: 0010:__rcu_read_unlock+0x117/0x210 [ 43.927469][ T270] Code: 68 56 83 01 00 00 00 bf 09 00 00 00 e8 62 da f1 ff 4d 85 ed 0f 84 27 ff ff ff e8 24 f7 5 [ 43.928861][ T270] RSP: 0000:ffffc900041bfcf8 EFLAGS: 00010286 [ 43.929292][ T270] RAX: 00000000ffffffff RBX: ffff888104619bc0 RCX: 0000000000000027 [ 43.929876][ T270] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8882b5a19780 [ 43.930431][ T270] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 [ 43.931012][ T270] R10: ffffffffffffdfff R11: ffffc900041bf920 R12: ffff8881000f3ac0 [ 43.931611][ T270] R13: 00005590e4eb8000 R14: 0000000000000001 R15: ffff888102fa2000 [ 43.932188][ T270] FS: 00007f73c4641740(0000) GS:ffff8883324cb000(0000) knlGS:0000000000000000 [ 43.932838][ T270] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 43.933301][ T270] CR2: 00005590e4eb8000 CR3: 00000001040d2000 CR4: 00000000000006f0 [ 43.933882][ T270] Call Trace: [ 43.934124][ T270] <TASK> [ 43.934472][ T270] do_pte_missing+0x233/0xb40 [ 43.935004][ T270] __handle_mm_fault+0x80e/0xcd0 [ 43.935953][ T270] handle_mm_fault+0x146/0x310 [ 43.936462][ T270] do_user_addr_fault+0x303/0x880 [ 43.937078][ T270] exc_page_fault+0x9b/0x270 [ 43.937552][ T270] asm_exc_page_fault+0x26/0x30 [ 43.937918][ T270] RIP: 0033:0x5590e4eb41ea [ 43.938246][ T270] Code: 61 cc 66 0f 6f e0 66 0f 61 c2 66 0f db cd 66 0f 69 e2 66 0f 6f d0 66 0f 69 d4 66 0f 61 0 [ 43.939645][ T270] RSP: 002b:00007ffcad25f030 EFLAGS: 00010202 [ 43.940075][ T270] RAX: 00005590e4eb8010 RBX: 00007ffcad260f7d RCX: 00007f73c474d44d [ 43.940644][ T270] RDX: 00005590e4eb80a0 RSI: 00005590e4eb503c RDI: 000000000000000f [ 43.941210][ T270] RBP: 00005590e4eb70a0 R08: 0000000000000000 R09: 00007f73c483a680 [ 43.941786][ T270] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 43.942351][ T270] R13: 00007ffcad25f180 R14: 00005590e4eb6dd8 R15: 00007f73c4869020 [ 43.943383][ T270] </TASK> [ 43.943620][ T270] irq event stamp: 2975 [ 43.943912][ T270] hardirqs last enabled at (2975): [<ffffffff81312500>] raw_spin_rq_unlock_irq+0x10/0x30 [ 43.944626][ T270] hardirqs last disabled at (2974): [<ffffffff820e83e5>] __schedule+0xd35/0x1df0 [ 43.945270][ T270] softirqs last enabled at (2048): [<ffffffff812c7f1e>] handle_softirqs+0x38e/0x460 [ 43.945956][ T270] softirqs last disabled at (2031): [<ffffffff812c84c9>] irq_exit_rcu+0xe9/0x160 [ 43.946625][ T270] ---[ end trace 0000000000000000 ]--- > >> >>> However, in a production environment, this is practically impossible. >> >> Can you expand on this? >> >> sysbot isn't a production environment ;) > > Rebinding only works when the hierarchy is completely empty. This is > generally not the case in a production environment (e.g. when systemd > is used). > > BTW, it seems rebinding is about to be deprecated: > > cgroup1_reconfigure > --> pr_warn("option changes via remount are deprecated (pid=%d comm=%s)\n", > task_tgid_nr(current), current->comm); > > Also, it appears the current memcg subsystem assumes that > cgroup_subsys_on_dfl(memory_cgrp_subsys) cannot be changed at runtime. > (Please correct me if I missed anything.) > > If we can get a reproducer, we can try the following fix, or simply drop > rebinding altogether? > > From 6ae41b91339625dd7bf0f819f775f26e78171a73 Mon Sep 17 00:00:00 2001 > From: Qi Zheng <zhengqi.arch@bytedance.com> > Date: Mon, 27 Apr 2026 11:20:21 +0800 > Subject: [PATCH] mm: memcontrol: fix rcu unbalance in > get_non_dying_memcg_end() > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > mm/memcontrol.c | 30 ++++++++++++++++++++---------- > 1 file changed, 20 insertions(+), 10 deletions(-) With the above patch applied, the warnings are gone. If no one objects, I'll submit the formal fix. Or should we actually just remove rebinding instead? Thanks, Qi ===== Repro ===== kernel diff ----------- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c3d98ab41f1f1..419883a483e32 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -36,6 +36,7 @@ #include <linux/pagemap.h> #include <linux/folio_batch.h> #include <linux/vm_event_item.h> +#include <linux/delay.h> #include <linux/smp.h> #include <linux/page-flags.h> #include <linux/backing-dev.h> @@ -805,6 +806,28 @@ static long memcg_state_val_in_pages(int idx, long val) * Used in mod_memcg_state() and mod_memcg_lruvec_state() to avoid race with * reparenting of non-hierarchical state_locals. */ +static __always_inline bool memcg_rcu_repro_task(void) +{ + return !strncmp(current->comm, "memcg-repro", TASK_COMM_LEN); +} + +static noinline void memcg_rcu_repro_pause(void) +{ + if (memcg_rcu_repro_task()) + mdelay(200); +} + +static noinline void memcg_rcu_repro_check(const char *site, int depth_before) +{ + bool key_on_dfl = cgroup_subsys_on_dfl(memory_cgrp_subsys); + bool rcu_locked = rcu_preempt_depth() != depth_before; + + WARN_ON_ONCE(memcg_rcu_repro_task() && key_on_dfl == rcu_locked); + if (memcg_rcu_repro_task() && key_on_dfl == rcu_locked) + pr_warn("%s: key_on_dfl=%d rcu_locked=%d depth_before=%d depth_now=%d\n", + site, key_on_dfl, rcu_locked, depth_before, rcu_preempt_depth()); +} + static inline struct mem_cgroup *get_non_dying_memcg_start(struct mem_cgroup *memcg) { if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) @@ -865,10 +888,15 @@ static void __mod_memcg_state(struct mem_cgroup *memcg, void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val) { + int depth_before; + if (mem_cgroup_disabled()) return; + depth_before = rcu_preempt_depth(); memcg = get_non_dying_memcg_start(memcg); + memcg_rcu_repro_pause(); + memcg_rcu_repro_check(__func__, depth_before); __mod_memcg_state(memcg, idx, val); get_non_dying_memcg_end(); } @@ -932,10 +960,14 @@ static void mod_memcg_lruvec_state(struct lruvec *lruvec, { struct pglist_data *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup_per_node *pn; + int depth_before; struct mem_cgroup *memcg; pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); + depth_before = rcu_preempt_depth(); memcg = get_non_dying_memcg_start(pn->memcg); + memcg_rcu_repro_pause(); + memcg_rcu_repro_check(__func__, depth_before); pn = memcg->nodeinfo[pgdat->node_id]; __mod_memcg_lruvec_state(pn, idx, val); /root/memcg-rcu-unbalance-repro.c --------------------------------- #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <linux/prctl.h> #include <limits.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/prctl.h> #include <sys/socket.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> static void die(const char *msg) { perror(msg); exit(1); } static void ensure_parent_dir(const char *path) { char tmp[PATH_MAX]; char *slash; if (strlen(path) >= sizeof(tmp)) die("path too long"); strcpy(tmp, path); slash = strrchr(tmp, '/'); if (!slash) return; while (slash > tmp && *slash == '/') *slash-- = '\0'; if (slash < tmp) return; *++slash = '\0'; for (slash = tmp + 1; *slash; slash++) { if (*slash != '/') continue; *slash = '\0'; if (mkdir(tmp, 0755) < 0 && errno != EEXIST) die("mkdir"); *slash = '/'; } if (mkdir(tmp, 0755) < 0 && errno != EEXIST) die("mkdir"); } static void reset_file(int fd, off_t *off) { if (ftruncate(fd, 0) < 0) die("ftruncate"); *off = 0; } static void socket_roundtrip(int txfd, int rxfd, const void *buf, size_t len) { char rxbuf[4096]; ssize_t n; for (;;) { n = send(txfd, buf, len, 0); if (n >= 0) break; if (errno != EINTR) die("send"); } if ((size_t)n != len) { errno = EIO; die("send"); } for (;;) { n = recv(rxfd, rxbuf, sizeof(rxbuf), 0); if (n >= 0) break; if (errno != EINTR) die("recv"); } if ((size_t)n != len) { errno = EIO; die("recv"); } } int main(int argc, char **argv) { const char *path = argc > 1 ? argv[1] : "/tmp/memcg-rcu-repro.file"; static char buf[4096]; off_t off = 0; off_t max = 16LL * 1024 * 1024; int fd; int sv[2]; int i; if (prctl(PR_SET_NAME, "memcg-repro", 0, 0, 0) < 0) die("prctl(PR_SET_NAME)"); for (i = 0; i < (int)sizeof(buf); i++) buf[i] = (char)i; ensure_parent_dir(path); fd = open(path, O_CREAT | O_RDWR | O_TRUNC, 0600); if (fd < 0) die("open"); if (socketpair(AF_UNIX, SOCK_DGRAM, 0, sv) < 0) die("socketpair"); for (;;) { ssize_t n = pwrite(fd, buf, sizeof(buf), off); if (n != (ssize_t)sizeof(buf)) { if (n < 0 && errno == EINTR) continue; if (n < 0 && (errno == ENOSPC || errno == EDQUOT)) { reset_file(fd, &off); continue; } die("pwrite"); } off += sizeof(buf); if ((off & ((1 << 20) - 1)) == 0) { if (fsync(fd) < 0) { if (errno == EINTR) continue; if (errno == ENOSPC || errno == EDQUOT) { reset_file(fd, &off); continue; } die("fsync"); } } if (off >= max) reset_file(fd, &off); for (i = 0; i < 16; i++) socket_roundtrip(sv[0], sv[1], buf, sizeof(buf)); } } /root/memcg-rcu-unbalance-repro.sh ---------------------------------- #!/bin/sh set -eu WORKER_SRC="/root/memcg-rcu-unbalance-repro.c" WORKER_BIN="/root/memcg-rcu-unbalance-repro" WORKER_BIN_FALLBACK="/tmp/memcg-rcu-unbalance-repro" WORKDIR="/tmp/memcg-rcu-repro" CGV2_PROBE_MNT="$WORKDIR/cgv2-probe" DATA_FILE="$WORKDIR/repro.file" CG_MNT="/sys/fs/cgroup" REPRO_HIER_NAME="memcg-rcu-repro" RESTORE_CGROUP2_ON_EXIT=0 WORKER_CPU="" V1_HOLD_MS="${V1_HOLD_MS:-800}" V2_HOLD_MS="${V2_HOLD_MS:-50}" need_root() { if [ "$(id -u)" -ne 0 ]; then echo "must run as root" >&2 exit 1 fi } is_mounted() { grep -Fqs " $1 " /proc/self/mountinfo } mount_fstype() { awk -v mountpoint="$1" ' $5 == mountpoint { for (i = 1; i <= NF; i++) { if ($i == "-") { print $(i + 1) exit } } } ' /proc/self/mountinfo } setup_early_boot_env() { mount -o remount,rw / >/dev/null 2>&1 || true [ -d /proc ] || mkdir -p /proc [ -d /sys ] || mkdir -p /sys [ -d /dev ] || mkdir -p /dev [ -d /tmp ] || mkdir -p /tmp is_mounted /proc || mount -t proc proc /proc is_mounted /sys || mount -t sysfs sysfs /sys if ! is_mounted /dev && grep -qw devtmpfs /proc/filesystems 2>/dev/null; then mount -t devtmpfs devtmpfs /dev >/dev/null 2>&1 || true fi } need_memory_controller() { if [ -r /proc/cgroups ] && awk '$1 == "memory" && $4 == 1 { found = 1 } END { exit found ? 0 : 1 }' /proc/cgroups; then return 0 fi echo "memory controller not available; expected an enabled memory entry in /proc/cgroups" >&2 exit 1 } count_child_cgroups() { mountpoint="$1" count=0 for d in "$mountpoint"/*; do [ -d "$d" ] || continue count=$((count + 1)) done echo "$count" } umount_if_mounted() { if is_mounted "$1"; then umount "$1" fi } mount_cgroup2_probe() { if [ "$(mount_fstype "$CG_MNT")" = "cgroup2" ]; then echo "$CG_MNT" return 0 fi umount_if_mounted "$CGV2_PROBE_MNT" mount -t cgroup2 none "$CGV2_PROBE_MNT" echo "$CGV2_PROBE_MNT" } mount_named_cgroup1_root() { umount_if_mounted "$CG_MNT" mount -t cgroup -o "none,name=$REPRO_HIER_NAME" none "$CG_MNT" } remount_memory_to_v1() { mount -t cgroup -o "remount,memory,name=$REPRO_HIER_NAME" none "$CG_MNT" } remount_memory_to_v2() { mount -t cgroup -o "remount,none,name=$REPRO_HIER_NAME" none "$CG_MNT" } sleep_ms() { ms="$1" if [ "$ms" -le 0 ]; then return 0 fi if command -v usleep >/dev/null 2>&1; then usleep $((ms * 1000)) return 0 fi if command -v busybox >/dev/null 2>&1 && busybox usleep 1000 >/dev/null 2>&1; then busybox usleep $((ms * 1000)) return 0 fi if [ $((ms % 1000)) -eq 0 ]; then sleep $((ms / 1000)) return 0 fi sleep "$(printf '%d.%03d' $((ms / 1000)) $((ms % 1000)))" } cleanup() { set +e if [ -n "${WORKER_PID:-}" ]; then kill "$WORKER_PID" 2>/dev/null || true wait "$WORKER_PID" 2>/dev/null || true fi umount_if_mounted "$CGV2_PROBE_MNT" if [ "$RESTORE_CGROUP2_ON_EXIT" -eq 1 ]; then umount_if_mounted "$CG_MNT" mount -t cgroup2 none "$CG_MNT" >/dev/null 2>&1 || true fi } prepare_worker() { if [ -x "$WORKER_BIN" ]; then return 0 fi if [ -x "$WORKER_BIN_FALLBACK" ]; then WORKER_BIN="$WORKER_BIN_FALLBACK" return 0 fi if ! command -v cc >/dev/null 2>&1; then echo "no usable worker binary and no compiler in current environment" >&2 echo "prebuild it before reboot with:" >&2 echo " cc -O2 -Wall -Wextra -o $WORKER_BIN $WORKER_SRC" >&2 exit 1 fi if cc -O2 -Wall -Wextra -o "$WORKER_BIN" "$WORKER_SRC"; then return 0 fi echo "failed to compile worker in early-boot shell" >&2 echo "prebuild it before reboot with:" >&2 echo " cc -O2 -Wall -Wextra -o $WORKER_BIN $WORKER_SRC" >&2 exit 1 } wait_for_worker_ready() { tries=0 while [ "$tries" -lt 5 ]; do if kill -0 "$WORKER_PID" 2>/dev/null && [ -r "/proc/$WORKER_PID/comm" ] && grep -qx "memcg-repro" "/proc/$WORKER_PID/comm" && [ -s "$DATA_FILE" ]; then return 0 fi tries=$((tries + 1)) sleep 1 done echo "worker failed to become ready before remount loop" >&2 if [ -r "/proc/$WORKER_PID/comm" ]; then echo "worker pid=$WORKER_PID comm=$(cat "/proc/$WORKER_PID/comm")" >&2 else echo "worker pid=$WORKER_PID is not alive" >&2 fi exit 1 } need_root setup_early_boot_env mkdir -p "$WORKDIR" "$CGV2_PROBE_MNT" trap cleanup EXIT INT TERM if [ ! -d "$CG_MNT" ]; then mkdir -p "$CG_MNT" fi need_memory_controller CGV2_CHECK_MNT="$(mount_cgroup2_probe)" if [ ! -r "$CGV2_CHECK_MNT/cgroup.controllers" ] || ! grep -qw memory "$CGV2_CHECK_MNT/cgroup.controllers"; then echo "memory controller is not on the default cgroup v2 hierarchy before repro" >&2 echo "run this in early boot before anything binds memory to a legacy v1 hierarchy" >&2 exit 1 fi child_count="$(count_child_cgroups "$CGV2_CHECK_MNT")" if [ "$child_count" -ne 0 ]; then echo "cgroup2 root already has child cgroups; memory rebind to v1 will likely hit -EBUSY" >&2 echo "run this in a minimal initramfs or early-boot shell with no non-root cgroups" >&2 exit 1 fi if [ "$CGV2_CHECK_MNT" = "$CGV2_PROBE_MNT" ]; then umount_if_mounted "$CGV2_PROBE_MNT" fi mount_named_cgroup1_root RESTORE_CGROUP2_ON_EXIT=1 prepare_worker if command -v nproc >/dev/null 2>&1 && command -v taskset >/dev/null 2>&1; then if [ "$(nproc)" -ge 2 ]; then taskset -pc 1 $$ >/dev/null 2>&1 || true WORKER_CPU="0" else WORKER_CPU="" fi else WORKER_CPU="" fi echo "apply the kernel patch in /root/memcg-rcu-unbalance-repro.patch before running this script" echo "recommended kernel config: CONFIG_MEMCG=y CONFIG_MEMCG_V1=y CONFIG_PREEMPT_RCU=y" echo "recommended boot param: panic_on_warn=1" echo "worker binary: $WORKER_BIN" echo "repro hierarchy: name=$REPRO_HIER_NAME mountpoint=$CG_MNT" echo "remount cadence: v2=${V2_HOLD_MS}ms v1=${V1_HOLD_MS}ms" if [ -n "$WORKER_CPU" ]; then taskset -c "$WORKER_CPU" "$WORKER_BIN" "$DATA_FILE" & else "$WORKER_BIN" "$DATA_FILE" & fi WORKER_PID=$! wait_for_worker_ready echo "worker pid=$WORKER_PID comm=$(cat "/proc/$WORKER_PID/comm") data_file=$DATA_FILE" echo "cgroup v1 remount/rebind loop starting; watch dmesg for:" echo " option changes via remount are deprecated" echo " mod_memcg_state: key_on_dfl=0 rcu_locked=0 depth_before=0 depth_now=0" echo " WARN.*memcg_rcu_repro_check" echo " Voluntary context switch within RCU read-side critical section" echo " rcu_read_unlock.*underflow / bad unlock" i=0 while :; do i=$((i + 1)) remount_memory_to_v2 sleep_ms "$V2_HOLD_MS" remount_memory_to_v1 sleep_ms "$V1_HOLD_MS" if [ $((i % 10)) -eq 0 ]; then echo "completed $i rebind cycles" fi done ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page 2026-04-27 9:43 ` Qi Zheng @ 2026-04-27 10:44 ` Andrew Morton 2026-04-27 10:57 ` Qi Zheng 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2026-04-27 10:44 UTC (permalink / raw) To: Qi Zheng Cc: shakeel.butt, syzbot, Liam.Howlett, linux-kernel, linux-mm, ljs, surenb, syzkaller-bugs, vbabka, Muchun Song On Mon, 27 Apr 2026 17:43:38 +0800 Qi Zheng <qi.zheng@linux.dev> wrote: > > Alright, it seems I have successfully reproduced it: > (The reproducer is attached at the bottom of this email.) That's a lot of code. Thanks for doing that. Maybe there's something here we can put into selftests/ > >> > >>> However, in a production environment, this is practically impossible. > >> > >> Can you expand on this? > >> > >> sysbot isn't a production environment ;) > > > > Rebinding only works when the hierarchy is completely empty. This is > > generally not the case in a production environment (e.g. when systemd > > is used). > > > > BTW, it seems rebinding is about to be deprecated: > > > > cgroup1_reconfigure > > --> pr_warn("option changes via remount are deprecated (pid=%d comm=%s)\n", > > task_tgid_nr(current), current->comm); > > > > Also, it appears the current memcg subsystem assumes that > > cgroup_subsys_on_dfl(memory_cgrp_subsys) cannot be changed at runtime. > > (Please correct me if I missed anything.) > > > > If we can get a reproducer, we can try the following fix, or simply drop > > rebinding altogether? We'll want something which is applicable to 7.1-rcX please. Removal of rebinding sounds like something we'd address in 7.2 or later. > > From 6ae41b91339625dd7bf0f819f775f26e78171a73 Mon Sep 17 00:00:00 2001 > > From: Qi Zheng <zhengqi.arch@bytedance.com> > > Date: Mon, 27 Apr 2026 11:20:21 +0800 > > Subject: [PATCH] mm: memcontrol: fix rcu unbalance in > > get_non_dying_memcg_end() > > > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > > --- > > mm/memcontrol.c | 30 ++++++++++++++++++++---------- > > 1 file changed, 20 insertions(+), 10 deletions(-) > > With the above patch applied, the warnings are gone. > > If no one objects, I'll submit the formal fix. Or should we actually > just remove rebinding instead? I suggest we just fix up current -rc please. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page 2026-04-27 10:44 ` Andrew Morton @ 2026-04-27 10:57 ` Qi Zheng 0 siblings, 0 replies; 10+ messages in thread From: Qi Zheng @ 2026-04-27 10:57 UTC (permalink / raw) To: Andrew Morton Cc: shakeel.butt, syzbot, Liam.Howlett, linux-kernel, linux-mm, ljs, surenb, syzkaller-bugs, vbabka, Muchun Song On 4/27/26 6:44 PM, Andrew Morton wrote: > On Mon, 27 Apr 2026 17:43:38 +0800 Qi Zheng <qi.zheng@linux.dev> wrote: > >> >> Alright, it seems I have successfully reproduced it: >> (The reproducer is attached at the bottom of this email.) > > That's a lot of code. Thanks for doing that. Maybe there's something > here we can put into selftests/ If we are inclined to drop rebinding, then perhaps it's unnecessary to add selftest for it. ;) > >>>> >>>>> However, in a production environment, this is practically impossible. >>>> >>>> Can you expand on this? >>>> >>>> sysbot isn't a production environment ;) >>> >>> Rebinding only works when the hierarchy is completely empty. This is >>> generally not the case in a production environment (e.g. when systemd >>> is used). >>> >>> BTW, it seems rebinding is about to be deprecated: >>> >>> cgroup1_reconfigure >>> --> pr_warn("option changes via remount are deprecated (pid=%d comm=%s)\n", >>> task_tgid_nr(current), current->comm); >>> >>> Also, it appears the current memcg subsystem assumes that >>> cgroup_subsys_on_dfl(memory_cgrp_subsys) cannot be changed at runtime. >>> (Please correct me if I missed anything.) >>> >>> If we can get a reproducer, we can try the following fix, or simply drop >>> rebinding altogether? > > We'll want something which is applicable to 7.1-rcX please. Removal of > rebinding sounds like something we'd address in 7.2 or later. Got it. > >>> From 6ae41b91339625dd7bf0f819f775f26e78171a73 Mon Sep 17 00:00:00 2001 >>> From: Qi Zheng <zhengqi.arch@bytedance.com> >>> Date: Mon, 27 Apr 2026 11:20:21 +0800 >>> Subject: [PATCH] mm: memcontrol: fix rcu unbalance in >>> get_non_dying_memcg_end() >>> >>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>> --- >>> mm/memcontrol.c | 30 ++++++++++++++++++++---------- >>> 1 file changed, 20 insertions(+), 10 deletions(-) >> >> With the above patch applied, the warnings are gone. >> >> If no one objects, I'll submit the formal fix. Or should we actually >> just remove rebinding instead? > > I suggest we just fix up current -rc please. OK, will do. Thanks, Qi > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page 2026-04-27 7:24 ` Qi Zheng 2026-04-27 9:43 ` Qi Zheng @ 2026-04-27 10:43 ` Andrew Morton 2026-04-27 10:54 ` Qi Zheng 1 sibling, 1 reply; 10+ messages in thread From: Andrew Morton @ 2026-04-27 10:43 UTC (permalink / raw) To: Qi Zheng Cc: shakeel.butt, syzbot, Liam.Howlett, linux-kernel, linux-mm, ljs, surenb, syzkaller-bugs, vbabka, Muchun Song On Mon, 27 Apr 2026 15:24:10 +0800 Qi Zheng <qi.zheng@linux.dev> wrote: > > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -805,10 +805,15 @@ static long memcg_state_val_in_pages(int idx, long > val) > * Used in mod_memcg_state() and mod_memcg_lruvec_state() to avoid > race with > * reparenting of non-hierarchical state_locals. > */ > -static inline struct mem_cgroup *get_non_dying_memcg_start(struct > mem_cgroup *memcg) > +static inline struct mem_cgroup *get_non_dying_memcg_start(struct > mem_cgroup *memcg, > + bool *locked) Sometimes this is called "locked", sometimes "rcu_locked". Let's be consistent please. > { > - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) > + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) { > + *locked = false; > return memcg; > + } I was thinking we could use a single bit in the task_struct for this: --- a/mm/memcontrol.c~a +++ a/mm/memcontrol.c @@ -810,6 +810,8 @@ static inline struct mem_cgroup *get_non if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) return memcg; + current->foo = 1; + rcu_read_lock(); while (memcg_is_dying(memcg)) @@ -820,9 +822,11 @@ static inline struct mem_cgroup *get_non static inline void get_non_dying_memcg_end(void) { - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + if (current->foo == 0) return; + current->foo = 0; + rcu_read_unlock(); } #else _ That doesn't work if get_non_dying_memcg_start/get_non_dying_memcg_end calls can nest? If they can nest then we'd need a counter in the task_struct. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page 2026-04-27 10:43 ` Andrew Morton @ 2026-04-27 10:54 ` Qi Zheng 0 siblings, 0 replies; 10+ messages in thread From: Qi Zheng @ 2026-04-27 10:54 UTC (permalink / raw) To: Andrew Morton Cc: shakeel.butt, syzbot, Liam.Howlett, linux-kernel, linux-mm, ljs, surenb, syzkaller-bugs, vbabka, Muchun Song On 4/27/26 6:43 PM, Andrew Morton wrote: > On Mon, 27 Apr 2026 15:24:10 +0800 Qi Zheng <qi.zheng@linux.dev> wrote: > >> >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -805,10 +805,15 @@ static long memcg_state_val_in_pages(int idx, long >> val) >> * Used in mod_memcg_state() and mod_memcg_lruvec_state() to avoid >> race with >> * reparenting of non-hierarchical state_locals. >> */ >> -static inline struct mem_cgroup *get_non_dying_memcg_start(struct >> mem_cgroup *memcg) >> +static inline struct mem_cgroup *get_non_dying_memcg_start(struct >> mem_cgroup *memcg, >> + bool *locked) > > Sometimes this is called "locked", sometimes "rcu_locked". Let's be > consistent please. OK. > >> { >> - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) >> + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) { >> + *locked = false; >> return memcg; >> + } > > I was thinking we could use a single bit in the task_struct for this: Maybe this isn't the good idea. It feels a bit weird to add a new member to task_struct just to fix a minor bug in mm. :( > > --- a/mm/memcontrol.c~a > +++ a/mm/memcontrol.c > @@ -810,6 +810,8 @@ static inline struct mem_cgroup *get_non > if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) > return memcg; > > + current->foo = 1; > + > rcu_read_lock(); > > while (memcg_is_dying(memcg)) > @@ -820,9 +822,11 @@ static inline struct mem_cgroup *get_non > > static inline void get_non_dying_memcg_end(void) > { > - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) > + if (current->foo == 0) > return; > > + current->foo = 0; > + > rcu_read_unlock(); > } > #else > _ > > That doesn't work if get_non_dying_memcg_start/get_non_dying_memcg_end > calls can nest? If they can nest then we'd need a counter in the > task_struct. > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-27 10:58 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-26 8:17 [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page syzbot 2026-04-26 10:49 ` Andrew Morton 2026-04-26 15:57 ` Qi Zheng 2026-04-26 17:55 ` Andrew Morton 2026-04-27 7:24 ` Qi Zheng 2026-04-27 9:43 ` Qi Zheng 2026-04-27 10:44 ` Andrew Morton 2026-04-27 10:57 ` Qi Zheng 2026-04-27 10:43 ` Andrew Morton 2026-04-27 10:54 ` Qi Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox