* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
2024-12-14 3:56 [syzbot] [mm?] " syzbot
@ 2025-02-14 18:11 ` syzbot
2025-02-14 23:23 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2025-02-14 18:11 UTC (permalink / raw)
To: akpm, chengming.zhou, hannes, kasong, kent.overstreet,
linux-bcachefs, linux-kernel, linux-mm, mhocko, muchun.song,
roman.gushchin, ryncsn, sashal, shakeel.butt, syzkaller-bugs,
willy, yuzhao, zhengqi.arch
syzbot has found a reproducer for the following issue on:
HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
Modules linked in:
CPU: 0 UID: 0 PID: 5459 Comm: syz-executor Not tainted 6.14.0-rc2-syzkaller-00185-g128c8f96eb86 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
Code: e9 19 fe ff ff e8 72 f2 b5 ff 4c 8b 7c 24 08 45 84 f6 0f 84 40 ff ff ff e9 22 01 00 00 e8 5a f2 b5 ff eb 05 e8 53 f2 b5 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 71 fd ff ff 48
RSP: 0018:ffffc9000d70f3a0 EFLAGS: 00010293
RAX: ffffffff820bc50d RBX: 0000000000000000 RCX: ffff8880382d4880
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8880351ac054 R08: ffffffff820bc49f R09: 1ffffffff2079b8e
R10: dffffc0000000000 R11: fffffbfff2079b8f R12: ffffffff820bc19e
R13: ffff88801ee9a798 R14: 0000000000000000 R15: ffff8880351ac000
FS: 000055557d70b500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff6826de40 CR3: 000000005680c000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
list_lru_del+0x58/0x1f0 mm/list_lru.c:202
list_lru_del_obj+0x17b/0x250 mm/list_lru.c:223
d_lru_del fs/dcache.c:481 [inline]
to_shrink_list+0x136/0x340 fs/dcache.c:904
select_collect+0xce/0x1b0 fs/dcache.c:1472
d_walk+0x1f5/0x750 fs/dcache.c:1295
shrink_dcache_parent+0x144/0x3b0 fs/dcache.c:1527
d_invalidate+0x11c/0x2d0 fs/dcache.c:1632
proc_invalidate_siblings_dcache+0x3fb/0x6e0 fs/proc/inode.c:142
release_task+0x168e/0x1830 kernel/exit.c:279
wait_task_zombie kernel/exit.c:1249 [inline]
wait_consider_task+0x1a14/0x2e60 kernel/exit.c:1476
do_wait_thread kernel/exit.c:1539 [inline]
__do_wait+0x1b0/0x850 kernel/exit.c:1657
do_wait+0x1e9/0x550 kernel/exit.c:1691
kernel_wait4+0x2a7/0x3e0 kernel/exit.c:1850
__do_sys_wait4 kernel/exit.c:1878 [inline]
__se_sys_wait4 kernel/exit.c:1874 [inline]
__x64_sys_wait4+0x134/0x1e0 kernel/exit.c:1874
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f93f3983057
Code: 89 7c 24 10 48 89 4c 24 18 e8 45 1b 03 00 4c 8b 54 24 18 8b 54 24 14 41 89 c0 48 8b 74 24 08 8b 7c 24 10 b8 3d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 10 e8 95 1b 03 00 8b 44
RSP: 002b:00007fff6826e9b0 EFLAGS: 00000293 ORIG_RAX: 000000000000003d
RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f93f3983057
RDX: 0000000040000001 RSI: 00007fff6826ea1c RDI: 00000000ffffffff
RBP: 00007fff6826ea1c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000001388
R13: 00000000000927c0 R14: 000000000002f011 R15: 00007fff6826ea70
</TASK>
---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
2025-02-14 18:11 ` [syzbot] [mm?] [bcachefs?] " syzbot
@ 2025-02-14 23:23 ` Andrew Morton
2025-02-16 16:13 ` Kairui Song
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2025-02-14 23:23 UTC (permalink / raw)
To: syzbot
Cc: chengming.zhou, hannes, kasong, kent.overstreet, linux-bcachefs,
linux-kernel, linux-mm, mhocko, muchun.song, roman.gushchin,
ryncsn, sashal, shakeel.butt, syzkaller-bugs, willy, yuzhao,
zhengqi.arch
On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> syzbot has found a reproducer for the following issue on:
Thanks. I doubt if bcachefs is implicated in this?
> HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
VM_WARN_ON(!css_is_dying(&memcg->css));
>
> ...
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
2025-02-14 23:23 ` Andrew Morton
@ 2025-02-16 16:13 ` Kairui Song
2025-02-17 17:12 ` Kairui Song
0 siblings, 1 reply; 12+ messages in thread
From: Kairui Song @ 2025-02-16 16:13 UTC (permalink / raw)
To: Andrew Morton
Cc: syzbot, chengming.zhou, hannes, kent.overstreet, linux-bcachefs,
linux-kernel, linux-mm, mhocko, muchun.song, roman.gushchin,
sashal, shakeel.butt, syzkaller-bugs, willy, yuzhao, zhengqi.arch
On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
>
> > syzbot has found a reproducer for the following issue on:
>
> Thanks. I doubt if bcachefs is implicated in this?
>
> > HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> >
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> > mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
>
> VM_WARN_ON(!css_is_dying(&memcg->css));
I'm checking this, when last time this was triggered, it was caused by
a list_lru user did not initialize the memcg list_lru properly before
list_lru reclaim started, and fixed by:
https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
This shouldn't be a big issue, maybe there are leaks that will be
fixed upon reparenting, and this new added sanity check might be too
lenient, I'm not 100% sure though.
Unfortunately I couldn't reproduce the issue locally with the
reproducer yet. will keep the test running and see if it can hit this
WARN_ON.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
2025-02-16 16:13 ` Kairui Song
@ 2025-02-17 17:12 ` Kairui Song
2025-02-17 18:09 ` Alan Huang
0 siblings, 1 reply; 12+ messages in thread
From: Kairui Song @ 2025-02-17 17:12 UTC (permalink / raw)
To: Andrew Morton, kent.overstreet
Cc: syzbot, chengming.zhou, hannes, linux-bcachefs, linux-kernel,
linux-mm, mhocko, muchun.song, roman.gushchin, sashal,
shakeel.butt, syzkaller-bugs, willy, yuzhao, zhengqi.arch
On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> >
> > > syzbot has found a reproducer for the following issue on:
> >
> > Thanks. I doubt if bcachefs is implicated in this?
> >
> > > HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> > >
> > > Downloadable assets:
> > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> > > mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> > >
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> >
> > VM_WARN_ON(!css_is_dying(&memcg->css));
>
> I'm checking this, when last time this was triggered, it was caused by
> a list_lru user did not initialize the memcg list_lru properly before
> list_lru reclaim started, and fixed by:
> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
>
> This shouldn't be a big issue, maybe there are leaks that will be
> fixed upon reparenting, and this new added sanity check might be too
> lenient, I'm not 100% sure though.
>
> Unfortunately I couldn't reproduce the issue locally with the
> reproducer yet. will keep the test running and see if it can hit this
> WARN_ON.
So far I am still unable to trigger this VM_WARN_ON using the
reproducer, and I'm seeing many other random crashes.
But after I changed the .config a bit adding more debug configs
(SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
and will be triggered immediately after I start the test:
[ T1242] BUG: unable to handle page fault for address: ffff888054c60000
[ T1242] #PF: supervisor read access in kernel mode
[ T1242] #PF: error_code(0x0000) - not-present page
[ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
800fffffab39f060
[ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
[ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
6.14.0-rc2-00185-g128c8f96eb86 #2
[ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
[ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
[ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
[ T6058] bcachefs (loop2): empty btree root xattrs
[ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
[ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
[ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
[ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
[ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
[ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
[ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
[ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
knlGS:0000000000000000
[ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
[ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ T1242] Call Trace:
[ T1242] <TASK>
[ T1242] bch2_btree_node_read_done+0x1d20/0x53a0
[ T1242] btree_node_read_work+0x54d/0xdc0
[ T1242] process_scheduled_works+0xaf8/0x17f0
[ T1242] worker_thread+0x89d/0xd60
[ T1242] kthread+0x722/0x890
[ T1242] ret_from_fork+0x4e/0x80
[ T1242] ret_from_fork_asm+0x1a/0x30
[ T1242] </TASK>
[ T1242] Modules linked in:
[ T1242] ---[ end trace 0000000000000000 ]---
[ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
[ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
[ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
[ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
[ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
[ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
[ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
[ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
[ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
knlGS:0000000000000000
[ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
[ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ T1242] Kernel panic - not syncing: Fatal exception
[ T1242] Kernel Offset: disabled
[ T1242] Rebooting in 86400 seconds..
It's caused by the memmove_u64s_down in validate_bset_keys of
fs/bcachefs/btree_io.c:
-> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
The bkey_p_next(k) is RSI: ffff888054c60000 and it's causing an out of
border access.
(u64 *) vstruct_end(i) - (u64 *) k is RCX: 0000000000006c31, if added
to RDI this should cause an out of border write as well.
This seems to indicate there is an out of border memory modification?
And maybe it corrupted other subsystems? The slight change to .config
changed the layout so it's causing a fault, maybe previously this just
went on silently.
I don't know much about bcachefs, will be grateful if bcachefs people
could help have a look.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
2025-02-17 17:12 ` Kairui Song
@ 2025-02-17 18:09 ` Alan Huang
2025-02-18 11:40 ` Kairui Song
0 siblings, 1 reply; 12+ messages in thread
From: Alan Huang @ 2025-02-17 18:09 UTC (permalink / raw)
To: Kairui Song
Cc: Andrew Morton, kent.overstreet, syzbot, chengming.zhou, hannes,
linux-bcachefs, linux-kernel, linux-mm, mhocko, muchun.song,
roman.gushchin, sashal, shakeel.butt, syzkaller-bugs, willy,
yuzhao, zhengqi.arch
On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@gmail.com> wrote:
>
> On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
>>
>> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>>>
>>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
>>>
>>>> syzbot has found a reproducer for the following issue on:
>>>
>>> Thanks. I doubt if bcachefs is implicated in this?
>>>
>>>> HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
>>>> git tree: upstream
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
>>>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
>>>>
>>>> Downloadable assets:
>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
>>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
>>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
>>>>
>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
>>>>
>>>> ------------[ cut here ]------------
>>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
>>>
>>> VM_WARN_ON(!css_is_dying(&memcg->css));
>>
>> I'm checking this, when last time this was triggered, it was caused by
>> a list_lru user did not initialize the memcg list_lru properly before
>> list_lru reclaim started, and fixed by:
>> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
>>
>> This shouldn't be a big issue, maybe there are leaks that will be
>> fixed upon reparenting, and this new added sanity check might be too
>> lenient, I'm not 100% sure though.
>>
>> Unfortunately I couldn't reproduce the issue locally with the
>> reproducer yet. will keep the test running and see if it can hit this
>> WARN_ON.
>
> So far I am still unable to trigger this VM_WARN_ON using the
> reproducer, and I'm seeing many other random crashes.
>
> But after I changed the .config a bit adding more debug configs
> (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
> and will be triggered immediately after I start the test:
>
> [ T1242] BUG: unable to handle page fault for address: ffff888054c60000
> [ T1242] #PF: supervisor read access in kernel mode
> [ T1242] #PF: error_code(0x0000) - not-present page
> [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
> 800fffffab39f060
> [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
> [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
> 6.14.0-rc2-00185-g128c8f96eb86 #2
> [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
> 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
> [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> [ T6058] bcachefs (loop2): empty btree root xattrs
> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
> knlGS:0000000000000000
> [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ T1242] Call Trace:
> [ T1242] <TASK>
> [ T1242] bch2_btree_node_read_done+0x1d20/0x53a0
> [ T1242] btree_node_read_work+0x54d/0xdc0
> [ T1242] process_scheduled_works+0xaf8/0x17f0
> [ T1242] worker_thread+0x89d/0xd60
> [ T1242] kthread+0x722/0x890
> [ T1242] ret_from_fork+0x4e/0x80
> [ T1242] ret_from_fork_asm+0x1a/0x30
> [ T1242] </TASK>
> [ T1242] Modules linked in:
> [ T1242] ---[ end trace 0000000000000000 ]---
> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
> knlGS:0000000000000000
> [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ T1242] Kernel panic - not syncing: Fatal exception
> [ T1242] Kernel Offset: disabled
> [ T1242] Rebooting in 86400 seconds..
>
> It's caused by the memmove_u64s_down in validate_bset_keys of
> fs/bcachefs/btree_io.c:
> -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
Might need this.
diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e71b278672b6..fb53174cb735 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
}
got_good_key:
le16_add_cpu(&i->u64s, -next_good_key);
- memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
+ memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k));
set_btree_node_need_rewrite(b);
}
fsck_err:
>
> The bkey_p_next(k) is RSI: ffff888054c60000 and it's causing an out of
> border access.
> (u64 *) vstruct_end(i) - (u64 *) k is RCX: 0000000000006c31, if added
> to RDI this should cause an out of border write as well.
>
> This seems to indicate there is an out of border memory modification?
> And maybe it corrupted other subsystems? The slight change to .config
> changed the layout so it's causing a fault, maybe previously this just
> went on silently.
> I don't know much about bcachefs, will be grateful if bcachefs people
> could help have a look.
>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
2025-02-17 18:09 ` Alan Huang
@ 2025-02-18 11:40 ` Kairui Song
2025-02-18 12:16 ` Alan Huang
0 siblings, 1 reply; 12+ messages in thread
From: Kairui Song @ 2025-02-18 11:40 UTC (permalink / raw)
To: Alan Huang
Cc: Andrew Morton, kent.overstreet, syzbot, chengming.zhou, hannes,
linux-bcachefs, linux-kernel, linux-mm, mhocko, muchun.song,
roman.gushchin, sashal, shakeel.butt, syzkaller-bugs, willy,
yuzhao, zhengqi.arch
On Tue, Feb 18, 2025 at 2:09 AM Alan Huang <mmpgouride@gmail.com> wrote:
>
> On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@gmail.com> wrote:
> >
> > On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
> >>
> >> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >>>
> >>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> >>>
> >>>> syzbot has found a reproducer for the following issue on:
> >>>
> >>> Thanks. I doubt if bcachefs is implicated in this?
> >>>
> >>>> HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> >>>> git tree: upstream
> >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> >>>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> >>>>
> >>>> Downloadable assets:
> >>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> >>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> >>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> >>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> >>>>
> >>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> >>>>
> >>>> ------------[ cut here ]------------
> >>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> >>>
> >>> VM_WARN_ON(!css_is_dying(&memcg->css));
> >>
> >> I'm checking this, when last time this was triggered, it was caused by
> >> a list_lru user did not initialize the memcg list_lru properly before
> >> list_lru reclaim started, and fixed by:
> >> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
> >>
> >> This shouldn't be a big issue, maybe there are leaks that will be
> >> fixed upon reparenting, and this new added sanity check might be too
> >> lenient, I'm not 100% sure though.
> >>
> >> Unfortunately I couldn't reproduce the issue locally with the
> >> reproducer yet. will keep the test running and see if it can hit this
> >> WARN_ON.
> >
> > So far I am still unable to trigger this VM_WARN_ON using the
> > reproducer, and I'm seeing many other random crashes.
> >
> > But after I changed the .config a bit adding more debug configs
> > (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
> > and will be triggered immediately after I start the test:
> >
> > [ T1242] BUG: unable to handle page fault for address: ffff888054c60000
> > [ T1242] #PF: supervisor read access in kernel mode
> > [ T1242] #PF: error_code(0x0000) - not-present page
> > [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
> > 800fffffab39f060
> > [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
> > [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
> > 6.14.0-rc2-00185-g128c8f96eb86 #2
> > [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
> > 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
> > [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
> > [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> > [ T6058] bcachefs (loop2): empty btree root xattrs
> > [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> > 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> > ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> > [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> > [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> > [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> > [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> > [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> > [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> > [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
> > knlGS:0000000000000000
> > [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> > [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ T1242] Call Trace:
> > [ T1242] <TASK>
> > [ T1242] bch2_btree_node_read_done+0x1d20/0x53a0
> > [ T1242] btree_node_read_work+0x54d/0xdc0
> > [ T1242] process_scheduled_works+0xaf8/0x17f0
> > [ T1242] worker_thread+0x89d/0xd60
> > [ T1242] kthread+0x722/0x890
> > [ T1242] ret_from_fork+0x4e/0x80
> > [ T1242] ret_from_fork_asm+0x1a/0x30
> > [ T1242] </TASK>
> > [ T1242] Modules linked in:
> > [ T1242] ---[ end trace 0000000000000000 ]---
> > [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> > [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> > 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> > ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> > [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> > [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> > [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> > [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> > [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> > [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> > [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
> > knlGS:0000000000000000
> > [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> > [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ T1242] Kernel panic - not syncing: Fatal exception
> > [ T1242] Kernel Offset: disabled
> > [ T1242] Rebooting in 86400 seconds..
> >
> > It's caused by the memmove_u64s_down in validate_bset_keys of
> > fs/bcachefs/btree_io.c:
> > -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
>
>
> Might need this.
>
> diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
> index e71b278672b6..fb53174cb735 100644
> --- a/fs/bcachefs/btree_io.c
> +++ b/fs/bcachefs/btree_io.c
> @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
> }
> got_good_key:
> le16_add_cpu(&i->u64s, -next_good_key);
> - memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
> + memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k));
> set_btree_node_need_rewrite(b);
> }
> fsck_err:
>
Thanks, but this didn't fix everything. I think the problem is more
complex, syzbot seems to be trying to mount damaged bcachefs (on
purpose I think), so the vstruct_end(i) is already returning an offset
that is out of border.
I retriggered it and print some more debug info: i->_data is
ffff88806d5c00a0, i->u64s is 60928, and the faulting address is
ffff88806d600000.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
2025-02-18 11:40 ` Kairui Song
@ 2025-02-18 12:16 ` Alan Huang
2025-02-18 17:47 ` Kairui Song
0 siblings, 1 reply; 12+ messages in thread
From: Alan Huang @ 2025-02-18 12:16 UTC (permalink / raw)
To: Kairui Song
Cc: Andrew Morton, kent.overstreet, syzbot, chengming.zhou, hannes,
linux-bcachefs, linux-kernel, linux-mm, mhocko, muchun.song,
roman.gushchin, sashal, shakeel.butt, syzkaller-bugs, willy,
yuzhao, zhengqi.arch
On Feb 18, 2025, at 19:40, Kairui Song <ryncsn@gmail.com> wrote:
>
> On Tue, Feb 18, 2025 at 2:09 AM Alan Huang <mmpgouride@gmail.com> wrote:
>>
>> On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@gmail.com> wrote:
>>>
>>> On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
>>>>
>>>> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>>>>>
>>>>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
>>>>>
>>>>>> syzbot has found a reproducer for the following issue on:
>>>>>
>>>>> Thanks. I doubt if bcachefs is implicated in this?
>>>>>
>>>>>> HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
>>>>>> git tree: upstream
>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
>>>>>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
>>>>>>
>>>>>> Downloadable assets:
>>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
>>>>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
>>>>>>
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
>>>>>>
>>>>>> ------------[ cut here ]------------
>>>>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
>>>>>
>>>>> VM_WARN_ON(!css_is_dying(&memcg->css));
>>>>
>>>> I'm checking this, when last time this was triggered, it was caused by
>>>> a list_lru user did not initialize the memcg list_lru properly before
>>>> list_lru reclaim started, and fixed by:
>>>> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
>>>>
>>>> This shouldn't be a big issue, maybe there are leaks that will be
>>>> fixed upon reparenting, and this new added sanity check might be too
>>>> lenient, I'm not 100% sure though.
>>>>
>>>> Unfortunately I couldn't reproduce the issue locally with the
>>>> reproducer yet. will keep the test running and see if it can hit this
>>>> WARN_ON.
>>>
>>> So far I am still unable to trigger this VM_WARN_ON using the
>>> reproducer, and I'm seeing many other random crashes.
>>>
>>> But after I changed the .config a bit adding more debug configs
>>> (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
>>> and will be triggered immediately after I start the test:
>>>
>>> [ T1242] BUG: unable to handle page fault for address: ffff888054c60000
>>> [ T1242] #PF: supervisor read access in kernel mode
>>> [ T1242] #PF: error_code(0x0000) - not-present page
>>> [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
>>> 800fffffab39f060
>>> [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
>>> [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
>>> 6.14.0-rc2-00185-g128c8f96eb86 #2
>>> [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
>>> 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
>>> [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
>>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
>>> [ T6058] bcachefs (loop2): empty btree root xattrs
>>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
>>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
>>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
>>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
>>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
>>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
>>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
>>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
>>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
>>> [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
>>> knlGS:0000000000000000
>>> [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
>>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ T1242] Call Trace:
>>> [ T1242] <TASK>
>>> [ T1242] bch2_btree_node_read_done+0x1d20/0x53a0
>>> [ T1242] btree_node_read_work+0x54d/0xdc0
>>> [ T1242] process_scheduled_works+0xaf8/0x17f0
>>> [ T1242] worker_thread+0x89d/0xd60
>>> [ T1242] kthread+0x722/0x890
>>> [ T1242] ret_from_fork+0x4e/0x80
>>> [ T1242] ret_from_fork_asm+0x1a/0x30
>>> [ T1242] </TASK>
>>> [ T1242] Modules linked in:
>>> [ T1242] ---[ end trace 0000000000000000 ]---
>>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
>>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
>>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
>>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
>>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
>>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
>>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
>>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
>>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
>>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
>>> [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
>>> knlGS:0000000000000000
>>> [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
>>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ T1242] Kernel panic - not syncing: Fatal exception
>>> [ T1242] Kernel Offset: disabled
>>> [ T1242] Rebooting in 86400 seconds..
>>>
>>> It's caused by the memmove_u64s_down in validate_bset_keys of
>>> fs/bcachefs/btree_io.c:
>>> -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
>>
>>
>> Might need this.
>>
>> diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
>> index e71b278672b6..fb53174cb735 100644
>> --- a/fs/bcachefs/btree_io.c
>> +++ b/fs/bcachefs/btree_io.c
>> @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
>> }
>> got_good_key:
>> le16_add_cpu(&i->u64s, -next_good_key);
>> - memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
>> + memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k));
>> set_btree_node_need_rewrite(b);
>> }
>> fsck_err:
>>
>
> Thanks, but this didn't fix everything. I think the problem is more
> complex, syzbot seems to be trying to mount damaged bcachefs (on
> purpose I think), so the vstruct_end(i) is already returning an offset
> that is out of border.
Could you try this (I need to go out now):
diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e71b278672b6..80a0094be356 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
}
got_good_key:
le16_add_cpu(&i->u64s, -next_good_key);
- memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
+ memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
set_btree_node_need_rewrite(b);
}
fsck_err:
>
> I retriggered it and print some more debug info: i->_data is
> ffff88806d5c00a0, i->u64s is 60928, and the faulting address is
> ffff88806d600000.
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
[not found] <7ADD7E39-E7DC-4A40-8236-CD5C90112C96@gmail.com>
@ 2025-02-18 17:12 ` syzbot
0 siblings, 0 replies; 12+ messages in thread
From: syzbot @ 2025-02-18 17:12 UTC (permalink / raw)
To: linux-kernel, mmpgouride, syzkaller-bugs
Hello,
syzbot tried to test the proposed patch but the build/boot failed:
failed to apply patch:
checking file fs/bcachefs/btree_io.c
Hunk #1 FAILED at 997.
1 out of 1 hunk FAILED
Tested on:
commit: 2408a807 Merge tag 'vfs-6.14-rc4.fixes' of git://git.k..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
compiler:
patch: https://syzkaller.appspot.com/x/patch.diff?x=139b65b0580000
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
[not found] <B2F37A7F-BE18-495F-9350-6D7D47198FFD@gmail.com>
@ 2025-02-18 17:38 ` syzbot
0 siblings, 0 replies; 12+ messages in thread
From: syzbot @ 2025-02-18 17:38 UTC (permalink / raw)
To: linux-kernel, mmpgouride, syzkaller-bugs
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: unable to handle kernel NULL pointer dereference in qlist_free_all
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 3f9de067 P4D 3f9de067 PUD 40014067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5357 Comm: kworker/0:3 Not tainted 6.14.0-rc3-syzkaller-g2408a807bfc3-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Workqueue: events nsim_fib_event_work
RIP: 0010:qlink_to_cache mm/kasan/quarantine.c:131 [inline]
RIP: 0010:qlist_free_all+0x69/0x140 mm/kasan/quarantine.c:176
Code: e8 06 48 83 e0 c0 49 8b 4c 05 08 f6 c1 01 0f 85 a8 00 00 00 4c 01 e8 66 90 0f b6 48 33 c1 e1 18 81 f9 00 00 00 f5 48 0f 45 c5 <48> 8b 58 08 4d 8b 34 24 48 63 83 c0 00 00 00 49 29 c4 48 89 df 4c
RSP: 0018:ffffc9000d477568 EFLAGS: 00010206
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ff000000
RDX: 0000000000000000 RSI: 0000000013ec5780 RDI: 000000001fffffff
RBP: 0000000000000000 R08: ffffffff816eac95 R09: ffffffff82290c0f
R10: dffffc0000000000 R11: fffffbfff28a8d0f R12: ffffffff93ec5780
R13: ffffea0000000000 R14: ffffffff93ec5780 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000000365b0000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
kasan_slab_alloc include/linux/kasan.h:250 [inline]
slab_post_alloc_hook mm/slub.c:4115 [inline]
slab_alloc_node mm/slub.c:4164 [inline]
__kmalloc_cache_noprof+0x1d9/0x390 mm/slub.c:4320
kmalloc_noprof include/linux/slab.h:901 [inline]
kzalloc_noprof include/linux/slab.h:1037 [inline]
nsim_fib4_rt_create drivers/net/netdevsim/fib.c:280 [inline]
nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:426 [inline]
nsim_fib4_event drivers/net/netdevsim/fib.c:464 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:884 [inline]
nsim_fib_event_work+0xe02/0x3f00 drivers/net/netdevsim/fib.c:1493
process_one_work kernel/workqueue.c:3236 [inline]
process_scheduled_works+0xabe/0x18e0 kernel/workqueue.c:3317
worker_thread+0x870/0xd30 kernel/workqueue.c:3398
kthread+0x7a9/0x920 kernel/kthread.c:464
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Modules linked in:
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
RIP: 0010:qlink_to_cache mm/kasan/quarantine.c:131 [inline]
RIP: 0010:qlist_free_all+0x69/0x140 mm/kasan/quarantine.c:176
Code: e8 06 48 83 e0 c0 49 8b 4c 05 08 f6 c1 01 0f 85 a8 00 00 00 4c 01 e8 66 90 0f b6 48 33 c1 e1 18 81 f9 00 00 00 f5 48 0f 45 c5 <48> 8b 58 08 4d 8b 34 24 48 63 83 c0 00 00 00 49 29 c4 48 89 df 4c
RSP: 0018:ffffc9000d477568 EFLAGS: 00010206
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ff000000
RDX: 0000000000000000 RSI: 0000000013ec5780 RDI: 000000001fffffff
RBP: 0000000000000000 R08: ffffffff816eac95 R09: ffffffff82290c0f
R10: dffffc0000000000 R11: fffffbfff28a8d0f R12: ffffffff93ec5780
R13: ffffea0000000000 R14: ffffffff93ec5780 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000000365b0000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: e8 06 48 83 e0 call 0xe083480b
5: c0 49 8b 4c rorb $0x4c,-0x75(%rcx)
9: 05 08 f6 c1 01 add $0x1c1f608,%eax
e: 0f 85 a8 00 00 00 jne 0xbc
14: 4c 01 e8 add %r13,%rax
17: 66 90 xchg %ax,%ax
19: 0f b6 48 33 movzbl 0x33(%rax),%ecx
1d: c1 e1 18 shl $0x18,%ecx
20: 81 f9 00 00 00 f5 cmp $0xf5000000,%ecx
26: 48 0f 45 c5 cmovne %rbp,%rax
* 2a: 48 8b 58 08 mov 0x8(%rax),%rbx <-- trapping instruction
2e: 4d 8b 34 24 mov (%r12),%r14
32: 48 63 83 c0 00 00 00 movslq 0xc0(%rbx),%rax
39: 49 29 c4 sub %rax,%r12
3c: 48 89 df mov %rbx,%rdi
3f: 4c rex.WR
Tested on:
commit: 2408a807 Merge tag 'vfs-6.14-rc4.fixes' of git://git.k..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=113477df980000
kernel config: https://syzkaller.appspot.com/x/.config?x=b7bde34acd8f53b1
dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=114603a4580000
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
2025-02-18 12:16 ` Alan Huang
@ 2025-02-18 17:47 ` Kairui Song
0 siblings, 0 replies; 12+ messages in thread
From: Kairui Song @ 2025-02-18 17:47 UTC (permalink / raw)
To: Alan Huang
Cc: Andrew Morton, kent.overstreet, syzbot, linux-bcachefs,
linux-kernel, linux-mm, syzkaller-bugs
On Tue, Feb 18, 2025 at 8:17 PM Alan Huang <mmpgouride@gmail.com> wrote:
>
> On Feb 18, 2025, at 19:40, Kairui Song <ryncsn@gmail.com> wrote:
> >
> > On Tue, Feb 18, 2025 at 2:09 AM Alan Huang <mmpgouride@gmail.com> wrote:
> >>
> >> On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@gmail.com> wrote:
> >>>
> >>> On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
> >>>>
> >>>> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >>>>>
> >>>>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> >>>>>
> >>>>>> syzbot has found a reproducer for the following issue on:
> >>>>>
> >>>>> Thanks. I doubt if bcachefs is implicated in this?
> >>>>>
> >>>>>> HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> >>>>>> git tree: upstream
> >>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> >>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> >>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> >>>>>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> >>>>>>
> >>>>>> Downloadable assets:
> >>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> >>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> >>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> >>>>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> >>>>>>
> >>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>>>> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> >>>>>>
> >>>>>> ------------[ cut here ]------------
> >>>>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> >>>>>
> >>>>> VM_WARN_ON(!css_is_dying(&memcg->css));
> >>>>
> >>>> I'm checking this, when last time this was triggered, it was caused by
> >>>> a list_lru user did not initialize the memcg list_lru properly before
> >>>> list_lru reclaim started, and fixed by:
> >>>> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
> >>>>
> >>>> This shouldn't be a big issue, maybe there are leaks that will be
> >>>> fixed upon reparenting, and this new added sanity check might be too
> >>>> lenient, I'm not 100% sure though.
> >>>>
> >>>> Unfortunately I couldn't reproduce the issue locally with the
> >>>> reproducer yet. will keep the test running and see if it can hit this
> >>>> WARN_ON.
> >>>
> >>> So far I am still unable to trigger this VM_WARN_ON using the
> >>> reproducer, and I'm seeing many other random crashes.
> >>>
> >>> But after I changed the .config a bit adding more debug configs
> >>> (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
> >>> and will be triggered immediately after I start the test:
> >>>
> >>> [ T1242] BUG: unable to handle page fault for address: ffff888054c60000
> >>> [ T1242] #PF: supervisor read access in kernel mode
> >>> [ T1242] #PF: error_code(0x0000) - not-present page
> >>> [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
> >>> 800fffffab39f060
> >>> [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
> >>> [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
> >>> 6.14.0-rc2-00185-g128c8f96eb86 #2
> >>> [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
> >>> 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
> >>> [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
> >>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> >>> [ T6058] bcachefs (loop2): empty btree root xattrs
> >>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> >>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> >>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> >>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> >>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> >>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> >>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> >>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> >>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> >>> [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
> >>> knlGS:0000000000000000
> >>> [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> >>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> [ T1242] Call Trace:
> >>> [ T1242] <TASK>
> >>> [ T1242] bch2_btree_node_read_done+0x1d20/0x53a0
> >>> [ T1242] btree_node_read_work+0x54d/0xdc0
> >>> [ T1242] process_scheduled_works+0xaf8/0x17f0
> >>> [ T1242] worker_thread+0x89d/0xd60
> >>> [ T1242] kthread+0x722/0x890
> >>> [ T1242] ret_from_fork+0x4e/0x80
> >>> [ T1242] ret_from_fork_asm+0x1a/0x30
> >>> [ T1242] </TASK>
> >>> [ T1242] Modules linked in:
> >>> [ T1242] ---[ end trace 0000000000000000 ]---
> >>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> >>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> >>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> >>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> >>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> >>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> >>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> >>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> >>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> >>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> >>> [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
> >>> knlGS:0000000000000000
> >>> [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> >>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> [ T1242] Kernel panic - not syncing: Fatal exception
> >>> [ T1242] Kernel Offset: disabled
> >>> [ T1242] Rebooting in 86400 seconds..
> >>>
> >>> It's caused by the memmove_u64s_down in validate_bset_keys of
> >>> fs/bcachefs/btree_io.c:
> >>> -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
> >>
> >>
> >> Might need this.
> >>
> >> diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
> >> index e71b278672b6..fb53174cb735 100644
> >> --- a/fs/bcachefs/btree_io.c
> >> +++ b/fs/bcachefs/btree_io.c
> >> @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
> >> }
> >> got_good_key:
> >> le16_add_cpu(&i->u64s, -next_good_key);
> >> - memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
> >> + memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k));
> >> set_btree_node_need_rewrite(b);
> >> }
> >> fsck_err:
> >>
> >
> > Thanks, but this didn't fix everything. I think the problem is more
> > complex, syzbot seems to be trying to mount damaged bcachefs (on
> > purpose I think), so the vstruct_end(i) is already returning an offset
> > that is out of border.
>
> Could you try this (I need to go out now):
>
> diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
> index e71b278672b6..80a0094be356 100644
> --- a/fs/bcachefs/btree_io.c
> +++ b/fs/bcachefs/btree_io.c
> @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
> }
> got_good_key:
> le16_add_cpu(&i->u64s, -next_good_key);
> - memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
> + memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
> set_btree_node_need_rewrite(b);
> }
> fsck_err:
>
> >
> > I retriggered it and print some more debug info: i->_data is
> > ffff88806d5c00a0, i->u64s is 60928, and the faulting address is
> > ffff88806d600000.
>
Hi Alan
This didn't help either. If I wasn't very wrong about this, the
problem is that the content of the `struct bset` is corrupted (not
exactly sure how this happens, but should be related to the damaged
bcachefs image from syzbot), so calculations based on that won't be
helpful.
If I add a print before the memmove_u64s_down, like this:
pr_err("DEBUG: k: 0x%lx - 0x%lx, len %ld", (unsigned long)k, (unsigned
long)bkey_p_next(k), bkey_p_next(k) - k);
pr_err("DEBUG: i: 0x%lx - 0x%lx, len %ld", (unsigned long)i->start,
(unsigned long)vstruct_end(i), i->u64s);
pr_err("DEBUG: next_good_key * 8: %ld, k + next_good_key: 0x%lx",
next_good_key * sizeof(u64*), (u64 *) k + next_good_key);
le16_add_cpu(&i->u64s, -next_good_key);
pr_err("DEBUG: copying 0x%lx from 0x%lx, len %ld",
k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i)
- (u64 *) k);
Then I got:
[ 57.100623][ T1222] bcachefs: validate_bset_keys() DEBUG: k:
0xffff88806f2200a0 - 0xffff88806f220110, len 2
[ 57.101323][ T1222] bcachefs: validate_bset_keys() DEBUG: i:
0xffff88806f2200a0 - 0xffff88806f2970a0, len 60928
[ 57.101990][ T1222] bcachefs: validate_bset_keys() DEBUG:
next_good_key * 8: 3976, k + next_good_key: 0xffff88806f221028
[ 57.102712][ T1222] bcachefs: validate_bset_keys() DEBUG: copying
0xffff88806f2200a0 from 0xffff88806f221028, len 60431
[ 57.103437][ T1222] BUG: unable to handle page fault for address:
ffff88806f260000
`struct bset i` spawns an invalid area.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
[not found] <180B5E33-351D-4A3F-8948-02AF10DBA3D8@gmail.com>
@ 2025-02-18 20:25 ` syzbot
0 siblings, 0 replies; 12+ messages in thread
From: syzbot @ 2025-02-18 20:25 UTC (permalink / raw)
To: linux-kernel, mmpgouride, syzkaller-bugs
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
general protection fault in call_rcu
Oops: general protection fault, probably for non-canonical address 0xef11a50f048e916c: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0x788d487824748b60-0x788d487824748b67]
CPU: 0 UID: 0 PID: 5968 Comm: modprobe Not tainted 6.14.0-rc3-syzkaller-g6537cfb395f3-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:lookup_object lib/debugobjects.c:423 [inline]
RIP: 0010:lookup_object_or_alloc lib/debugobjects.c:662 [inline]
RIP: 0010:debug_object_activate+0x1bd/0x580 lib/debugobjects.c:820
Code: 89 df e8 06 d5 2c fd 48 89 5c 24 30 4c 8b 3b 45 31 e4 eb 06 4d 8b 3f 41 ff c4 4d 85 ff 74 40 49 8d 5f 18 48 89 d8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 df e8 d4 d4 2c fd 48 8b 44 24 10 48 39
RSP: 0018:ffffc9000d09f2c0 EFLAGS: 00010006
RAX: 0f11a90f048e916c RBX: 788d487824748b60 RCX: 0000000000000001
RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffffc9000d09f1a0
RBP: ffffc9000d09f3d0 R08: 0000000000000003 R09: fffff52001a13e34
R10: dffffc0000000000 R11: fffff52001a13e34 R12: 0000000000000001
R13: 1ffff92001a13e60 R14: dffffc0000000000 R15: 788d487824748b48
FS: 0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f7221a9008 CR3: 0000000053bc0000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
debug_rcu_head_queue kernel/rcu/rcu.h:224 [inline]
__call_rcu_common kernel/rcu/tree.c:3050 [inline]
call_rcu+0x97/0xac0 kernel/rcu/tree.c:3172
remove_vma mm/vma.c:419 [inline]
vms_complete_munmap_vmas+0x65b/0x8f0 mm/vma.c:1202
do_vmi_align_munmap+0x5ef/0x6f0 mm/vma.c:1445
do_vmi_munmap+0x24e/0x2d0 mm/vma.c:1493
__vm_munmap+0x372/0x510 mm/vma.c:2951
elf_map fs/binfmt_elf.c:389 [inline]
elf_load+0x2d6/0x700 fs/binfmt_elf.c:414
load_elf_interp+0x440/0xac0 fs/binfmt_elf.c:681
load_elf_binary+0x1a87/0x2820 fs/binfmt_elf.c:1241
search_binary_handler fs/exec.c:1775 [inline]
exec_binprm fs/exec.c:1807 [inline]
bprm_execve+0x979/0x1430 fs/exec.c:1859
kernel_execve+0x931/0xa50 fs/exec.c:2026
call_usermodehelper_exec_async+0x237/0x380 kernel/umh.c:109
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:lookup_object lib/debugobjects.c:423 [inline]
RIP: 0010:lookup_object_or_alloc lib/debugobjects.c:662 [inline]
RIP: 0010:debug_object_activate+0x1bd/0x580 lib/debugobjects.c:820
Code: 89 df e8 06 d5 2c fd 48 89 5c 24 30 4c 8b 3b 45 31 e4 eb 06 4d 8b 3f 41 ff c4 4d 85 ff 74 40 49 8d 5f 18 48 89 d8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 df e8 d4 d4 2c fd 48 8b 44 24 10 48 39
RSP: 0018:ffffc9000d09f2c0 EFLAGS: 00010006
RAX: 0f11a90f048e916c RBX: 788d487824748b60 RCX: 0000000000000001
RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffffc9000d09f1a0
RBP: ffffc9000d09f3d0 R08: 0000000000000003 R09: fffff52001a13e34
R10: dffffc0000000000 R11: fffff52001a13e34 R12: 0000000000000001
R13: 1ffff92001a13e60 R14: dffffc0000000000 R15: 788d487824748b48
FS: 0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f7221a9008 CR3: 0000000053bc0000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 89 df mov %ebx,%edi
2: e8 06 d5 2c fd call 0xfd2cd50d
7: 48 89 5c 24 30 mov %rbx,0x30(%rsp)
c: 4c 8b 3b mov (%rbx),%r15
f: 45 31 e4 xor %r12d,%r12d
12: eb 06 jmp 0x1a
14: 4d 8b 3f mov (%r15),%r15
17: 41 ff c4 inc %r12d
1a: 4d 85 ff test %r15,%r15
1d: 74 40 je 0x5f
1f: 49 8d 5f 18 lea 0x18(%r15),%rbx
23: 48 89 d8 mov %rbx,%rax
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 42 80 3c 30 00 cmpb $0x0,(%rax,%r14,1) <-- trapping instruction
2f: 74 08 je 0x39
31: 48 89 df mov %rbx,%rdi
34: e8 d4 d4 2c fd call 0xfd2cd50d
39: 48 8b 44 24 10 mov 0x10(%rsp),%rax
3e: 48 rex.W
3f: 39 .byte 0x39
Tested on:
commit: 6537cfb3 Merge tag 'sound-6.14-rc4' of git://git.kerne..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13732498580000
kernel config: https://syzkaller.appspot.com/x/.config?x=b7bde34acd8f53b1
dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=10f8e5b0580000
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
[not found] <984ABB17-1380-4581-B8AD-5E233B167856@gmail.com>
@ 2025-02-19 16:37 ` syzbot
0 siblings, 0 replies; 12+ messages in thread
From: syzbot @ 2025-02-19 16:37 UTC (permalink / raw)
To: linux-kernel, mmpgouride, syzkaller-bugs
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: corrupted list in new_inode
slab debugfs_inode_cache start ffff888045d00a30 pointer offset 448 size 1176
list_add corruption. next->prev should be prev (ffff888030eb09c0), but was 0000000000000000. (next=ffff888045d00bf0).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:31!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5848 Comm: syz-executor Not tainted 6.14.0-rc3-syzkaller-g6537cfb395f3-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__list_add_valid_or_report+0xf3/0x130 lib/list_debug.c:29
Code: e9 0a fd 42 80 7c 2d 00 00 74 08 48 89 df e8 64 dd 2c fd 49 8b 56 08 48 c7 c7 80 ef 80 8c 4c 89 e6 4c 89 f1 e8 ce e8 2a fc 90 <0f> 0b 4c 89 e7 e8 53 e9 0a fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 34
RSP: 0018:ffffc9000cdced78 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff888045d00bf8 RCX: 77c72d62d2106c00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 1ffff11008ba017f R08: ffffffff81a1108c R09: fffffbfff1d3a614
R10: dffffc0000000000 R11: fffffbfff1d3a614 R12: ffff888030eb09c0
R13: dffffc0000000000 R14: ffff888045d00bf0 R15: ffff888045d01108
FS: 0000555584544500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffa5c4d56c0 CR3: 000000003f29a000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__list_add_valid include/linux/list.h:88 [inline]
__list_add include/linux/list.h:150 [inline]
list_add include/linux/list.h:169 [inline]
inode_sb_list_add fs/inode.c:617 [inline]
new_inode+0xc7/0x1d0 fs/inode.c:1195
debugfs_get_inode fs/debugfs/inode.c:72 [inline]
debugfs_create_dir+0xf6/0x430 fs/debugfs/inode.c:597
wiphy_register+0x1a59/0x2650 net/wireless/core.c:1015
ieee80211_register_hw+0x35d9/0x42e0 net/mac80211/main.c:1587
mac80211_hwsim_new_radio+0x2ae8/0x4a40 drivers/net/wireless/virtual/mac80211_hwsim.c:5558
hwsim_new_radio_nl+0xece/0x2290 drivers/net/wireless/virtual/mac80211_hwsim.c:6242
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb1f/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x206/0x480 net/netlink/af_netlink.c:2543
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1348
netlink_sendmsg+0x8de/0xcb0 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:718 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:733
__sys_sendto+0x363/0x4c0 net/socket.c:2187
__do_sys_sendto net/socket.c:2194 [inline]
__se_sys_sendto net/socket.c:2190 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2190
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd5e5b8ec7c
Code: 2a 5f 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 70 5f 02 00 48 8b
RSP: 002b:00007ffc35ff7120 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fd5e68d4620 RCX: 00007fd5e5b8ec7c
RDX: 0000000000000024 RSI: 00007fd5e68d4670 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007ffc35ff7174 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
R13: 0000000000000000 R14: 00007fd5e68d4670 R15: 0000000000000000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_add_valid_or_report+0xf3/0x130 lib/list_debug.c:29
Code: e9 0a fd 42 80 7c 2d 00 00 74 08 48 89 df e8 64 dd 2c fd 49 8b 56 08 48 c7 c7 80 ef 80 8c 4c 89 e6 4c 89 f1 e8 ce e8 2a fc 90 <0f> 0b 4c 89 e7 e8 53 e9 0a fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 34
RSP: 0018:ffffc9000cdced78 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff888045d00bf8 RCX: 77c72d62d2106c00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 1ffff11008ba017f R08: ffffffff81a1108c R09: fffffbfff1d3a614
R10: dffffc0000000000 R11: fffffbfff1d3a614 R12: ffff888030eb09c0
R13: dffffc0000000000 R14: ffff888045d00bf0 R15: ffff888045d01108
FS: 0000555584544500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffa5c4d56c0 CR3: 000000003f29a000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Tested on:
commit: 6537cfb3 Merge tag 'sound-6.14-rc4' of git://git.kerne..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1334b5b0580000
kernel config: https://syzkaller.appspot.com/x/.config?x=b7bde34acd8f53b1
dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=142f4ba4580000
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-02-19 16:37 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <7ADD7E39-E7DC-4A40-8236-CD5C90112C96@gmail.com>
2025-02-18 17:12 ` [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg syzbot
[not found] <984ABB17-1380-4581-B8AD-5E233B167856@gmail.com>
2025-02-19 16:37 ` syzbot
[not found] <180B5E33-351D-4A3F-8948-02AF10DBA3D8@gmail.com>
2025-02-18 20:25 ` syzbot
[not found] <B2F37A7F-BE18-495F-9350-6D7D47198FFD@gmail.com>
2025-02-18 17:38 ` syzbot
2024-12-14 3:56 [syzbot] [mm?] " syzbot
2025-02-14 18:11 ` [syzbot] [mm?] [bcachefs?] " syzbot
2025-02-14 23:23 ` Andrew Morton
2025-02-16 16:13 ` Kairui Song
2025-02-17 17:12 ` Kairui Song
2025-02-17 18:09 ` Alan Huang
2025-02-18 11:40 ` Kairui Song
2025-02-18 12:16 ` Alan Huang
2025-02-18 17:47 ` Kairui Song
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.