linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [mm?] WARNING in lock_list_lru_of_memcg
@ 2024-12-14  3:56 syzbot
  2024-12-14  6:05 ` Yu Zhao
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: syzbot @ 2024-12-14  3:56 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    7cb1b4663150 Merge tag 'locking_urgent_for_v6.13_rc3' of g..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16e96b30580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fee25f93665c89ac
dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-7cb1b466.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/13e083329dab/vmlinux-7cb1b466.xz
kernel image: https://storage.googleapis.com/syzbot-assets/fe3847d08513/bzImage-7cb1b466.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 80 at mm/list_lru.c:97 lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
Modules linked in:
CPU: 0 UID: 0 PID: 80 Comm: kswapd0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
Code: e9 22 fe ff ff e8 9b cc b6 ff 4c 8b 7c 24 10 45 84 f6 0f 84 40 ff ff ff e9 37 01 00 00 e8 83 cc b6 ff eb 05 e8 7c cc b6 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7a fd ff ff 48
RSP: 0018:ffffc9000105e798 EFLAGS: 00010093
RAX: ffffffff81e891c4 RBX: 0000000000000000 RCX: ffff88801f53a440
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff888042e70054 R08: ffffffff81e89156 R09: 1ffffffff2032cae
R10: dffffc0000000000 R11: fffffbfff2032caf R12: ffffffff81e88e5e
R13: ffffffff9a3feb20 R14: 0000000000000000 R15: ffff888042e70000
FS:  0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020161000 CR3: 0000000032d12000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 list_lru_add+0x59/0x270 mm/list_lru.c:164
 list_lru_add_obj+0x17b/0x250 mm/list_lru.c:187
 workingset_update_node+0x1af/0x230 mm/workingset.c:634
 xas_update lib/xarray.c:355 [inline]
 update_node lib/xarray.c:758 [inline]
 xas_store+0xb8f/0x1890 lib/xarray.c:845
 page_cache_delete mm/filemap.c:149 [inline]
 __filemap_remove_folio+0x4e9/0x670 mm/filemap.c:232
 __remove_mapping+0x86f/0xad0 mm/vmscan.c:791
 shrink_folio_list+0x30a6/0x5ca0 mm/vmscan.c:1467
 evict_folios+0x3c86/0x5800 mm/vmscan.c:4593
 try_to_shrink_lruvec+0x9a6/0xc70 mm/vmscan.c:4789
 shrink_one+0x3b9/0x850 mm/vmscan.c:4834
 shrink_many mm/vmscan.c:4897 [inline]
 lru_gen_shrink_node mm/vmscan.c:4975 [inline]
 shrink_node+0x37c5/0x3e50 mm/vmscan.c:5956
 kswapd_shrink_node mm/vmscan.c:6785 [inline]
 balance_pgdat mm/vmscan.c:6977 [inline]
 kswapd+0x1ca9/0x36f0 mm/vmscan.c:7246
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] WARNING in lock_list_lru_of_memcg
  2024-12-14  3:56 [syzbot] [mm?] WARNING in lock_list_lru_of_memcg syzbot
@ 2024-12-14  6:05 ` Yu Zhao
  2024-12-14 19:43   ` Kairui Song
  2025-02-14 18:11 ` [syzbot] [mm?] [bcachefs?] " syzbot
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Yu Zhao @ 2024-12-14  6:05 UTC (permalink / raw)
  To: syzbot, Kairui Song; +Cc: akpm, linux-kernel, linux-mm, syzkaller-bugs

On Fri, Dec 13, 2024 at 8:56 PM syzbot
<syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    7cb1b4663150 Merge tag 'locking_urgent_for_v6.13_rc3' of g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16e96b30580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=fee25f93665c89ac
> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-7cb1b466.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/13e083329dab/vmlinux-7cb1b466.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/fe3847d08513/bzImage-7cb1b466.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 80 at mm/list_lru.c:97 lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> Modules linked in:
> CPU: 0 UID: 0 PID: 80 Comm: kswapd0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> RIP: 0010:lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> Code: e9 22 fe ff ff e8 9b cc b6 ff 4c 8b 7c 24 10 45 84 f6 0f 84 40 ff ff ff e9 37 01 00 00 e8 83 cc b6 ff eb 05 e8 7c cc b6 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7a fd ff ff 48
> RSP: 0018:ffffc9000105e798 EFLAGS: 00010093
> RAX: ffffffff81e891c4 RBX: 0000000000000000 RCX: ffff88801f53a440
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff888042e70054 R08: ffffffff81e89156 R09: 1ffffffff2032cae
> R10: dffffc0000000000 R11: fffffbfff2032caf R12: ffffffff81e88e5e
> R13: ffffffff9a3feb20 R14: 0000000000000000 R15: ffff888042e70000
> FS:  0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020161000 CR3: 0000000032d12000 CR4: 0000000000352ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  list_lru_add+0x59/0x270 mm/list_lru.c:164
>  list_lru_add_obj+0x17b/0x250 mm/list_lru.c:187
>  workingset_update_node+0x1af/0x230 mm/workingset.c:634
>  xas_update lib/xarray.c:355 [inline]
>  update_node lib/xarray.c:758 [inline]
>  xas_store+0xb8f/0x1890 lib/xarray.c:845
>  page_cache_delete mm/filemap.c:149 [inline]
>  __filemap_remove_folio+0x4e9/0x670 mm/filemap.c:232
>  __remove_mapping+0x86f/0xad0 mm/vmscan.c:791
>  shrink_folio_list+0x30a6/0x5ca0 mm/vmscan.c:1467
>  evict_folios+0x3c86/0x5800 mm/vmscan.c:4593
>  try_to_shrink_lruvec+0x9a6/0xc70 mm/vmscan.c:4789
>  shrink_one+0x3b9/0x850 mm/vmscan.c:4834
>  shrink_many mm/vmscan.c:4897 [inline]
>  lru_gen_shrink_node mm/vmscan.c:4975 [inline]
>  shrink_node+0x37c5/0x3e50 mm/vmscan.c:5956
>  kswapd_shrink_node mm/vmscan.c:6785 [inline]
>  balance_pgdat mm/vmscan.c:6977 [inline]
>  kswapd+0x1ca9/0x36f0 mm/vmscan.c:7246
>  kthread+0x2f0/0x390 kernel/kthread.c:389
>  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>  </TASK>

This one seems to be related to "mm/list_lru: split the lock to
per-cgroup scope".

Kairui, can you please take a look? Thanks.

> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] WARNING in lock_list_lru_of_memcg
  2024-12-14  6:05 ` Yu Zhao
@ 2024-12-14 19:43   ` Kairui Song
  2024-12-15 17:44     ` Kairui Song
  0 siblings, 1 reply; 20+ messages in thread
From: Kairui Song @ 2024-12-14 19:43 UTC (permalink / raw)
  To: Yu Zhao; +Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs

On Sat, Dec 14, 2024 at 2:06 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Fri, Dec 13, 2024 at 8:56 PM syzbot
> <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    7cb1b4663150 Merge tag 'locking_urgent_for_v6.13_rc3' of g..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16e96b30580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=fee25f93665c89ac
> > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-7cb1b466.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/13e083329dab/vmlinux-7cb1b466.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/fe3847d08513/bzImage-7cb1b466.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 80 at mm/list_lru.c:97 lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> > Modules linked in:
> > CPU: 0 UID: 0 PID: 80 Comm: kswapd0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > RIP: 0010:lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> > Code: e9 22 fe ff ff e8 9b cc b6 ff 4c 8b 7c 24 10 45 84 f6 0f 84 40 ff ff ff e9 37 01 00 00 e8 83 cc b6 ff eb 05 e8 7c cc b6 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7a fd ff ff 48
> > RSP: 0018:ffffc9000105e798 EFLAGS: 00010093
> > RAX: ffffffff81e891c4 RBX: 0000000000000000 RCX: ffff88801f53a440
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffff888042e70054 R08: ffffffff81e89156 R09: 1ffffffff2032cae
> > R10: dffffc0000000000 R11: fffffbfff2032caf R12: ffffffff81e88e5e
> > R13: ffffffff9a3feb20 R14: 0000000000000000 R15: ffff888042e70000
> > FS:  0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020161000 CR3: 0000000032d12000 CR4: 0000000000352ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <TASK>
> >  list_lru_add+0x59/0x270 mm/list_lru.c:164
> >  list_lru_add_obj+0x17b/0x250 mm/list_lru.c:187
> >  workingset_update_node+0x1af/0x230 mm/workingset.c:634
> >  xas_update lib/xarray.c:355 [inline]
> >  update_node lib/xarray.c:758 [inline]
> >  xas_store+0xb8f/0x1890 lib/xarray.c:845
> >  page_cache_delete mm/filemap.c:149 [inline]
> >  __filemap_remove_folio+0x4e9/0x670 mm/filemap.c:232
> >  __remove_mapping+0x86f/0xad0 mm/vmscan.c:791
> >  shrink_folio_list+0x30a6/0x5ca0 mm/vmscan.c:1467
> >  evict_folios+0x3c86/0x5800 mm/vmscan.c:4593
> >  try_to_shrink_lruvec+0x9a6/0xc70 mm/vmscan.c:4789
> >  shrink_one+0x3b9/0x850 mm/vmscan.c:4834
> >  shrink_many mm/vmscan.c:4897 [inline]
> >  lru_gen_shrink_node mm/vmscan.c:4975 [inline]
> >  shrink_node+0x37c5/0x3e50 mm/vmscan.c:5956
> >  kswapd_shrink_node mm/vmscan.c:6785 [inline]
> >  balance_pgdat mm/vmscan.c:6977 [inline]
> >  kswapd+0x1ca9/0x36f0 mm/vmscan.c:7246
> >  kthread+0x2f0/0x390 kernel/kthread.c:389
> >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> >  </TASK>
>
> This one seems to be related to "mm/list_lru: split the lock to
> per-cgroup scope".
>
> Kairui, can you please take a look? Thanks.

Thanks for pinging, yes that's a new sanity check added by me.

Which is supposed to mean, a list_lru is being reparented while the
memcg it belongs to isn't dying.

More concretely, list_lru is marked dead by memcg_offline_kmem ->
memcg_reparent_list_lrus, if the function is called for one memcg, but
now the memcg is not dying, this WARN triggers. I'm not sure how this
is caused. One possibility is if alloc_shrinker_info() in
mem_cgroup_css_online failed, then memcg_offline_kmem is called early?
Doesn't seem to fit this case though.. Or maybe just sync issues with
the memcg dying flag so the user saw the list_lru dying before seeing
memcg dying? The object might be leaked to the parent cgroup, seems
not too terrible though.

I'm not sure how to reproduce this. I will keep looking.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] WARNING in lock_list_lru_of_memcg
  2024-12-14 19:43   ` Kairui Song
@ 2024-12-15 17:44     ` Kairui Song
  2024-12-16  2:45       ` Yu Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Kairui Song @ 2024-12-15 17:44 UTC (permalink / raw)
  To: syzkaller-bugs; +Cc: Yu Zhao, syzbot, akpm, linux-kernel, linux-mm

On Sun, Dec 15, 2024 at 3:43 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Sat, Dec 14, 2024 at 2:06 PM Yu Zhao <yuzhao@google.com> wrote:
> >
> > On Fri, Dec 13, 2024 at 8:56 PM syzbot
> > <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    7cb1b4663150 Merge tag 'locking_urgent_for_v6.13_rc3' of g..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=16e96b30580000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=fee25f93665c89ac
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > Downloadable assets:
> > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-7cb1b466.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/13e083329dab/vmlinux-7cb1b466.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/fe3847d08513/bzImage-7cb1b466.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> > >
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 80 at mm/list_lru.c:97 lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> > > Modules linked in:
> > > CPU: 0 UID: 0 PID: 80 Comm: kswapd0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > > RIP: 0010:lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> > > Code: e9 22 fe ff ff e8 9b cc b6 ff 4c 8b 7c 24 10 45 84 f6 0f 84 40 ff ff ff e9 37 01 00 00 e8 83 cc b6 ff eb 05 e8 7c cc b6 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7a fd ff ff 48
> > > RSP: 0018:ffffc9000105e798 EFLAGS: 00010093
> > > RAX: ffffffff81e891c4 RBX: 0000000000000000 RCX: ffff88801f53a440
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > RBP: ffff888042e70054 R08: ffffffff81e89156 R09: 1ffffffff2032cae
> > > R10: dffffc0000000000 R11: fffffbfff2032caf R12: ffffffff81e88e5e
> > > R13: ffffffff9a3feb20 R14: 0000000000000000 R15: ffff888042e70000
> > > FS:  0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000020161000 CR3: 0000000032d12000 CR4: 0000000000352ef0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Call Trace:
> > >  <TASK>
> > >  list_lru_add+0x59/0x270 mm/list_lru.c:164
> > >  list_lru_add_obj+0x17b/0x250 mm/list_lru.c:187
> > >  workingset_update_node+0x1af/0x230 mm/workingset.c:634
> > >  xas_update lib/xarray.c:355 [inline]
> > >  update_node lib/xarray.c:758 [inline]
> > >  xas_store+0xb8f/0x1890 lib/xarray.c:845
> > >  page_cache_delete mm/filemap.c:149 [inline]
> > >  __filemap_remove_folio+0x4e9/0x670 mm/filemap.c:232
> > >  __remove_mapping+0x86f/0xad0 mm/vmscan.c:791
> > >  shrink_folio_list+0x30a6/0x5ca0 mm/vmscan.c:1467
> > >  evict_folios+0x3c86/0x5800 mm/vmscan.c:4593
> > >  try_to_shrink_lruvec+0x9a6/0xc70 mm/vmscan.c:4789
> > >  shrink_one+0x3b9/0x850 mm/vmscan.c:4834
> > >  shrink_many mm/vmscan.c:4897 [inline]
> > >  lru_gen_shrink_node mm/vmscan.c:4975 [inline]
> > >  shrink_node+0x37c5/0x3e50 mm/vmscan.c:5956
> > >  kswapd_shrink_node mm/vmscan.c:6785 [inline]
> > >  balance_pgdat mm/vmscan.c:6977 [inline]
> > >  kswapd+0x1ca9/0x36f0 mm/vmscan.c:7246
> > >  kthread+0x2f0/0x390 kernel/kthread.c:389
> > >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> > >  </TASK>
> >
> > This one seems to be related to "mm/list_lru: split the lock to
> > per-cgroup scope".
> >
> > Kairui, can you please take a look? Thanks.
>
> Thanks for pinging, yes that's a new sanity check added by me.
>
> Which is supposed to mean, a list_lru is being reparented while the
> memcg it belongs to isn't dying.
>
> More concretely, list_lru is marked dead by memcg_offline_kmem ->
> memcg_reparent_list_lrus, if the function is called for one memcg, but
> now the memcg is not dying, this WARN triggers. I'm not sure how this
> is caused. One possibility is if alloc_shrinker_info() in
> mem_cgroup_css_online failed, then memcg_offline_kmem is called early?
> Doesn't seem to fit this case though.. Or maybe just sync issues with
> the memcg dying flag so the user saw the list_lru dying before seeing
> memcg dying? The object might be leaked to the parent cgroup, seems
> not too terrible though.
>
> I'm not sure how to reproduce this. I will keep looking.

Managed to boot the image and using the kernel config provided by bot,
so far local tests didn't trigger any issue. Is there any way I can
reproduce what the bot actually did? Or provide some patch for the bot
to test?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] WARNING in lock_list_lru_of_memcg
  2024-12-15 17:44     ` Kairui Song
@ 2024-12-16  2:45       ` Yu Zhao
  2024-12-16 18:39         ` Sasha Levin
  0 siblings, 1 reply; 20+ messages in thread
From: Yu Zhao @ 2024-12-16  2:45 UTC (permalink / raw)
  To: Kairui Song; +Cc: syzkaller-bugs, syzbot, akpm, linux-kernel, linux-mm

Hi Kairui,

On Sun, Dec 15, 2024 at 10:45 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Sun, Dec 15, 2024 at 3:43 AM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > On Sat, Dec 14, 2024 at 2:06 PM Yu Zhao <yuzhao@google.com> wrote:
> > >
> > > On Fri, Dec 13, 2024 at 8:56 PM syzbot
> > > <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit:    7cb1b4663150 Merge tag 'locking_urgent_for_v6.13_rc3' of g..
> > > > git tree:       upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=16e96b30580000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=fee25f93665c89ac
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> > > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > >
> > > > Downloadable assets:
> > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-7cb1b466.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/13e083329dab/vmlinux-7cb1b466.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/fe3847d08513/bzImage-7cb1b466.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> > > >
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 80 at mm/list_lru.c:97 lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> > > > Modules linked in:
> > > > CPU: 0 UID: 0 PID: 80 Comm: kswapd0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > > > RIP: 0010:lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> > > > Code: e9 22 fe ff ff e8 9b cc b6 ff 4c 8b 7c 24 10 45 84 f6 0f 84 40 ff ff ff e9 37 01 00 00 e8 83 cc b6 ff eb 05 e8 7c cc b6 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7a fd ff ff 48
> > > > RSP: 0018:ffffc9000105e798 EFLAGS: 00010093
> > > > RAX: ffffffff81e891c4 RBX: 0000000000000000 RCX: ffff88801f53a440
> > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > > RBP: ffff888042e70054 R08: ffffffff81e89156 R09: 1ffffffff2032cae
> > > > R10: dffffc0000000000 R11: fffffbfff2032caf R12: ffffffff81e88e5e
> > > > R13: ffffffff9a3feb20 R14: 0000000000000000 R15: ffff888042e70000
> > > > FS:  0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000000020161000 CR3: 0000000032d12000 CR4: 0000000000352ef0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > Call Trace:
> > > >  <TASK>
> > > >  list_lru_add+0x59/0x270 mm/list_lru.c:164
> > > >  list_lru_add_obj+0x17b/0x250 mm/list_lru.c:187
> > > >  workingset_update_node+0x1af/0x230 mm/workingset.c:634
> > > >  xas_update lib/xarray.c:355 [inline]
> > > >  update_node lib/xarray.c:758 [inline]
> > > >  xas_store+0xb8f/0x1890 lib/xarray.c:845
> > > >  page_cache_delete mm/filemap.c:149 [inline]
> > > >  __filemap_remove_folio+0x4e9/0x670 mm/filemap.c:232
> > > >  __remove_mapping+0x86f/0xad0 mm/vmscan.c:791
> > > >  shrink_folio_list+0x30a6/0x5ca0 mm/vmscan.c:1467
> > > >  evict_folios+0x3c86/0x5800 mm/vmscan.c:4593
> > > >  try_to_shrink_lruvec+0x9a6/0xc70 mm/vmscan.c:4789
> > > >  shrink_one+0x3b9/0x850 mm/vmscan.c:4834
> > > >  shrink_many mm/vmscan.c:4897 [inline]
> > > >  lru_gen_shrink_node mm/vmscan.c:4975 [inline]
> > > >  shrink_node+0x37c5/0x3e50 mm/vmscan.c:5956
> > > >  kswapd_shrink_node mm/vmscan.c:6785 [inline]
> > > >  balance_pgdat mm/vmscan.c:6977 [inline]
> > > >  kswapd+0x1ca9/0x36f0 mm/vmscan.c:7246
> > > >  kthread+0x2f0/0x390 kernel/kthread.c:389
> > > >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> > > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> > > >  </TASK>
> > >
> > > This one seems to be related to "mm/list_lru: split the lock to
> > > per-cgroup scope".
> > >
> > > Kairui, can you please take a look? Thanks.
> >
> > Thanks for pinging, yes that's a new sanity check added by me.
> >
> > Which is supposed to mean, a list_lru is being reparented while the
> > memcg it belongs to isn't dying.
> >
> > More concretely, list_lru is marked dead by memcg_offline_kmem ->
> > memcg_reparent_list_lrus, if the function is called for one memcg, but
> > now the memcg is not dying, this WARN triggers. I'm not sure how this
> > is caused. One possibility is if alloc_shrinker_info() in
> > mem_cgroup_css_online failed, then memcg_offline_kmem is called early?
> > Doesn't seem to fit this case though.. Or maybe just sync issues with
> > the memcg dying flag so the user saw the list_lru dying before seeing
> > memcg dying? The object might be leaked to the parent cgroup, seems
> > not too terrible though.
> >
> > I'm not sure how to reproduce this. I will keep looking.
>
> Managed to boot the image and using the kernel config provided by bot,
> so far local tests didn't trigger any issue. Is there any way I can
> reproduce what the bot actually did?

If syzbot doesn't have a repro, it might not be productive for you to
try to find one. Personally, I would analyze stacktraces and double
check the code, and move on if I can't find something obviously wrong.

> Or provide some patch for the bot
> to test?

syzbot only can try patches after it finds a repro. So in this case,
no, it can't try your patches.

Hope the above clarifies things for you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] WARNING in lock_list_lru_of_memcg
  2024-12-16  2:45       ` Yu Zhao
@ 2024-12-16 18:39         ` Sasha Levin
  2024-12-17 18:19           ` Kairui Song
  0 siblings, 1 reply; 20+ messages in thread
From: Sasha Levin @ 2024-12-16 18:39 UTC (permalink / raw)
  To: Yu Zhao; +Cc: Kairui Song, syzkaller-bugs, syzbot, akpm, linux-kernel, linux-mm

On Sun, Dec 15, 2024 at 07:45:38PM -0700, Yu Zhao wrote:
>Hi Kairui,
>
>On Sun, Dec 15, 2024 at 10:45 AM Kairui Song <ryncsn@gmail.com> wrote:
>>
>> On Sun, Dec 15, 2024 at 3:43 AM Kairui Song <ryncsn@gmail.com> wrote:
>> >
>> > On Sat, Dec 14, 2024 at 2:06 PM Yu Zhao <yuzhao@google.com> wrote:
>> > >
>> > > On Fri, Dec 13, 2024 at 8:56 PM syzbot
>> > > <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
>> > > >
>> > > > Hello,
>> > > >
>> > > > syzbot found the following issue on:
>> > > >
>> > > > HEAD commit:    7cb1b4663150 Merge tag 'locking_urgent_for_v6.13_rc3' of g..
>> > > > git tree:       upstream
>> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=16e96b30580000
>> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=fee25f93665c89ac
>> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
>> > > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>> > > >
>> > > > Unfortunately, I don't have any reproducer for this issue yet.
>> > > >
>> > > > Downloadable assets:
>> > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-7cb1b466.raw.xz
>> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/13e083329dab/vmlinux-7cb1b466.xz
>> > > > kernel image: https://storage.googleapis.com/syzbot-assets/fe3847d08513/bzImage-7cb1b466.xz
>> > > >
>> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> > > > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
>> > > >
>> > > > ------------[ cut here ]------------
>> > > > WARNING: CPU: 0 PID: 80 at mm/list_lru.c:97 lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
>> > > > Modules linked in:
>> > > > CPU: 0 UID: 0 PID: 80 Comm: kswapd0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
>> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
>> > > > RIP: 0010:lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
>> > > > Code: e9 22 fe ff ff e8 9b cc b6 ff 4c 8b 7c 24 10 45 84 f6 0f 84 40 ff ff ff e9 37 01 00 00 e8 83 cc b6 ff eb 05 e8 7c cc b6 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7a fd ff ff 48
>> > > > RSP: 0018:ffffc9000105e798 EFLAGS: 00010093
>> > > > RAX: ffffffff81e891c4 RBX: 0000000000000000 RCX: ffff88801f53a440
>> > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> > > > RBP: ffff888042e70054 R08: ffffffff81e89156 R09: 1ffffffff2032cae
>> > > > R10: dffffc0000000000 R11: fffffbfff2032caf R12: ffffffff81e88e5e
>> > > > R13: ffffffff9a3feb20 R14: 0000000000000000 R15: ffff888042e70000
>> > > > FS:  0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
>> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > > > CR2: 0000000020161000 CR3: 0000000032d12000 CR4: 0000000000352ef0
>> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > > > Call Trace:
>> > > >  <TASK>
>> > > >  list_lru_add+0x59/0x270 mm/list_lru.c:164
>> > > >  list_lru_add_obj+0x17b/0x250 mm/list_lru.c:187
>> > > >  workingset_update_node+0x1af/0x230 mm/workingset.c:634
>> > > >  xas_update lib/xarray.c:355 [inline]
>> > > >  update_node lib/xarray.c:758 [inline]
>> > > >  xas_store+0xb8f/0x1890 lib/xarray.c:845
>> > > >  page_cache_delete mm/filemap.c:149 [inline]
>> > > >  __filemap_remove_folio+0x4e9/0x670 mm/filemap.c:232
>> > > >  __remove_mapping+0x86f/0xad0 mm/vmscan.c:791
>> > > >  shrink_folio_list+0x30a6/0x5ca0 mm/vmscan.c:1467
>> > > >  evict_folios+0x3c86/0x5800 mm/vmscan.c:4593
>> > > >  try_to_shrink_lruvec+0x9a6/0xc70 mm/vmscan.c:4789
>> > > >  shrink_one+0x3b9/0x850 mm/vmscan.c:4834
>> > > >  shrink_many mm/vmscan.c:4897 [inline]
>> > > >  lru_gen_shrink_node mm/vmscan.c:4975 [inline]
>> > > >  shrink_node+0x37c5/0x3e50 mm/vmscan.c:5956
>> > > >  kswapd_shrink_node mm/vmscan.c:6785 [inline]
>> > > >  balance_pgdat mm/vmscan.c:6977 [inline]
>> > > >  kswapd+0x1ca9/0x36f0 mm/vmscan.c:7246
>> > > >  kthread+0x2f0/0x390 kernel/kthread.c:389
>> > > >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>> > > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>> > > >  </TASK>
>> > >
>> > > This one seems to be related to "mm/list_lru: split the lock to
>> > > per-cgroup scope".
>> > >
>> > > Kairui, can you please take a look? Thanks.
>> >
>> > Thanks for pinging, yes that's a new sanity check added by me.
>> >
>> > Which is supposed to mean, a list_lru is being reparented while the
>> > memcg it belongs to isn't dying.
>> >
>> > More concretely, list_lru is marked dead by memcg_offline_kmem ->
>> > memcg_reparent_list_lrus, if the function is called for one memcg, but
>> > now the memcg is not dying, this WARN triggers. I'm not sure how this
>> > is caused. One possibility is if alloc_shrinker_info() in
>> > mem_cgroup_css_online failed, then memcg_offline_kmem is called early?
>> > Doesn't seem to fit this case though.. Or maybe just sync issues with
>> > the memcg dying flag so the user saw the list_lru dying before seeing
>> > memcg dying? The object might be leaked to the parent cgroup, seems
>> > not too terrible though.
>> >
>> > I'm not sure how to reproduce this. I will keep looking.
>>
>> Managed to boot the image and using the kernel config provided by bot,
>> so far local tests didn't trigger any issue. Is there any way I can
>> reproduce what the bot actually did?
>
>If syzbot doesn't have a repro, it might not be productive for you to
>try to find one. Personally, I would analyze stacktraces and double
>check the code, and move on if I can't find something obviously wrong.
>
>> Or provide some patch for the bot
>> to test?
>
>syzbot only can try patches after it finds a repro. So in this case,
>no, it can't try your patches.
>
>Hope the above clarifies things for you.

Chiming in here as LKFT seems to be able to hit a nearby warning on
boot.

The link below contains the full log as well as additional information
on the run.

https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.13-rc2-232-g4800575d8c0b/testrun/26323524/suite/log-parser-test/test/exception-warning-cpu-pid-at-mmlist_lruc-list_lru_del/details/

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] WARNING in lock_list_lru_of_memcg
  2024-12-16 18:39         ` Sasha Levin
@ 2024-12-17 18:19           ` Kairui Song
  2024-12-18 19:08             ` Kairui Song
  0 siblings, 1 reply; 20+ messages in thread
From: Kairui Song @ 2024-12-17 18:19 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Yu Zhao, syzkaller-bugs, syzbot, akpm, linux-kernel, linux-mm

Thanks! Looking


Sasha Levin <sashal@kernel.org> 于 2024年12月17日周二 02:39写道:
>
> On Sun, Dec 15, 2024 at 07:45:38PM -0700, Yu Zhao wrote:
> >Hi Kairui,
> >
> >On Sun, Dec 15, 2024 at 10:45 AM Kairui Song <ryncsn@gmail.com> wrote:
> >>
> >> On Sun, Dec 15, 2024 at 3:43 AM Kairui Song <ryncsn@gmail.com> wrote:
> >> >
> >> > On Sat, Dec 14, 2024 at 2:06 PM Yu Zhao <yuzhao@google.com> wrote:
> >> > >
> >> > > On Fri, Dec 13, 2024 at 8:56 PM syzbot
> >> > > <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> >> > > >
> >> > > > Hello,
> >> > > >
> >> > > > syzbot found the following issue on:
> >> > > >
> >> > > > HEAD commit:    7cb1b4663150 Merge tag 'locking_urgent_for_v6.13_rc3' of g..
> >> > > > git tree:       upstream
> >> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=16e96b30580000
> >> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=fee25f93665c89ac
> >> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> >> > > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >> > > >
> >> > > > Unfortunately, I don't have any reproducer for this issue yet.
> >> > > >
> >> > > > Downloadable assets:
> >> > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-7cb1b466.raw.xz
> >> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/13e083329dab/vmlinux-7cb1b466.xz
> >> > > > kernel image: https://storage.googleapis.com/syzbot-assets/fe3847d08513/bzImage-7cb1b466.xz
> >> > > >
> >> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >> > > > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> >> > > >
> >> > > > ------------[ cut here ]------------
> >> > > > WARNING: CPU: 0 PID: 80 at mm/list_lru.c:97 lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> >> > > > Modules linked in:
> >> > > > CPU: 0 UID: 0 PID: 80 Comm: kswapd0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
> >> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> >> > > > RIP: 0010:lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> >> > > > Code: e9 22 fe ff ff e8 9b cc b6 ff 4c 8b 7c 24 10 45 84 f6 0f 84 40 ff ff ff e9 37 01 00 00 e8 83 cc b6 ff eb 05 e8 7c cc b6 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7a fd ff ff 48
> >> > > > RSP: 0018:ffffc9000105e798 EFLAGS: 00010093
> >> > > > RAX: ffffffff81e891c4 RBX: 0000000000000000 RCX: ffff88801f53a440
> >> > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> >> > > > RBP: ffff888042e70054 R08: ffffffff81e89156 R09: 1ffffffff2032cae
> >> > > > R10: dffffc0000000000 R11: fffffbfff2032caf R12: ffffffff81e88e5e
> >> > > > R13: ffffffff9a3feb20 R14: 0000000000000000 R15: ffff888042e70000
> >> > > > FS:  0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> >> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > > > CR2: 0000000020161000 CR3: 0000000032d12000 CR4: 0000000000352ef0
> >> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> > > > Call Trace:
> >> > > >  <TASK>
> >> > > >  list_lru_add+0x59/0x270 mm/list_lru.c:164
> >> > > >  list_lru_add_obj+0x17b/0x250 mm/list_lru.c:187
> >> > > >  workingset_update_node+0x1af/0x230 mm/workingset.c:634
> >> > > >  xas_update lib/xarray.c:355 [inline]
> >> > > >  update_node lib/xarray.c:758 [inline]
> >> > > >  xas_store+0xb8f/0x1890 lib/xarray.c:845
> >> > > >  page_cache_delete mm/filemap.c:149 [inline]
> >> > > >  __filemap_remove_folio+0x4e9/0x670 mm/filemap.c:232
> >> > > >  __remove_mapping+0x86f/0xad0 mm/vmscan.c:791
> >> > > >  shrink_folio_list+0x30a6/0x5ca0 mm/vmscan.c:1467
> >> > > >  evict_folios+0x3c86/0x5800 mm/vmscan.c:4593
> >> > > >  try_to_shrink_lruvec+0x9a6/0xc70 mm/vmscan.c:4789
> >> > > >  shrink_one+0x3b9/0x850 mm/vmscan.c:4834
> >> > > >  shrink_many mm/vmscan.c:4897 [inline]
> >> > > >  lru_gen_shrink_node mm/vmscan.c:4975 [inline]
> >> > > >  shrink_node+0x37c5/0x3e50 mm/vmscan.c:5956
> >> > > >  kswapd_shrink_node mm/vmscan.c:6785 [inline]
> >> > > >  balance_pgdat mm/vmscan.c:6977 [inline]
> >> > > >  kswapd+0x1ca9/0x36f0 mm/vmscan.c:7246
> >> > > >  kthread+0x2f0/0x390 kernel/kthread.c:389
> >> > > >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> >> > > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> >> > > >  </TASK>
> >> > >
> >> > > This one seems to be related to "mm/list_lru: split the lock to
> >> > > per-cgroup scope".
> >> > >
> >> > > Kairui, can you please take a look? Thanks.
> >> >
> >> > Thanks for pinging, yes that's a new sanity check added by me.
> >> >
> >> > Which is supposed to mean, a list_lru is being reparented while the
> >> > memcg it belongs to isn't dying.
> >> >
> >> > More concretely, list_lru is marked dead by memcg_offline_kmem ->
> >> > memcg_reparent_list_lrus, if the function is called for one memcg, but
> >> > now the memcg is not dying, this WARN triggers. I'm not sure how this
> >> > is caused. One possibility is if alloc_shrinker_info() in
> >> > mem_cgroup_css_online failed, then memcg_offline_kmem is called early?
> >> > Doesn't seem to fit this case though.. Or maybe just sync issues with
> >> > the memcg dying flag so the user saw the list_lru dying before seeing
> >> > memcg dying? The object might be leaked to the parent cgroup, seems
> >> > not too terrible though.
> >> >
> >> > I'm not sure how to reproduce this. I will keep looking.
> >>
> >> Managed to boot the image and using the kernel config provided by bot,
> >> so far local tests didn't trigger any issue. Is there any way I can
> >> reproduce what the bot actually did?
> >
> >If syzbot doesn't have a repro, it might not be productive for you to
> >try to find one. Personally, I would analyze stacktraces and double
> >check the code, and move on if I can't find something obviously wrong.
> >
> >> Or provide some patch for the bot
> >> to test?
> >
> >syzbot only can try patches after it finds a repro. So in this case,
> >no, it can't try your patches.
> >
> >Hope the above clarifies things for you.
>
> Chiming in here as LKFT seems to be able to hit a nearby warning on
> boot.
>
> The link below contains the full log as well as additional information
> on the run.
>
> https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.13-rc2-232-g4800575d8c0b/testrun/26323524/suite/log-parser-test/test/exception-warning-cpu-pid-at-mmlist_lruc-list_lru_del/details/
>

Thanks for the info, I'm trying to reproduce and checking the code.

There were similar WARN_ON s some years ago and these WARN_ON was
removed by commit 2788cf0c401c allowing nr_items to become a wrong
value, but as that commit message mentioned, that should not be a
problem. I added these back because the new lock_list_lru_of_memcg
should ensure a stable list_lru, so they might help catch wrong usage.
There could be some corner cases or synchronization issues that are
not well considered for these sanity checks, I'm looking at it. An
bold fix is just remove these WARN_ON as such wrong values might not
be harmful. I'll do more checks and tests locally and report back.


> --
> Thanks,
> Sasha

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] WARNING in lock_list_lru_of_memcg
  2024-12-17 18:19           ` Kairui Song
@ 2024-12-18 19:08             ` Kairui Song
  0 siblings, 0 replies; 20+ messages in thread
From: Kairui Song @ 2024-12-18 19:08 UTC (permalink / raw)
  To: syzkaller-bugs; +Cc: Sasha Levin, Yu Zhao, syzbot, akpm, linux-kernel, linux-mm

On Wed, Dec 18, 2024 at 2:19 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> Thanks! Looking
>
>
> Sasha Levin <sashal@kernel.org> 于 2024年12月17日周二 02:39写道:
> >
> > On Sun, Dec 15, 2024 at 07:45:38PM -0700, Yu Zhao wrote:
> > >Hi Kairui,
> > >
> > >On Sun, Dec 15, 2024 at 10:45 AM Kairui Song <ryncsn@gmail.com> wrote:
> > >>
> > >> On Sun, Dec 15, 2024 at 3:43 AM Kairui Song <ryncsn@gmail.com> wrote:
> > >> >
> > >> > On Sat, Dec 14, 2024 at 2:06 PM Yu Zhao <yuzhao@google.com> wrote:
> > >> > >
> > >> > > On Fri, Dec 13, 2024 at 8:56 PM syzbot
> > >> > > <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> > >> > > >
> > >> > > > Hello,
> > >> > > >
> > >> > > > syzbot found the following issue on:
> > >> > > >
> > >> > > > HEAD commit:    7cb1b4663150 Merge tag 'locking_urgent_for_v6.13_rc3' of g..
> > >> > > > git tree:       upstream
> > >> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=16e96b30580000
> > >> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=fee25f93665c89ac
> > >> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> > >> > > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > >> > > >
> > >> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > >> > > >
> > >> > > > Downloadable assets:
> > >> > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-7cb1b466.raw.xz
> > >> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/13e083329dab/vmlinux-7cb1b466.xz
> > >> > > > kernel image: https://storage.googleapis.com/syzbot-assets/fe3847d08513/bzImage-7cb1b466.xz
> > >> > > >
> > >> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > >> > > > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> > >> > > >
> > >> > > > ------------[ cut here ]------------
> > >> > > > WARNING: CPU: 0 PID: 80 at mm/list_lru.c:97 lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> > >> > > > Modules linked in:
> > >> > > > CPU: 0 UID: 0 PID: 80 Comm: kswapd0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
> > >> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > >> > > > RIP: 0010:lock_list_lru_of_memcg+0x395/0x4e0 mm/list_lru.c:97
> > >> > > > Code: e9 22 fe ff ff e8 9b cc b6 ff 4c 8b 7c 24 10 45 84 f6 0f 84 40 ff ff ff e9 37 01 00 00 e8 83 cc b6 ff eb 05 e8 7c cc b6 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7a fd ff ff 48
> > >> > > > RSP: 0018:ffffc9000105e798 EFLAGS: 00010093
> > >> > > > RAX: ffffffff81e891c4 RBX: 0000000000000000 RCX: ffff88801f53a440
> > >> > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > >> > > > RBP: ffff888042e70054 R08: ffffffff81e89156 R09: 1ffffffff2032cae
> > >> > > > R10: dffffc0000000000 R11: fffffbfff2032caf R12: ffffffff81e88e5e
> > >> > > > R13: ffffffff9a3feb20 R14: 0000000000000000 R15: ffff888042e70000
> > >> > > > FS:  0000000000000000(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> > >> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> > > > CR2: 0000000020161000 CR3: 0000000032d12000 CR4: 0000000000352ef0
> > >> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >> > > > Call Trace:
> > >> > > >  <TASK>
> > >> > > >  list_lru_add+0x59/0x270 mm/list_lru.c:164
> > >> > > >  list_lru_add_obj+0x17b/0x250 mm/list_lru.c:187
> > >> > > >  workingset_update_node+0x1af/0x230 mm/workingset.c:634
> > >> > > >  xas_update lib/xarray.c:355 [inline]
> > >> > > >  update_node lib/xarray.c:758 [inline]
> > >> > > >  xas_store+0xb8f/0x1890 lib/xarray.c:845
> > >> > > >  page_cache_delete mm/filemap.c:149 [inline]
> > >> > > >  __filemap_remove_folio+0x4e9/0x670 mm/filemap.c:232
> > >> > > >  __remove_mapping+0x86f/0xad0 mm/vmscan.c:791
> > >> > > >  shrink_folio_list+0x30a6/0x5ca0 mm/vmscan.c:1467
> > >> > > >  evict_folios+0x3c86/0x5800 mm/vmscan.c:4593
> > >> > > >  try_to_shrink_lruvec+0x9a6/0xc70 mm/vmscan.c:4789
> > >> > > >  shrink_one+0x3b9/0x850 mm/vmscan.c:4834
> > >> > > >  shrink_many mm/vmscan.c:4897 [inline]
> > >> > > >  lru_gen_shrink_node mm/vmscan.c:4975 [inline]
> > >> > > >  shrink_node+0x37c5/0x3e50 mm/vmscan.c:5956
> > >> > > >  kswapd_shrink_node mm/vmscan.c:6785 [inline]
> > >> > > >  balance_pgdat mm/vmscan.c:6977 [inline]
> > >> > > >  kswapd+0x1ca9/0x36f0 mm/vmscan.c:7246
> > >> > > >  kthread+0x2f0/0x390 kernel/kthread.c:389
> > >> > > >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> > >> > > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> > >> > > >  </TASK>
> > >> > >
> > >> > > This one seems to be related to "mm/list_lru: split the lock to
> > >> > > per-cgroup scope".
> > >> > >
> > >> > > Kairui, can you please take a look? Thanks.
> > >> >
> > >> > Thanks for pinging, yes that's a new sanity check added by me.
> > >> >
> > >> > Which is supposed to mean, a list_lru is being reparented while the
> > >> > memcg it belongs to isn't dying.
> > >> >
> > >> > More concretely, list_lru is marked dead by memcg_offline_kmem ->
> > >> > memcg_reparent_list_lrus, if the function is called for one memcg, but
> > >> > now the memcg is not dying, this WARN triggers. I'm not sure how this
> > >> > is caused. One possibility is if alloc_shrinker_info() in
> > >> > mem_cgroup_css_online failed, then memcg_offline_kmem is called early?
> > >> > Doesn't seem to fit this case though.. Or maybe just sync issues with
> > >> > the memcg dying flag so the user saw the list_lru dying before seeing
> > >> > memcg dying? The object might be leaked to the parent cgroup, seems
> > >> > not too terrible though.
> > >> >
> > >> > I'm not sure how to reproduce this. I will keep looking.
> > >>
> > >> Managed to boot the image and using the kernel config provided by bot,
> > >> so far local tests didn't trigger any issue. Is there any way I can
> > >> reproduce what the bot actually did?
> > >
> > >If syzbot doesn't have a repro, it might not be productive for you to
> > >try to find one. Personally, I would analyze stacktraces and double
> > >check the code, and move on if I can't find something obviously wrong.
> > >
> > >> Or provide some patch for the bot
> > >> to test?
> > >
> > >syzbot only can try patches after it finds a repro. So in this case,
> > >no, it can't try your patches.
> > >
> > >Hope the above clarifies things for you.
> >
> > Chiming in here as LKFT seems to be able to hit a nearby warning on
> > boot.
> >
> > The link below contains the full log as well as additional information
> > on the run.
> >
> > https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.13-rc2-232-g4800575d8c0b/testrun/26323524/suite/log-parser-test/test/exception-warning-cpu-pid-at-mmlist_lruc-list_lru_del/details/
> >
>

After some investigation, this mm/list_lru.c:80 warn should be fixed by:

diff --git a/mm/list_lru.c b/mm/list_lru.c
index f93ada6a207b..7d69434c70e0 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -77,7 +77,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int
nid, struct mem_cgroup *memcg,
                        spin_lock(&l->lock);
                nr_items = READ_ONCE(l->nr_items);
                if (likely(nr_items != LONG_MIN)) {
-                       WARN_ON(nr_items < 0);
                        rcu_read_unlock();
                        return l;
                }
@@ -450,6 +449,7 @@ static void memcg_reparent_list_lru_one(struct
list_lru *lru, int nid,

        list_splice_init(&src->list, &dst->list);
        if (src->nr_items) {
+               WARN_ON(src->nr_items < 0);
                dst->nr_items += src->nr_items;
                set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru));
        }

This should be caused by a short time window between `mlru =
xas_store(&xas, NULL);` and `memcg_reparent_list_lru_one` in
memcg_reparent_list_lrus, if any user delete an item from list lru
during this time window, it may cause the parents nr_items went
negative and trigger this warning, as the child list_lru still holding
the actual item but it's the parents counter get updated. The counter
will be synced by the reparent so it is not a problem.
We can keep this WARN_ON just move it to the time of the reparent
progress, this removes this false warning while still keep avoiding
misuse from users. I'm not 100% sure this is exactly the LKFT warning,
but will send this out after double confirmation as it does need to be
fixed.


And the mm/list_lru.c:97 seems a different problem, I suspect
memcg_list_lru_alloc wasn't called for shadow_nodes but kswapd started
early. If this is the case, it might not be a new issue, just get
exposed by this new sanity check, this can be bypassed with:

diff --git a/mm/list_lru.c b/mm/list_lru.c
index f93ada6a207b..5f124a661ee8 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -81,6 +81,7 @@ lock_list_lru_of_memcg(struct list_lru *lru, int
nid, struct mem_cgroup *memcg,
                        rcu_read_unlock();
                        return l;
                }
+               VM_WARN_ON(!css_is_dying(&memcg->css));
                if (irq)
                        spin_unlock_irq(&l->lock);
                else
@@ -94,7 +95,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int
nid, struct mem_cgroup *memcg,
                rcu_read_unlock();
                return NULL;
        }
-       VM_WARN_ON(!css_is_dying(&memcg->css));
        memcg = parent_mem_cgroup(memcg);
        goto again;
 }

But I'm not sure if it indicates some potential (and previously
existing) list_lru leak, keeping this sanity check at current place
could be helpful for catching missing memcg_list_lru_alloc call. Will
try to send a proper fix after checking the root cause and reproduce
it locally.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2024-12-14  3:56 [syzbot] [mm?] WARNING in lock_list_lru_of_memcg syzbot
  2024-12-14  6:05 ` Yu Zhao
@ 2025-02-14 18:11 ` syzbot
  2025-02-14 23:23   ` Andrew Morton
  2025-02-18 17:09 ` [syzbot] " syzbot
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: syzbot @ 2025-02-14 18:11 UTC (permalink / raw)
  To: akpm, chengming.zhou, hannes, kasong, kent.overstreet,
	linux-bcachefs, linux-kernel, linux-mm, mhocko, muchun.song,
	roman.gushchin, ryncsn, sashal, shakeel.butt, syzkaller-bugs,
	willy, yuzhao, zhengqi.arch

syzbot has found a reproducer for the following issue on:

HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
Modules linked in:
CPU: 0 UID: 0 PID: 5459 Comm: syz-executor Not tainted 6.14.0-rc2-syzkaller-00185-g128c8f96eb86 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
Code: e9 19 fe ff ff e8 72 f2 b5 ff 4c 8b 7c 24 08 45 84 f6 0f 84 40 ff ff ff e9 22 01 00 00 e8 5a f2 b5 ff eb 05 e8 53 f2 b5 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 71 fd ff ff 48
RSP: 0018:ffffc9000d70f3a0 EFLAGS: 00010293
RAX: ffffffff820bc50d RBX: 0000000000000000 RCX: ffff8880382d4880
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8880351ac054 R08: ffffffff820bc49f R09: 1ffffffff2079b8e
R10: dffffc0000000000 R11: fffffbfff2079b8f R12: ffffffff820bc19e
R13: ffff88801ee9a798 R14: 0000000000000000 R15: ffff8880351ac000
FS:  000055557d70b500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff6826de40 CR3: 000000005680c000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 list_lru_del+0x58/0x1f0 mm/list_lru.c:202
 list_lru_del_obj+0x17b/0x250 mm/list_lru.c:223
 d_lru_del fs/dcache.c:481 [inline]
 to_shrink_list+0x136/0x340 fs/dcache.c:904
 select_collect+0xce/0x1b0 fs/dcache.c:1472
 d_walk+0x1f5/0x750 fs/dcache.c:1295
 shrink_dcache_parent+0x144/0x3b0 fs/dcache.c:1527
 d_invalidate+0x11c/0x2d0 fs/dcache.c:1632
 proc_invalidate_siblings_dcache+0x3fb/0x6e0 fs/proc/inode.c:142
 release_task+0x168e/0x1830 kernel/exit.c:279
 wait_task_zombie kernel/exit.c:1249 [inline]
 wait_consider_task+0x1a14/0x2e60 kernel/exit.c:1476
 do_wait_thread kernel/exit.c:1539 [inline]
 __do_wait+0x1b0/0x850 kernel/exit.c:1657
 do_wait+0x1e9/0x550 kernel/exit.c:1691
 kernel_wait4+0x2a7/0x3e0 kernel/exit.c:1850
 __do_sys_wait4 kernel/exit.c:1878 [inline]
 __se_sys_wait4 kernel/exit.c:1874 [inline]
 __x64_sys_wait4+0x134/0x1e0 kernel/exit.c:1874
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f93f3983057
Code: 89 7c 24 10 48 89 4c 24 18 e8 45 1b 03 00 4c 8b 54 24 18 8b 54 24 14 41 89 c0 48 8b 74 24 08 8b 7c 24 10 b8 3d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 10 e8 95 1b 03 00 8b 44
RSP: 002b:00007fff6826e9b0 EFLAGS: 00000293 ORIG_RAX: 000000000000003d
RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f93f3983057
RDX: 0000000040000001 RSI: 00007fff6826ea1c RDI: 00000000ffffffff
RBP: 00007fff6826ea1c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000001388
R13: 00000000000927c0 R14: 000000000002f011 R15: 00007fff6826ea70
 </TASK>


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2025-02-14 18:11 ` [syzbot] [mm?] [bcachefs?] " syzbot
@ 2025-02-14 23:23   ` Andrew Morton
  2025-02-16 16:13     ` Kairui Song
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2025-02-14 23:23 UTC (permalink / raw)
  To: syzbot
  Cc: chengming.zhou, hannes, kasong, kent.overstreet, linux-bcachefs,
	linux-kernel, linux-mm, mhocko, muchun.song, roman.gushchin,
	ryncsn, sashal, shakeel.butt, syzkaller-bugs, willy, yuzhao,
	zhengqi.arch

On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:

> syzbot has found a reproducer for the following issue on:

Thanks.  I doubt if bcachefs is implicated in this?

> HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96

	VM_WARN_ON(!css_is_dying(&memcg->css));

>
> ...
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2025-02-14 23:23   ` Andrew Morton
@ 2025-02-16 16:13     ` Kairui Song
  2025-02-17 17:12       ` Kairui Song
  0 siblings, 1 reply; 20+ messages in thread
From: Kairui Song @ 2025-02-16 16:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: syzbot, chengming.zhou, hannes, kent.overstreet, linux-bcachefs,
	linux-kernel, linux-mm, mhocko, muchun.song, roman.gushchin,
	sashal, shakeel.butt, syzkaller-bugs, willy, yuzhao, zhengqi.arch

On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
>
> > syzbot has found a reproducer for the following issue on:
>
> Thanks.  I doubt if bcachefs is implicated in this?
>
> > HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> >
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> > mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
>
>         VM_WARN_ON(!css_is_dying(&memcg->css));

I'm checking this, when last time this was triggered, it was caused by
a list_lru user did not initialize the memcg list_lru properly before
list_lru reclaim started, and fixed by:
https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/

This shouldn't be a big issue, maybe there are leaks that will be
fixed upon reparenting, and this new added sanity check might be too
lenient, I'm not 100% sure though.

Unfortunately I couldn't reproduce the issue locally with the
reproducer yet. will keep the test running and see if it can hit this
WARN_ON.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2025-02-16 16:13     ` Kairui Song
@ 2025-02-17 17:12       ` Kairui Song
  2025-02-17 18:09         ` Alan Huang
  0 siblings, 1 reply; 20+ messages in thread
From: Kairui Song @ 2025-02-17 17:12 UTC (permalink / raw)
  To: Andrew Morton, kent.overstreet
  Cc: syzbot, chengming.zhou, hannes, linux-bcachefs, linux-kernel,
	linux-mm, mhocko, muchun.song, roman.gushchin, sashal,
	shakeel.butt, syzkaller-bugs, willy, yuzhao, zhengqi.arch

On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> >
> > > syzbot has found a reproducer for the following issue on:
> >
> > Thanks.  I doubt if bcachefs is implicated in this?
> >
> > > HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> > >
> > > Downloadable assets:
> > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> > > mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> > >
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> >
> >         VM_WARN_ON(!css_is_dying(&memcg->css));
>
> I'm checking this, when last time this was triggered, it was caused by
> a list_lru user did not initialize the memcg list_lru properly before
> list_lru reclaim started, and fixed by:
> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
>
> This shouldn't be a big issue, maybe there are leaks that will be
> fixed upon reparenting, and this new added sanity check might be too
> lenient, I'm not 100% sure though.
>
> Unfortunately I couldn't reproduce the issue locally with the
> reproducer yet. will keep the test running and see if it can hit this
> WARN_ON.

So far I am still unable to trigger this VM_WARN_ON using the
reproducer, and I'm seeing many other random crashes.

But after I changed the .config a bit adding more debug configs
(SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
and will be triggered immediately after I start the test:

[ T1242] BUG: unable to handle page fault for address: ffff888054c60000
[ T1242] #PF: supervisor read access in kernel mode
[ T1242] #PF: error_code(0x0000) - not-present page
[ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
800fffffab39f060
[ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
[ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
6.14.0-rc2-00185-g128c8f96eb86 #2
[ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
[ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
[ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
[ T6058] bcachefs (loop2): empty btree root xattrs
[ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
[ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
[ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
[ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
[ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
[ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
[ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
[ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
knlGS:0000000000000000
[ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
[ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ T1242] Call Trace:
[ T1242]  <TASK>
[ T1242]  bch2_btree_node_read_done+0x1d20/0x53a0
[ T1242]  btree_node_read_work+0x54d/0xdc0
[ T1242]  process_scheduled_works+0xaf8/0x17f0
[ T1242]  worker_thread+0x89d/0xd60
[ T1242]  kthread+0x722/0x890
[ T1242]  ret_from_fork+0x4e/0x80
[ T1242]  ret_from_fork_asm+0x1a/0x30
[ T1242]  </TASK>
[ T1242] Modules linked in:
[ T1242] ---[ end trace 0000000000000000 ]---
[ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
[ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
[ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
[ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
[ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
[ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
[ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
[ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
[ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
knlGS:0000000000000000
[ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
[ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ T1242] Kernel panic - not syncing: Fatal exception
[ T1242] Kernel Offset: disabled
[ T1242] Rebooting in 86400 seconds..

It's caused by the memmove_u64s_down in validate_bset_keys of
fs/bcachefs/btree_io.c:
-> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);

The bkey_p_next(k) is RSI: ffff888054c60000 and it's causing an out of
border access.
(u64 *) vstruct_end(i) - (u64 *) k is RCX: 0000000000006c31, if added
to RDI this should cause an out of border write as well.

This seems to indicate there is an out of border memory modification?
And maybe it corrupted other subsystems? The slight change to .config
changed the layout so it's causing a fault, maybe previously this just
went on silently.
I don't know much about bcachefs, will be grateful if bcachefs people
could help have a look.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2025-02-17 17:12       ` Kairui Song
@ 2025-02-17 18:09         ` Alan Huang
  2025-02-18 11:40           ` Kairui Song
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Huang @ 2025-02-17 18:09 UTC (permalink / raw)
  To: Kairui Song
  Cc: Andrew Morton, kent.overstreet, syzbot, chengming.zhou, hannes,
	linux-bcachefs, linux-kernel, linux-mm, mhocko, muchun.song,
	roman.gushchin, sashal, shakeel.butt, syzkaller-bugs, willy,
	yuzhao, zhengqi.arch

On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@gmail.com> wrote:
> 
> On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
>> 
>> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>>> 
>>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
>>> 
>>>> syzbot has found a reproducer for the following issue on:
>>> 
>>> Thanks.  I doubt if bcachefs is implicated in this?
>>> 
>>>> HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
>>>> git tree:       upstream
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
>>>> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
>>>> 
>>>> Downloadable assets:
>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
>>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
>>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
>>>> 
>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
>>>> 
>>>> ------------[ cut here ]------------
>>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
>>> 
>>>        VM_WARN_ON(!css_is_dying(&memcg->css));
>> 
>> I'm checking this, when last time this was triggered, it was caused by
>> a list_lru user did not initialize the memcg list_lru properly before
>> list_lru reclaim started, and fixed by:
>> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
>> 
>> This shouldn't be a big issue, maybe there are leaks that will be
>> fixed upon reparenting, and this new added sanity check might be too
>> lenient, I'm not 100% sure though.
>> 
>> Unfortunately I couldn't reproduce the issue locally with the
>> reproducer yet. will keep the test running and see if it can hit this
>> WARN_ON.
> 
> So far I am still unable to trigger this VM_WARN_ON using the
> reproducer, and I'm seeing many other random crashes.
> 
> But after I changed the .config a bit adding more debug configs
> (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
> and will be triggered immediately after I start the test:
> 
> [ T1242] BUG: unable to handle page fault for address: ffff888054c60000
> [ T1242] #PF: supervisor read access in kernel mode
> [ T1242] #PF: error_code(0x0000) - not-present page
> [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
> 800fffffab39f060
> [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
> [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
> 6.14.0-rc2-00185-g128c8f96eb86 #2
> [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
> 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
> [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> [ T6058] bcachefs (loop2): empty btree root xattrs
> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> [ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
> knlGS:0000000000000000
> [ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ T1242] Call Trace:
> [ T1242]  <TASK>
> [ T1242]  bch2_btree_node_read_done+0x1d20/0x53a0
> [ T1242]  btree_node_read_work+0x54d/0xdc0
> [ T1242]  process_scheduled_works+0xaf8/0x17f0
> [ T1242]  worker_thread+0x89d/0xd60
> [ T1242]  kthread+0x722/0x890
> [ T1242]  ret_from_fork+0x4e/0x80
> [ T1242]  ret_from_fork_asm+0x1a/0x30
> [ T1242]  </TASK>
> [ T1242] Modules linked in:
> [ T1242] ---[ end trace 0000000000000000 ]---
> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> [ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
> knlGS:0000000000000000
> [ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ T1242] Kernel panic - not syncing: Fatal exception
> [ T1242] Kernel Offset: disabled
> [ T1242] Rebooting in 86400 seconds..
> 
> It's caused by the memmove_u64s_down in validate_bset_keys of
> fs/bcachefs/btree_io.c:
> -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);


Might need this.

diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e71b278672b6..fb53174cb735 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
                }
 got_good_key:
                le16_add_cpu(&i->u64s, -next_good_key);
-               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
+               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k));
                set_btree_node_need_rewrite(b);
        }
 fsck_err:

> 
> The bkey_p_next(k) is RSI: ffff888054c60000 and it's causing an out of
> border access.
> (u64 *) vstruct_end(i) - (u64 *) k is RCX: 0000000000006c31, if added
> to RDI this should cause an out of border write as well.
> 
> This seems to indicate there is an out of border memory modification?
> And maybe it corrupted other subsystems? The slight change to .config
> changed the layout so it's causing a fault, maybe previously this just
> went on silently.
> I don't know much about bcachefs, will be grateful if bcachefs people
> could help have a look.
> 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2025-02-17 18:09         ` Alan Huang
@ 2025-02-18 11:40           ` Kairui Song
  2025-02-18 12:16             ` Alan Huang
  0 siblings, 1 reply; 20+ messages in thread
From: Kairui Song @ 2025-02-18 11:40 UTC (permalink / raw)
  To: Alan Huang
  Cc: Andrew Morton, kent.overstreet, syzbot, chengming.zhou, hannes,
	linux-bcachefs, linux-kernel, linux-mm, mhocko, muchun.song,
	roman.gushchin, sashal, shakeel.butt, syzkaller-bugs, willy,
	yuzhao, zhengqi.arch

On Tue, Feb 18, 2025 at 2:09 AM Alan Huang <mmpgouride@gmail.com> wrote:
>
> On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@gmail.com> wrote:
> >
> > On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
> >>
> >> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >>>
> >>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> >>>
> >>>> syzbot has found a reproducer for the following issue on:
> >>>
> >>> Thanks.  I doubt if bcachefs is implicated in this?
> >>>
> >>>> HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> >>>> git tree:       upstream
> >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> >>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> >>>> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> >>>>
> >>>> Downloadable assets:
> >>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> >>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> >>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> >>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> >>>>
> >>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> >>>>
> >>>> ------------[ cut here ]------------
> >>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> >>>
> >>>        VM_WARN_ON(!css_is_dying(&memcg->css));
> >>
> >> I'm checking this, when last time this was triggered, it was caused by
> >> a list_lru user did not initialize the memcg list_lru properly before
> >> list_lru reclaim started, and fixed by:
> >> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
> >>
> >> This shouldn't be a big issue, maybe there are leaks that will be
> >> fixed upon reparenting, and this new added sanity check might be too
> >> lenient, I'm not 100% sure though.
> >>
> >> Unfortunately I couldn't reproduce the issue locally with the
> >> reproducer yet. will keep the test running and see if it can hit this
> >> WARN_ON.
> >
> > So far I am still unable to trigger this VM_WARN_ON using the
> > reproducer, and I'm seeing many other random crashes.
> >
> > But after I changed the .config a bit adding more debug configs
> > (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
> > and will be triggered immediately after I start the test:
> >
> > [ T1242] BUG: unable to handle page fault for address: ffff888054c60000
> > [ T1242] #PF: supervisor read access in kernel mode
> > [ T1242] #PF: error_code(0x0000) - not-present page
> > [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
> > 800fffffab39f060
> > [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
> > [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
> > 6.14.0-rc2-00185-g128c8f96eb86 #2
> > [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
> > 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
> > [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
> > [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> > [ T6058] bcachefs (loop2): empty btree root xattrs
> > [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> > 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> > ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> > [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> > [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> > [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> > [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> > [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> > [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> > [ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
> > knlGS:0000000000000000
> > [ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> > [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ T1242] Call Trace:
> > [ T1242]  <TASK>
> > [ T1242]  bch2_btree_node_read_done+0x1d20/0x53a0
> > [ T1242]  btree_node_read_work+0x54d/0xdc0
> > [ T1242]  process_scheduled_works+0xaf8/0x17f0
> > [ T1242]  worker_thread+0x89d/0xd60
> > [ T1242]  kthread+0x722/0x890
> > [ T1242]  ret_from_fork+0x4e/0x80
> > [ T1242]  ret_from_fork_asm+0x1a/0x30
> > [ T1242]  </TASK>
> > [ T1242] Modules linked in:
> > [ T1242] ---[ end trace 0000000000000000 ]---
> > [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> > [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> > 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> > ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> > [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> > [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> > [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> > [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> > [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> > [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> > [ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
> > knlGS:0000000000000000
> > [ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> > [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ T1242] Kernel panic - not syncing: Fatal exception
> > [ T1242] Kernel Offset: disabled
> > [ T1242] Rebooting in 86400 seconds..
> >
> > It's caused by the memmove_u64s_down in validate_bset_keys of
> > fs/bcachefs/btree_io.c:
> > -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
>
>
> Might need this.
>
> diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
> index e71b278672b6..fb53174cb735 100644
> --- a/fs/bcachefs/btree_io.c
> +++ b/fs/bcachefs/btree_io.c
> @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
>                 }
>  got_good_key:
>                 le16_add_cpu(&i->u64s, -next_good_key);
> -               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
> +               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k));
>                 set_btree_node_need_rewrite(b);
>         }
>  fsck_err:
>

Thanks, but this didn't fix everything. I think the problem is more
complex, syzbot seems to be trying to mount damaged bcachefs (on
purpose I think), so the vstruct_end(i) is already returning an offset
that is out of border.

I retriggered it and print some more debug info: i->_data is
ffff88806d5c00a0, i->u64s is 60928, and the faulting address is
ffff88806d600000.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2025-02-18 11:40           ` Kairui Song
@ 2025-02-18 12:16             ` Alan Huang
  2025-02-18 17:47               ` Kairui Song
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Huang @ 2025-02-18 12:16 UTC (permalink / raw)
  To: Kairui Song
  Cc: Andrew Morton, kent.overstreet, syzbot, chengming.zhou, hannes,
	linux-bcachefs, linux-kernel, linux-mm, mhocko, muchun.song,
	roman.gushchin, sashal, shakeel.butt, syzkaller-bugs, willy,
	yuzhao, zhengqi.arch

On Feb 18, 2025, at 19:40, Kairui Song <ryncsn@gmail.com> wrote:
> 
> On Tue, Feb 18, 2025 at 2:09 AM Alan Huang <mmpgouride@gmail.com> wrote:
>> 
>> On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@gmail.com> wrote:
>>> 
>>> On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
>>>> 
>>>> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>>>>> 
>>>>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
>>>>> 
>>>>>> syzbot has found a reproducer for the following issue on:
>>>>> 
>>>>> Thanks.  I doubt if bcachefs is implicated in this?
>>>>> 
>>>>>> HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
>>>>>> git tree:       upstream
>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
>>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
>>>>>> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
>>>>>> 
>>>>>> Downloadable assets:
>>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
>>>>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
>>>>>> 
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
>>>>>> 
>>>>>> ------------[ cut here ]------------
>>>>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
>>>>> 
>>>>>       VM_WARN_ON(!css_is_dying(&memcg->css));
>>>> 
>>>> I'm checking this, when last time this was triggered, it was caused by
>>>> a list_lru user did not initialize the memcg list_lru properly before
>>>> list_lru reclaim started, and fixed by:
>>>> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
>>>> 
>>>> This shouldn't be a big issue, maybe there are leaks that will be
>>>> fixed upon reparenting, and this new added sanity check might be too
>>>> lenient, I'm not 100% sure though.
>>>> 
>>>> Unfortunately I couldn't reproduce the issue locally with the
>>>> reproducer yet. will keep the test running and see if it can hit this
>>>> WARN_ON.
>>> 
>>> So far I am still unable to trigger this VM_WARN_ON using the
>>> reproducer, and I'm seeing many other random crashes.
>>> 
>>> But after I changed the .config a bit adding more debug configs
>>> (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
>>> and will be triggered immediately after I start the test:
>>> 
>>> [ T1242] BUG: unable to handle page fault for address: ffff888054c60000
>>> [ T1242] #PF: supervisor read access in kernel mode
>>> [ T1242] #PF: error_code(0x0000) - not-present page
>>> [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
>>> 800fffffab39f060
>>> [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
>>> [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
>>> 6.14.0-rc2-00185-g128c8f96eb86 #2
>>> [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
>>> 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
>>> [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
>>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
>>> [ T6058] bcachefs (loop2): empty btree root xattrs
>>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
>>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
>>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
>>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
>>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
>>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
>>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
>>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
>>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
>>> [ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
>>> knlGS:0000000000000000
>>> [ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
>>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ T1242] Call Trace:
>>> [ T1242]  <TASK>
>>> [ T1242]  bch2_btree_node_read_done+0x1d20/0x53a0
>>> [ T1242]  btree_node_read_work+0x54d/0xdc0
>>> [ T1242]  process_scheduled_works+0xaf8/0x17f0
>>> [ T1242]  worker_thread+0x89d/0xd60
>>> [ T1242]  kthread+0x722/0x890
>>> [ T1242]  ret_from_fork+0x4e/0x80
>>> [ T1242]  ret_from_fork_asm+0x1a/0x30
>>> [ T1242]  </TASK>
>>> [ T1242] Modules linked in:
>>> [ T1242] ---[ end trace 0000000000000000 ]---
>>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
>>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
>>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
>>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
>>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
>>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
>>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
>>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
>>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
>>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
>>> [ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
>>> knlGS:0000000000000000
>>> [ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
>>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ T1242] Kernel panic - not syncing: Fatal exception
>>> [ T1242] Kernel Offset: disabled
>>> [ T1242] Rebooting in 86400 seconds..
>>> 
>>> It's caused by the memmove_u64s_down in validate_bset_keys of
>>> fs/bcachefs/btree_io.c:
>>> -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
>> 
>> 
>> Might need this.
>> 
>> diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
>> index e71b278672b6..fb53174cb735 100644
>> --- a/fs/bcachefs/btree_io.c
>> +++ b/fs/bcachefs/btree_io.c
>> @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
>>                }
>> got_good_key:
>>                le16_add_cpu(&i->u64s, -next_good_key);
>> -               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
>> +               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k));
>>                set_btree_node_need_rewrite(b);
>>        }
>> fsck_err:
>> 
> 
> Thanks, but this didn't fix everything. I think the problem is more
> complex, syzbot seems to be trying to mount damaged bcachefs (on
> purpose I think), so the vstruct_end(i) is already returning an offset
> that is out of border.

Could you try this (I need to go out now):

diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e71b278672b6..80a0094be356 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
                }
 got_good_key:
                le16_add_cpu(&i->u64s, -next_good_key);
-               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
+               memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
                set_btree_node_need_rewrite(b);
        }
 fsck_err:

> 
> I retriggered it and print some more debug info: i->_data is
> ffff88806d5c00a0, i->u64s is 60928, and the faulting address is
> ffff88806d600000.



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [syzbot] Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2024-12-14  3:56 [syzbot] [mm?] WARNING in lock_list_lru_of_memcg syzbot
  2024-12-14  6:05 ` Yu Zhao
  2025-02-14 18:11 ` [syzbot] [mm?] [bcachefs?] " syzbot
@ 2025-02-18 17:09 ` syzbot
  2025-02-18 17:16 ` syzbot
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: syzbot @ 2025-02-18 17:09 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
Author: mmpgouride@gmail.com

#syz test

diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e71b278672b6..80a0094be356 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
               }
got_good_key:
               le16_add_cpu(&i->u64s, -next_good_key);
-               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
+               memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
               set_btree_node_need_rewrite(b);
       }
fsck_err:

> On Feb 15, 2025, at 02:11, syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> 
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> Modules linked in:
> CPU: 0 UID: 0 PID: 5459 Comm: syz-executor Not tainted 6.14.0-rc2-syzkaller-00185-g128c8f96eb86 #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> RIP: 0010:lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> Code: e9 19 fe ff ff e8 72 f2 b5 ff 4c 8b 7c 24 08 45 84 f6 0f 84 40 ff ff ff e9 22 01 00 00 e8 5a f2 b5 ff eb 05 e8 53 f2 b5 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 71 fd ff ff 48
> RSP: 0018:ffffc9000d70f3a0 EFLAGS: 00010293
> RAX: ffffffff820bc50d RBX: 0000000000000000 RCX: ffff8880382d4880
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff8880351ac054 R08: ffffffff820bc49f R09: 1ffffffff2079b8e
> R10: dffffc0000000000 R11: fffffbfff2079b8f R12: ffffffff820bc19e
> R13: ffff88801ee9a798 R14: 0000000000000000 R15: ffff8880351ac000
> FS:  000055557d70b500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fff6826de40 CR3: 000000005680c000 CR4: 0000000000352ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> list_lru_del+0x58/0x1f0 mm/list_lru.c:202
> list_lru_del_obj+0x17b/0x250 mm/list_lru.c:223
> d_lru_del fs/dcache.c:481 [inline]
> to_shrink_list+0x136/0x340 fs/dcache.c:904
> select_collect+0xce/0x1b0 fs/dcache.c:1472
> d_walk+0x1f5/0x750 fs/dcache.c:1295
> shrink_dcache_parent+0x144/0x3b0 fs/dcache.c:1527
> d_invalidate+0x11c/0x2d0 fs/dcache.c:1632
> proc_invalidate_siblings_dcache+0x3fb/0x6e0 fs/proc/inode.c:142
> release_task+0x168e/0x1830 kernel/exit.c:279
> wait_task_zombie kernel/exit.c:1249 [inline]
> wait_consider_task+0x1a14/0x2e60 kernel/exit.c:1476
> do_wait_thread kernel/exit.c:1539 [inline]
> __do_wait+0x1b0/0x850 kernel/exit.c:1657
> do_wait+0x1e9/0x550 kernel/exit.c:1691
> kernel_wait4+0x2a7/0x3e0 kernel/exit.c:1850
> __do_sys_wait4 kernel/exit.c:1878 [inline]
> __se_sys_wait4 kernel/exit.c:1874 [inline]
> __x64_sys_wait4+0x134/0x1e0 kernel/exit.c:1874
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f93f3983057
> Code: 89 7c 24 10 48 89 4c 24 18 e8 45 1b 03 00 4c 8b 54 24 18 8b 54 24 14 41 89 c0 48 8b 74 24 08 8b 7c 24 10 b8 3d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 10 e8 95 1b 03 00 8b 44
> RSP: 002b:00007fff6826e9b0 EFLAGS: 00000293 ORIG_RAX: 000000000000003d
> RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f93f3983057
> RDX: 0000000040000001 RSI: 00007fff6826ea1c RDI: 00000000ffffffff
> RBP: 00007fff6826ea1c R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000001388
> R13: 00000000000927c0 R14: 000000000002f011 R15: 00007fff6826ea70
> </TASK>
> 
> 
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [syzbot] Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2024-12-14  3:56 [syzbot] [mm?] WARNING in lock_list_lru_of_memcg syzbot
                   ` (2 preceding siblings ...)
  2025-02-18 17:09 ` [syzbot] " syzbot
@ 2025-02-18 17:16 ` syzbot
  2025-02-18 20:01 ` syzbot
  2025-02-19 16:12 ` syzbot
  5 siblings, 0 replies; 20+ messages in thread
From: syzbot @ 2025-02-18 17:16 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
Author: mmpgouride@gmail.com


> On Feb 15, 2025, at 02:11, syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> 
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> Modules linked in:
> CPU: 0 UID: 0 PID: 5459 Comm: syz-executor Not tainted 6.14.0-rc2-syzkaller-00185-g128c8f96eb86 #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> RIP: 0010:lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> Code: e9 19 fe ff ff e8 72 f2 b5 ff 4c 8b 7c 24 08 45 84 f6 0f 84 40 ff ff ff e9 22 01 00 00 e8 5a f2 b5 ff eb 05 e8 53 f2 b5 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 71 fd ff ff 48
> RSP: 0018:ffffc9000d70f3a0 EFLAGS: 00010293
> RAX: ffffffff820bc50d RBX: 0000000000000000 RCX: ffff8880382d4880
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff8880351ac054 R08: ffffffff820bc49f R09: 1ffffffff2079b8e
> R10: dffffc0000000000 R11: fffffbfff2079b8f R12: ffffffff820bc19e
> R13: ffff88801ee9a798 R14: 0000000000000000 R15: ffff8880351ac000
> FS:  000055557d70b500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fff6826de40 CR3: 000000005680c000 CR4: 0000000000352ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> list_lru_del+0x58/0x1f0 mm/list_lru.c:202
> list_lru_del_obj+0x17b/0x250 mm/list_lru.c:223
> d_lru_del fs/dcache.c:481 [inline]
> to_shrink_list+0x136/0x340 fs/dcache.c:904
> select_collect+0xce/0x1b0 fs/dcache.c:1472
> d_walk+0x1f5/0x750 fs/dcache.c:1295
> shrink_dcache_parent+0x144/0x3b0 fs/dcache.c:1527
> d_invalidate+0x11c/0x2d0 fs/dcache.c:1632
> proc_invalidate_siblings_dcache+0x3fb/0x6e0 fs/proc/inode.c:142
> release_task+0x168e/0x1830 kernel/exit.c:279
> wait_task_zombie kernel/exit.c:1249 [inline]
> wait_consider_task+0x1a14/0x2e60 kernel/exit.c:1476
> do_wait_thread kernel/exit.c:1539 [inline]
> __do_wait+0x1b0/0x850 kernel/exit.c:1657
> do_wait+0x1e9/0x550 kernel/exit.c:1691
> kernel_wait4+0x2a7/0x3e0 kernel/exit.c:1850
> __do_sys_wait4 kernel/exit.c:1878 [inline]
> __se_sys_wait4 kernel/exit.c:1874 [inline]
> __x64_sys_wait4+0x134/0x1e0 kernel/exit.c:1874
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f93f3983057
> Code: 89 7c 24 10 48 89 4c 24 18 e8 45 1b 03 00 4c 8b 54 24 18 8b 54 24 14 41 89 c0 48 8b 74 24 08 8b 7c 24 10 b8 3d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 10 e8 95 1b 03 00 8b 44
> RSP: 002b:00007fff6826e9b0 EFLAGS: 00000293 ORIG_RAX: 000000000000003d
> RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f93f3983057
> RDX: 0000000040000001 RSI: 00007fff6826ea1c RDI: 00000000ffffffff
> RBP: 00007fff6826ea1c R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000001388
> R13: 00000000000927c0 R14: 000000000002f011 R15: 00007fff6826ea70
> </TASK>
> 
> 
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 

#syz test

diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e71b278672b6..80a0094be356 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
                }
 got_good_key:
                le16_add_cpu(&i->u64s, -next_good_key);
-               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
+               memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
                set_btree_node_need_rewrite(b);
        }
 fsck_err:


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2025-02-18 12:16             ` Alan Huang
@ 2025-02-18 17:47               ` Kairui Song
  0 siblings, 0 replies; 20+ messages in thread
From: Kairui Song @ 2025-02-18 17:47 UTC (permalink / raw)
  To: Alan Huang
  Cc: Andrew Morton, kent.overstreet, syzbot, linux-bcachefs,
	linux-kernel, linux-mm, syzkaller-bugs

On Tue, Feb 18, 2025 at 8:17 PM Alan Huang <mmpgouride@gmail.com> wrote:
>
> On Feb 18, 2025, at 19:40, Kairui Song <ryncsn@gmail.com> wrote:
> >
> > On Tue, Feb 18, 2025 at 2:09 AM Alan Huang <mmpgouride@gmail.com> wrote:
> >>
> >> On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@gmail.com> wrote:
> >>>
> >>> On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@gmail.com> wrote:
> >>>>
> >>>> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >>>>>
> >>>>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> >>>>>
> >>>>>> syzbot has found a reproducer for the following issue on:
> >>>>>
> >>>>> Thanks.  I doubt if bcachefs is implicated in this?
> >>>>>
> >>>>>> HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> >>>>>> git tree:       upstream
> >>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> >>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> >>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> >>>>>> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> >>>>>>
> >>>>>> Downloadable assets:
> >>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> >>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> >>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> >>>>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> >>>>>>
> >>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>>>> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> >>>>>>
> >>>>>> ------------[ cut here ]------------
> >>>>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> >>>>>
> >>>>>       VM_WARN_ON(!css_is_dying(&memcg->css));
> >>>>
> >>>> I'm checking this, when last time this was triggered, it was caused by
> >>>> a list_lru user did not initialize the memcg list_lru properly before
> >>>> list_lru reclaim started, and fixed by:
> >>>> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@gmail.com/T/
> >>>>
> >>>> This shouldn't be a big issue, maybe there are leaks that will be
> >>>> fixed upon reparenting, and this new added sanity check might be too
> >>>> lenient, I'm not 100% sure though.
> >>>>
> >>>> Unfortunately I couldn't reproduce the issue locally with the
> >>>> reproducer yet. will keep the test running and see if it can hit this
> >>>> WARN_ON.
> >>>
> >>> So far I am still unable to trigger this VM_WARN_ON using the
> >>> reproducer, and I'm seeing many other random crashes.
> >>>
> >>> But after I changed the .config a bit adding more debug configs
> >>> (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
> >>> and will be triggered immediately after I start the test:
> >>>
> >>> [ T1242] BUG: unable to handle page fault for address: ffff888054c60000
> >>> [ T1242] #PF: supervisor read access in kernel mode
> >>> [ T1242] #PF: error_code(0x0000) - not-present page
> >>> [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
> >>> 800fffffab39f060
> >>> [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
> >>> [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
> >>> 6.14.0-rc2-00185-g128c8f96eb86 #2
> >>> [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
> >>> 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
> >>> [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
> >>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> >>> [ T6058] bcachefs (loop2): empty btree root xattrs
> >>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> >>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> >>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> >>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> >>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> >>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> >>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> >>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> >>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> >>> [ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
> >>> knlGS:0000000000000000
> >>> [ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> >>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> [ T1242] Call Trace:
> >>> [ T1242]  <TASK>
> >>> [ T1242]  bch2_btree_node_read_done+0x1d20/0x53a0
> >>> [ T1242]  btree_node_read_work+0x54d/0xdc0
> >>> [ T1242]  process_scheduled_works+0xaf8/0x17f0
> >>> [ T1242]  worker_thread+0x89d/0xd60
> >>> [ T1242]  kthread+0x722/0x890
> >>> [ T1242]  ret_from_fork+0x4e/0x80
> >>> [ T1242]  ret_from_fork_asm+0x1a/0x30
> >>> [ T1242]  </TASK>
> >>> [ T1242] Modules linked in:
> >>> [ T1242] ---[ end trace 0000000000000000 ]---
> >>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
> >>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
> >>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
> >>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
> >>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
> >>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
> >>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
> >>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
> >>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
> >>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
> >>> [ T1242] FS:  0000000000000000(0000) GS:ffff88807ea00000(0000)
> >>> knlGS:0000000000000000
> >>> [ T1242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
> >>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> [ T1242] Kernel panic - not syncing: Fatal exception
> >>> [ T1242] Kernel Offset: disabled
> >>> [ T1242] Rebooting in 86400 seconds..
> >>>
> >>> It's caused by the memmove_u64s_down in validate_bset_keys of
> >>> fs/bcachefs/btree_io.c:
> >>> -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
> >>
> >>
> >> Might need this.
> >>
> >> diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
> >> index e71b278672b6..fb53174cb735 100644
> >> --- a/fs/bcachefs/btree_io.c
> >> +++ b/fs/bcachefs/btree_io.c
> >> @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
> >>                }
> >> got_good_key:
> >>                le16_add_cpu(&i->u64s, -next_good_key);
> >> -               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
> >> +               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k));
> >>                set_btree_node_need_rewrite(b);
> >>        }
> >> fsck_err:
> >>
> >
> > Thanks, but this didn't fix everything. I think the problem is more
> > complex, syzbot seems to be trying to mount damaged bcachefs (on
> > purpose I think), so the vstruct_end(i) is already returning an offset
> > that is out of border.
>
> Could you try this (I need to go out now):
>
> diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
> index e71b278672b6..80a0094be356 100644
> --- a/fs/bcachefs/btree_io.c
> +++ b/fs/bcachefs/btree_io.c
> @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
>                 }
>  got_good_key:
>                 le16_add_cpu(&i->u64s, -next_good_key);
> -               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
> +               memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
>                 set_btree_node_need_rewrite(b);
>         }
>  fsck_err:
>
> >
> > I retriggered it and print some more debug info: i->_data is
> > ffff88806d5c00a0, i->u64s is 60928, and the faulting address is
> > ffff88806d600000.
>

Hi Alan

This didn't help either. If I wasn't very wrong about this, the
problem is that the content of the `struct bset` is corrupted (not
exactly sure how this happens, but should be related to the damaged
bcachefs image from syzbot), so calculations based on that won't be
helpful.

If I add a print before the memmove_u64s_down, like this:
pr_err("DEBUG: k: 0x%lx - 0x%lx, len %ld", (unsigned long)k, (unsigned
long)bkey_p_next(k), bkey_p_next(k) - k);
pr_err("DEBUG: i: 0x%lx - 0x%lx, len %ld", (unsigned long)i->start,
(unsigned long)vstruct_end(i), i->u64s);
pr_err("DEBUG: next_good_key * 8: %ld, k + next_good_key: 0x%lx",
       next_good_key * sizeof(u64*), (u64 *) k + next_good_key);
le16_add_cpu(&i->u64s, -next_good_key);
pr_err("DEBUG: copying 0x%lx from 0x%lx, len %ld",
       k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i)
- (u64 *) k);

Then I got:
[   57.100623][ T1222] bcachefs: validate_bset_keys() DEBUG: k:
0xffff88806f2200a0 - 0xffff88806f220110, len 2
[   57.101323][ T1222] bcachefs: validate_bset_keys() DEBUG: i:
0xffff88806f2200a0 - 0xffff88806f2970a0, len 60928
[   57.101990][ T1222] bcachefs: validate_bset_keys() DEBUG:
next_good_key * 8: 3976, k + next_good_key: 0xffff88806f221028
[   57.102712][ T1222] bcachefs: validate_bset_keys() DEBUG: copying
0xffff88806f2200a0 from 0xffff88806f221028, len 60431
[   57.103437][ T1222] BUG: unable to handle page fault for address:
ffff88806f260000

`struct bset i` spawns an invalid area.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2024-12-14  3:56 [syzbot] [mm?] WARNING in lock_list_lru_of_memcg syzbot
                   ` (3 preceding siblings ...)
  2025-02-18 17:16 ` syzbot
@ 2025-02-18 20:01 ` syzbot
  2025-02-19 16:12 ` syzbot
  5 siblings, 0 replies; 20+ messages in thread
From: syzbot @ 2025-02-18 20:01 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
Author: mmpgouride@gmail.com



> On Feb 15, 2025, at 02:11, syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> 
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> Modules linked in:
> CPU: 0 UID: 0 PID: 5459 Comm: syz-executor Not tainted 6.14.0-rc2-syzkaller-00185-g128c8f96eb86 #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> RIP: 0010:lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> Code: e9 19 fe ff ff e8 72 f2 b5 ff 4c 8b 7c 24 08 45 84 f6 0f 84 40 ff ff ff e9 22 01 00 00 e8 5a f2 b5 ff eb 05 e8 53 f2 b5 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 71 fd ff ff 48
> RSP: 0018:ffffc9000d70f3a0 EFLAGS: 00010293
> RAX: ffffffff820bc50d RBX: 0000000000000000 RCX: ffff8880382d4880
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff8880351ac054 R08: ffffffff820bc49f R09: 1ffffffff2079b8e
> R10: dffffc0000000000 R11: fffffbfff2079b8f R12: ffffffff820bc19e
> R13: ffff88801ee9a798 R14: 0000000000000000 R15: ffff8880351ac000
> FS:  000055557d70b500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fff6826de40 CR3: 000000005680c000 CR4: 0000000000352ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> list_lru_del+0x58/0x1f0 mm/list_lru.c:202
> list_lru_del_obj+0x17b/0x250 mm/list_lru.c:223
> d_lru_del fs/dcache.c:481 [inline]
> to_shrink_list+0x136/0x340 fs/dcache.c:904
> select_collect+0xce/0x1b0 fs/dcache.c:1472
> d_walk+0x1f5/0x750 fs/dcache.c:1295
> shrink_dcache_parent+0x144/0x3b0 fs/dcache.c:1527
> d_invalidate+0x11c/0x2d0 fs/dcache.c:1632
> proc_invalidate_siblings_dcache+0x3fb/0x6e0 fs/proc/inode.c:142
> release_task+0x168e/0x1830 kernel/exit.c:279
> wait_task_zombie kernel/exit.c:1249 [inline]
> wait_consider_task+0x1a14/0x2e60 kernel/exit.c:1476
> do_wait_thread kernel/exit.c:1539 [inline]
> __do_wait+0x1b0/0x850 kernel/exit.c:1657
> do_wait+0x1e9/0x550 kernel/exit.c:1691
> kernel_wait4+0x2a7/0x3e0 kernel/exit.c:1850
> __do_sys_wait4 kernel/exit.c:1878 [inline]
> __se_sys_wait4 kernel/exit.c:1874 [inline]
> __x64_sys_wait4+0x134/0x1e0 kernel/exit.c:1874
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f93f3983057
> Code: 89 7c 24 10 48 89 4c 24 18 e8 45 1b 03 00 4c 8b 54 24 18 8b 54 24 14 41 89 c0 48 8b 74 24 08 8b 7c 24 10 b8 3d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 10 e8 95 1b 03 00 8b 44
> RSP: 002b:00007fff6826e9b0 EFLAGS: 00000293 ORIG_RAX: 000000000000003d
> RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f93f3983057
> RDX: 0000000040000001 RSI: 00007fff6826ea1c RDI: 00000000ffffffff
> RBP: 00007fff6826ea1c R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000001388
> R13: 00000000000927c0 R14: 000000000002f011 R15: 00007fff6826ea70
> </TASK>
> 
> 
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 

#syz test

iff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e71b278672b6..907868b53d1f 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -727,14 +727,14 @@ static int validate_bset(struct bch_fs *c, struct bch_dev *ca,
                     btree_node_unsupported_version,
                     "BSET_SEPARATE_WHITEOUTS no longer supported");
 
-       if (!write &&
-           btree_err_on(offset + sectors > (ptr_written ?: btree_sectors(c)),
-                        -BCH_ERR_btree_node_read_err_fixable,
-                        c, ca, b, i, NULL,
-                        bset_past_end_of_btree_node,
-                        "bset past end of btree node (offset %u len %u but written %zu)",
-                        offset, sectors, ptr_written ?: btree_sectors(c)))
+       if (!write && offset + sectors > (ptr_written ?: btree_sectors(c))) {
                i->u64s = 0;
+               btree_err(-BCH_ERR_btree_node_read_err_fixable,
+                         c, ca, b, i, NULL,
+                         bset_past_end_of_btree_node,
+                         "bset past end of btree node (offset %u len %u but written %zu)",
+                         offset, sectors, ptr_written ?: btree_sectors(c));
+       }
 
        btree_err_on(offset && !i->u64s,
                     -BCH_ERR_btree_node_read_err_fixable,
@@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
                }
 got_good_key:
                le16_add_cpu(&i->u64s, -next_good_key);
-               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
+               memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
                set_btree_node_need_rewrite(b);
        }
 fsck_err:




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [syzbot] Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
  2024-12-14  3:56 [syzbot] [mm?] WARNING in lock_list_lru_of_memcg syzbot
                   ` (4 preceding siblings ...)
  2025-02-18 20:01 ` syzbot
@ 2025-02-19 16:12 ` syzbot
  5 siblings, 0 replies; 20+ messages in thread
From: syzbot @ 2025-02-19 16:12 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg
Author: mmpgouride@gmail.com


> On Feb 15, 2025, at 02:11, syzbot <syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com> wrote:
> 
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> Modules linked in:
> CPU: 0 UID: 0 PID: 5459 Comm: syz-executor Not tainted 6.14.0-rc2-syzkaller-00185-g128c8f96eb86 #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> RIP: 0010:lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
> Code: e9 19 fe ff ff e8 72 f2 b5 ff 4c 8b 7c 24 08 45 84 f6 0f 84 40 ff ff ff e9 22 01 00 00 e8 5a f2 b5 ff eb 05 e8 53 f2 b5 ff 90 <0f> 0b 90 eb 97 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 71 fd ff ff 48
> RSP: 0018:ffffc9000d70f3a0 EFLAGS: 00010293
> RAX: ffffffff820bc50d RBX: 0000000000000000 RCX: ffff8880382d4880
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff8880351ac054 R08: ffffffff820bc49f R09: 1ffffffff2079b8e
> R10: dffffc0000000000 R11: fffffbfff2079b8f R12: ffffffff820bc19e
> R13: ffff88801ee9a798 R14: 0000000000000000 R15: ffff8880351ac000
> FS:  000055557d70b500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fff6826de40 CR3: 000000005680c000 CR4: 0000000000352ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> list_lru_del+0x58/0x1f0 mm/list_lru.c:202
> list_lru_del_obj+0x17b/0x250 mm/list_lru.c:223
> d_lru_del fs/dcache.c:481 [inline]
> to_shrink_list+0x136/0x340 fs/dcache.c:904
> select_collect+0xce/0x1b0 fs/dcache.c:1472
> d_walk+0x1f5/0x750 fs/dcache.c:1295
> shrink_dcache_parent+0x144/0x3b0 fs/dcache.c:1527
> d_invalidate+0x11c/0x2d0 fs/dcache.c:1632
> proc_invalidate_siblings_dcache+0x3fb/0x6e0 fs/proc/inode.c:142
> release_task+0x168e/0x1830 kernel/exit.c:279
> wait_task_zombie kernel/exit.c:1249 [inline]
> wait_consider_task+0x1a14/0x2e60 kernel/exit.c:1476
> do_wait_thread kernel/exit.c:1539 [inline]
> __do_wait+0x1b0/0x850 kernel/exit.c:1657
> do_wait+0x1e9/0x550 kernel/exit.c:1691
> kernel_wait4+0x2a7/0x3e0 kernel/exit.c:1850
> __do_sys_wait4 kernel/exit.c:1878 [inline]
> __se_sys_wait4 kernel/exit.c:1874 [inline]
> __x64_sys_wait4+0x134/0x1e0 kernel/exit.c:1874
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f93f3983057
> Code: 89 7c 24 10 48 89 4c 24 18 e8 45 1b 03 00 4c 8b 54 24 18 8b 54 24 14 41 89 c0 48 8b 74 24 08 8b 7c 24 10 b8 3d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 10 e8 95 1b 03 00 8b 44
> RSP: 002b:00007fff6826e9b0 EFLAGS: 00000293 ORIG_RAX: 000000000000003d
> RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f93f3983057
> RDX: 0000000040000001 RSI: 00007fff6826ea1c RDI: 00000000ffffffff
> RBP: 00007fff6826ea1c R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000001388
> R13: 00000000000927c0 R14: 000000000002f011 R15: 00007fff6826ea70
> </TASK>
> 
> 
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 

#syz test

diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e71b278672b6..62ecab8306b5 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -997,7 +997,8 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
                }
 got_good_key:
                le16_add_cpu(&i->u64s, -next_good_key);
-               memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
+               pr_err("DEBUG: i->u64s: %u, btree node size: %u", i->u64s, c->opts.btree_node_size);
+               memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
                set_btree_node_need_rewrite(b);
        }
 fsck_err:

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-02-19 16:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-14  3:56 [syzbot] [mm?] WARNING in lock_list_lru_of_memcg syzbot
2024-12-14  6:05 ` Yu Zhao
2024-12-14 19:43   ` Kairui Song
2024-12-15 17:44     ` Kairui Song
2024-12-16  2:45       ` Yu Zhao
2024-12-16 18:39         ` Sasha Levin
2024-12-17 18:19           ` Kairui Song
2024-12-18 19:08             ` Kairui Song
2025-02-14 18:11 ` [syzbot] [mm?] [bcachefs?] " syzbot
2025-02-14 23:23   ` Andrew Morton
2025-02-16 16:13     ` Kairui Song
2025-02-17 17:12       ` Kairui Song
2025-02-17 18:09         ` Alan Huang
2025-02-18 11:40           ` Kairui Song
2025-02-18 12:16             ` Alan Huang
2025-02-18 17:47               ` Kairui Song
2025-02-18 17:09 ` [syzbot] " syzbot
2025-02-18 17:16 ` syzbot
2025-02-18 20:01 ` syzbot
2025-02-19 16:12 ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).