* [syzbot] Monthly bcachefs report (Jun 2025)
@ 2025-06-24 7:39 syzbot
2025-06-24 16:36 ` Kent Overstreet
0 siblings, 1 reply; 3+ messages in thread
From: syzbot @ 2025-06-24 7:39 UTC (permalink / raw)
To: kent.overstreet, linux-bcachefs, linux-kernel, syzkaller-bugs
Hello bcachefs maintainers/developers,
This is a 31-day syzbot report for the bcachefs subsystem.
All related reports/information can be found at:
https://syzkaller.appspot.com/upstream/s/bcachefs
During the period, 15 new issues were detected and 8 were fixed.
In total, 102 issues are still open and 190 have already been fixed.
Some of the still happening issues:
Ref Crashes Repro Title
<1> 30829 Yes INFO: task hung in bch2_copygc_stop
https://syzkaller.appspot.com/bug?extid=c6fd966ebbdea1e8ff08
<2> 7874 Yes INFO: task hung in __closure_sync
https://syzkaller.appspot.com/bug?extid=7bf808f7fe4a6549f36e
<3> 3875 Yes WARNING in bch2_trans_srcu_unlock
https://syzkaller.appspot.com/bug?extid=1e515cab343dbe5aa38a
<4> 1497 Yes INFO: task hung in __bch2_fs_stop
https://syzkaller.appspot.com/bug?extid=6d3e28b33490b3085412
<5> 1236 Yes INFO: task hung in bch2_journal_reclaim_thread (2)
https://syzkaller.appspot.com/bug?extid=820dc3b465c69f766a57
<6> 442 Yes WARNING in bch2_trans_put
https://syzkaller.appspot.com/bug?extid=291aef749c5cbb9ca2fd
<7> 361 Yes kernel panic: trans should be locked, unlocked by bch2_btree_update_start
https://syzkaller.appspot.com/bug?extid=d540192e763531d307ff
<8> 343 Yes UBSAN: shift-out-of-bounds in __bch2_btree_node_hash_insert
https://syzkaller.appspot.com/bug?extid=029d1989099aa5ae3e89
<9> 326 Yes WARNING in __bch2_truncate_folio
https://syzkaller.appspot.com/bug?extid=3d11e35eeafe176a6c5b
<10> 324 Yes INFO: task hung in bch2_readahead
https://syzkaller.appspot.com/bug?extid=a6060114362257e9798a
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
To disable reminders for individual bugs, reply with the following command:
#syz set <Ref> no-reminders
To change bug's subsystems, reply with:
#syz set <Ref> subsystems: new-subsystem
You may send multiple commands in a single email message.
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [syzbot] Monthly bcachefs report (Jun 2025) 2025-06-24 7:39 [syzbot] Monthly bcachefs report (Jun 2025) syzbot @ 2025-06-24 16:36 ` Kent Overstreet 2025-06-24 17:31 ` Alan Huang 0 siblings, 1 reply; 3+ messages in thread From: Kent Overstreet @ 2025-06-24 16:36 UTC (permalink / raw) To: syzbot; +Cc: linux-bcachefs, linux-kernel, syzkaller-bugs On Tue, Jun 24, 2025 at 12:39:27AM -0700, syzbot wrote: > Hello bcachefs maintainers/developers, > > This is a 31-day syzbot report for the bcachefs subsystem. > All related reports/information can be found at: > https://syzkaller.appspot.com/upstream/s/bcachefs > > During the period, 15 new issues were detected and 8 were fixed. > In total, 102 issues are still open and 190 have already been fixed. > > Some of the still happening issues: Most of these are not terribly concerning, but... > > Ref Crashes Repro Title > <1> 30829 Yes INFO: task hung in bch2_copygc_stop > https://syzkaller.appspot.com/bug?extid=c6fd966ebbdea1e8ff08 > <2> 7874 Yes INFO: task hung in __closure_sync > https://syzkaller.appspot.com/bug?extid=7bf808f7fe4a6549f36e > <3> 3875 Yes WARNING in bch2_trans_srcu_unlock > https://syzkaller.appspot.com/bug?extid=1e515cab343dbe5aa38a > <4> 1497 Yes INFO: task hung in __bch2_fs_stop > https://syzkaller.appspot.com/bug?extid=6d3e28b33490b3085412 > <5> 1236 Yes INFO: task hung in bch2_journal_reclaim_thread (2) > https://syzkaller.appspot.com/bug?extid=820dc3b465c69f766a57 > <6> 442 Yes WARNING in bch2_trans_put this one > https://syzkaller.appspot.com/bug?extid=291aef749c5cbb9ca2fd > <7> 361 Yes kernel panic: trans should be locked, unlocked by bch2_btree_update_start we want these locking bugs gone. ok this one is a bug users hit - I thought locking was fully sorted in 6.16, but apparently not. Previously, the locking bugs (especially in 6.14) were showing up in bch2_dev_metadata_drop(), which uses old style btree node iterators that do a DFS btree walk. Modern code just walks one level of the btree at a time, using bch2_btree_iter_peek() - there's much less special purpose code and strange corner cases this way. So it looks like this is another instance of that - bch2_move_btree() is using the old bch2_btree_iter_peek_node() code. Perhaps it's time to just delete all that stuff. bch2_move_btree() can probably be deleted and replaced with a call to bch2_move_data_btree() - that's part of the main data move path, and it recently gained the ability to walk keys in interior btree nodes. Looking at the stace, we have [ 91.068473][ T5330] Kernel panic - not syncing: trans should be locked, unlocked by bch2_btree_update_start+0x1700/0x1dc0 [ 91.074232][ T5330] CPU: 0 UID: 0 PID: 5330 Comm: syz.0.0 Not tainted 6.16.0-rc1-syzkaller-00203-g4774cfe3543a #0 PREEMPT(full) [ 91.079637][ T5330] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 [ 91.085068][ T5330] Call Trace: [ 91.086510][ T5330] <TASK> [ 91.087883][ T5330] dump_stack_lvl+0x99/0x250 [ 91.090102][ T5330] ? __asan_memcpy+0x40/0x70 [ 91.092501][ T5330] ? __pfx_dump_stack_lvl+0x10/0x10 [ 91.095374][ T5330] ? __pfx__printk+0x10/0x10 [ 91.097423][ T5330] panic+0x2db/0x790 [ 91.099118][ T5330] ? bch2_btree_update_start+0x1700/0x1dc0 [ 91.101917][ T5330] ? bch2_move_btree+0x44e/0xc60 [ 91.104349][ T5330] ? __pfx_panic+0x10/0x10 [ 91.106602][ T5330] ? __closure_wake_up+0xa2/0xb0 [ 91.109239][ T5330] ? bch2_btree_update_free+0x39e/0x3f0 [ 91.111867][ T5330] ? bch2_btree_update_start+0x1700/0x1dc0 [ 91.114541][ T5330] bch2_trans_unlocked_or_in_restart_error+0xb0/0x110 [ 91.117731][ T5330] bch2_path_get+0x108c/0x1540 [ 91.120398][ T5330] ? bch2_btree_node_rewrite+0x17e/0x1120 [ 91.123155][ T5330] ? __pfx_bch2_btree_update_start+0x10/0x10 [ 91.125844][ T5330] ? bch2_move_btree+0x44e/0xc60 [ 91.128330][ T5330] ? __pfx_bch2_path_get+0x10/0x10 [ 91.131060][ T5330] ? rcu_is_watching+0x15/0xb0 [ 91.135248][ T5330] ? bch2_trans_downgrade+0x228/0x360 [ 91.137691][ T5330] ? rcu_is_watching+0x15/0xb0 [ 91.139975][ T5330] ? __bch2_btree_path_downgrade+0x35a/0x5e0 [ 91.143245][ T5330] ? bch2_trans_node_iter_init+0x2d7/0x5e0 [ 91.146160][ T5330] bch2_trans_node_iter_init+0x348/0x5e0 So the real bug is that we're in a restart when we call bch2_trans_node_iter_init(), meaning we didn't call bch2_trans_begin() after we got a restart error. I do not see the bug on a cursory glance, but the code is trickier that it should be. We're aiming to get rid of this sort of open coded restart handling and bch2_trans_begin(0 calls; modern code generally uses lockrestart_do(), commit_do(), or a looping macro that does the restart handling itself - e.g. for_each_btree_key() for (btree = start.btree; btree <= min_t(unsigned, end.btree, btree_id_nr_alive(c) - 1); btree ++) { stats->pos = BBPOS(btree, POS_MIN); if (!bch2_btree_id_root(c, btree)->b) continue; bch2_trans_node_iter_init(trans, &iter, btree, POS_MIN, 0, 0, BTREE_ITER_prefetch); retry: ret = 0; while (bch2_trans_begin(trans), (b = bch2_btree_iter_peek_node(trans, &iter)) && !(ret = PTR_ERR_OR_ZERO(b))) { if (kthread && kthread_should_stop()) break; if ((cmp_int(btree, end.btree) ?: bpos_cmp(b->key.k.p, end.pos)) > 0) break; stats->pos = BBPOS(iter.btree_id, iter.pos); if (!pred(c, arg, b, &io_opts, &data_opts)) goto next; ret = bch2_btree_node_rewrite(trans, &iter, b, 0, 0) ?: ret; if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) continue; if (ret) break; next: bch2_btree_iter_next_node(trans, &iter); } if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) goto retry; bch2_trans_iter_exit(trans, &iter); if (kthread && kthread_should_stop()) break; } ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [syzbot] Monthly bcachefs report (Jun 2025) 2025-06-24 16:36 ` Kent Overstreet @ 2025-06-24 17:31 ` Alan Huang 0 siblings, 0 replies; 3+ messages in thread From: Alan Huang @ 2025-06-24 17:31 UTC (permalink / raw) To: Kent Overstreet; +Cc: syzbot, linux-bcachefs, linux-kernel, syzkaller-bugs On Jun 25, 2025, at 00:36, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Tue, Jun 24, 2025 at 12:39:27AM -0700, syzbot wrote: >> Hello bcachefs maintainers/developers, >> >> This is a 31-day syzbot report for the bcachefs subsystem. >> All related reports/information can be found at: >> https://syzkaller.appspot.com/upstream/s/bcachefs >> >> During the period, 15 new issues were detected and 8 were fixed. >> In total, 102 issues are still open and 190 have already been fixed. >> >> Some of the still happening issues: > > Most of these are not terribly concerning, but... > >> >> Ref Crashes Repro Title >> <1> 30829 Yes INFO: task hung in bch2_copygc_stop >> https://syzkaller.appspot.com/bug?extid=c6fd966ebbdea1e8ff08 >> <2> 7874 Yes INFO: task hung in __closure_sync >> https://syzkaller.appspot.com/bug?extid=7bf808f7fe4a6549f36e >> <3> 3875 Yes WARNING in bch2_trans_srcu_unlock >> https://syzkaller.appspot.com/bug?extid=1e515cab343dbe5aa38a >> <4> 1497 Yes INFO: task hung in __bch2_fs_stop >> https://syzkaller.appspot.com/bug?extid=6d3e28b33490b3085412 >> <5> 1236 Yes INFO: task hung in bch2_journal_reclaim_thread (2) >> https://syzkaller.appspot.com/bug?extid=820dc3b465c69f766a57 >> <6> 442 Yes WARNING in bch2_trans_put > > > this one > >> https://syzkaller.appspot.com/bug?extid=291aef749c5cbb9ca2fd >> <7> 361 Yes kernel panic: trans should be locked, unlocked by bch2_btree_update_start > > we want these locking bugs gone. > > ok this one is a bug users hit - I thought locking was fully sorted in > 6.16, but apparently not. > > Previously, the locking bugs (especially in 6.14) were showing up in > bch2_dev_metadata_drop(), which uses old style btree node iterators that > do a DFS btree walk. > > Modern code just walks one level of the btree at a time, using > bch2_btree_iter_peek() - there's much less special purpose code and > strange corner cases this way. > > So it looks like this is another instance of that - bch2_move_btree() is > using the old bch2_btree_iter_peek_node() code. > > Perhaps it's time to just delete all that stuff. > > bch2_move_btree() can probably be deleted and replaced with a call to > bch2_move_data_btree() - that's part of the main data move path, and it > recently gained the ability to walk keys in interior btree nodes. > > Looking at the stace, we have > [ 91.068473][ T5330] Kernel panic - not syncing: trans should be locked, unlocked by bch2_btree_update_start+0x1700/0x1dc0 > [ 91.074232][ T5330] CPU: 0 UID: 0 PID: 5330 Comm: syz.0.0 Not tainted 6.16.0-rc1-syzkaller-00203-g4774cfe3543a #0 PREEMPT(full) > [ 91.079637][ T5330] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > [ 91.085068][ T5330] Call Trace: > [ 91.086510][ T5330] <TASK> > [ 91.087883][ T5330] dump_stack_lvl+0x99/0x250 > [ 91.090102][ T5330] ? __asan_memcpy+0x40/0x70 > [ 91.092501][ T5330] ? __pfx_dump_stack_lvl+0x10/0x10 > [ 91.095374][ T5330] ? __pfx__printk+0x10/0x10 > [ 91.097423][ T5330] panic+0x2db/0x790 > [ 91.099118][ T5330] ? bch2_btree_update_start+0x1700/0x1dc0 > [ 91.101917][ T5330] ? bch2_move_btree+0x44e/0xc60 > [ 91.104349][ T5330] ? __pfx_panic+0x10/0x10 > [ 91.106602][ T5330] ? __closure_wake_up+0xa2/0xb0 > [ 91.109239][ T5330] ? bch2_btree_update_free+0x39e/0x3f0 > [ 91.111867][ T5330] ? bch2_btree_update_start+0x1700/0x1dc0 > [ 91.114541][ T5330] bch2_trans_unlocked_or_in_restart_error+0xb0/0x110 > [ 91.117731][ T5330] bch2_path_get+0x108c/0x1540 > [ 91.120398][ T5330] ? bch2_btree_node_rewrite+0x17e/0x1120 > [ 91.123155][ T5330] ? __pfx_bch2_btree_update_start+0x10/0x10 > [ 91.125844][ T5330] ? bch2_move_btree+0x44e/0xc60 > [ 91.128330][ T5330] ? __pfx_bch2_path_get+0x10/0x10 > [ 91.131060][ T5330] ? rcu_is_watching+0x15/0xb0 > [ 91.135248][ T5330] ? bch2_trans_downgrade+0x228/0x360 > [ 91.137691][ T5330] ? rcu_is_watching+0x15/0xb0 > [ 91.139975][ T5330] ? __bch2_btree_path_downgrade+0x35a/0x5e0 > [ 91.143245][ T5330] ? bch2_trans_node_iter_init+0x2d7/0x5e0 > [ 91.146160][ T5330] bch2_trans_node_iter_init+0x348/0x5e0 > > So the real bug is that we're in a restart when we call > bch2_trans_node_iter_init(), meaning we didn't call bch2_trans_begin() > after we got a restart error. We call bch2_trans_unlock() even If bch2_btree_reserve_get() at line 1289 in bch2_btree_update_start returns error After that, we return the error. See below. > > I do not see the bug on a cursory glance, but the code is trickier that > it should be. > > We're aiming to get rid of this sort of open coded restart handling and > bch2_trans_begin(0 calls; modern code generally uses lockrestart_do(), > commit_do(), or a looping macro that does the restart handling itself - > e.g. for_each_btree_key() > > for (btree = start.btree; > btree <= min_t(unsigned, end.btree, btree_id_nr_alive(c) - 1); > btree ++) { > stats->pos = BBPOS(btree, POS_MIN); > > if (!bch2_btree_id_root(c, btree)->b) > continue; > > bch2_trans_node_iter_init(trans, &iter, btree, POS_MIN, 0, 0, > BTREE_ITER_prefetch); > retry: > ret = 0; > while (bch2_trans_begin(trans), > (b = bch2_btree_iter_peek_node(trans, &iter)) && > !(ret = PTR_ERR_OR_ZERO(b))) { > if (kthread && kthread_should_stop()) > break; > > if ((cmp_int(btree, end.btree) ?: > bpos_cmp(b->key.k.p, end.pos)) > 0) > break; > > stats->pos = BBPOS(iter.btree_id, iter.pos); > > if (!pred(c, arg, b, &io_opts, &data_opts)) > goto next; > > ret = bch2_btree_node_rewrite(trans, &iter, b, 0, 0) ?: ret; > if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) > continue; > if (ret) > break; Here, we break since it’s not a restart, so at this time, trans->locked is false. > > next: > bch2_btree_iter_next_node(trans, &iter); > } > if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) > goto retry; > > bch2_trans_iter_exit(trans, &iter); > > if (kthread && kthread_should_stop()) > break; > } ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-06-24 17:32 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-24 7:39 [syzbot] Monthly bcachefs report (Jun 2025) syzbot 2025-06-24 16:36 ` Kent Overstreet 2025-06-24 17:31 ` Alan Huang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox