RE: [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb

Linux filesystem development
 help / color / mirror / Atom feed

* RE: [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb_commit
       [not found]                 ` <aW7Vy_RpxseBC4UQ@ndev>
@ 2026-01-20 20:51                   ` Viacheslav Dubeyko
  2026-01-21  2:44                     ` Jinchao Wang
  0 siblings, 1 reply; 2+ messages in thread
From: Viacheslav Dubeyko @ 2026-01-20 20:51 UTC (permalink / raw)
  To: wangjinchao600@gmail.com
  Cc: glaubitz@physik.fu-berlin.de, frank.li@vivo.com,
	slava@dubeyko.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	syzbot+1e3ff4b07c16ca0f6fe2@syzkaller.appspotmail.com

On Tue, 2026-01-20 at 09:09 +0800, Jinchao Wang wrote:
> 

<skipped>

> > 
> > Firs of all, I've tried to check the syzbot report that you are mentioning in
> > the patch. And I was confused because it was report for FAT. So, I don't see the
> > way how I can reproduce the issue on my side.
> > 
> > Secondly, I need to see the real call trace of the issue. This discussion
> > doesn't make sense without the reproduction path and the call trace(s) of the
> > issue.
> > 
> > Thanks,
> > Slava.
> There are many crash in the syz report page, please follow the specified time and version.
> 
> Syzbot report: https://syzkaller.appspot.com/bug?extid=1e3ff4b07c16ca0f6fe2  
> 
> For this version:
> > time             |  kernel    | Commit       | Syzkaller |
> > 2025/12/20 17:03 | linux-next | cc3aa43b44bd | d6526ea3  |
> 
> The full call trace can be found in the crash log of "2025/12/20 17:03", which url is:
> 
> Crash log: https://syzkaller.appspot.com/text?tag=CrashLog&x=12909b1a580000  

This call trace is dedicated to flushing inode's dirty pages in page cache, as
far as I can see:

[  504.401993][   T31] INFO: task kworker/u8:1:13 blocked for more than 143
seconds.
[  504.434587][   T31]       Not tainted syzkaller #0
[  504.441437][   T31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  504.451145][   T31] task:kworker/u8:1    state:D stack:22792 pid:13   
tgid:13    ppid:2      task_flags:0x4208060 flags:0x00080000
[  504.463591][   T31] Workqueue: writeback wb_workfn (flush-7:4)
[  504.471997][   T31] Call Trace:
[  504.475502][   T31]  <TASK>
[  504.479684][   T31]  __schedule+0x150e/0x5070
[  504.484307][   T31]  ? __pfx___schedule+0x10/0x10
[  504.491526][   T31]  ? __blk_flush_plug+0x3fc/0x4b0
[  504.496683][   T31]  ? schedule+0x91/0x360
[  504.501085][   T31]  schedule+0x165/0x360
[  504.505366][   T31]  io_schedule+0x80/0xd0
[  504.510102][   T31]  folio_wait_bit_common+0x6b0/0xb80
[  504.532721][   T31]  ? __pfx_folio_wait_bit_common+0x10/0x10
[  504.538760][   T31]  ? __pfx_wake_page_function+0x10/0x10
[  504.544344][   T31]  ? _raw_spin_unlock_irqrestore+0xad/0x110
[  504.551446][   T31]  ? writeback_iter+0x853/0x1280
[  504.556492][   T31]  writeback_iter+0x8d8/0x1280
[  504.564484][   T31]  blkdev_writepages+0xb7/0x170
[  504.569517][   T31]  ? __pfx_blkdev_writepages+0x10/0x10
[  504.575043][   T31]  ? __pfx_blkdev_writepages+0x10/0x10
[  504.580705][   T31]  do_writepages+0x32e/0x550
[  504.585344][   T31]  ? reacquire_held_locks+0x121/0x1c0
[  504.591296][   T31]  ? writeback_sb_inodes+0x3bd/0x1870
[  504.596806][   T31]  __writeback_single_inode+0x133/0x1240
[  504.603290][   T31]  ? do_raw_spin_unlock+0x122/0x240
[  504.608620][   T31]  writeback_sb_inodes+0x93a/0x1870
[  504.613878][   T31]  ? __pfx_writeback_sb_inodes+0x10/0x10
[  504.637194][   T31]  ? __pfx_down_read_trylock+0x10/0x10
[  504.642838][   T31]  ? __pfx_move_expired_inodes+0x10/0x10
[  504.648717][   T31]  __writeback_inodes_wb+0x111/0x240
[  504.654048][   T31]  wb_writeback+0x43f/0xaa0
[  504.658709][   T31]  ? queue_io+0x281/0x450
[  504.663179][   T31]  ? __pfx_wb_writeback+0x10/0x10
[  504.668641][   T31]  wb_workfn+0x8ee/0xed0
[  504.673021][   T31]  ? __pfx_wb_workfn+0x10/0x10
[  504.677989][   T31]  ? _raw_spin_unlock_irqrestore+0xad/0x110
[  504.683916][   T31]  ? preempt_schedule+0xae/0xc0
[  504.688852][   T31]  ? preempt_schedule_common+0x83/0xd0
[  504.694389][   T31]  ? process_one_work+0x868/0x15a0
[  504.699698][   T31]  process_one_work+0x93a/0x15a0
[  504.704752][   T31]  ? __pfx_process_one_work+0x10/0x10
[  504.717115][   T31]  ? assign_work+0x3c7/0x5b0
[  504.739767][   T31]  worker_thread+0x9b0/0xee0
[  504.744502][   T31]  kthread+0x711/0x8a0
[  504.748698][   T31]  ? __pfx_worker_thread+0x10/0x10
[  504.753855][   T31]  ? __pfx_kthread+0x10/0x10
[  504.758645][   T31]  ? _raw_spin_unlock_irq+0x23/0x50
[  504.763888][   T31]  ? lockdep_hardirqs_on+0x98/0x140
[  504.769331][   T31]  ? __pfx_kthread+0x10/0x10
[  504.773958][   T31]  ret_from_fork+0x599/0xb30
[  504.779253][   T31]  ? __pfx_ret_from_fork+0x10/0x10
[  504.784718][   T31]  ? __switch_to_asm+0x39/0x70
[  504.791355][   T31]  ? __switch_to_asm+0x33/0x70
[  504.796167][   T31]  ? __pfx_kthread+0x10/0x10
[  504.800882][   T31]  ret_from_fork_asm+0x1a/0x30
[  504.805695][   T31]  </TASK>

And this call trace is dedicated to superblock commit: 

[  505.186758][   T31] INFO: task kworker/1:4:5971 blocked for more than 144
seconds.
[  505.194752][ T8014] Bluetooth: hci37: command tx timeout
[  505.210267][   T31]       Not tainted syzkaller #0
[  505.215260][   T31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  505.273687][   T31] task:kworker/1:4     state:D stack:24152 pid:5971 
tgid:5971  ppid:2      task_flags:0x4208060 flags:0x00080000
[  505.287569][   T31] Workqueue: events_long flush_mdb
[  505.293762][   T31] Call Trace:
[  505.297607][   T31]  <TASK>
[  505.307307][   T31]  __schedule+0x150e/0x5070
[  505.314414][   T31]  ? __pfx___schedule+0x10/0x10
[  505.325453][   T31]  ? _raw_spin_unlock_irqrestore+0xad/0x110
[  505.331535][   T31]  ? __pfx__raw_spin_unlock_irqrestore+0x10/0x10
[  505.354296][   T31]  ? preempt_schedule+0xae/0xc0
[  505.359482][   T31]  ? preempt_schedule+0xae/0xc0
[  505.364399][   T31]  ? __pfx___schedule+0x10/0x10
[  505.369493][   T31]  ? schedule+0x91/0x360
[  505.373819][   T31]  schedule+0x165/0x360
[  505.378340][   T31]  io_schedule+0x80/0xd0
[  505.382626][   T31]  bit_wait_io+0x11/0xd0
[  505.387219][   T31]  __wait_on_bit_lock+0xec/0x4f0
[  505.392201][   T31]  ? __pfx_bit_wait_io+0x10/0x10
[  505.397441][   T31]  ? __pfx_bit_wait_io+0x10/0x10
[  505.402435][   T31]  out_of_line_wait_on_bit_lock+0x123/0x170
[  505.408661][   T31]  ? __pfx___might_resched+0x10/0x10
[  505.414026][   T31]  ? __pfx_out_of_line_wait_on_bit_lock+0x10/0x10
[  505.420693][   T31]  ? __pfx_wake_bit_function+0x10/0x10
[  505.426212][   T31]  ? __lock_buffer+0xe/0x80
[  505.431646][   T31]  hfs_mdb_commit+0x115/0x12e0
[  505.451949][   T31]  ? do_raw_spin_unlock+0x122/0x240
[  505.457642][   T31]  ? _raw_spin_unlock+0x28/0x50
[  505.462552][   T31]  ? process_one_work+0x868/0x15a0
[  505.467897][   T31]  process_one_work+0x93a/0x15a0
[  505.472917][   T31]  ? __pfx_process_one_work+0x10/0x10
[  505.478463][   T31]  ? assign_work+0x3c7/0x5b0
[  505.483113][   T31]  worker_thread+0x9b0/0xee0
[  505.487894][   T31]  kthread+0x711/0x8a0
[  505.492015][   T31]  ? __pfx_worker_thread+0x10/0x10
[  505.497303][   T31]  ? __pfx_kthread+0x10/0x10
[  505.502429][   T31]  ? _raw_spin_unlock_irq+0x23/0x50
[  505.510913][   T31]  ? lockdep_hardirqs_on+0x98/0x140
[  505.516183][   T31]  ? __pfx_kthread+0x10/0x10
[  505.521290][   T31]  ret_from_fork+0x599/0xb30
[  505.525991][   T31]  ? __pfx_ret_from_fork+0x10/0x10
[  505.531301][   T31]  ? __switch_to_asm+0x39/0x70
[  505.535600][ T8874] chnl_net:caif_netlink_parms(): no params data found
[  505.536284][   T31]  ? __switch_to_asm+0x33/0x70
[  505.560487][   T31]  ? __pfx_kthread+0x10/0x10
[  505.565188][   T31]  ret_from_fork_asm+0x1a/0x30
[  505.570372][   T31]  </TASK>

I don't see any relation between folios in inode's page cache and HFS_SB(sb)-
>mdb_bh because they cannot share the same folio. I still don't see from your
explanation how the issue could happen. I don't see how lock_buffer(HFS_SB(sb)-
>mdb_bh) can be responsible for the issue. Oppositely, if we follow to your
logic, then we never can be able to mount any HFS volume. But xfstests works for
HFS file systems (of course, multiple tests fail) and I cannot see the deadlock
for common situation. So, you need to explain which particular use-case can
reproduce the issue and what is mechanism of deadlock happening.

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb_commit
  2026-01-20 20:51                   ` [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb_commit Viacheslav Dubeyko
@ 2026-01-21  2:44                     ` Jinchao Wang
  0 siblings, 0 replies; 2+ messages in thread
From: Jinchao Wang @ 2026-01-21  2:44 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: glaubitz@physik.fu-berlin.de, frank.li@vivo.com,
	slava@dubeyko.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	syzbot+1e3ff4b07c16ca0f6fe2@syzkaller.appspotmail.com

On Tue, Jan 20, 2026 at 08:51:06PM +0000, Viacheslav Dubeyko wrote:
> On Tue, 2026-01-20 at 09:09 +0800, Jinchao Wang wrote:
> > 
> 
> <skipped>
> 
> > > 
> > > Firs of all, I've tried to check the syzbot report that you are mentioning in
> > > the patch. And I was confused because it was report for FAT. So, I don't see the
> > > way how I can reproduce the issue on my side.
> > > 
> > > Secondly, I need to see the real call trace of the issue. This discussion
> > > doesn't make sense without the reproduction path and the call trace(s) of the
> > > issue.
> > > 
> > > Thanks,
> > > Slava.
> > There are many crash in the syz report page, please follow the specified time and version.
> > 
> > Syzbot report: https://syzkaller.appspot.com/bug?extid=1e3ff4b07c16ca0f6fe2  
> > 
> > For this version:
> > > time             |  kernel    | Commit       | Syzkaller |
> > > 2025/12/20 17:03 | linux-next | cc3aa43b44bd | d6526ea3  |
> > 
> > The full call trace can be found in the crash log of "2025/12/20 17:03", which url is:
> > 
> > Crash log: https://syzkaller.appspot.com/text?tag=CrashLog&x=12909b1a580000  
> 
> This call trace is dedicated to flushing inode's dirty pages in page cache, as
> far as I can see:
> 
> [  504.401993][   T31] INFO: task kworker/u8:1:13 blocked for more than 143
> seconds.
> [  504.434587][   T31]       Not tainted syzkaller #0
> [  504.441437][   T31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  504.451145][   T31] task:kworker/u8:1    state:D stack:22792 pid:13   
> tgid:13    ppid:2      task_flags:0x4208060 flags:0x00080000
> [  504.463591][   T31] Workqueue: writeback wb_workfn (flush-7:4)
> [  504.471997][   T31] Call Trace:
> [  504.475502][   T31]  <TASK>
> ...
> [  504.805695][   T31]  </TASK>
> 
> And this call trace is dedicated to superblock commit: 
> 
> [  505.186758][   T31] INFO: task kworker/1:4:5971 blocked for more than 144
> seconds.
> [  505.194752][ T8014] Bluetooth: hci37: command tx timeout
> [  505.210267][   T31]       Not tainted syzkaller #0
> [  505.215260][   T31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  505.273687][   T31] task:kworker/1:4     state:D stack:24152 pid:5971 
> tgid:5971  ppid:2      task_flags:0x4208060 flags:0x00080000
> [  505.287569][   T31] Workqueue: events_long flush_mdb
> [  505.293762][   T31] Call Trace:
> [  505.297607][   T31]  <TASK>
> ...
> [  505.570372][   T31]  </TASK>
> 
> I don't see any relation between folios in inode's page cache and HFS_SB(sb)-
> >mdb_bh because they cannot share the same folio. 
What you pasted are not the right tasks. Please see this analysis which I sent before
and focus on the task id 8009 and 8010.

Analysis
========
In the crash log, the lockdep information requires adjustment based on the call stack.
After adjustment, a deadlock is identified:

** task syz.1.1902:8009 **
- held &disk->open_mutex
- held foio lock
- wait lock_buffer(bh)
Partial call trace:
->blkdev_writepages()
        ->writeback_iter()
                ->writeback_get_folio()
                        ->folio_lock(folio)
        ->block_write_full_folio()
                __block_write_full_folio()
                        ->lock_buffer(bh)

task syz.0.1904:8010
- held &type->s_umount_key#66 down_read
- held lock_buffer(HFS_SB(sb)->mdb_bh);
- wait folio
Partial call trace:
hfs_mdb_commit
        ->lock_buffer(HFS_SB(sb)->mdb_bh);
        ->bh = sb_bread(sb, block);
                ...->folio_lock(folio)


Other hung tasks are secondary effects of this deadlock. The issue
is reproducible in my local environment usuing the syz-reproducer.

> I still don't see from your
> explanation how the issue could happen. I don't see how lock_buffer(HFS_SB(sb)-
> >mdb_bh) can be responsible for the issue. 

> Oppositely, if we follow to your
> logic, then we never can be able to mount any HFS volume. But xfstests works for
> HFS file systems (of course, multiple tests fail) and I cannot see the deadlock
> for common situation. So, you need to explain which particular use-case can
> reproduce the issue and what is mechanism of deadlock happening.
> 

Please follow what I sent and do the reproduce. 
Have you ever try the specified time and version in the syz report page?

| time             |  kernel    | Commit       | Syzkaller |
| 2025/12/20 17:03 | linux-next | cc3aa43b44bd | d6526ea3  |

-- 
Thanks,
Jinchao

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-01-21  2:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <68b0240f.a00a0220.1337b0.0006.GAE@google.com>
     [not found] ` <20260113081952.2431735-1-wangjinchao600@gmail.com>
     [not found]   ` <a2b8144a25206fba69e59e805d93c05444080132.camel@ibm.com>
     [not found]     ` <aWcHhTiUrDppotRg@ndev>
     [not found]       ` <d382b5c97a71d769598fd32bc22cae9f960fea70.camel@ibm.com>
     [not found]         ` <aWhgNujuXujxSg3E@ndev>
     [not found]           ` <b718505beca70f2a3c1e0e20c74e43ae558b29d5.camel@ibm.com>
     [not found]             ` <aWnybRfDcsUAtsol@ndev>
     [not found]               ` <0349430786e4553845c30490e19b08451c8b999f.camel@ibm.com>
     [not found]                 ` <aW7Vy_RpxseBC4UQ@ndev>
2026-01-20 20:51                   ` [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb_commit Viacheslav Dubeyko
2026-01-21  2:44                     ` Jinchao Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox