* fstests btrfs/057 trigger a deadlock on 5.4.187
@ 2022-06-10 9:22 Wang Yugui
2022-06-11 6:26 ` Wang Yugui
0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-06-10 9:22 UTC (permalink / raw)
To: linux-btrfs
Hi,
fstests btrfs/057 trigger a deadlock on linux 5.4.187
Frequency: yet not able to reproduce it after 50 loops.
but it happen on a server with ECC memory.
Maybe the Call Trace info help.
[ 2437.005526] sysrq: Show Blocked State
[ 2437.009825] task PC stack pid father
[ 2437.017595] btrfs-cleaner D 0 42617 2 0x80004000
[ 2437.023916] Call Trace:
[ 2437.026841] __schedule+0x2e9/0x730
[ 2437.030921] schedule+0x36/0xc0
[ 2437.034672] wait_current_trans+0xd4/0x110 [btrfs]
[ 2437.040224] ? finish_wait+0x80/0x80
[ 2437.044462] start_transaction+0x42a/0x500 [btrfs]
[ 2437.050051] btrfs_drop_snapshot+0x548/0x840 [btrfs]
[ 2437.055818] ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
[ 2437.062054] ? down_write+0x21/0x50
[ 2437.066199] ? btrfs_delete_unused_bgs+0x35/0x610 [btrfs]
[ 2437.072466] ? cleaner_kthread+0xc8/0x150 [btrfs]
[ 2437.077950] btrfs_clean_one_deleted_snapshot+0xc1/0x110 [btrfs]
[ 2437.084879] cleaner_kthread+0x121/0x150 [btrfs]
[ 2437.090277] ? btree_submit_bio_start+0x10/0x10 [btrfs]
[ 2437.096339] kthread+0x12b/0x150
[ 2437.100190] ? __kthread_cancel_work+0x40/0x40
[ 2437.105407] ret_from_fork+0x1f/0x40
[ 2437.109652] btrfs-transacti D 0 42618 2 0x80004000
[ 2437.116048] Call Trace:
[ 2437.119052] __schedule+0x2e9/0x730
[ 2437.123224] schedule+0x36/0xc0
[ 2437.127046] wait_current_trans+0xd4/0x110 [btrfs]
[ 2437.132671] ? finish_wait+0x80/0x80
[ 2437.136972] start_transaction+0x315/0x500 [btrfs]
[ 2437.142655] transaction_kthread+0xa9/0x190 [btrfs]
[ 2437.148424] ? btrfs_cleanup_transaction+0x610/0x610 [btrfs]
[ 2437.155048] kthread+0x12b/0x150
[ 2437.158979] ? __kthread_cancel_work+0x40/0x40
[ 2437.164279] ret_from_fork+0x1f/0x40
[ 2437.168610] umount D 0 42653 42649 0x00004000
[ 2437.175085] Call Trace:
[ 2437.178179] __schedule+0x2e9/0x730
[ 2437.182444] ? wait_for_completion+0x116/0x1c0
[ 2437.187768] schedule+0x36/0xc0
[ 2437.191640] schedule_timeout+0x222/0x350
[ 2437.196520] ? wait_for_completion+0x116/0x1c0
[ 2437.201861] wait_for_completion+0x14e/0x1c0
[ 2437.207017] ? wake_up_q+0x70/0x70
[ 2437.211237] __synchronize_srcu.part.20+0x94/0xc0
[ 2437.216912] ? __bpf_trace_rcu_utilization+0x10/0x10
[ 2437.222847] btrfs_drop_and_free_fs_root+0x8e/0xf0 [btrfs]
[ 2437.229393] switch_commit_roots+0x18d/0x1d0 [btrfs]
[ 2437.235361] btrfs_commit_transaction+0x615/0xa80 [btrfs]
[ 2437.241794] ? finish_wait+0x80/0x80
[ 2437.246207] sync_filesystem+0x71/0x90
[ 2437.250837] generic_shutdown_super+0x22/0x110
[ 2437.256209] kill_anon_super+0x14/0x30
[ 2437.260846] btrfs_kill_super+0x12/0xa0 [btrfs]
[ 2437.266304] deactivate_locked_super+0x34/0x70
[ 2437.271685] cleanup_mnt+0xb8/0x140
[ 2437.276012] task_work_run+0xa3/0xe0
[ 2437.280352] exit_to_usermode_loop+0x103/0x120
[ 2437.285728] do_syscall_64+0x198/0x1a0
[ 2437.290320] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2437.296376] RIP: 0033:0x7f6d5b088ba7
[ 2437.300782] Code: Bad RIP value.
[ 2437.304780] RSP: 002b:00007ffdc69e0b08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 2437.313649] RAX: 0000000000000000 RBX: 000055d46fc65060 RCX: 00007f6d5b088ba7
[ 2437.322070] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055d46fc6bb10
[ 2437.330478] RBP: 000055d46fc6bb10 R08: 0000000000000000 R09: 000000000000000d
[ 2437.338864] R10: 00007ffdc69e0560 R11: 0000000000000246 R12: 00007f6d5be32ee4
[ 2437.347244] R13: 0000000000000000 R14: 000055d46fc652d0 R15: 00000000ffffffff
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/10
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fstests btrfs/057 trigger a deadlock on 5.4.187
2022-06-10 9:22 fstests btrfs/057 trigger a deadlock on 5.4.187 Wang Yugui
@ 2022-06-11 6:26 ` Wang Yugui
2022-06-11 6:35 ` Wang Yugui
0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-06-11 6:26 UTC (permalink / raw)
To: linux-btrfs
Hi,
this dead-lock is triggered by fstests btrfs/057 again on the lastest 5.4(5.4.197).
Frequency: about 1/100.
The Call Trace on 5.4.197
[ 704.619697] sysrq: Show Blocked State
[ 704.624312] task PC stack pid father
[ 704.632499] btrfs-cleaner D 0 33205 2 0x80004000
[ 704.639055] Call Trace:
[ 704.642239] __schedule+0x284/0x6d0
[ 704.646595] schedule+0x2f/0xa0
[ 704.650669] wait_current_trans+0xb3/0xf0 [btrfs]
[ 704.656407] ? finish_wait+0x80/0x80
[ 704.660916] start_transaction+0x41c/0x4f0 [btrfs]
[ 704.666767] btrfs_drop_snapshot+0x548/0x840 [btrfs]
[ 704.672778] ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
[ 704.679247] ? down_write+0xe/0x40
[ 704.683524] ? btrfs_delete_unused_bgs+0x35/0x5b0 [btrfs]
[ 704.690065] ? btree_submit_bio_start+0x10/0x10 [btrfs]
[ 704.696366] btrfs_clean_one_deleted_snapshot+0xba/0x110 [btrfs]
[ 704.703513] cleaner_kthread+0xfa/0x120 [btrfs]
[ 704.709054] kthread+0x112/0x130
[ 704.713104] ? __kthread_cancel_work+0x40/0x40
[ 704.718511] ret_from_fork+0x1f/0x40
[ 704.722978] btrfs-transacti D 0 33206 2 0x80004000
[ 704.729585] Call Trace:
[ 704.732809] __schedule+0x284/0x6d0
[ 704.737156] schedule+0x2f/0xa0
[ 704.741123] wait_current_trans+0xb3/0xf0 [btrfs]
[ 704.746823] ? finish_wait+0x80/0x80
[ 704.751270] start_transaction+0x315/0x4f0 [btrfs]
[ 704.757099] transaction_kthread+0xa2/0x180 [btrfs]
[ 704.763001] ? btrfs_cleanup_transaction+0x590/0x590 [btrfs]
[ 704.769776] kthread+0x112/0x130
[ 704.773811] ? __kthread_cancel_work+0x40/0x40
[ 704.779218] ret_from_fork+0x1f/0x40
[ 704.783650] umount D 0 33243 33239 0x00004000
[ 704.790253] Call Trace:
[ 704.793423] __schedule+0x284/0x6d0
[ 704.797788] schedule+0x2f/0xa0
[ 704.801733] schedule_timeout+0x20d/0x340
[ 704.806634] wait_for_completion+0x11f/0x190
[ 704.811866] ? wake_up_q+0x70/0x70
[ 704.816099] __synchronize_srcu.part.20+0x81/0xb0
[ 704.821784] ? __bpf_trace_rcu_utilization+0x10/0x10
[ 704.827787] btrfs_drop_and_free_fs_root+0x87/0xe0 [btrfs]
[ 704.834350] switch_commit_roots+0x186/0x1c0 [btrfs]
[ 704.840373] btrfs_commit_transaction+0x5da/0xa20 [btrfs]
[ 704.846842] ? finish_wait+0x80/0x80
[ 704.851265] sync_filesystem+0x71/0x90
[ 704.855907] generic_shutdown_super+0x22/0x100
[ 704.861342] kill_anon_super+0x14/0x30
[ 704.866001] btrfs_kill_super+0x12/0xa0 [btrfs]
[ 704.871487] deactivate_locked_super+0x34/0x70
[ 704.876890] cleanup_mnt+0xb8/0x140
[ 704.881190] task_work_run+0x8a/0xb0
[ 704.885586] exit_to_usermode_loop+0x103/0x120
[ 704.890981] do_syscall_64+0x198/0x1a0
[ 704.895604] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 704.901698] RIP: 0033:0x7fc630348ba7
[ 704.906098] Code: Bad RIP value.
[ 704.910097] RSP: 002b:00007ffd612e8a18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 704.918967] RAX: 0000000000000000 RBX: 0000555b85889060 RCX: 00007fc630348ba7
[ 704.927386] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000555b8588fb10
[ 704.935800] RBP: 0000555b8588fb10 R08: 0000000000000000 R09: 000000000000000d
[ 704.944185] R10: 00007ffd612e84a0 R11: 0000000000000246 R12: 00007fc6310f2ee4
[ 704.952559] R13: 0000000000000000 R14: 0000555b858892d0 R15: 00000000ffffffff
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/11
> Hi,
>
> fstests btrfs/057 trigger a deadlock on linux 5.4.187
>
> Frequency: yet not able to reproduce it after 50 loops.
> but it happen on a server with ECC memory.
>
> Maybe the Call Trace info help.
>
> [ 2437.005526] sysrq: Show Blocked State
> [ 2437.009825] task PC stack pid father
> [ 2437.017595] btrfs-cleaner D 0 42617 2 0x80004000
> [ 2437.023916] Call Trace:
> [ 2437.026841] __schedule+0x2e9/0x730
> [ 2437.030921] schedule+0x36/0xc0
> [ 2437.034672] wait_current_trans+0xd4/0x110 [btrfs]
> [ 2437.040224] ? finish_wait+0x80/0x80
> [ 2437.044462] start_transaction+0x42a/0x500 [btrfs]
> [ 2437.050051] btrfs_drop_snapshot+0x548/0x840 [btrfs]
> [ 2437.055818] ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
> [ 2437.062054] ? down_write+0x21/0x50
> [ 2437.066199] ? btrfs_delete_unused_bgs+0x35/0x610 [btrfs]
> [ 2437.072466] ? cleaner_kthread+0xc8/0x150 [btrfs]
> [ 2437.077950] btrfs_clean_one_deleted_snapshot+0xc1/0x110 [btrfs]
> [ 2437.084879] cleaner_kthread+0x121/0x150 [btrfs]
> [ 2437.090277] ? btree_submit_bio_start+0x10/0x10 [btrfs]
> [ 2437.096339] kthread+0x12b/0x150
> [ 2437.100190] ? __kthread_cancel_work+0x40/0x40
> [ 2437.105407] ret_from_fork+0x1f/0x40
>
> [ 2437.109652] btrfs-transacti D 0 42618 2 0x80004000
> [ 2437.116048] Call Trace:
> [ 2437.119052] __schedule+0x2e9/0x730
> [ 2437.123224] schedule+0x36/0xc0
> [ 2437.127046] wait_current_trans+0xd4/0x110 [btrfs]
> [ 2437.132671] ? finish_wait+0x80/0x80
> [ 2437.136972] start_transaction+0x315/0x500 [btrfs]
> [ 2437.142655] transaction_kthread+0xa9/0x190 [btrfs]
> [ 2437.148424] ? btrfs_cleanup_transaction+0x610/0x610 [btrfs]
> [ 2437.155048] kthread+0x12b/0x150
> [ 2437.158979] ? __kthread_cancel_work+0x40/0x40
> [ 2437.164279] ret_from_fork+0x1f/0x40
> [ 2437.168610] umount D 0 42653 42649 0x00004000
> [ 2437.175085] Call Trace:
> [ 2437.178179] __schedule+0x2e9/0x730
> [ 2437.182444] ? wait_for_completion+0x116/0x1c0
> [ 2437.187768] schedule+0x36/0xc0
> [ 2437.191640] schedule_timeout+0x222/0x350
> [ 2437.196520] ? wait_for_completion+0x116/0x1c0
> [ 2437.201861] wait_for_completion+0x14e/0x1c0
> [ 2437.207017] ? wake_up_q+0x70/0x70
> [ 2437.211237] __synchronize_srcu.part.20+0x94/0xc0
> [ 2437.216912] ? __bpf_trace_rcu_utilization+0x10/0x10
> [ 2437.222847] btrfs_drop_and_free_fs_root+0x8e/0xf0 [btrfs]
> [ 2437.229393] switch_commit_roots+0x18d/0x1d0 [btrfs]
> [ 2437.235361] btrfs_commit_transaction+0x615/0xa80 [btrfs]
> [ 2437.241794] ? finish_wait+0x80/0x80
> [ 2437.246207] sync_filesystem+0x71/0x90
> [ 2437.250837] generic_shutdown_super+0x22/0x110
> [ 2437.256209] kill_anon_super+0x14/0x30
> [ 2437.260846] btrfs_kill_super+0x12/0xa0 [btrfs]
> [ 2437.266304] deactivate_locked_super+0x34/0x70
> [ 2437.271685] cleanup_mnt+0xb8/0x140
> [ 2437.276012] task_work_run+0xa3/0xe0
> [ 2437.280352] exit_to_usermode_loop+0x103/0x120
> [ 2437.285728] do_syscall_64+0x198/0x1a0
> [ 2437.290320] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 2437.296376] RIP: 0033:0x7f6d5b088ba7
> [ 2437.300782] Code: Bad RIP value.
> [ 2437.304780] RSP: 002b:00007ffdc69e0b08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> [ 2437.313649] RAX: 0000000000000000 RBX: 000055d46fc65060 RCX: 00007f6d5b088ba7
> [ 2437.322070] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055d46fc6bb10
> [ 2437.330478] RBP: 000055d46fc6bb10 R08: 0000000000000000 R09: 000000000000000d
> [ 2437.338864] R10: 00007ffdc69e0560 R11: 0000000000000246 R12: 00007f6d5be32ee4
> [ 2437.347244] R13: 0000000000000000 R14: 000055d46fc652d0 R15: 00000000ffffffff
>
>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/06/10
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fstests btrfs/057 trigger a deadlock on 5.4.187
2022-06-11 6:26 ` Wang Yugui
@ 2022-06-11 6:35 ` Wang Yugui
2022-07-18 0:03 ` Wang Yugui
0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-06-11 6:35 UTC (permalink / raw)
To: linux-btrfs
Hi,
> this dead-lock is triggered by fstests btrfs/057 again on the lastest 5.4(5.4.197).
>
> Frequency: about 1/100.
>
> The Call Trace on 5.4.197
>
> [ 704.619697] sysrq: Show Blocked State
> [ 704.624312] task PC stack pid father
> [ 704.632499] btrfs-cleaner D 0 33205 2 0x80004000
> [ 704.639055] Call Trace:
> [ 704.642239] __schedule+0x284/0x6d0
> [ 704.646595] schedule+0x2f/0xa0
> [ 704.650669] wait_current_trans+0xb3/0xf0 [btrfs]
> [ 704.656407] ? finish_wait+0x80/0x80
> [ 704.660916] start_transaction+0x41c/0x4f0 [btrfs]
> [ 704.666767] btrfs_drop_snapshot+0x548/0x840 [btrfs]
> [ 704.672778] ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
> [ 704.679247] ? down_write+0xe/0x40
> [ 704.683524] ? btrfs_delete_unused_bgs+0x35/0x5b0 [btrfs]
> [ 704.690065] ? btree_submit_bio_start+0x10/0x10 [btrfs]
> [ 704.696366] btrfs_clean_one_deleted_snapshot+0xba/0x110 [btrfs]
> [ 704.703513] cleaner_kthread+0xfa/0x120 [btrfs]
> [ 704.709054] kthread+0x112/0x130
> [ 704.713104] ? __kthread_cancel_work+0x40/0x40
> [ 704.718511] ret_from_fork+0x1f/0x40
>
> [ 704.722978] btrfs-transacti D 0 33206 2 0x80004000
> [ 704.729585] Call Trace:
> [ 704.732809] __schedule+0x284/0x6d0
> [ 704.737156] schedule+0x2f/0xa0
> [ 704.741123] wait_current_trans+0xb3/0xf0 [btrfs]
> [ 704.746823] ? finish_wait+0x80/0x80
> [ 704.751270] start_transaction+0x315/0x4f0 [btrfs]
> [ 704.757099] transaction_kthread+0xa2/0x180 [btrfs]
> [ 704.763001] ? btrfs_cleanup_transaction+0x590/0x590 [btrfs]
> [ 704.769776] kthread+0x112/0x130
> [ 704.773811] ? __kthread_cancel_work+0x40/0x40
> [ 704.779218] ret_from_fork+0x1f/0x40
>
> [ 704.783650] umount D 0 33243 33239 0x00004000
> [ 704.790253] Call Trace:
> [ 704.793423] __schedule+0x284/0x6d0
> [ 704.797788] schedule+0x2f/0xa0
> [ 704.801733] schedule_timeout+0x20d/0x340
> [ 704.806634] wait_for_completion+0x11f/0x190
> [ 704.811866] ? wake_up_q+0x70/0x70
> [ 704.816099] __synchronize_srcu.part.20+0x81/0xb0
> [ 704.821784] ? __bpf_trace_rcu_utilization+0x10/0x10
> [ 704.827787] btrfs_drop_and_free_fs_root+0x87/0xe0 [btrfs]
> [ 704.834350] switch_commit_roots+0x186/0x1c0 [btrfs]
> [ 704.840373] btrfs_commit_transaction+0x5da/0xa20 [btrfs]
> [ 704.846842] ? finish_wait+0x80/0x80
> [ 704.851265] sync_filesystem+0x71/0x90
> [ 704.855907] generic_shutdown_super+0x22/0x100
> [ 704.861342] kill_anon_super+0x14/0x30
> [ 704.866001] btrfs_kill_super+0x12/0xa0 [btrfs]
> [ 704.871487] deactivate_locked_super+0x34/0x70
> [ 704.876890] cleanup_mnt+0xb8/0x140
> [ 704.881190] task_work_run+0x8a/0xb0
> [ 704.885586] exit_to_usermode_loop+0x103/0x120
> [ 704.890981] do_syscall_64+0x198/0x1a0
> [ 704.895604] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 704.901698] RIP: 0033:0x7fc630348ba7
> [ 704.906098] Code: Bad RIP value.
> [ 704.910097] RSP: 002b:00007ffd612e8a18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> [ 704.918967] RAX: 0000000000000000 RBX: 0000555b85889060 RCX: 00007fc630348ba7
> [ 704.927386] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000555b8588fb10
> [ 704.935800] RBP: 0000555b8588fb10 R08: 0000000000000000 R09: 000000000000000d
> [ 704.944185] R10: 00007ffd612e84a0 R11: 0000000000000246 R12: 00007fc6310f2ee4
> [ 704.952559] R13: 0000000000000000 R14: 0000555b858892d0 R15: 00000000ffffffff
In addition to the 3 state D theads,
another thread(PID: 1560) is 100% busy
this is the call trace reported by 'sysrq l'
[ 1269.337878] CPU: 31 PID: 1560 Comm: kworker/31:2 Not tainted 5.4.197-2.el7.x86_64 #1
[ 1269.337878] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 2.14.0 01/25/2022
[ 1269.337879] Workqueue: rcu_gp process_srcu
[ 1269.337879] RIP: 0010:delay_tsc+0x27/0x60
[ 1269.337880] Code: 00 00 00 0f 1f 44 00 00 65 44 8b 0d a3 c3 64 54 0f 01 f9 66 90 48 c1 e2 20 48 09 c2 49 89 d0 eb 11 f3 90 65 8b 35 89 c3 64 54 <41> 39 f1 75 1b 41 89 f1 0f 01 f9 66 90 48 c1 e2 20 48 09 d0 48 89
[ 1269.337880] RSP: 0018:ffffc3e30e1bbde0 EFLAGS: 00000283
[ 1269.337880] RAX: 000007443487c36f RBX: 00000000000042bc RCX: 000000000000101f
[ 1269.337881] RDX: 00000000000032a8 RSI: 000000000000001f RDI: 00000000000032c5
[ 1269.337881] RBP: 0000000000000000 R08: 00000744348790c7 R09: 000000000000001f
[ 1269.337881] R10: 8080808080808080 R11: ffffffffaca52ff8 R12: 0000000000000000
[ 1269.337882] R13: 00000000000042bf R14: 00000000ffffffff R15: ffffa01a770b0498
[ 1269.337882] FS: 0000000000000000(0000) GS:ffffa05aff7c0000(0000) knlGS:0000000000000000
[ 1269.337882] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1269.337882] CR2: 0000555b8589e838 CR3: 0000007f7006e002 CR4: 00000000003606e0
[ 1269.337883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1269.337883] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1269.337883] Call Trace:
[ 1269.337883] try_check_zero+0xc2/0xf0
[ 1269.337883] process_srcu+0x15e/0x4a0
[ 1269.337884] ? __queue_work+0x13d/0x3e0
[ 1269.337884] process_one_work+0x1a7/0x360
[ 1269.337884] worker_thread+0x30/0x390
[ 1269.337884] ? create_worker+0x1a0/0x1a0
[ 1269.337884] kthread+0x112/0x130
[ 1269.337885] ? __kthread_cancel_work+0x40/0x40
[ 1269.337885] ret_from_fork+0x1f/0x40
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/11
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fstests btrfs/057 trigger a deadlock on 5.4.187
2022-06-11 6:35 ` Wang Yugui
@ 2022-07-18 0:03 ` Wang Yugui
2024-02-12 12:17 ` David Sterba
0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-07-18 0:03 UTC (permalink / raw)
To: linux-btrfs
Hi,
Because process_srcu() is 100% busy, so this problem is srcu related.
the only srcu in btrfs is subvol_srcu, but it is dropped in 5.17
c75e839414d3: Josef Bacik: btrfs: kill the subvol_srcu
and this patch is difficult to backport to 5.4.x.
so this problem maybe a little difficult to fix. If we hit this problem
in more frequency, we may upgrade to 5.10.x
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/07/18
> Hi,
>
>
> > this dead-lock is triggered by fstests btrfs/057 again on the lastest 5.4(5.4.197).
> >
> > Frequency: about 1/100.
> >
> > The Call Trace on 5.4.197
> >
> > [ 704.619697] sysrq: Show Blocked State
> > [ 704.624312] task PC stack pid father
> > [ 704.632499] btrfs-cleaner D 0 33205 2 0x80004000
> > [ 704.639055] Call Trace:
> > [ 704.642239] __schedule+0x284/0x6d0
> > [ 704.646595] schedule+0x2f/0xa0
> > [ 704.650669] wait_current_trans+0xb3/0xf0 [btrfs]
> > [ 704.656407] ? finish_wait+0x80/0x80
> > [ 704.660916] start_transaction+0x41c/0x4f0 [btrfs]
> > [ 704.666767] btrfs_drop_snapshot+0x548/0x840 [btrfs]
> > [ 704.672778] ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
> > [ 704.679247] ? down_write+0xe/0x40
> > [ 704.683524] ? btrfs_delete_unused_bgs+0x35/0x5b0 [btrfs]
> > [ 704.690065] ? btree_submit_bio_start+0x10/0x10 [btrfs]
> > [ 704.696366] btrfs_clean_one_deleted_snapshot+0xba/0x110 [btrfs]
> > [ 704.703513] cleaner_kthread+0xfa/0x120 [btrfs]
> > [ 704.709054] kthread+0x112/0x130
> > [ 704.713104] ? __kthread_cancel_work+0x40/0x40
> > [ 704.718511] ret_from_fork+0x1f/0x40
> >
> > [ 704.722978] btrfs-transacti D 0 33206 2 0x80004000
> > [ 704.729585] Call Trace:
> > [ 704.732809] __schedule+0x284/0x6d0
> > [ 704.737156] schedule+0x2f/0xa0
> > [ 704.741123] wait_current_trans+0xb3/0xf0 [btrfs]
> > [ 704.746823] ? finish_wait+0x80/0x80
> > [ 704.751270] start_transaction+0x315/0x4f0 [btrfs]
> > [ 704.757099] transaction_kthread+0xa2/0x180 [btrfs]
> > [ 704.763001] ? btrfs_cleanup_transaction+0x590/0x590 [btrfs]
> > [ 704.769776] kthread+0x112/0x130
> > [ 704.773811] ? __kthread_cancel_work+0x40/0x40
> > [ 704.779218] ret_from_fork+0x1f/0x40
> >
> > [ 704.783650] umount D 0 33243 33239 0x00004000
> > [ 704.790253] Call Trace:
> > [ 704.793423] __schedule+0x284/0x6d0
> > [ 704.797788] schedule+0x2f/0xa0
> > [ 704.801733] schedule_timeout+0x20d/0x340
> > [ 704.806634] wait_for_completion+0x11f/0x190
> > [ 704.811866] ? wake_up_q+0x70/0x70
> > [ 704.816099] __synchronize_srcu.part.20+0x81/0xb0
> > [ 704.821784] ? __bpf_trace_rcu_utilization+0x10/0x10
> > [ 704.827787] btrfs_drop_and_free_fs_root+0x87/0xe0 [btrfs]
> > [ 704.834350] switch_commit_roots+0x186/0x1c0 [btrfs]
> > [ 704.840373] btrfs_commit_transaction+0x5da/0xa20 [btrfs]
> > [ 704.846842] ? finish_wait+0x80/0x80
> > [ 704.851265] sync_filesystem+0x71/0x90
> > [ 704.855907] generic_shutdown_super+0x22/0x100
> > [ 704.861342] kill_anon_super+0x14/0x30
> > [ 704.866001] btrfs_kill_super+0x12/0xa0 [btrfs]
> > [ 704.871487] deactivate_locked_super+0x34/0x70
> > [ 704.876890] cleanup_mnt+0xb8/0x140
> > [ 704.881190] task_work_run+0x8a/0xb0
> > [ 704.885586] exit_to_usermode_loop+0x103/0x120
> > [ 704.890981] do_syscall_64+0x198/0x1a0
> > [ 704.895604] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [ 704.901698] RIP: 0033:0x7fc630348ba7
> > [ 704.906098] Code: Bad RIP value.
> > [ 704.910097] RSP: 002b:00007ffd612e8a18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> > [ 704.918967] RAX: 0000000000000000 RBX: 0000555b85889060 RCX: 00007fc630348ba7
> > [ 704.927386] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000555b8588fb10
> > [ 704.935800] RBP: 0000555b8588fb10 R08: 0000000000000000 R09: 000000000000000d
> > [ 704.944185] R10: 00007ffd612e84a0 R11: 0000000000000246 R12: 00007fc6310f2ee4
> > [ 704.952559] R13: 0000000000000000 R14: 0000555b858892d0 R15: 00000000ffffffff
>
> In addition to the 3 state D theads,
> another thread(PID: 1560) is 100% busy
>
> this is the call trace reported by 'sysrq l'
>
> [ 1269.337878] CPU: 31 PID: 1560 Comm: kworker/31:2 Not tainted 5.4.197-2.el7.x86_64 #1
> [ 1269.337878] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 2.14.0 01/25/2022
> [ 1269.337879] Workqueue: rcu_gp process_srcu
> [ 1269.337879] RIP: 0010:delay_tsc+0x27/0x60
> [ 1269.337880] Code: 00 00 00 0f 1f 44 00 00 65 44 8b 0d a3 c3 64 54 0f 01 f9 66 90 48 c1 e2 20 48 09 c2 49 89 d0 eb 11 f3 90 65 8b 35 89 c3 64 54 <41> 39 f1 75 1b 41 89 f1 0f 01 f9 66 90 48 c1 e2 20 48 09 d0 48 89
> [ 1269.337880] RSP: 0018:ffffc3e30e1bbde0 EFLAGS: 00000283
> [ 1269.337880] RAX: 000007443487c36f RBX: 00000000000042bc RCX: 000000000000101f
> [ 1269.337881] RDX: 00000000000032a8 RSI: 000000000000001f RDI: 00000000000032c5
> [ 1269.337881] RBP: 0000000000000000 R08: 00000744348790c7 R09: 000000000000001f
> [ 1269.337881] R10: 8080808080808080 R11: ffffffffaca52ff8 R12: 0000000000000000
> [ 1269.337882] R13: 00000000000042bf R14: 00000000ffffffff R15: ffffa01a770b0498
> [ 1269.337882] FS: 0000000000000000(0000) GS:ffffa05aff7c0000(0000) knlGS:0000000000000000
> [ 1269.337882] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1269.337882] CR2: 0000555b8589e838 CR3: 0000007f7006e002 CR4: 00000000003606e0
> [ 1269.337883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1269.337883] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1269.337883] Call Trace:
> [ 1269.337883] try_check_zero+0xc2/0xf0
> [ 1269.337883] process_srcu+0x15e/0x4a0
> [ 1269.337884] ? __queue_work+0x13d/0x3e0
> [ 1269.337884] process_one_work+0x1a7/0x360
> [ 1269.337884] worker_thread+0x30/0x390
> [ 1269.337884] ? create_worker+0x1a0/0x1a0
> [ 1269.337884] kthread+0x112/0x130
> [ 1269.337885] ? __kthread_cancel_work+0x40/0x40
> [ 1269.337885] ret_from_fork+0x1f/0x40
>
>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/06/11
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fstests btrfs/057 trigger a deadlock on 5.4.187
2022-07-18 0:03 ` Wang Yugui
@ 2024-02-12 12:17 ` David Sterba
0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2024-02-12 12:17 UTC (permalink / raw)
To: Wang Yugui; +Cc: linux-btrfs
On Mon, Jul 18, 2022 at 08:03:34AM +0800, Wang Yugui wrote:
> Hi,
>
> Because process_srcu() is 100% busy, so this problem is srcu related.
>
> the only srcu in btrfs is subvol_srcu, but it is dropped in 5.17
> c75e839414d3: Josef Bacik: btrfs: kill the subvol_srcu
> and this patch is difficult to backport to 5.4.x.
The whole SRCU use was replaced by root reference counting and spanning
many patches, backport would be really hard.
> so this problem maybe a little difficult to fix. If we hit this problem
> in more frequency, we may upgrade to 5.10.x
This could still be affected though.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-02-12 12:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-10 9:22 fstests btrfs/057 trigger a deadlock on 5.4.187 Wang Yugui
2022-06-11 6:26 ` Wang Yugui
2022-06-11 6:35 ` Wang Yugui
2022-07-18 0:03 ` Wang Yugui
2024-02-12 12:17 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox