Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* fstests btrfs/057 trigger a deadlock on 5.4.187
@ 2022-06-10  9:22 Wang Yugui
  2022-06-11  6:26 ` Wang Yugui
  0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-06-10  9:22 UTC (permalink / raw)
  To: linux-btrfs

Hi,

fstests btrfs/057 trigger a deadlock on linux 5.4.187

Frequency: yet not able to reproduce it after 50 loops.
but it happen on a server with ECC memory.

Maybe the Call Trace info help.

[ 2437.005526] sysrq: Show Blocked State
[ 2437.009825]   task                        PC stack   pid father
[ 2437.017595] btrfs-cleaner   D    0 42617      2 0x80004000
[ 2437.023916] Call Trace:
[ 2437.026841]  __schedule+0x2e9/0x730
[ 2437.030921]  schedule+0x36/0xc0
[ 2437.034672]  wait_current_trans+0xd4/0x110 [btrfs]
[ 2437.040224]  ? finish_wait+0x80/0x80
[ 2437.044462]  start_transaction+0x42a/0x500 [btrfs]
[ 2437.050051]  btrfs_drop_snapshot+0x548/0x840 [btrfs]
[ 2437.055818]  ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
[ 2437.062054]  ? down_write+0x21/0x50
[ 2437.066199]  ? btrfs_delete_unused_bgs+0x35/0x610 [btrfs]
[ 2437.072466]  ? cleaner_kthread+0xc8/0x150 [btrfs]
[ 2437.077950]  btrfs_clean_one_deleted_snapshot+0xc1/0x110 [btrfs]
[ 2437.084879]  cleaner_kthread+0x121/0x150 [btrfs]
[ 2437.090277]  ? btree_submit_bio_start+0x10/0x10 [btrfs]
[ 2437.096339]  kthread+0x12b/0x150
[ 2437.100190]  ? __kthread_cancel_work+0x40/0x40
[ 2437.105407]  ret_from_fork+0x1f/0x40

[ 2437.109652] btrfs-transacti D    0 42618      2 0x80004000
[ 2437.116048] Call Trace:
[ 2437.119052]  __schedule+0x2e9/0x730
[ 2437.123224]  schedule+0x36/0xc0
[ 2437.127046]  wait_current_trans+0xd4/0x110 [btrfs]
[ 2437.132671]  ? finish_wait+0x80/0x80
[ 2437.136972]  start_transaction+0x315/0x500 [btrfs]
[ 2437.142655]  transaction_kthread+0xa9/0x190 [btrfs]
[ 2437.148424]  ? btrfs_cleanup_transaction+0x610/0x610 [btrfs]
[ 2437.155048]  kthread+0x12b/0x150
[ 2437.158979]  ? __kthread_cancel_work+0x40/0x40
[ 2437.164279]  ret_from_fork+0x1f/0x40
[ 2437.168610] umount          D    0 42653  42649 0x00004000
[ 2437.175085] Call Trace:
[ 2437.178179]  __schedule+0x2e9/0x730
[ 2437.182444]  ? wait_for_completion+0x116/0x1c0
[ 2437.187768]  schedule+0x36/0xc0
[ 2437.191640]  schedule_timeout+0x222/0x350
[ 2437.196520]  ? wait_for_completion+0x116/0x1c0
[ 2437.201861]  wait_for_completion+0x14e/0x1c0
[ 2437.207017]  ? wake_up_q+0x70/0x70
[ 2437.211237]  __synchronize_srcu.part.20+0x94/0xc0
[ 2437.216912]  ? __bpf_trace_rcu_utilization+0x10/0x10
[ 2437.222847]  btrfs_drop_and_free_fs_root+0x8e/0xf0 [btrfs]
[ 2437.229393]  switch_commit_roots+0x18d/0x1d0 [btrfs]
[ 2437.235361]  btrfs_commit_transaction+0x615/0xa80 [btrfs]
[ 2437.241794]  ? finish_wait+0x80/0x80
[ 2437.246207]  sync_filesystem+0x71/0x90
[ 2437.250837]  generic_shutdown_super+0x22/0x110
[ 2437.256209]  kill_anon_super+0x14/0x30
[ 2437.260846]  btrfs_kill_super+0x12/0xa0 [btrfs]
[ 2437.266304]  deactivate_locked_super+0x34/0x70
[ 2437.271685]  cleanup_mnt+0xb8/0x140
[ 2437.276012]  task_work_run+0xa3/0xe0
[ 2437.280352]  exit_to_usermode_loop+0x103/0x120
[ 2437.285728]  do_syscall_64+0x198/0x1a0
[ 2437.290320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2437.296376] RIP: 0033:0x7f6d5b088ba7
[ 2437.300782] Code: Bad RIP value.
[ 2437.304780] RSP: 002b:00007ffdc69e0b08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 2437.313649] RAX: 0000000000000000 RBX: 000055d46fc65060 RCX: 00007f6d5b088ba7
[ 2437.322070] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055d46fc6bb10
[ 2437.330478] RBP: 000055d46fc6bb10 R08: 0000000000000000 R09: 000000000000000d
[ 2437.338864] R10: 00007ffdc69e0560 R11: 0000000000000246 R12: 00007f6d5be32ee4
[ 2437.347244] R13: 0000000000000000 R14: 000055d46fc652d0 R15: 00000000ffffffff


Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/10



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fstests btrfs/057 trigger a deadlock on 5.4.187
  2022-06-10  9:22 fstests btrfs/057 trigger a deadlock on 5.4.187 Wang Yugui
@ 2022-06-11  6:26 ` Wang Yugui
  2022-06-11  6:35   ` Wang Yugui
  0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-06-11  6:26 UTC (permalink / raw)
  To: linux-btrfs

Hi,

this dead-lock is triggered by fstests btrfs/057 again on the lastest 5.4(5.4.197).

Frequency: about 1/100.

The Call Trace on 5.4.197

[  704.619697] sysrq: Show Blocked State
[  704.624312]   task                        PC stack   pid father
[  704.632499] btrfs-cleaner   D    0 33205      2 0x80004000
[  704.639055] Call Trace:
[  704.642239]  __schedule+0x284/0x6d0
[  704.646595]  schedule+0x2f/0xa0
[  704.650669]  wait_current_trans+0xb3/0xf0 [btrfs]
[  704.656407]  ? finish_wait+0x80/0x80
[  704.660916]  start_transaction+0x41c/0x4f0 [btrfs]
[  704.666767]  btrfs_drop_snapshot+0x548/0x840 [btrfs]
[  704.672778]  ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
[  704.679247]  ? down_write+0xe/0x40
[  704.683524]  ? btrfs_delete_unused_bgs+0x35/0x5b0 [btrfs]
[  704.690065]  ? btree_submit_bio_start+0x10/0x10 [btrfs]
[  704.696366]  btrfs_clean_one_deleted_snapshot+0xba/0x110 [btrfs]
[  704.703513]  cleaner_kthread+0xfa/0x120 [btrfs]
[  704.709054]  kthread+0x112/0x130
[  704.713104]  ? __kthread_cancel_work+0x40/0x40
[  704.718511]  ret_from_fork+0x1f/0x40

[  704.722978] btrfs-transacti D    0 33206      2 0x80004000
[  704.729585] Call Trace:
[  704.732809]  __schedule+0x284/0x6d0
[  704.737156]  schedule+0x2f/0xa0
[  704.741123]  wait_current_trans+0xb3/0xf0 [btrfs]
[  704.746823]  ? finish_wait+0x80/0x80
[  704.751270]  start_transaction+0x315/0x4f0 [btrfs]
[  704.757099]  transaction_kthread+0xa2/0x180 [btrfs]
[  704.763001]  ? btrfs_cleanup_transaction+0x590/0x590 [btrfs]
[  704.769776]  kthread+0x112/0x130
[  704.773811]  ? __kthread_cancel_work+0x40/0x40
[  704.779218]  ret_from_fork+0x1f/0x40

[  704.783650] umount          D    0 33243  33239 0x00004000
[  704.790253] Call Trace:
[  704.793423]  __schedule+0x284/0x6d0
[  704.797788]  schedule+0x2f/0xa0
[  704.801733]  schedule_timeout+0x20d/0x340
[  704.806634]  wait_for_completion+0x11f/0x190
[  704.811866]  ? wake_up_q+0x70/0x70
[  704.816099]  __synchronize_srcu.part.20+0x81/0xb0
[  704.821784]  ? __bpf_trace_rcu_utilization+0x10/0x10
[  704.827787]  btrfs_drop_and_free_fs_root+0x87/0xe0 [btrfs]
[  704.834350]  switch_commit_roots+0x186/0x1c0 [btrfs]
[  704.840373]  btrfs_commit_transaction+0x5da/0xa20 [btrfs]
[  704.846842]  ? finish_wait+0x80/0x80
[  704.851265]  sync_filesystem+0x71/0x90
[  704.855907]  generic_shutdown_super+0x22/0x100
[  704.861342]  kill_anon_super+0x14/0x30
[  704.866001]  btrfs_kill_super+0x12/0xa0 [btrfs]
[  704.871487]  deactivate_locked_super+0x34/0x70
[  704.876890]  cleanup_mnt+0xb8/0x140
[  704.881190]  task_work_run+0x8a/0xb0
[  704.885586]  exit_to_usermode_loop+0x103/0x120
[  704.890981]  do_syscall_64+0x198/0x1a0
[  704.895604]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  704.901698] RIP: 0033:0x7fc630348ba7
[  704.906098] Code: Bad RIP value.
[  704.910097] RSP: 002b:00007ffd612e8a18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[  704.918967] RAX: 0000000000000000 RBX: 0000555b85889060 RCX: 00007fc630348ba7
[  704.927386] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000555b8588fb10
[  704.935800] RBP: 0000555b8588fb10 R08: 0000000000000000 R09: 000000000000000d
[  704.944185] R10: 00007ffd612e84a0 R11: 0000000000000246 R12: 00007fc6310f2ee4
[  704.952559] R13: 0000000000000000 R14: 0000555b858892d0 R15: 00000000ffffffff


Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/11

> Hi,
> 
> fstests btrfs/057 trigger a deadlock on linux 5.4.187
> 
> Frequency: yet not able to reproduce it after 50 loops.
> but it happen on a server with ECC memory.
> 
> Maybe the Call Trace info help.
> 
> [ 2437.005526] sysrq: Show Blocked State
> [ 2437.009825]   task                        PC stack   pid father
> [ 2437.017595] btrfs-cleaner   D    0 42617      2 0x80004000
> [ 2437.023916] Call Trace:
> [ 2437.026841]  __schedule+0x2e9/0x730
> [ 2437.030921]  schedule+0x36/0xc0
> [ 2437.034672]  wait_current_trans+0xd4/0x110 [btrfs]
> [ 2437.040224]  ? finish_wait+0x80/0x80
> [ 2437.044462]  start_transaction+0x42a/0x500 [btrfs]
> [ 2437.050051]  btrfs_drop_snapshot+0x548/0x840 [btrfs]
> [ 2437.055818]  ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
> [ 2437.062054]  ? down_write+0x21/0x50
> [ 2437.066199]  ? btrfs_delete_unused_bgs+0x35/0x610 [btrfs]
> [ 2437.072466]  ? cleaner_kthread+0xc8/0x150 [btrfs]
> [ 2437.077950]  btrfs_clean_one_deleted_snapshot+0xc1/0x110 [btrfs]
> [ 2437.084879]  cleaner_kthread+0x121/0x150 [btrfs]
> [ 2437.090277]  ? btree_submit_bio_start+0x10/0x10 [btrfs]
> [ 2437.096339]  kthread+0x12b/0x150
> [ 2437.100190]  ? __kthread_cancel_work+0x40/0x40
> [ 2437.105407]  ret_from_fork+0x1f/0x40
> 
> [ 2437.109652] btrfs-transacti D    0 42618      2 0x80004000
> [ 2437.116048] Call Trace:
> [ 2437.119052]  __schedule+0x2e9/0x730
> [ 2437.123224]  schedule+0x36/0xc0
> [ 2437.127046]  wait_current_trans+0xd4/0x110 [btrfs]
> [ 2437.132671]  ? finish_wait+0x80/0x80
> [ 2437.136972]  start_transaction+0x315/0x500 [btrfs]
> [ 2437.142655]  transaction_kthread+0xa9/0x190 [btrfs]
> [ 2437.148424]  ? btrfs_cleanup_transaction+0x610/0x610 [btrfs]
> [ 2437.155048]  kthread+0x12b/0x150
> [ 2437.158979]  ? __kthread_cancel_work+0x40/0x40
> [ 2437.164279]  ret_from_fork+0x1f/0x40
> [ 2437.168610] umount          D    0 42653  42649 0x00004000
> [ 2437.175085] Call Trace:
> [ 2437.178179]  __schedule+0x2e9/0x730
> [ 2437.182444]  ? wait_for_completion+0x116/0x1c0
> [ 2437.187768]  schedule+0x36/0xc0
> [ 2437.191640]  schedule_timeout+0x222/0x350
> [ 2437.196520]  ? wait_for_completion+0x116/0x1c0
> [ 2437.201861]  wait_for_completion+0x14e/0x1c0
> [ 2437.207017]  ? wake_up_q+0x70/0x70
> [ 2437.211237]  __synchronize_srcu.part.20+0x94/0xc0
> [ 2437.216912]  ? __bpf_trace_rcu_utilization+0x10/0x10
> [ 2437.222847]  btrfs_drop_and_free_fs_root+0x8e/0xf0 [btrfs]
> [ 2437.229393]  switch_commit_roots+0x18d/0x1d0 [btrfs]
> [ 2437.235361]  btrfs_commit_transaction+0x615/0xa80 [btrfs]
> [ 2437.241794]  ? finish_wait+0x80/0x80
> [ 2437.246207]  sync_filesystem+0x71/0x90
> [ 2437.250837]  generic_shutdown_super+0x22/0x110
> [ 2437.256209]  kill_anon_super+0x14/0x30
> [ 2437.260846]  btrfs_kill_super+0x12/0xa0 [btrfs]
> [ 2437.266304]  deactivate_locked_super+0x34/0x70
> [ 2437.271685]  cleanup_mnt+0xb8/0x140
> [ 2437.276012]  task_work_run+0xa3/0xe0
> [ 2437.280352]  exit_to_usermode_loop+0x103/0x120
> [ 2437.285728]  do_syscall_64+0x198/0x1a0
> [ 2437.290320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 2437.296376] RIP: 0033:0x7f6d5b088ba7
> [ 2437.300782] Code: Bad RIP value.
> [ 2437.304780] RSP: 002b:00007ffdc69e0b08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> [ 2437.313649] RAX: 0000000000000000 RBX: 000055d46fc65060 RCX: 00007f6d5b088ba7
> [ 2437.322070] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055d46fc6bb10
> [ 2437.330478] RBP: 000055d46fc6bb10 R08: 0000000000000000 R09: 000000000000000d
> [ 2437.338864] R10: 00007ffdc69e0560 R11: 0000000000000246 R12: 00007f6d5be32ee4
> [ 2437.347244] R13: 0000000000000000 R14: 000055d46fc652d0 R15: 00000000ffffffff
> 
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/06/10
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fstests btrfs/057 trigger a deadlock on 5.4.187
  2022-06-11  6:26 ` Wang Yugui
@ 2022-06-11  6:35   ` Wang Yugui
  2022-07-18  0:03     ` Wang Yugui
  0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-06-11  6:35 UTC (permalink / raw)
  To: linux-btrfs

Hi,


> this dead-lock is triggered by fstests btrfs/057 again on the lastest 5.4(5.4.197).
> 
> Frequency: about 1/100.
> 
> The Call Trace on 5.4.197
> 
> [  704.619697] sysrq: Show Blocked State
> [  704.624312]   task                        PC stack   pid father
> [  704.632499] btrfs-cleaner   D    0 33205      2 0x80004000
> [  704.639055] Call Trace:
> [  704.642239]  __schedule+0x284/0x6d0
> [  704.646595]  schedule+0x2f/0xa0
> [  704.650669]  wait_current_trans+0xb3/0xf0 [btrfs]
> [  704.656407]  ? finish_wait+0x80/0x80
> [  704.660916]  start_transaction+0x41c/0x4f0 [btrfs]
> [  704.666767]  btrfs_drop_snapshot+0x548/0x840 [btrfs]
> [  704.672778]  ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
> [  704.679247]  ? down_write+0xe/0x40
> [  704.683524]  ? btrfs_delete_unused_bgs+0x35/0x5b0 [btrfs]
> [  704.690065]  ? btree_submit_bio_start+0x10/0x10 [btrfs]
> [  704.696366]  btrfs_clean_one_deleted_snapshot+0xba/0x110 [btrfs]
> [  704.703513]  cleaner_kthread+0xfa/0x120 [btrfs]
> [  704.709054]  kthread+0x112/0x130
> [  704.713104]  ? __kthread_cancel_work+0x40/0x40
> [  704.718511]  ret_from_fork+0x1f/0x40
> 
> [  704.722978] btrfs-transacti D    0 33206      2 0x80004000
> [  704.729585] Call Trace:
> [  704.732809]  __schedule+0x284/0x6d0
> [  704.737156]  schedule+0x2f/0xa0
> [  704.741123]  wait_current_trans+0xb3/0xf0 [btrfs]
> [  704.746823]  ? finish_wait+0x80/0x80
> [  704.751270]  start_transaction+0x315/0x4f0 [btrfs]
> [  704.757099]  transaction_kthread+0xa2/0x180 [btrfs]
> [  704.763001]  ? btrfs_cleanup_transaction+0x590/0x590 [btrfs]
> [  704.769776]  kthread+0x112/0x130
> [  704.773811]  ? __kthread_cancel_work+0x40/0x40
> [  704.779218]  ret_from_fork+0x1f/0x40
> 
> [  704.783650] umount          D    0 33243  33239 0x00004000
> [  704.790253] Call Trace:
> [  704.793423]  __schedule+0x284/0x6d0
> [  704.797788]  schedule+0x2f/0xa0
> [  704.801733]  schedule_timeout+0x20d/0x340
> [  704.806634]  wait_for_completion+0x11f/0x190
> [  704.811866]  ? wake_up_q+0x70/0x70
> [  704.816099]  __synchronize_srcu.part.20+0x81/0xb0
> [  704.821784]  ? __bpf_trace_rcu_utilization+0x10/0x10
> [  704.827787]  btrfs_drop_and_free_fs_root+0x87/0xe0 [btrfs]
> [  704.834350]  switch_commit_roots+0x186/0x1c0 [btrfs]
> [  704.840373]  btrfs_commit_transaction+0x5da/0xa20 [btrfs]
> [  704.846842]  ? finish_wait+0x80/0x80
> [  704.851265]  sync_filesystem+0x71/0x90
> [  704.855907]  generic_shutdown_super+0x22/0x100
> [  704.861342]  kill_anon_super+0x14/0x30
> [  704.866001]  btrfs_kill_super+0x12/0xa0 [btrfs]
> [  704.871487]  deactivate_locked_super+0x34/0x70
> [  704.876890]  cleanup_mnt+0xb8/0x140
> [  704.881190]  task_work_run+0x8a/0xb0
> [  704.885586]  exit_to_usermode_loop+0x103/0x120
> [  704.890981]  do_syscall_64+0x198/0x1a0
> [  704.895604]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  704.901698] RIP: 0033:0x7fc630348ba7
> [  704.906098] Code: Bad RIP value.
> [  704.910097] RSP: 002b:00007ffd612e8a18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> [  704.918967] RAX: 0000000000000000 RBX: 0000555b85889060 RCX: 00007fc630348ba7
> [  704.927386] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000555b8588fb10
> [  704.935800] RBP: 0000555b8588fb10 R08: 0000000000000000 R09: 000000000000000d
> [  704.944185] R10: 00007ffd612e84a0 R11: 0000000000000246 R12: 00007fc6310f2ee4
> [  704.952559] R13: 0000000000000000 R14: 0000555b858892d0 R15: 00000000ffffffff

In addition to the 3 state D theads,
another thread(PID: 1560) is 100% busy

this is the call trace reported by 'sysrq l'

[ 1269.337878] CPU: 31 PID: 1560 Comm: kworker/31:2 Not tainted 5.4.197-2.el7.x86_64 #1
[ 1269.337878] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 2.14.0 01/25/2022
[ 1269.337879] Workqueue: rcu_gp process_srcu
[ 1269.337879] RIP: 0010:delay_tsc+0x27/0x60
[ 1269.337880] Code: 00 00 00 0f 1f 44 00 00 65 44 8b 0d a3 c3 64 54 0f 01 f9 66 90 48 c1 e2 20 48 09 c2 49 89 d0 eb 11 f3 90 65 8b 35 89 c3 64 54 <41> 39 f1 75 1b 41 89 f1 0f 01 f9 66 90 48 c1 e2 20 48 09 d0 48 89
[ 1269.337880] RSP: 0018:ffffc3e30e1bbde0 EFLAGS: 00000283
[ 1269.337880] RAX: 000007443487c36f RBX: 00000000000042bc RCX: 000000000000101f
[ 1269.337881] RDX: 00000000000032a8 RSI: 000000000000001f RDI: 00000000000032c5
[ 1269.337881] RBP: 0000000000000000 R08: 00000744348790c7 R09: 000000000000001f
[ 1269.337881] R10: 8080808080808080 R11: ffffffffaca52ff8 R12: 0000000000000000
[ 1269.337882] R13: 00000000000042bf R14: 00000000ffffffff R15: ffffa01a770b0498
[ 1269.337882] FS:  0000000000000000(0000) GS:ffffa05aff7c0000(0000) knlGS:0000000000000000
[ 1269.337882] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1269.337882] CR2: 0000555b8589e838 CR3: 0000007f7006e002 CR4: 00000000003606e0
[ 1269.337883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1269.337883] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1269.337883] Call Trace:
[ 1269.337883]  try_check_zero+0xc2/0xf0
[ 1269.337883]  process_srcu+0x15e/0x4a0
[ 1269.337884]  ? __queue_work+0x13d/0x3e0
[ 1269.337884]  process_one_work+0x1a7/0x360
[ 1269.337884]  worker_thread+0x30/0x390
[ 1269.337884]  ? create_worker+0x1a0/0x1a0
[ 1269.337884]  kthread+0x112/0x130
[ 1269.337885]  ? __kthread_cancel_work+0x40/0x40
[ 1269.337885]  ret_from_fork+0x1f/0x40


Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/06/11



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fstests btrfs/057 trigger a deadlock on 5.4.187
  2022-06-11  6:35   ` Wang Yugui
@ 2022-07-18  0:03     ` Wang Yugui
  2024-02-12 12:17       ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-07-18  0:03 UTC (permalink / raw)
  To: linux-btrfs

Hi,

Because process_srcu() is 100% busy, so this problem is srcu related.

the only srcu in btrfs is subvol_srcu, but it is dropped in 5.17
	c75e839414d3: Josef Bacik: btrfs: kill the subvol_srcu
and this patch is difficult to backport to 5.4.x.

so this problem maybe a little difficult to fix.  If we hit this problem
in more frequency, we may upgrade to 5.10.x

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/07/18

> Hi,
> 
> 
> > this dead-lock is triggered by fstests btrfs/057 again on the lastest 5.4(5.4.197).
> > 
> > Frequency: about 1/100.
> > 
> > The Call Trace on 5.4.197
> > 
> > [  704.619697] sysrq: Show Blocked State
> > [  704.624312]   task                        PC stack   pid father
> > [  704.632499] btrfs-cleaner   D    0 33205      2 0x80004000
> > [  704.639055] Call Trace:
> > [  704.642239]  __schedule+0x284/0x6d0
> > [  704.646595]  schedule+0x2f/0xa0
> > [  704.650669]  wait_current_trans+0xb3/0xf0 [btrfs]
> > [  704.656407]  ? finish_wait+0x80/0x80
> > [  704.660916]  start_transaction+0x41c/0x4f0 [btrfs]
> > [  704.666767]  btrfs_drop_snapshot+0x548/0x840 [btrfs]
> > [  704.672778]  ? btrfs_run_defrag_inodes+0x79/0x380 [btrfs]
> > [  704.679247]  ? down_write+0xe/0x40
> > [  704.683524]  ? btrfs_delete_unused_bgs+0x35/0x5b0 [btrfs]
> > [  704.690065]  ? btree_submit_bio_start+0x10/0x10 [btrfs]
> > [  704.696366]  btrfs_clean_one_deleted_snapshot+0xba/0x110 [btrfs]
> > [  704.703513]  cleaner_kthread+0xfa/0x120 [btrfs]
> > [  704.709054]  kthread+0x112/0x130
> > [  704.713104]  ? __kthread_cancel_work+0x40/0x40
> > [  704.718511]  ret_from_fork+0x1f/0x40
> > 
> > [  704.722978] btrfs-transacti D    0 33206      2 0x80004000
> > [  704.729585] Call Trace:
> > [  704.732809]  __schedule+0x284/0x6d0
> > [  704.737156]  schedule+0x2f/0xa0
> > [  704.741123]  wait_current_trans+0xb3/0xf0 [btrfs]
> > [  704.746823]  ? finish_wait+0x80/0x80
> > [  704.751270]  start_transaction+0x315/0x4f0 [btrfs]
> > [  704.757099]  transaction_kthread+0xa2/0x180 [btrfs]
> > [  704.763001]  ? btrfs_cleanup_transaction+0x590/0x590 [btrfs]
> > [  704.769776]  kthread+0x112/0x130
> > [  704.773811]  ? __kthread_cancel_work+0x40/0x40
> > [  704.779218]  ret_from_fork+0x1f/0x40
> > 
> > [  704.783650] umount          D    0 33243  33239 0x00004000
> > [  704.790253] Call Trace:
> > [  704.793423]  __schedule+0x284/0x6d0
> > [  704.797788]  schedule+0x2f/0xa0
> > [  704.801733]  schedule_timeout+0x20d/0x340
> > [  704.806634]  wait_for_completion+0x11f/0x190
> > [  704.811866]  ? wake_up_q+0x70/0x70
> > [  704.816099]  __synchronize_srcu.part.20+0x81/0xb0
> > [  704.821784]  ? __bpf_trace_rcu_utilization+0x10/0x10
> > [  704.827787]  btrfs_drop_and_free_fs_root+0x87/0xe0 [btrfs]
> > [  704.834350]  switch_commit_roots+0x186/0x1c0 [btrfs]
> > [  704.840373]  btrfs_commit_transaction+0x5da/0xa20 [btrfs]
> > [  704.846842]  ? finish_wait+0x80/0x80
> > [  704.851265]  sync_filesystem+0x71/0x90
> > [  704.855907]  generic_shutdown_super+0x22/0x100
> > [  704.861342]  kill_anon_super+0x14/0x30
> > [  704.866001]  btrfs_kill_super+0x12/0xa0 [btrfs]
> > [  704.871487]  deactivate_locked_super+0x34/0x70
> > [  704.876890]  cleanup_mnt+0xb8/0x140
> > [  704.881190]  task_work_run+0x8a/0xb0
> > [  704.885586]  exit_to_usermode_loop+0x103/0x120
> > [  704.890981]  do_syscall_64+0x198/0x1a0
> > [  704.895604]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  704.901698] RIP: 0033:0x7fc630348ba7
> > [  704.906098] Code: Bad RIP value.
> > [  704.910097] RSP: 002b:00007ffd612e8a18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> > [  704.918967] RAX: 0000000000000000 RBX: 0000555b85889060 RCX: 00007fc630348ba7
> > [  704.927386] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000555b8588fb10
> > [  704.935800] RBP: 0000555b8588fb10 R08: 0000000000000000 R09: 000000000000000d
> > [  704.944185] R10: 00007ffd612e84a0 R11: 0000000000000246 R12: 00007fc6310f2ee4
> > [  704.952559] R13: 0000000000000000 R14: 0000555b858892d0 R15: 00000000ffffffff
> 
> In addition to the 3 state D theads,
> another thread(PID: 1560) is 100% busy
> 
> this is the call trace reported by 'sysrq l'
> 
> [ 1269.337878] CPU: 31 PID: 1560 Comm: kworker/31:2 Not tainted 5.4.197-2.el7.x86_64 #1
> [ 1269.337878] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 2.14.0 01/25/2022
> [ 1269.337879] Workqueue: rcu_gp process_srcu
> [ 1269.337879] RIP: 0010:delay_tsc+0x27/0x60
> [ 1269.337880] Code: 00 00 00 0f 1f 44 00 00 65 44 8b 0d a3 c3 64 54 0f 01 f9 66 90 48 c1 e2 20 48 09 c2 49 89 d0 eb 11 f3 90 65 8b 35 89 c3 64 54 <41> 39 f1 75 1b 41 89 f1 0f 01 f9 66 90 48 c1 e2 20 48 09 d0 48 89
> [ 1269.337880] RSP: 0018:ffffc3e30e1bbde0 EFLAGS: 00000283
> [ 1269.337880] RAX: 000007443487c36f RBX: 00000000000042bc RCX: 000000000000101f
> [ 1269.337881] RDX: 00000000000032a8 RSI: 000000000000001f RDI: 00000000000032c5
> [ 1269.337881] RBP: 0000000000000000 R08: 00000744348790c7 R09: 000000000000001f
> [ 1269.337881] R10: 8080808080808080 R11: ffffffffaca52ff8 R12: 0000000000000000
> [ 1269.337882] R13: 00000000000042bf R14: 00000000ffffffff R15: ffffa01a770b0498
> [ 1269.337882] FS:  0000000000000000(0000) GS:ffffa05aff7c0000(0000) knlGS:0000000000000000
> [ 1269.337882] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1269.337882] CR2: 0000555b8589e838 CR3: 0000007f7006e002 CR4: 00000000003606e0
> [ 1269.337883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1269.337883] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1269.337883] Call Trace:
> [ 1269.337883]  try_check_zero+0xc2/0xf0
> [ 1269.337883]  process_srcu+0x15e/0x4a0
> [ 1269.337884]  ? __queue_work+0x13d/0x3e0
> [ 1269.337884]  process_one_work+0x1a7/0x360
> [ 1269.337884]  worker_thread+0x30/0x390
> [ 1269.337884]  ? create_worker+0x1a0/0x1a0
> [ 1269.337884]  kthread+0x112/0x130
> [ 1269.337885]  ? __kthread_cancel_work+0x40/0x40
> [ 1269.337885]  ret_from_fork+0x1f/0x40
> 
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/06/11
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fstests btrfs/057 trigger a deadlock on 5.4.187
  2022-07-18  0:03     ` Wang Yugui
@ 2024-02-12 12:17       ` David Sterba
  0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2024-02-12 12:17 UTC (permalink / raw)
  To: Wang Yugui; +Cc: linux-btrfs

On Mon, Jul 18, 2022 at 08:03:34AM +0800, Wang Yugui wrote:
> Hi,
> 
> Because process_srcu() is 100% busy, so this problem is srcu related.
> 
> the only srcu in btrfs is subvol_srcu, but it is dropped in 5.17
> 	c75e839414d3: Josef Bacik: btrfs: kill the subvol_srcu
> and this patch is difficult to backport to 5.4.x.

The whole SRCU use was replaced by root reference counting and spanning
many patches, backport would be really hard.

> so this problem maybe a little difficult to fix.  If we hit this problem
> in more frequency, we may upgrade to 5.10.x

This could still be affected though.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-12 12:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-10  9:22 fstests btrfs/057 trigger a deadlock on 5.4.187 Wang Yugui
2022-06-11  6:26 ` Wang Yugui
2022-06-11  6:35   ` Wang Yugui
2022-07-18  0:03     ` Wang Yugui
2024-02-12 12:17       ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox