hangs after merge window changes

All of lore.kernel.org
 help / color / mirror / Atom feed

* hangs after merge window changes
@ 2024-07-17 17:24 Jakub Kicinski
  2024-07-17 18:01 ` Eric Van Hensbergen
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-07-17 17:24 UTC (permalink / raw)
  To: Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet,
	Christian Schoenebeck
  Cc: v9fs

Hi!

Apologies for possible duplicate, I tried to search lore and haven't
found anything, but also I can't find the exact 9p archive there :S

After forwarding to Linus's tree yesterday, we see hangs in networking
CI (which uses 9p over virtio to mount host FS), they seem sporadic, 
and are not very reproducible:

https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/686601/10-bridge-mdb-max-sh/stderr
https://netdev-3.bots.linux.dev/vmksft-netdevsim-dbg/results/687140/5-ethtool-coalesce-sh/stderr
https://netdev-3.bots.linux.dev/vmksft-netdevsim-dbg/results/687140/6-devlink-sh/stderr
https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/686781/3-test-vxlan-mdb-sh/stderr
https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/686601/61-router-bridge-sh/stderr

We only got those 5 over 24h, meaning so roughly 1 in 500 runs hits
this.

For the benefits of the ML here's one of the five:

[  247.084355][   T43] INFO: task ip:419 blocked for more than 122 seconds.
[  247.084693][   T43]       Not tainted 6.10.0-virtme #1
[  247.084816][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.085041][   T43] task:ip              state:D stack:26288 pid:419   tgid:419   ppid:233    flags:0x00000000
[  247.085333][   T43] Call Trace:
[  247.085477][   T43]  <TASK>
[  247.085548][   T43]  __schedule+0x6e0/0x17e0
[  247.085681][   T43]  ? __pfx___schedule+0x10/0x10
[  247.085805][   T43]  ? schedule+0x1a5/0x210
[  247.085925][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  247.086049][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  247.086178][   T43]  ? schedule+0x1a5/0x210
[  247.086531][   T43]  schedule+0xdf/0x210
[  247.086643][   T43]  d_alloc_parallel+0xaef/0xed0
[  247.086788][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
[  247.086914][   T43]  ? __pfx_default_wake_function+0x10/0x10
[  247.087085][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  247.087214][   T43]  __lookup_slow+0x17f/0x3c0
[  247.087357][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  247.087486][   T43]  ? walk_component+0x29e/0x4f0
[  247.087626][   T43]  walk_component+0x2ab/0x4f0
[  247.087753][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  247.087921][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  247.088092][   T43]  path_openat+0x1be/0x440
[  247.088240][   T43]  ? __pfx_path_openat+0x10/0x10
[  247.088366][   T43]  ? __lock_acquire+0xaf0/0x1570
[  247.088492][   T43]  do_filp_open+0x1b3/0x3e0
[  247.088616][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  247.088736][   T43]  ? find_held_lock+0x2c/0x110
[  247.088871][   T43]  ? do_raw_spin_lock+0x131/0x270
[  247.088995][   T43]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  247.089118][   T43]  ? alloc_fd+0x1f5/0x650
[  247.089214][   T43]  ? do_raw_spin_unlock+0x58/0x220
[  247.089367][   T43]  ? _raw_spin_unlock+0x23/0x40
[  247.089493][   T43]  ? alloc_fd+0x1f5/0x650
[  247.089624][   T43]  do_sys_openat2+0x122/0x160
[  247.089777][   T43]  ? __pfx_do_sys_openat2+0x10/0x10
[  247.089901][   T43]  ? __pfx___debug_check_no_obj_freed+0x10/0x10
[  247.090060][   T43]  ? __virt_addr_valid+0x230/0x430
[  247.090192][   T43]  __x64_sys_openat+0x123/0x1e0
[  247.090329][   T43]  ? __pfx___x64_sys_openat+0x10/0x10
[  247.090484][   T43]  do_syscall_64+0xc1/0x1d0
[  247.090607][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  247.090758][   T43] RIP: 0033:0x7f4b57a460e8
[  247.090901][   T43] RSP: 002b:00007ffd58925788 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
[  247.091085][   T43] RAX: ffffffffffffffda RBX: 00007ffd58925a0f RCX: 00007f4b57a460e8
[  247.091303][   T43] RDX: 0000000000080000 RSI: 00007ffd58925800 RDI: 00000000ffffff9c
[  247.091500][   T43] RBP: 00007ffd589257f0 R08: 0000000000080000 R09: 00007ffd58925800
[  247.091682][   T43] R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffd58925807
[  247.091873][   T43] R13: 00007ffd58925a20 R14: 00007ffd58925800 R15: 00007f4b57a17000
[  247.092070][   T43]  </TASK>
[  247.092167][   T43] INFO: task rename_device:421 blocked for more than 122 seconds.
[  247.092346][   T43]       Not tainted 6.10.0-virtme #1
[  247.092467][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.092690][   T43] task:rename_device   state:D stack:26464 pid:421   tgid:421   ppid:1      flags:0x00004004
[  247.092926][   T43] Call Trace:
[  247.093016][   T43]  <TASK>
[  247.093110][   T43]  __schedule+0x6e0/0x17e0
[  247.093265][   T43]  ? __pfx___schedule+0x10/0x10
[  247.093392][   T43]  ? schedule+0x1a5/0x210
[  247.093497][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  247.093636][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  247.093763][   T43]  ? schedule+0x1a5/0x210
[  247.093861][   T43]  schedule+0xdf/0x210
[  247.093963][   T43]  __wait_on_freeing_inode+0x115/0x280
[  247.094084][   T43]  ? __pfx___wait_on_freeing_inode+0x10/0x10
[  247.094249][   T43]  ? __pfx_wake_bit_function+0x10/0x10
[  247.094395][   T43]  ? lock_acquire+0x32/0xc0
[  247.094525][   T43]  ? find_inode_fast+0x158/0x450
[  247.094654][   T43]  find_inode_fast+0x18d/0x450
[  247.094779][   T43]  iget_locked+0x7d/0x390
[  247.094895][   T43]  ? hlock_class+0x4e/0x130
[  247.095029][   T43]  v9fs_fid_iget_dotl+0x78/0x2d0
[  247.095156][   T43]  v9fs_vfs_lookup.part.0+0x1ed/0x390
[  247.095304][   T43]  ? __pfx_v9fs_vfs_lookup.part.0+0x10/0x10
[  247.095455][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  247.095598][   T43]  __lookup_slow+0x209/0x3c0
[  247.095721][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  247.095865][   T43]  ? walk_component+0x29e/0x4f0
[  247.096013][   T43]  walk_component+0x2ab/0x4f0
[  247.096137][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  247.096335][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  247.096492][   T43]  path_openat+0x1be/0x440
[  247.096633][   T43]  ? __pfx_path_openat+0x10/0x10
[  247.096770][   T43]  ? __lock_acquire+0xaf0/0x1570
[  247.096895][   T43]  do_filp_open+0x1b3/0x3e0
[  247.097025][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  247.097144][   T43]  ? find_held_lock+0x2c/0x110
[  247.097310][   T43]  ? do_raw_spin_lock+0x131/0x270
[  247.097433][   T43]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  247.097567][   T43]  ? alloc_fd+0x1f5/0x650
[  247.097674][   T43]  ? do_raw_spin_unlock+0x58/0x220
[  247.097797][   T43]  ? _raw_spin_unlock+0x23/0x40
[  247.097923][   T43]  ? alloc_fd+0x1f5/0x650
[  247.098022][   T43]  do_sys_openat2+0x122/0x160
[  247.098144][   T43]  ? __pfx_do_sys_openat2+0x10/0x10
[  247.098276][   T43]  ? __up_read+0x1a7/0x780
[  247.098402][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  247.098523][   T43]  ? __pfx___up_read+0x10/0x10
[  247.098650][   T43]  __x64_sys_openat+0x123/0x1e0
[  247.098770][   T43]  ? __pfx___x64_sys_openat+0x10/0x10
[  247.098907][   T43]  do_syscall_64+0xc1/0x1d0
[  247.099029][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  247.099178][   T43] RIP: 0033:0x7fb4a75850e8
[  247.099311][   T43] RSP: 002b:00007fffb3f40268 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
[  247.099490][   T43] RAX: ffffffffffffffda RBX: 00007fffb3f4033f RCX: 00007fb4a75850e8
[  247.099666][   T43] RDX: 0000000000080000 RSI: 00007fb4a75565f0 RDI: 00000000ffffff9c
[  247.099844][   T43] RBP: 00007fffb3f402d0 R08: 0000000000080000 R09: 00007fb4a75565f0
[  247.100022][   T43] R10: 0000000000000000 R11: 0000000000000287 R12: 0000000000000000
[  247.100203][   T43] R13: 00007fffb3f40350 R14: 00007fb4a75565f0 R15: 00007fffb3f40350
[  247.100402][   T43]  </TASK>
[  247.100496][   T43] INFO: task (spawn):422 blocked for more than 122 seconds.
[  247.100660][   T43]       Not tainted 6.10.0-virtme #1
[  247.100777][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.100969][   T43] task:(spawn)         state:D stack:26464 pid:422   tgid:422   ppid:1      flags:0x00004006
[  247.101202][   T43] Call Trace:
[  247.101300][   T43]  <TASK>
[  247.101367][   T43]  __schedule+0x6e0/0x17e0
[  247.101494][   T43]  ? __pfx___schedule+0x10/0x10
[  247.101614][   T43]  ? schedule+0x1a5/0x210
[  247.101710][   T43]  ? __pfx___lock_release+0x10/0x10
[  247.101828][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  247.101950][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  247.102080][   T43]  ? schedule+0x1a5/0x210
[  247.102176][   T43]  schedule+0xdf/0x210
[  247.102280][   T43]  d_alloc_parallel+0xaef/0xed0
[  247.102411][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
[  247.102536][   T43]  ? __pfx_default_wake_function+0x10/0x10
[  247.102690][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  247.102817][   T43]  __lookup_slow+0x17f/0x3c0
[  247.102963][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  247.103092][   T43]  ? walk_component+0x29e/0x4f0
[  247.103251][   T43]  walk_component+0x2ab/0x4f0
[  247.103375][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  247.103549][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  247.103712][   T43]  path_openat+0x1be/0x440
[  247.103838][   T43]  ? __pfx_path_openat+0x10/0x10
[  247.103997][   T43]  ? hlock_class+0x4e/0x130
[  247.104117][   T43]  ? mark_lock+0x38/0x3e0
[  247.104244][   T43]  do_filp_open+0x1b3/0x3e0
[  247.104375][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  247.104507][   T43]  ? __lock_release+0x103/0x460
[  247.104625][   T43]  ? __create_object+0x5e/0xb0
[  247.104746][   T43]  ? __pfx___lock_release+0x10/0x10
[  247.104868][   T43]  ? hlock_class+0x4e/0x130
[  247.104999][   T43]  ? mark_held_locks+0x9e/0xe0
[  247.105120][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
[  247.105286][   T43]  do_open_execat+0xb0/0x280
[  247.105421][   T43]  ? __pfx_do_open_execat+0x10/0x10
[  247.105544][   T43]  ? getname_kernel+0x14c/0x2d0
[  247.105689][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
[  247.105814][   T43]  open_exec+0x2b/0x50
[  247.105909][   T43]  load_elf_binary+0x504/0x2880
[  247.106069][   T43]  ? find_held_lock+0x2c/0x110
[  247.106195][   T43]  ? __lock_release+0x103/0x460
[  247.106345][   T43]  ? search_binary_handler+0x134/0x550
[  247.106480][   T43]  ? __pfx___lock_release+0x10/0x10
[  247.106599][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  247.106728][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  247.106847][   T43]  ? __pfx_load_elf_binary+0x10/0x10
[  247.106991][   T43]  ? search_binary_handler+0x134/0x550
[  247.107126][   T43]  search_binary_handler+0x152/0x550
[  247.107258][   T43]  ? __pfx_search_binary_handler+0x10/0x10
[  247.107438][   T43]  ? __task_pid_nr_ns+0x11a/0x410
[  247.107564][   T43]  ? exec_binprm+0x100/0x3c0
[  247.107707][   T43]  exec_binprm+0x119/0x3c0
[  247.107833][   T43]  bprm_execve+0xe9/0x4f0
[  247.107927][   T43]  do_execveat_common.isra.0+0x3eb/0x530
[  247.108064][   T43]  __x64_sys_execve+0x8c/0xb0
[  247.108193][   T43]  do_syscall_64+0xc1/0x1d0
[  247.108344][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  247.108494][   T43] RIP: 0033:0x7fa4f34f740b
[  247.108650][   T43] RSP: 002b:00007ffd893fd6d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[  247.108832][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d5daf0 RCX: 00007fa4f34f740b
[  247.109026][   T43] RDX: 0000560ac3d62360 RSI: 0000560ac3d76bc0 RDI: 0000560ac3d8d8f0
[  247.109213][   T43] RBP: 00007ffd893fd800 R08: 0000000000000000 R09: 0000000000000000
[  247.109403][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd89401890
[  247.109604][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd893fd720 R15: 0000000000000000
[  247.109792][   T43]  </TASK>
[  247.109885][   T43] INFO: task (spawn):423 blocked for more than 122 seconds.
[  247.110074][   T43]       Not tainted 6.10.0-virtme #1
[  247.110188][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.110412][   T43] task:(spawn)         state:D stack:26608 pid:423   tgid:423   ppid:1      flags:0x00004006
[  247.110654][   T43] Call Trace:
[  247.110744][   T43]  <TASK>
[  247.110820][   T43]  __schedule+0x6e0/0x17e0
[  247.110961][   T43]  ? __pfx___schedule+0x10/0x10
[  247.111080][   T43]  ? schedule+0x1a5/0x210
[  247.111196][   T43]  ? __pfx___lock_release+0x10/0x10
[  247.111336][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  247.111458][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  247.111599][   T43]  ? schedule+0x1a5/0x210
[  247.111693][   T43]  schedule+0xdf/0x210
[  247.111786][   T43]  d_alloc_parallel+0xaef/0xed0
[  247.111928][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
[  247.112051][   T43]  ? __pfx_default_wake_function+0x10/0x10
[  247.112239][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  247.112367][   T43]  __lookup_slow+0x17f/0x3c0
[  247.112490][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  247.112618][   T43]  ? walk_component+0x29e/0x4f0
[  247.112753][   T43]  walk_component+0x2ab/0x4f0
[  247.112877][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  247.113028][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  247.113186][   T43]  path_openat+0x1be/0x440
[  247.113362][   T43]  ? __pfx_path_openat+0x10/0x10
[  247.113483][   T43]  ? hlock_class+0x4e/0x130
[  247.113612][   T43]  ? mark_lock+0x38/0x3e0
[  247.113729][   T43]  do_filp_open+0x1b3/0x3e0
[  247.113850][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  247.113989][   T43]  ? __lock_release+0x103/0x460
[  247.114107][   T43]  ? __create_object+0x5e/0xb0
[  247.114255][   T43]  ? __pfx___lock_release+0x10/0x10
[  247.114388][   T43]  ? hlock_class+0x4e/0x130
[  247.114514][   T43]  ? mark_held_locks+0x9e/0xe0
[  247.114653][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
[  247.114814][   T43]  do_open_execat+0xb0/0x280
[  247.114947][   T43]  ? __pfx_do_open_execat+0x10/0x10
[  247.115089][   T43]  ? getname_kernel+0x14c/0x2d0
[  247.115206][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
[  247.115347][   T43]  open_exec+0x2b/0x50
[  247.115443][   T43]  load_elf_binary+0x504/0x2880
[  247.115591][   T43]  ? find_held_lock+0x2c/0x110
[  247.115713][   T43]  ? __lock_release+0x103/0x460
[  247.115838][   T43]  ? search_binary_handler+0x134/0x550
[  247.115984][   T43]  ? __pfx___lock_release+0x10/0x10
[  247.116103][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  247.116272][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  247.116395][   T43]  ? __pfx_load_elf_binary+0x10/0x10
[  247.116526][   T43]  ? search_binary_handler+0x134/0x550
[  247.116658][   T43]  search_binary_handler+0x152/0x550
[  247.116801][   T43]  ? __pfx_search_binary_handler+0x10/0x10
[  247.116950][   T43]  ? __task_pid_nr_ns+0x11a/0x410
[  247.117081][   T43]  ? exec_binprm+0x100/0x3c0
[  247.117248][   T43]  exec_binprm+0x119/0x3c0
[  247.117378][   T43]  bprm_execve+0xe9/0x4f0
[  247.117472][   T43]  do_execveat_common.isra.0+0x3eb/0x530
[  247.117596][   T43]  __x64_sys_execve+0x8c/0xb0
[  247.117725][   T43]  do_syscall_64+0xc1/0x1d0
[  247.117847][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  247.117994][   T43] RIP: 0033:0x7fa4f34f740b
[  247.118116][   T43] RSP: 002b:00007ffd89405918 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[  247.118309][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d94430 RCX: 00007fa4f34f740b
[  247.118483][   T43] RDX: 0000560ac3d5ff20 RSI: 0000560ac3d76cd0 RDI: 0000560ac3d5e240
[  247.118660][   T43] RBP: 00007ffd89405a40 R08: 0000000000000008 R09: 0000000000000000
[  247.118838][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 0000560ac3d767d0
[  247.119013][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd89405960 R15: 0000000000000000
[  247.119201][   T43]  </TASK>
[  247.119300][   T43] INFO: task (spawn):424 blocked for more than 122 seconds.
[  247.119463][   T43]       Not tainted 6.10.0-virtme #1
[  247.119583][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.119772][   T43] task:(spawn)         state:D stack:26464 pid:424   tgid:424   ppid:1      flags:0x00004006
[  247.120005][   T43] Call Trace:
[  247.120096][   T43]  <TASK>
[  247.120161][   T43]  __schedule+0x6e0/0x17e0
[  247.120298][   T43]  ? __pfx___schedule+0x10/0x10
[  247.120417][   T43]  ? schedule+0x1a5/0x210
[  247.120519][   T43]  ? __pfx___lock_release+0x10/0x10
[  247.120636][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  247.120756][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  247.120881][   T43]  ? schedule+0x1a5/0x210
[  247.120976][   T43]  schedule+0xdf/0x210
[  247.121069][   T43]  d_alloc_parallel+0xaef/0xed0
[  247.121200][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
[  247.121443][   T43]  ? __pfx_default_wake_function+0x10/0x10
[  247.121596][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  247.121728][   T43]  __lookup_slow+0x17f/0x3c0
[  247.121849][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  247.122028][   T43]  ? walk_component+0x29e/0x4f0
[  247.122162][   T43]  walk_component+0x2ab/0x4f0
[  247.122295][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  247.122447][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  247.122713][   T43]  path_openat+0x1be/0x440
[  247.122839][   T43]  ? __pfx_path_openat+0x10/0x10
[  247.122959][   T43]  ? hlock_class+0x4e/0x130
[  247.123078][   T43]  ? mark_lock+0x38/0x3e0
[  247.123175][   T43]  do_filp_open+0x1b3/0x3e0
[  247.123379][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  247.123511][   T43]  ? __lock_release+0x103/0x460
[  247.123628][   T43]  ? __create_object+0x5e/0xb0
[  247.123775][   T43]  ? __pfx___lock_release+0x10/0x10
[  247.123990][   T43]  ? hlock_class+0x4e/0x130
[  247.124128][   T43]  ? mark_held_locks+0x9e/0xe0
[  247.124263][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
[  247.124444][   T43]  do_open_execat+0xb0/0x280
[  247.124622][   T43]  ? __pfx_do_open_execat+0x10/0x10
[  247.124747][   T43]  ? getname_kernel+0x14c/0x2d0
[  247.124896][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
[  247.125019][   T43]  open_exec+0x2b/0x50
[  247.125194][   T43]  load_elf_binary+0x504/0x2880
[  247.125335][   T43]  ? find_held_lock+0x2c/0x110
[  247.125466][   T43]  ? __lock_release+0x103/0x460
[  247.125589][   T43]  ? search_binary_handler+0x134/0x550
[  247.125723][   T43]  ? __pfx___lock_release+0x10/0x10
[  247.125916][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  247.126036][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  247.126184][   T43]  ? __pfx_load_elf_binary+0x10/0x10
[  247.126330][   T43]  ? search_binary_handler+0x134/0x550
[  247.126536][   T43]  search_binary_handler+0x152/0x550
[  247.126659][   T43]  ? __pfx_search_binary_handler+0x10/0x10
[  247.126819][   T43]  ? __task_pid_nr_ns+0x11a/0x410
[  247.126943][   T43]  ? exec_binprm+0x100/0x3c0
[  247.127144][   T43]  exec_binprm+0x119/0x3c0
[  247.127281][   T43]  bprm_execve+0xe9/0x4f0
[  247.127403][   T43]  do_execveat_common.isra.0+0x3eb/0x530
[  247.127529][   T43]  __x64_sys_execve+0x8c/0xb0
[  247.127732][   T43]  do_syscall_64+0xc1/0x1d0
[  247.127854][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  247.128016][   T43] RIP: 0033:0x7fa4f34f740b
[  247.128140][   T43] RSP: 002b:00007ffd893fd6d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[  247.128413][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d2df30 RCX: 00007fa4f34f740b
[  247.128601][   T43] RDX: 0000560ac3d7ff10 RSI: 0000560ac3d8d940 RDI: 0000560ac3d33950
[  247.128798][   T43] RBP: 00007ffd893fd800 R08: 0000000000000000 R09: 0000000000000000
[  247.129060][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd89401890
[  247.129249][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd893fd720 R15: 0000000000000000
[  247.129440][   T43]  </TASK>
[  247.129587][   T43] 
[  247.129587][   T43] Showing all locks held in the system:
[  247.129764][   T43] 1 lock held by khungtaskd/43:
[  247.129901][   T43]  #0: ffffffff97b54b80 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x70/0x3a0
[  247.130210][   T43] 1 lock held by ip/419:
[  247.130311][   T43]  #0: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  247.130587][   T43] 1 lock held by rename_device/421:
[  247.130713][   T43]  #0: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  247.131030][   T43] 2 locks held by (spawn)/422:
[  247.131146][   T43]  #0: ffff888005bc4238 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
[  247.131431][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  247.131698][   T43] 2 locks held by (spawn)/423:
[  247.131813][   T43]  #0: ffff888005bc0578 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
[  247.132112][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  247.132379][   T43] 2 locks held by (spawn)/424:
[  247.132502][   T43]  #0: ffff888005bc0c38 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
[  247.132852][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  247.133095][   T43] 
[  247.133169][   T43] =============================================
[  247.133169][   T43] 
[  369.964331][   T43] INFO: task ip:419 blocked for more than 245 seconds.
[  369.964619][   T43]       Not tainted 6.10.0-virtme #1
[  369.964734][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.964953][   T43] task:ip              state:D stack:26288 pid:419   tgid:419   ppid:233    flags:0x00000000
[  369.965192][   T43] Call Trace:
[  369.965327][   T43]  <TASK>
[  369.965400][   T43]  __schedule+0x6e0/0x17e0
[  369.965536][   T43]  ? __pfx___schedule+0x10/0x10
[  369.965659][   T43]  ? schedule+0x1a5/0x210
[  369.965757][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  369.965881][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  369.966011][   T43]  ? schedule+0x1a5/0x210
[  369.966113][   T43]  schedule+0xdf/0x210
[  369.966209][   T43]  d_alloc_parallel+0xaef/0xed0
[  369.966364][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
[  369.966502][   T43]  ? __pfx_default_wake_function+0x10/0x10
[  369.966657][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  369.966785][   T43]  __lookup_slow+0x17f/0x3c0
[  369.966907][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  369.967033][   T43]  ? walk_component+0x29e/0x4f0
[  369.967172][   T43]  walk_component+0x2ab/0x4f0
[  369.967305][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  369.967462][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  369.967617][   T43]  path_openat+0x1be/0x440
[  369.967743][   T43]  ? __pfx_path_openat+0x10/0x10
[  369.967867][   T43]  ? __lock_acquire+0xaf0/0x1570
[  369.967993][   T43]  do_filp_open+0x1b3/0x3e0
[  369.968115][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  369.968243][   T43]  ? find_held_lock+0x2c/0x110
[  369.968374][   T43]  ? do_raw_spin_lock+0x131/0x270
[  369.968499][   T43]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  369.968620][   T43]  ? alloc_fd+0x1f5/0x650
[  369.968714][   T43]  ? do_raw_spin_unlock+0x58/0x220
[  369.968837][   T43]  ? _raw_spin_unlock+0x23/0x40
[  369.968960][   T43]  ? alloc_fd+0x1f5/0x650
[  369.969059][   T43]  do_sys_openat2+0x122/0x160
[  369.969181][   T43]  ? __pfx_do_sys_openat2+0x10/0x10
[  369.969312][   T43]  ? __pfx___debug_check_no_obj_freed+0x10/0x10
[  369.969465][   T43]  ? __virt_addr_valid+0x230/0x430
[  369.969591][   T43]  __x64_sys_openat+0x123/0x1e0
[  369.969713][   T43]  ? __pfx___x64_sys_openat+0x10/0x10
[  369.969869][   T43]  do_syscall_64+0xc1/0x1d0
[  369.969996][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  369.970156][   T43] RIP: 0033:0x7f4b57a460e8
[  369.970330][   T43] RSP: 002b:00007ffd58925788 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
[  369.970663][   T43] RAX: ffffffffffffffda RBX: 00007ffd58925a0f RCX: 00007f4b57a460e8
[  369.970849][   T43] RDX: 0000000000080000 RSI: 00007ffd58925800 RDI: 00000000ffffff9c
[  369.971027][   T43] RBP: 00007ffd589257f0 R08: 0000000000080000 R09: 00007ffd58925800
[  369.971241][   T43] R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffd58925807
[  369.971439][   T43] R13: 00007ffd58925a20 R14: 00007ffd58925800 R15: 00007f4b57a17000
[  369.971675][   T43]  </TASK>
[  369.971769][   T43] INFO: task rename_device:421 blocked for more than 245 seconds.
[  369.971933][   T43]       Not tainted 6.10.0-virtme #1
[  369.972051][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.972296][   T43] task:rename_device   state:D stack:26464 pid:421   tgid:421   ppid:1      flags:0x00004004
[  369.972551][   T43] Call Trace:
[  369.972648][   T43]  <TASK>
[  369.972715][   T43]  __schedule+0x6e0/0x17e0
[  369.972860][   T43]  ? __pfx___schedule+0x10/0x10
[  369.972980][   T43]  ? schedule+0x1a5/0x210
[  369.973082][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  369.973237][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  369.973364][   T43]  ? schedule+0x1a5/0x210
[  369.973463][   T43]  schedule+0xdf/0x210
[  369.973555][   T43]  __wait_on_freeing_inode+0x115/0x280
[  369.973687][   T43]  ? __pfx___wait_on_freeing_inode+0x10/0x10
[  369.973840][   T43]  ? __pfx_wake_bit_function+0x10/0x10
[  369.973966][   T43]  ? lock_acquire+0x32/0xc0
[  369.974093][   T43]  ? find_inode_fast+0x158/0x450
[  369.974257][   T43]  find_inode_fast+0x18d/0x450
[  369.974383][   T43]  iget_locked+0x7d/0x390
[  369.974501][   T43]  ? hlock_class+0x4e/0x130
[  369.974629][   T43]  v9fs_fid_iget_dotl+0x78/0x2d0
[  369.974762][   T43]  v9fs_vfs_lookup.part.0+0x1ed/0x390
[  369.974902][   T43]  ? __pfx_v9fs_vfs_lookup.part.0+0x10/0x10
[  369.975059][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  369.975189][   T43]  __lookup_slow+0x209/0x3c0
[  369.975347][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  369.975483][   T43]  ? walk_component+0x29e/0x4f0
[  369.975644][   T43]  walk_component+0x2ab/0x4f0
[  369.975770][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  369.975964][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  369.976120][   T43]  path_openat+0x1be/0x440
[  369.976269][   T43]  ? __pfx_path_openat+0x10/0x10
[  369.976394][   T43]  ? __lock_acquire+0xaf0/0x1570
[  369.976534][   T43]  do_filp_open+0x1b3/0x3e0
[  369.976655][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  369.976796][   T43]  ? find_held_lock+0x2c/0x110
[  369.976935][   T43]  ? do_raw_spin_lock+0x131/0x270
[  369.977054][   T43]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  369.977181][   T43]  ? alloc_fd+0x1f5/0x650
[  369.977283][   T43]  ? do_raw_spin_unlock+0x58/0x220
[  369.977407][   T43]  ? _raw_spin_unlock+0x23/0x40
[  369.977547][   T43]  ? alloc_fd+0x1f5/0x650
[  369.977648][   T43]  do_sys_openat2+0x122/0x160
[  369.977769][   T43]  ? __pfx_do_sys_openat2+0x10/0x10
[  369.977908][   T43]  ? __up_read+0x1a7/0x780
[  369.978036][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  369.978162][   T43]  ? __pfx___up_read+0x10/0x10
[  369.978323][   T43]  __x64_sys_openat+0x123/0x1e0
[  369.978446][   T43]  ? __pfx___x64_sys_openat+0x10/0x10
[  369.978585][   T43]  do_syscall_64+0xc1/0x1d0
[  369.978708][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  369.978877][   T43] RIP: 0033:0x7fb4a75850e8
[  369.979008][   T43] RSP: 002b:00007fffb3f40268 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
[  369.979204][   T43] RAX: ffffffffffffffda RBX: 00007fffb3f4033f RCX: 00007fb4a75850e8
[  369.979427][   T43] RDX: 0000000000080000 RSI: 00007fb4a75565f0 RDI: 00000000ffffff9c
[  369.979602][   T43] RBP: 00007fffb3f402d0 R08: 0000000000080000 R09: 00007fb4a75565f0
[  369.979780][   T43] R10: 0000000000000000 R11: 0000000000000287 R12: 0000000000000000
[  369.979957][   T43] R13: 00007fffb3f40350 R14: 00007fb4a75565f0 R15: 00007fffb3f40350
[  369.980151][   T43]  </TASK>
[  369.980252][   T43] INFO: task (spawn):422 blocked for more than 245 seconds.
[  369.980427][   T43]       Not tainted 6.10.0-virtme #1
[  369.980548][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.980759][   T43] task:(spawn)         state:D stack:26464 pid:422   tgid:422   ppid:1      flags:0x00004006
[  369.980997][   T43] Call Trace:
[  369.981110][   T43]  <TASK>
[  369.981181][   T43]  __schedule+0x6e0/0x17e0
[  369.981321][   T43]  ? __pfx___schedule+0x10/0x10
[  369.981441][   T43]  ? schedule+0x1a5/0x210
[  369.981533][   T43]  ? __pfx___lock_release+0x10/0x10
[  369.981652][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  369.981775][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  369.981926][   T43]  ? schedule+0x1a5/0x210
[  369.982024][   T43]  schedule+0xdf/0x210
[  369.982122][   T43]  d_alloc_parallel+0xaef/0xed0
[  369.982297][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
[  369.982422][   T43]  ? __pfx_default_wake_function+0x10/0x10
[  369.982584][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  369.982738][   T43]  __lookup_slow+0x17f/0x3c0
[  369.982859][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  369.982994][   T43]  ? walk_component+0x29e/0x4f0
[  369.983152][   T43]  walk_component+0x2ab/0x4f0
[  369.983293][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  369.983483][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  369.983641][   T43]  path_openat+0x1be/0x440
[  369.983781][   T43]  ? __pfx_path_openat+0x10/0x10
[  369.983900][   T43]  ? hlock_class+0x4e/0x130
[  369.984034][   T43]  ? mark_lock+0x38/0x3e0
[  369.984135][   T43]  do_filp_open+0x1b3/0x3e0
[  369.984273][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  369.984430][   T43]  ? __lock_release+0x103/0x460
[  369.984552][   T43]  ? __create_object+0x5e/0xb0
[  369.984689][   T43]  ? __pfx___lock_release+0x10/0x10
[  369.984811][   T43]  ? hlock_class+0x4e/0x130
[  369.984957][   T43]  ? mark_held_locks+0x9e/0xe0
[  369.985081][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
[  369.985256][   T43]  do_open_execat+0xb0/0x280
[  369.985395][   T43]  ? __pfx_do_open_execat+0x10/0x10
[  369.985521][   T43]  ? getname_kernel+0x14c/0x2d0
[  369.985647][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
[  369.985772][   T43]  open_exec+0x2b/0x50
[  369.985867][   T43]  load_elf_binary+0x504/0x2880
[  369.985995][   T43]  ? find_held_lock+0x2c/0x110
[  369.986118][   T43]  ? __lock_release+0x103/0x460
[  369.986255][   T43]  ? search_binary_handler+0x134/0x550
[  369.986399][   T43]  ? __pfx___lock_release+0x10/0x10
[  369.986519][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  369.986661][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  369.986801][   T43]  ? __pfx_load_elf_binary+0x10/0x10
[  369.986926][   T43]  ? search_binary_handler+0x134/0x550
[  369.987070][   T43]  search_binary_handler+0x152/0x550
[  369.987193][   T43]  ? __pfx_search_binary_handler+0x10/0x10
[  369.987351][   T43]  ? __task_pid_nr_ns+0x11a/0x410
[  369.987473][   T43]  ? exec_binprm+0x100/0x3c0
[  369.987598][   T43]  exec_binprm+0x119/0x3c0
[  369.987728][   T43]  bprm_execve+0xe9/0x4f0
[  369.987821][   T43]  do_execveat_common.isra.0+0x3eb/0x530
[  369.987947][   T43]  __x64_sys_execve+0x8c/0xb0
[  369.988070][   T43]  do_syscall_64+0xc1/0x1d0
[  369.988192][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  369.988347][   T43] RIP: 0033:0x7fa4f34f740b
[  369.988472][   T43] RSP: 002b:00007ffd893fd6d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[  369.988655][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d5daf0 RCX: 00007fa4f34f740b
[  369.988835][   T43] RDX: 0000560ac3d62360 RSI: 0000560ac3d76bc0 RDI: 0000560ac3d8d8f0
[  369.989010][   T43] RBP: 00007ffd893fd800 R08: 0000000000000000 R09: 0000000000000000
[  369.989188][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd89401890
[  369.989372][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd893fd720 R15: 0000000000000000
[  369.989566][   T43]  </TASK>
[  369.989659][   T43] INFO: task (spawn):423 blocked for more than 245 seconds.
[  369.989828][   T43]       Not tainted 6.10.0-virtme #1
[  369.989939][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.990129][   T43] task:(spawn)         state:D stack:26608 pid:423   tgid:423   ppid:1      flags:0x00004006
[  369.990382][   T43] Call Trace:
[  369.990476][   T43]  <TASK>
[  369.990547][   T43]  __schedule+0x6e0/0x17e0
[  369.990674][   T43]  ? __pfx___schedule+0x10/0x10
[  369.990795][   T43]  ? schedule+0x1a5/0x210
[  369.990890][   T43]  ? __pfx___lock_release+0x10/0x10
[  369.991008][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  369.991128][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  369.991321][   T43]  ? schedule+0x1a5/0x210
[  369.991418][   T43]  schedule+0xdf/0x210
[  369.991516][   T43]  d_alloc_parallel+0xaef/0xed0
[  369.991647][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
[  369.991824][   T43]  ? __pfx_default_wake_function+0x10/0x10
[  369.991973][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  369.992102][   T43]  __lookup_slow+0x17f/0x3c0
[  369.992234][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  369.992416][   T43]  ? walk_component+0x29e/0x4f0
[  369.992552][   T43]  walk_component+0x2ab/0x4f0
[  369.992694][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  369.992846][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  369.993134][   T43]  path_openat+0x1be/0x440
[  369.993291][   T43]  ? __pfx_path_openat+0x10/0x10
[  369.993412][   T43]  ? hlock_class+0x4e/0x130
[  369.993572][   T43]  ? mark_lock+0x38/0x3e0
[  369.993671][   T43]  do_filp_open+0x1b3/0x3e0
[  369.993858][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  369.994001][   T43]  ? __lock_release+0x103/0x460
[  369.994124][   T43]  ? __create_object+0x5e/0xb0
[  369.994282][   T43]  ? __pfx___lock_release+0x10/0x10
[  369.994475][   T43]  ? hlock_class+0x4e/0x130
[  369.994611][   T43]  ? mark_held_locks+0x9e/0xe0
[  369.994733][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
[  369.994885][   T43]  do_open_execat+0xb0/0x280
[  369.995060][   T43]  ? __pfx_do_open_execat+0x10/0x10
[  369.995200][   T43]  ? getname_kernel+0x14c/0x2d0
[  369.995349][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
[  369.995472][   T43]  open_exec+0x2b/0x50
[  369.995619][   T43]  load_elf_binary+0x504/0x2880
[  369.995745][   T43]  ? find_held_lock+0x2c/0x110
[  369.995867][   T43]  ? __lock_release+0x103/0x460
[  369.995986][   T43]  ? search_binary_handler+0x134/0x550
[  369.996105][   T43]  ? __pfx___lock_release+0x10/0x10
[  369.996289][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  369.996414][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  369.996536][   T43]  ? __pfx_load_elf_binary+0x10/0x10
[  369.996670][   T43]  ? search_binary_handler+0x134/0x550
[  369.996863][   T43]  search_binary_handler+0x152/0x550
[  369.997001][   T43]  ? __pfx_search_binary_handler+0x10/0x10
[  369.997154][   T43]  ? __task_pid_nr_ns+0x11a/0x410
[  369.997296][   T43]  ? exec_binprm+0x100/0x3c0
[  369.997475][   T43]  exec_binprm+0x119/0x3c0
[  369.997628][   T43]  bprm_execve+0xe9/0x4f0
[  369.997730][   T43]  do_execveat_common.isra.0+0x3eb/0x530
[  369.997874][   T43]  __x64_sys_execve+0x8c/0xb0
[  369.998058][   T43]  do_syscall_64+0xc1/0x1d0
[  369.998212][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  369.998382][   T43] RIP: 0033:0x7fa4f34f740b
[  369.998530][   T43] RSP: 002b:00007ffd89405918 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[  369.998781][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d94430 RCX: 00007fa4f34f740b
[  369.998962][   T43] RDX: 0000560ac3d5ff20 RSI: 0000560ac3d76cd0 RDI: 0000560ac3d5e240
[  369.999164][   T43] RBP: 00007ffd89405a40 R08: 0000000000000008 R09: 0000000000000000
[  369.999409][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 0000560ac3d767d0
[  369.999584][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd89405960 R15: 0000000000000000
[  369.999775][   T43]  </TASK>
[  369.999921][   T43] INFO: task (spawn):424 blocked for more than 245 seconds.
[  370.000087][   T43]       Not tainted 6.10.0-virtme #1
[  370.000198][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.000445][   T43] task:(spawn)         state:D stack:26464 pid:424   tgid:424   ppid:1      flags:0x00004006
[  370.000676][   T43] Call Trace:
[  370.000765][   T43]  <TASK>
[  370.000833][   T43]  __schedule+0x6e0/0x17e0
[  370.001017][   T43]  ? __pfx___schedule+0x10/0x10
[  370.001136][   T43]  ? schedule+0x1a5/0x210
[  370.001239][   T43]  ? __pfx___lock_release+0x10/0x10
[  370.001372][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  370.001513][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  370.001692][   T43]  ? schedule+0x1a5/0x210
[  370.001788][   T43]  schedule+0xdf/0x210
[  370.001885][   T43]  d_alloc_parallel+0xaef/0xed0
[  370.002017][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
[  370.002195][   T43]  ? __pfx_default_wake_function+0x10/0x10
[  370.002364][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
[  370.002497][   T43]  __lookup_slow+0x17f/0x3c0
[  370.002617][   T43]  ? __pfx___lookup_slow+0x10/0x10
[  370.002884][   T43]  ? walk_component+0x29e/0x4f0
[  370.003020][   T43]  walk_component+0x2ab/0x4f0
[  370.003161][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
[  370.003326][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
[  370.003537][   T43]  path_openat+0x1be/0x440
[  370.003660][   T43]  ? __pfx_path_openat+0x10/0x10
[  370.003780][   T43]  ? hlock_class+0x4e/0x130
[  370.003899][   T43]  ? mark_lock+0x38/0x3e0
[  370.003997][   T43]  do_filp_open+0x1b3/0x3e0
[  370.004178][   T43]  ? __pfx_do_filp_open+0x10/0x10
[  370.004318][   T43]  ? __lock_release+0x103/0x460
[  370.004442][   T43]  ? __create_object+0x5e/0xb0
[  370.004562][   T43]  ? __pfx___lock_release+0x10/0x10
[  370.004737][   T43]  ? hlock_class+0x4e/0x130
[  370.004882][   T43]  ? mark_held_locks+0x9e/0xe0
[  370.005005][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
[  370.005160][   T43]  do_open_execat+0xb0/0x280
[  370.005376][   T43]  ? __pfx_do_open_execat+0x10/0x10
[  370.005503][   T43]  ? getname_kernel+0x14c/0x2d0
[  370.005625][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
[  370.005747][   T43]  open_exec+0x2b/0x50
[  370.005896][   T43]  load_elf_binary+0x504/0x2880
[  370.006022][   T43]  ? find_held_lock+0x2c/0x110
[  370.006144][   T43]  ? __lock_release+0x103/0x460
[  370.006278][   T43]  ? search_binary_handler+0x134/0x550
[  370.006399][   T43]  ? __pfx___lock_release+0x10/0x10
[  370.006572][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  370.006698][   T43]  ? trace_lock_acquire+0x14d/0x1f0
[  370.006819][   T43]  ? __pfx_load_elf_binary+0x10/0x10
[  370.006943][   T43]  ? search_binary_handler+0x134/0x550
[  370.007121][   T43]  search_binary_handler+0x152/0x550
[  370.007252][   T43]  ? __pfx_search_binary_handler+0x10/0x10
[  370.007401][   T43]  ? __task_pid_nr_ns+0x11a/0x410
[  370.007522][   T43]  ? exec_binprm+0x100/0x3c0
[  370.007703][   T43]  exec_binprm+0x119/0x3c0
[  370.007830][   T43]  bprm_execve+0xe9/0x4f0
[  370.007923][   T43]  do_execveat_common.isra.0+0x3eb/0x530
[  370.008047][   T43]  __x64_sys_execve+0x8c/0xb0
[  370.008234][   T43]  do_syscall_64+0xc1/0x1d0
[  370.008359][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  370.008506][   T43] RIP: 0033:0x7fa4f34f740b
[  370.008628][   T43] RSP: 002b:00007ffd893fd6d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[  370.008863][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d2df30 RCX: 00007fa4f34f740b
[  370.009043][   T43] RDX: 0000560ac3d7ff10 RSI: 0000560ac3d8d940 RDI: 0000560ac3d33950
[  370.009216][   T43] RBP: 00007ffd893fd800 R08: 0000000000000000 R09: 0000000000000000
[  370.009461][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd89401890
[  370.009635][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd893fd720 R15: 0000000000000000
[  370.009830][   T43]  </TASK>
[  370.009975][   T43] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
[  370.010169][   T43] 
[  370.010169][   T43] Showing all locks held in the system:
[  370.010359][   T43] 1 lock held by khungtaskd/43:
[  370.010531][   T43]  #0: ffffffff97b54b80 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x70/0x3a0
[  370.010753][   T43] 1 lock held by ip/419:
[  370.010840][   T43]  #0: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  370.011146][   T43] 1 lock held by rename_device/421:
[  370.011269][   T43]  #0: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  370.011511][   T43] 2 locks held by (spawn)/422:
[  370.011678][   T43]  #0: ffff888005bc4238 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
[  370.011888][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  370.012133][   T43] 2 locks held by (spawn)/423:
[  370.012365][   T43]  #0: ffff888005bc0578 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
[  370.012579][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  370.012874][   T43] 2 locks held by (spawn)/424:
[  370.012988][   T43]  #0: ffff888005bc0c38 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
[  370.013202][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
[  370.013512][   T43] 
[  370.013571][   T43] =============================================
[  370.013571][   T43] 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-17 17:24 hangs after merge window changes Jakub Kicinski
@ 2024-07-17 18:01 ` Eric Van Hensbergen
  2024-07-17 20:45   ` Dominique Martinet
  2024-07-17 21:42   ` Jakub Kicinski
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Van Hensbergen @ 2024-07-17 18:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, v9fs

I held way back on adding anything new over the past cycle, so this
sounds likely some external change maybe perturbing an older issue.
What guest side mount options, qemu options and qemu version are you
using?

Dominique, you aware of anything coming in that might have messed with
this?  Maybe the new common code refactoring underpinning
readahead/etc.?

      -eric

On Wed, Jul 17, 2024 at 12:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Hi!
>
> Apologies for possible duplicate, I tried to search lore and haven't
> found anything, but also I can't find the exact 9p archive there :S
>
> After forwarding to Linus's tree yesterday, we see hangs in networking
> CI (which uses 9p over virtio to mount host FS), they seem sporadic,
> and are not very reproducible:
>
> https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/686601/10-bridge-mdb-max-sh/stderr
> https://netdev-3.bots.linux.dev/vmksft-netdevsim-dbg/results/687140/5-ethtool-coalesce-sh/stderr
> https://netdev-3.bots.linux.dev/vmksft-netdevsim-dbg/results/687140/6-devlink-sh/stderr
> https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/686781/3-test-vxlan-mdb-sh/stderr
> https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/686601/61-router-bridge-sh/stderr
>
> We only got those 5 over 24h, meaning so roughly 1 in 500 runs hits
> this.
>
> For the benefits of the ML here's one of the five:
>
> [  247.084355][   T43] INFO: task ip:419 blocked for more than 122 seconds.
> [  247.084693][   T43]       Not tainted 6.10.0-virtme #1
> [  247.084816][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  247.085041][   T43] task:ip              state:D stack:26288 pid:419   tgid:419   ppid:233    flags:0x00000000
> [  247.085333][   T43] Call Trace:
> [  247.085477][   T43]  <TASK>
> [  247.085548][   T43]  __schedule+0x6e0/0x17e0
> [  247.085681][   T43]  ? __pfx___schedule+0x10/0x10
> [  247.085805][   T43]  ? schedule+0x1a5/0x210
> [  247.085925][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  247.086049][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  247.086178][   T43]  ? schedule+0x1a5/0x210
> [  247.086531][   T43]  schedule+0xdf/0x210
> [  247.086643][   T43]  d_alloc_parallel+0xaef/0xed0
> [  247.086788][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
> [  247.086914][   T43]  ? __pfx_default_wake_function+0x10/0x10
> [  247.087085][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  247.087214][   T43]  __lookup_slow+0x17f/0x3c0
> [  247.087357][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  247.087486][   T43]  ? walk_component+0x29e/0x4f0
> [  247.087626][   T43]  walk_component+0x2ab/0x4f0
> [  247.087753][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  247.087921][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  247.088092][   T43]  path_openat+0x1be/0x440
> [  247.088240][   T43]  ? __pfx_path_openat+0x10/0x10
> [  247.088366][   T43]  ? __lock_acquire+0xaf0/0x1570
> [  247.088492][   T43]  do_filp_open+0x1b3/0x3e0
> [  247.088616][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  247.088736][   T43]  ? find_held_lock+0x2c/0x110
> [  247.088871][   T43]  ? do_raw_spin_lock+0x131/0x270
> [  247.088995][   T43]  ? __pfx_do_raw_spin_lock+0x10/0x10
> [  247.089118][   T43]  ? alloc_fd+0x1f5/0x650
> [  247.089214][   T43]  ? do_raw_spin_unlock+0x58/0x220
> [  247.089367][   T43]  ? _raw_spin_unlock+0x23/0x40
> [  247.089493][   T43]  ? alloc_fd+0x1f5/0x650
> [  247.089624][   T43]  do_sys_openat2+0x122/0x160
> [  247.089777][   T43]  ? __pfx_do_sys_openat2+0x10/0x10
> [  247.089901][   T43]  ? __pfx___debug_check_no_obj_freed+0x10/0x10
> [  247.090060][   T43]  ? __virt_addr_valid+0x230/0x430
> [  247.090192][   T43]  __x64_sys_openat+0x123/0x1e0
> [  247.090329][   T43]  ? __pfx___x64_sys_openat+0x10/0x10
> [  247.090484][   T43]  do_syscall_64+0xc1/0x1d0
> [  247.090607][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  247.090758][   T43] RIP: 0033:0x7f4b57a460e8
> [  247.090901][   T43] RSP: 002b:00007ffd58925788 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> [  247.091085][   T43] RAX: ffffffffffffffda RBX: 00007ffd58925a0f RCX: 00007f4b57a460e8
> [  247.091303][   T43] RDX: 0000000000080000 RSI: 00007ffd58925800 RDI: 00000000ffffff9c
> [  247.091500][   T43] RBP: 00007ffd589257f0 R08: 0000000000080000 R09: 00007ffd58925800
> [  247.091682][   T43] R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffd58925807
> [  247.091873][   T43] R13: 00007ffd58925a20 R14: 00007ffd58925800 R15: 00007f4b57a17000
> [  247.092070][   T43]  </TASK>
> [  247.092167][   T43] INFO: task rename_device:421 blocked for more than 122 seconds.
> [  247.092346][   T43]       Not tainted 6.10.0-virtme #1
> [  247.092467][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  247.092690][   T43] task:rename_device   state:D stack:26464 pid:421   tgid:421   ppid:1      flags:0x00004004
> [  247.092926][   T43] Call Trace:
> [  247.093016][   T43]  <TASK>
> [  247.093110][   T43]  __schedule+0x6e0/0x17e0
> [  247.093265][   T43]  ? __pfx___schedule+0x10/0x10
> [  247.093392][   T43]  ? schedule+0x1a5/0x210
> [  247.093497][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  247.093636][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  247.093763][   T43]  ? schedule+0x1a5/0x210
> [  247.093861][   T43]  schedule+0xdf/0x210
> [  247.093963][   T43]  __wait_on_freeing_inode+0x115/0x280
> [  247.094084][   T43]  ? __pfx___wait_on_freeing_inode+0x10/0x10
> [  247.094249][   T43]  ? __pfx_wake_bit_function+0x10/0x10
> [  247.094395][   T43]  ? lock_acquire+0x32/0xc0
> [  247.094525][   T43]  ? find_inode_fast+0x158/0x450
> [  247.094654][   T43]  find_inode_fast+0x18d/0x450
> [  247.094779][   T43]  iget_locked+0x7d/0x390
> [  247.094895][   T43]  ? hlock_class+0x4e/0x130
> [  247.095029][   T43]  v9fs_fid_iget_dotl+0x78/0x2d0
> [  247.095156][   T43]  v9fs_vfs_lookup.part.0+0x1ed/0x390
> [  247.095304][   T43]  ? __pfx_v9fs_vfs_lookup.part.0+0x10/0x10
> [  247.095455][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  247.095598][   T43]  __lookup_slow+0x209/0x3c0
> [  247.095721][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  247.095865][   T43]  ? walk_component+0x29e/0x4f0
> [  247.096013][   T43]  walk_component+0x2ab/0x4f0
> [  247.096137][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  247.096335][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  247.096492][   T43]  path_openat+0x1be/0x440
> [  247.096633][   T43]  ? __pfx_path_openat+0x10/0x10
> [  247.096770][   T43]  ? __lock_acquire+0xaf0/0x1570
> [  247.096895][   T43]  do_filp_open+0x1b3/0x3e0
> [  247.097025][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  247.097144][   T43]  ? find_held_lock+0x2c/0x110
> [  247.097310][   T43]  ? do_raw_spin_lock+0x131/0x270
> [  247.097433][   T43]  ? __pfx_do_raw_spin_lock+0x10/0x10
> [  247.097567][   T43]  ? alloc_fd+0x1f5/0x650
> [  247.097674][   T43]  ? do_raw_spin_unlock+0x58/0x220
> [  247.097797][   T43]  ? _raw_spin_unlock+0x23/0x40
> [  247.097923][   T43]  ? alloc_fd+0x1f5/0x650
> [  247.098022][   T43]  do_sys_openat2+0x122/0x160
> [  247.098144][   T43]  ? __pfx_do_sys_openat2+0x10/0x10
> [  247.098276][   T43]  ? __up_read+0x1a7/0x780
> [  247.098402][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  247.098523][   T43]  ? __pfx___up_read+0x10/0x10
> [  247.098650][   T43]  __x64_sys_openat+0x123/0x1e0
> [  247.098770][   T43]  ? __pfx___x64_sys_openat+0x10/0x10
> [  247.098907][   T43]  do_syscall_64+0xc1/0x1d0
> [  247.099029][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  247.099178][   T43] RIP: 0033:0x7fb4a75850e8
> [  247.099311][   T43] RSP: 002b:00007fffb3f40268 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> [  247.099490][   T43] RAX: ffffffffffffffda RBX: 00007fffb3f4033f RCX: 00007fb4a75850e8
> [  247.099666][   T43] RDX: 0000000000080000 RSI: 00007fb4a75565f0 RDI: 00000000ffffff9c
> [  247.099844][   T43] RBP: 00007fffb3f402d0 R08: 0000000000080000 R09: 00007fb4a75565f0
> [  247.100022][   T43] R10: 0000000000000000 R11: 0000000000000287 R12: 0000000000000000
> [  247.100203][   T43] R13: 00007fffb3f40350 R14: 00007fb4a75565f0 R15: 00007fffb3f40350
> [  247.100402][   T43]  </TASK>
> [  247.100496][   T43] INFO: task (spawn):422 blocked for more than 122 seconds.
> [  247.100660][   T43]       Not tainted 6.10.0-virtme #1
> [  247.100777][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  247.100969][   T43] task:(spawn)         state:D stack:26464 pid:422   tgid:422   ppid:1      flags:0x00004006
> [  247.101202][   T43] Call Trace:
> [  247.101300][   T43]  <TASK>
> [  247.101367][   T43]  __schedule+0x6e0/0x17e0
> [  247.101494][   T43]  ? __pfx___schedule+0x10/0x10
> [  247.101614][   T43]  ? schedule+0x1a5/0x210
> [  247.101710][   T43]  ? __pfx___lock_release+0x10/0x10
> [  247.101828][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  247.101950][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  247.102080][   T43]  ? schedule+0x1a5/0x210
> [  247.102176][   T43]  schedule+0xdf/0x210
> [  247.102280][   T43]  d_alloc_parallel+0xaef/0xed0
> [  247.102411][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
> [  247.102536][   T43]  ? __pfx_default_wake_function+0x10/0x10
> [  247.102690][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  247.102817][   T43]  __lookup_slow+0x17f/0x3c0
> [  247.102963][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  247.103092][   T43]  ? walk_component+0x29e/0x4f0
> [  247.103251][   T43]  walk_component+0x2ab/0x4f0
> [  247.103375][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  247.103549][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  247.103712][   T43]  path_openat+0x1be/0x440
> [  247.103838][   T43]  ? __pfx_path_openat+0x10/0x10
> [  247.103997][   T43]  ? hlock_class+0x4e/0x130
> [  247.104117][   T43]  ? mark_lock+0x38/0x3e0
> [  247.104244][   T43]  do_filp_open+0x1b3/0x3e0
> [  247.104375][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  247.104507][   T43]  ? __lock_release+0x103/0x460
> [  247.104625][   T43]  ? __create_object+0x5e/0xb0
> [  247.104746][   T43]  ? __pfx___lock_release+0x10/0x10
> [  247.104868][   T43]  ? hlock_class+0x4e/0x130
> [  247.104999][   T43]  ? mark_held_locks+0x9e/0xe0
> [  247.105120][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
> [  247.105286][   T43]  do_open_execat+0xb0/0x280
> [  247.105421][   T43]  ? __pfx_do_open_execat+0x10/0x10
> [  247.105544][   T43]  ? getname_kernel+0x14c/0x2d0
> [  247.105689][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
> [  247.105814][   T43]  open_exec+0x2b/0x50
> [  247.105909][   T43]  load_elf_binary+0x504/0x2880
> [  247.106069][   T43]  ? find_held_lock+0x2c/0x110
> [  247.106195][   T43]  ? __lock_release+0x103/0x460
> [  247.106345][   T43]  ? search_binary_handler+0x134/0x550
> [  247.106480][   T43]  ? __pfx___lock_release+0x10/0x10
> [  247.106599][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  247.106728][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  247.106847][   T43]  ? __pfx_load_elf_binary+0x10/0x10
> [  247.106991][   T43]  ? search_binary_handler+0x134/0x550
> [  247.107126][   T43]  search_binary_handler+0x152/0x550
> [  247.107258][   T43]  ? __pfx_search_binary_handler+0x10/0x10
> [  247.107438][   T43]  ? __task_pid_nr_ns+0x11a/0x410
> [  247.107564][   T43]  ? exec_binprm+0x100/0x3c0
> [  247.107707][   T43]  exec_binprm+0x119/0x3c0
> [  247.107833][   T43]  bprm_execve+0xe9/0x4f0
> [  247.107927][   T43]  do_execveat_common.isra.0+0x3eb/0x530
> [  247.108064][   T43]  __x64_sys_execve+0x8c/0xb0
> [  247.108193][   T43]  do_syscall_64+0xc1/0x1d0
> [  247.108344][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  247.108494][   T43] RIP: 0033:0x7fa4f34f740b
> [  247.108650][   T43] RSP: 002b:00007ffd893fd6d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
> [  247.108832][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d5daf0 RCX: 00007fa4f34f740b
> [  247.109026][   T43] RDX: 0000560ac3d62360 RSI: 0000560ac3d76bc0 RDI: 0000560ac3d8d8f0
> [  247.109213][   T43] RBP: 00007ffd893fd800 R08: 0000000000000000 R09: 0000000000000000
> [  247.109403][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd89401890
> [  247.109604][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd893fd720 R15: 0000000000000000
> [  247.109792][   T43]  </TASK>
> [  247.109885][   T43] INFO: task (spawn):423 blocked for more than 122 seconds.
> [  247.110074][   T43]       Not tainted 6.10.0-virtme #1
> [  247.110188][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  247.110412][   T43] task:(spawn)         state:D stack:26608 pid:423   tgid:423   ppid:1      flags:0x00004006
> [  247.110654][   T43] Call Trace:
> [  247.110744][   T43]  <TASK>
> [  247.110820][   T43]  __schedule+0x6e0/0x17e0
> [  247.110961][   T43]  ? __pfx___schedule+0x10/0x10
> [  247.111080][   T43]  ? schedule+0x1a5/0x210
> [  247.111196][   T43]  ? __pfx___lock_release+0x10/0x10
> [  247.111336][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  247.111458][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  247.111599][   T43]  ? schedule+0x1a5/0x210
> [  247.111693][   T43]  schedule+0xdf/0x210
> [  247.111786][   T43]  d_alloc_parallel+0xaef/0xed0
> [  247.111928][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
> [  247.112051][   T43]  ? __pfx_default_wake_function+0x10/0x10
> [  247.112239][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  247.112367][   T43]  __lookup_slow+0x17f/0x3c0
> [  247.112490][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  247.112618][   T43]  ? walk_component+0x29e/0x4f0
> [  247.112753][   T43]  walk_component+0x2ab/0x4f0
> [  247.112877][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  247.113028][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  247.113186][   T43]  path_openat+0x1be/0x440
> [  247.113362][   T43]  ? __pfx_path_openat+0x10/0x10
> [  247.113483][   T43]  ? hlock_class+0x4e/0x130
> [  247.113612][   T43]  ? mark_lock+0x38/0x3e0
> [  247.113729][   T43]  do_filp_open+0x1b3/0x3e0
> [  247.113850][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  247.113989][   T43]  ? __lock_release+0x103/0x460
> [  247.114107][   T43]  ? __create_object+0x5e/0xb0
> [  247.114255][   T43]  ? __pfx___lock_release+0x10/0x10
> [  247.114388][   T43]  ? hlock_class+0x4e/0x130
> [  247.114514][   T43]  ? mark_held_locks+0x9e/0xe0
> [  247.114653][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
> [  247.114814][   T43]  do_open_execat+0xb0/0x280
> [  247.114947][   T43]  ? __pfx_do_open_execat+0x10/0x10
> [  247.115089][   T43]  ? getname_kernel+0x14c/0x2d0
> [  247.115206][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
> [  247.115347][   T43]  open_exec+0x2b/0x50
> [  247.115443][   T43]  load_elf_binary+0x504/0x2880
> [  247.115591][   T43]  ? find_held_lock+0x2c/0x110
> [  247.115713][   T43]  ? __lock_release+0x103/0x460
> [  247.115838][   T43]  ? search_binary_handler+0x134/0x550
> [  247.115984][   T43]  ? __pfx___lock_release+0x10/0x10
> [  247.116103][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  247.116272][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  247.116395][   T43]  ? __pfx_load_elf_binary+0x10/0x10
> [  247.116526][   T43]  ? search_binary_handler+0x134/0x550
> [  247.116658][   T43]  search_binary_handler+0x152/0x550
> [  247.116801][   T43]  ? __pfx_search_binary_handler+0x10/0x10
> [  247.116950][   T43]  ? __task_pid_nr_ns+0x11a/0x410
> [  247.117081][   T43]  ? exec_binprm+0x100/0x3c0
> [  247.117248][   T43]  exec_binprm+0x119/0x3c0
> [  247.117378][   T43]  bprm_execve+0xe9/0x4f0
> [  247.117472][   T43]  do_execveat_common.isra.0+0x3eb/0x530
> [  247.117596][   T43]  __x64_sys_execve+0x8c/0xb0
> [  247.117725][   T43]  do_syscall_64+0xc1/0x1d0
> [  247.117847][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  247.117994][   T43] RIP: 0033:0x7fa4f34f740b
> [  247.118116][   T43] RSP: 002b:00007ffd89405918 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
> [  247.118309][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d94430 RCX: 00007fa4f34f740b
> [  247.118483][   T43] RDX: 0000560ac3d5ff20 RSI: 0000560ac3d76cd0 RDI: 0000560ac3d5e240
> [  247.118660][   T43] RBP: 00007ffd89405a40 R08: 0000000000000008 R09: 0000000000000000
> [  247.118838][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 0000560ac3d767d0
> [  247.119013][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd89405960 R15: 0000000000000000
> [  247.119201][   T43]  </TASK>
> [  247.119300][   T43] INFO: task (spawn):424 blocked for more than 122 seconds.
> [  247.119463][   T43]       Not tainted 6.10.0-virtme #1
> [  247.119583][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  247.119772][   T43] task:(spawn)         state:D stack:26464 pid:424   tgid:424   ppid:1      flags:0x00004006
> [  247.120005][   T43] Call Trace:
> [  247.120096][   T43]  <TASK>
> [  247.120161][   T43]  __schedule+0x6e0/0x17e0
> [  247.120298][   T43]  ? __pfx___schedule+0x10/0x10
> [  247.120417][   T43]  ? schedule+0x1a5/0x210
> [  247.120519][   T43]  ? __pfx___lock_release+0x10/0x10
> [  247.120636][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  247.120756][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  247.120881][   T43]  ? schedule+0x1a5/0x210
> [  247.120976][   T43]  schedule+0xdf/0x210
> [  247.121069][   T43]  d_alloc_parallel+0xaef/0xed0
> [  247.121200][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
> [  247.121443][   T43]  ? __pfx_default_wake_function+0x10/0x10
> [  247.121596][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  247.121728][   T43]  __lookup_slow+0x17f/0x3c0
> [  247.121849][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  247.122028][   T43]  ? walk_component+0x29e/0x4f0
> [  247.122162][   T43]  walk_component+0x2ab/0x4f0
> [  247.122295][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  247.122447][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  247.122713][   T43]  path_openat+0x1be/0x440
> [  247.122839][   T43]  ? __pfx_path_openat+0x10/0x10
> [  247.122959][   T43]  ? hlock_class+0x4e/0x130
> [  247.123078][   T43]  ? mark_lock+0x38/0x3e0
> [  247.123175][   T43]  do_filp_open+0x1b3/0x3e0
> [  247.123379][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  247.123511][   T43]  ? __lock_release+0x103/0x460
> [  247.123628][   T43]  ? __create_object+0x5e/0xb0
> [  247.123775][   T43]  ? __pfx___lock_release+0x10/0x10
> [  247.123990][   T43]  ? hlock_class+0x4e/0x130
> [  247.124128][   T43]  ? mark_held_locks+0x9e/0xe0
> [  247.124263][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
> [  247.124444][   T43]  do_open_execat+0xb0/0x280
> [  247.124622][   T43]  ? __pfx_do_open_execat+0x10/0x10
> [  247.124747][   T43]  ? getname_kernel+0x14c/0x2d0
> [  247.124896][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
> [  247.125019][   T43]  open_exec+0x2b/0x50
> [  247.125194][   T43]  load_elf_binary+0x504/0x2880
> [  247.125335][   T43]  ? find_held_lock+0x2c/0x110
> [  247.125466][   T43]  ? __lock_release+0x103/0x460
> [  247.125589][   T43]  ? search_binary_handler+0x134/0x550
> [  247.125723][   T43]  ? __pfx___lock_release+0x10/0x10
> [  247.125916][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  247.126036][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  247.126184][   T43]  ? __pfx_load_elf_binary+0x10/0x10
> [  247.126330][   T43]  ? search_binary_handler+0x134/0x550
> [  247.126536][   T43]  search_binary_handler+0x152/0x550
> [  247.126659][   T43]  ? __pfx_search_binary_handler+0x10/0x10
> [  247.126819][   T43]  ? __task_pid_nr_ns+0x11a/0x410
> [  247.126943][   T43]  ? exec_binprm+0x100/0x3c0
> [  247.127144][   T43]  exec_binprm+0x119/0x3c0
> [  247.127281][   T43]  bprm_execve+0xe9/0x4f0
> [  247.127403][   T43]  do_execveat_common.isra.0+0x3eb/0x530
> [  247.127529][   T43]  __x64_sys_execve+0x8c/0xb0
> [  247.127732][   T43]  do_syscall_64+0xc1/0x1d0
> [  247.127854][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  247.128016][   T43] RIP: 0033:0x7fa4f34f740b
> [  247.128140][   T43] RSP: 002b:00007ffd893fd6d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
> [  247.128413][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d2df30 RCX: 00007fa4f34f740b
> [  247.128601][   T43] RDX: 0000560ac3d7ff10 RSI: 0000560ac3d8d940 RDI: 0000560ac3d33950
> [  247.128798][   T43] RBP: 00007ffd893fd800 R08: 0000000000000000 R09: 0000000000000000
> [  247.129060][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd89401890
> [  247.129249][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd893fd720 R15: 0000000000000000
> [  247.129440][   T43]  </TASK>
> [  247.129587][   T43]
> [  247.129587][   T43] Showing all locks held in the system:
> [  247.129764][   T43] 1 lock held by khungtaskd/43:
> [  247.129901][   T43]  #0: ffffffff97b54b80 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x70/0x3a0
> [  247.130210][   T43] 1 lock held by ip/419:
> [  247.130311][   T43]  #0: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  247.130587][   T43] 1 lock held by rename_device/421:
> [  247.130713][   T43]  #0: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  247.131030][   T43] 2 locks held by (spawn)/422:
> [  247.131146][   T43]  #0: ffff888005bc4238 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
> [  247.131431][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  247.131698][   T43] 2 locks held by (spawn)/423:
> [  247.131813][   T43]  #0: ffff888005bc0578 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
> [  247.132112][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  247.132379][   T43] 2 locks held by (spawn)/424:
> [  247.132502][   T43]  #0: ffff888005bc0c38 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
> [  247.132852][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  247.133095][   T43]
> [  247.133169][   T43] =============================================
> [  247.133169][   T43]
> [  369.964331][   T43] INFO: task ip:419 blocked for more than 245 seconds.
> [  369.964619][   T43]       Not tainted 6.10.0-virtme #1
> [  369.964734][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  369.964953][   T43] task:ip              state:D stack:26288 pid:419   tgid:419   ppid:233    flags:0x00000000
> [  369.965192][   T43] Call Trace:
> [  369.965327][   T43]  <TASK>
> [  369.965400][   T43]  __schedule+0x6e0/0x17e0
> [  369.965536][   T43]  ? __pfx___schedule+0x10/0x10
> [  369.965659][   T43]  ? schedule+0x1a5/0x210
> [  369.965757][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  369.965881][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  369.966011][   T43]  ? schedule+0x1a5/0x210
> [  369.966113][   T43]  schedule+0xdf/0x210
> [  369.966209][   T43]  d_alloc_parallel+0xaef/0xed0
> [  369.966364][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
> [  369.966502][   T43]  ? __pfx_default_wake_function+0x10/0x10
> [  369.966657][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  369.966785][   T43]  __lookup_slow+0x17f/0x3c0
> [  369.966907][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  369.967033][   T43]  ? walk_component+0x29e/0x4f0
> [  369.967172][   T43]  walk_component+0x2ab/0x4f0
> [  369.967305][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  369.967462][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  369.967617][   T43]  path_openat+0x1be/0x440
> [  369.967743][   T43]  ? __pfx_path_openat+0x10/0x10
> [  369.967867][   T43]  ? __lock_acquire+0xaf0/0x1570
> [  369.967993][   T43]  do_filp_open+0x1b3/0x3e0
> [  369.968115][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  369.968243][   T43]  ? find_held_lock+0x2c/0x110
> [  369.968374][   T43]  ? do_raw_spin_lock+0x131/0x270
> [  369.968499][   T43]  ? __pfx_do_raw_spin_lock+0x10/0x10
> [  369.968620][   T43]  ? alloc_fd+0x1f5/0x650
> [  369.968714][   T43]  ? do_raw_spin_unlock+0x58/0x220
> [  369.968837][   T43]  ? _raw_spin_unlock+0x23/0x40
> [  369.968960][   T43]  ? alloc_fd+0x1f5/0x650
> [  369.969059][   T43]  do_sys_openat2+0x122/0x160
> [  369.969181][   T43]  ? __pfx_do_sys_openat2+0x10/0x10
> [  369.969312][   T43]  ? __pfx___debug_check_no_obj_freed+0x10/0x10
> [  369.969465][   T43]  ? __virt_addr_valid+0x230/0x430
> [  369.969591][   T43]  __x64_sys_openat+0x123/0x1e0
> [  369.969713][   T43]  ? __pfx___x64_sys_openat+0x10/0x10
> [  369.969869][   T43]  do_syscall_64+0xc1/0x1d0
> [  369.969996][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  369.970156][   T43] RIP: 0033:0x7f4b57a460e8
> [  369.970330][   T43] RSP: 002b:00007ffd58925788 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> [  369.970663][   T43] RAX: ffffffffffffffda RBX: 00007ffd58925a0f RCX: 00007f4b57a460e8
> [  369.970849][   T43] RDX: 0000000000080000 RSI: 00007ffd58925800 RDI: 00000000ffffff9c
> [  369.971027][   T43] RBP: 00007ffd589257f0 R08: 0000000000080000 R09: 00007ffd58925800
> [  369.971241][   T43] R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffd58925807
> [  369.971439][   T43] R13: 00007ffd58925a20 R14: 00007ffd58925800 R15: 00007f4b57a17000
> [  369.971675][   T43]  </TASK>
> [  369.971769][   T43] INFO: task rename_device:421 blocked for more than 245 seconds.
> [  369.971933][   T43]       Not tainted 6.10.0-virtme #1
> [  369.972051][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  369.972296][   T43] task:rename_device   state:D stack:26464 pid:421   tgid:421   ppid:1      flags:0x00004004
> [  369.972551][   T43] Call Trace:
> [  369.972648][   T43]  <TASK>
> [  369.972715][   T43]  __schedule+0x6e0/0x17e0
> [  369.972860][   T43]  ? __pfx___schedule+0x10/0x10
> [  369.972980][   T43]  ? schedule+0x1a5/0x210
> [  369.973082][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  369.973237][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  369.973364][   T43]  ? schedule+0x1a5/0x210
> [  369.973463][   T43]  schedule+0xdf/0x210
> [  369.973555][   T43]  __wait_on_freeing_inode+0x115/0x280
> [  369.973687][   T43]  ? __pfx___wait_on_freeing_inode+0x10/0x10
> [  369.973840][   T43]  ? __pfx_wake_bit_function+0x10/0x10
> [  369.973966][   T43]  ? lock_acquire+0x32/0xc0
> [  369.974093][   T43]  ? find_inode_fast+0x158/0x450
> [  369.974257][   T43]  find_inode_fast+0x18d/0x450
> [  369.974383][   T43]  iget_locked+0x7d/0x390
> [  369.974501][   T43]  ? hlock_class+0x4e/0x130
> [  369.974629][   T43]  v9fs_fid_iget_dotl+0x78/0x2d0
> [  369.974762][   T43]  v9fs_vfs_lookup.part.0+0x1ed/0x390
> [  369.974902][   T43]  ? __pfx_v9fs_vfs_lookup.part.0+0x10/0x10
> [  369.975059][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  369.975189][   T43]  __lookup_slow+0x209/0x3c0
> [  369.975347][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  369.975483][   T43]  ? walk_component+0x29e/0x4f0
> [  369.975644][   T43]  walk_component+0x2ab/0x4f0
> [  369.975770][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  369.975964][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  369.976120][   T43]  path_openat+0x1be/0x440
> [  369.976269][   T43]  ? __pfx_path_openat+0x10/0x10
> [  369.976394][   T43]  ? __lock_acquire+0xaf0/0x1570
> [  369.976534][   T43]  do_filp_open+0x1b3/0x3e0
> [  369.976655][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  369.976796][   T43]  ? find_held_lock+0x2c/0x110
> [  369.976935][   T43]  ? do_raw_spin_lock+0x131/0x270
> [  369.977054][   T43]  ? __pfx_do_raw_spin_lock+0x10/0x10
> [  369.977181][   T43]  ? alloc_fd+0x1f5/0x650
> [  369.977283][   T43]  ? do_raw_spin_unlock+0x58/0x220
> [  369.977407][   T43]  ? _raw_spin_unlock+0x23/0x40
> [  369.977547][   T43]  ? alloc_fd+0x1f5/0x650
> [  369.977648][   T43]  do_sys_openat2+0x122/0x160
> [  369.977769][   T43]  ? __pfx_do_sys_openat2+0x10/0x10
> [  369.977908][   T43]  ? __up_read+0x1a7/0x780
> [  369.978036][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  369.978162][   T43]  ? __pfx___up_read+0x10/0x10
> [  369.978323][   T43]  __x64_sys_openat+0x123/0x1e0
> [  369.978446][   T43]  ? __pfx___x64_sys_openat+0x10/0x10
> [  369.978585][   T43]  do_syscall_64+0xc1/0x1d0
> [  369.978708][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  369.978877][   T43] RIP: 0033:0x7fb4a75850e8
> [  369.979008][   T43] RSP: 002b:00007fffb3f40268 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> [  369.979204][   T43] RAX: ffffffffffffffda RBX: 00007fffb3f4033f RCX: 00007fb4a75850e8
> [  369.979427][   T43] RDX: 0000000000080000 RSI: 00007fb4a75565f0 RDI: 00000000ffffff9c
> [  369.979602][   T43] RBP: 00007fffb3f402d0 R08: 0000000000080000 R09: 00007fb4a75565f0
> [  369.979780][   T43] R10: 0000000000000000 R11: 0000000000000287 R12: 0000000000000000
> [  369.979957][   T43] R13: 00007fffb3f40350 R14: 00007fb4a75565f0 R15: 00007fffb3f40350
> [  369.980151][   T43]  </TASK>
> [  369.980252][   T43] INFO: task (spawn):422 blocked for more than 245 seconds.
> [  369.980427][   T43]       Not tainted 6.10.0-virtme #1
> [  369.980548][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  369.980759][   T43] task:(spawn)         state:D stack:26464 pid:422   tgid:422   ppid:1      flags:0x00004006
> [  369.980997][   T43] Call Trace:
> [  369.981110][   T43]  <TASK>
> [  369.981181][   T43]  __schedule+0x6e0/0x17e0
> [  369.981321][   T43]  ? __pfx___schedule+0x10/0x10
> [  369.981441][   T43]  ? schedule+0x1a5/0x210
> [  369.981533][   T43]  ? __pfx___lock_release+0x10/0x10
> [  369.981652][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  369.981775][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  369.981926][   T43]  ? schedule+0x1a5/0x210
> [  369.982024][   T43]  schedule+0xdf/0x210
> [  369.982122][   T43]  d_alloc_parallel+0xaef/0xed0
> [  369.982297][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
> [  369.982422][   T43]  ? __pfx_default_wake_function+0x10/0x10
> [  369.982584][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  369.982738][   T43]  __lookup_slow+0x17f/0x3c0
> [  369.982859][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  369.982994][   T43]  ? walk_component+0x29e/0x4f0
> [  369.983152][   T43]  walk_component+0x2ab/0x4f0
> [  369.983293][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  369.983483][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  369.983641][   T43]  path_openat+0x1be/0x440
> [  369.983781][   T43]  ? __pfx_path_openat+0x10/0x10
> [  369.983900][   T43]  ? hlock_class+0x4e/0x130
> [  369.984034][   T43]  ? mark_lock+0x38/0x3e0
> [  369.984135][   T43]  do_filp_open+0x1b3/0x3e0
> [  369.984273][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  369.984430][   T43]  ? __lock_release+0x103/0x460
> [  369.984552][   T43]  ? __create_object+0x5e/0xb0
> [  369.984689][   T43]  ? __pfx___lock_release+0x10/0x10
> [  369.984811][   T43]  ? hlock_class+0x4e/0x130
> [  369.984957][   T43]  ? mark_held_locks+0x9e/0xe0
> [  369.985081][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
> [  369.985256][   T43]  do_open_execat+0xb0/0x280
> [  369.985395][   T43]  ? __pfx_do_open_execat+0x10/0x10
> [  369.985521][   T43]  ? getname_kernel+0x14c/0x2d0
> [  369.985647][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
> [  369.985772][   T43]  open_exec+0x2b/0x50
> [  369.985867][   T43]  load_elf_binary+0x504/0x2880
> [  369.985995][   T43]  ? find_held_lock+0x2c/0x110
> [  369.986118][   T43]  ? __lock_release+0x103/0x460
> [  369.986255][   T43]  ? search_binary_handler+0x134/0x550
> [  369.986399][   T43]  ? __pfx___lock_release+0x10/0x10
> [  369.986519][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  369.986661][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  369.986801][   T43]  ? __pfx_load_elf_binary+0x10/0x10
> [  369.986926][   T43]  ? search_binary_handler+0x134/0x550
> [  369.987070][   T43]  search_binary_handler+0x152/0x550
> [  369.987193][   T43]  ? __pfx_search_binary_handler+0x10/0x10
> [  369.987351][   T43]  ? __task_pid_nr_ns+0x11a/0x410
> [  369.987473][   T43]  ? exec_binprm+0x100/0x3c0
> [  369.987598][   T43]  exec_binprm+0x119/0x3c0
> [  369.987728][   T43]  bprm_execve+0xe9/0x4f0
> [  369.987821][   T43]  do_execveat_common.isra.0+0x3eb/0x530
> [  369.987947][   T43]  __x64_sys_execve+0x8c/0xb0
> [  369.988070][   T43]  do_syscall_64+0xc1/0x1d0
> [  369.988192][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  369.988347][   T43] RIP: 0033:0x7fa4f34f740b
> [  369.988472][   T43] RSP: 002b:00007ffd893fd6d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
> [  369.988655][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d5daf0 RCX: 00007fa4f34f740b
> [  369.988835][   T43] RDX: 0000560ac3d62360 RSI: 0000560ac3d76bc0 RDI: 0000560ac3d8d8f0
> [  369.989010][   T43] RBP: 00007ffd893fd800 R08: 0000000000000000 R09: 0000000000000000
> [  369.989188][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd89401890
> [  369.989372][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd893fd720 R15: 0000000000000000
> [  369.989566][   T43]  </TASK>
> [  369.989659][   T43] INFO: task (spawn):423 blocked for more than 245 seconds.
> [  369.989828][   T43]       Not tainted 6.10.0-virtme #1
> [  369.989939][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  369.990129][   T43] task:(spawn)         state:D stack:26608 pid:423   tgid:423   ppid:1      flags:0x00004006
> [  369.990382][   T43] Call Trace:
> [  369.990476][   T43]  <TASK>
> [  369.990547][   T43]  __schedule+0x6e0/0x17e0
> [  369.990674][   T43]  ? __pfx___schedule+0x10/0x10
> [  369.990795][   T43]  ? schedule+0x1a5/0x210
> [  369.990890][   T43]  ? __pfx___lock_release+0x10/0x10
> [  369.991008][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  369.991128][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  369.991321][   T43]  ? schedule+0x1a5/0x210
> [  369.991418][   T43]  schedule+0xdf/0x210
> [  369.991516][   T43]  d_alloc_parallel+0xaef/0xed0
> [  369.991647][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
> [  369.991824][   T43]  ? __pfx_default_wake_function+0x10/0x10
> [  369.991973][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  369.992102][   T43]  __lookup_slow+0x17f/0x3c0
> [  369.992234][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  369.992416][   T43]  ? walk_component+0x29e/0x4f0
> [  369.992552][   T43]  walk_component+0x2ab/0x4f0
> [  369.992694][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  369.992846][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  369.993134][   T43]  path_openat+0x1be/0x440
> [  369.993291][   T43]  ? __pfx_path_openat+0x10/0x10
> [  369.993412][   T43]  ? hlock_class+0x4e/0x130
> [  369.993572][   T43]  ? mark_lock+0x38/0x3e0
> [  369.993671][   T43]  do_filp_open+0x1b3/0x3e0
> [  369.993858][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  369.994001][   T43]  ? __lock_release+0x103/0x460
> [  369.994124][   T43]  ? __create_object+0x5e/0xb0
> [  369.994282][   T43]  ? __pfx___lock_release+0x10/0x10
> [  369.994475][   T43]  ? hlock_class+0x4e/0x130
> [  369.994611][   T43]  ? mark_held_locks+0x9e/0xe0
> [  369.994733][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
> [  369.994885][   T43]  do_open_execat+0xb0/0x280
> [  369.995060][   T43]  ? __pfx_do_open_execat+0x10/0x10
> [  369.995200][   T43]  ? getname_kernel+0x14c/0x2d0
> [  369.995349][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
> [  369.995472][   T43]  open_exec+0x2b/0x50
> [  369.995619][   T43]  load_elf_binary+0x504/0x2880
> [  369.995745][   T43]  ? find_held_lock+0x2c/0x110
> [  369.995867][   T43]  ? __lock_release+0x103/0x460
> [  369.995986][   T43]  ? search_binary_handler+0x134/0x550
> [  369.996105][   T43]  ? __pfx___lock_release+0x10/0x10
> [  369.996289][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  369.996414][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  369.996536][   T43]  ? __pfx_load_elf_binary+0x10/0x10
> [  369.996670][   T43]  ? search_binary_handler+0x134/0x550
> [  369.996863][   T43]  search_binary_handler+0x152/0x550
> [  369.997001][   T43]  ? __pfx_search_binary_handler+0x10/0x10
> [  369.997154][   T43]  ? __task_pid_nr_ns+0x11a/0x410
> [  369.997296][   T43]  ? exec_binprm+0x100/0x3c0
> [  369.997475][   T43]  exec_binprm+0x119/0x3c0
> [  369.997628][   T43]  bprm_execve+0xe9/0x4f0
> [  369.997730][   T43]  do_execveat_common.isra.0+0x3eb/0x530
> [  369.997874][   T43]  __x64_sys_execve+0x8c/0xb0
> [  369.998058][   T43]  do_syscall_64+0xc1/0x1d0
> [  369.998212][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  369.998382][   T43] RIP: 0033:0x7fa4f34f740b
> [  369.998530][   T43] RSP: 002b:00007ffd89405918 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
> [  369.998781][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d94430 RCX: 00007fa4f34f740b
> [  369.998962][   T43] RDX: 0000560ac3d5ff20 RSI: 0000560ac3d76cd0 RDI: 0000560ac3d5e240
> [  369.999164][   T43] RBP: 00007ffd89405a40 R08: 0000000000000008 R09: 0000000000000000
> [  369.999409][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 0000560ac3d767d0
> [  369.999584][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd89405960 R15: 0000000000000000
> [  369.999775][   T43]  </TASK>
> [  369.999921][   T43] INFO: task (spawn):424 blocked for more than 245 seconds.
> [  370.000087][   T43]       Not tainted 6.10.0-virtme #1
> [  370.000198][   T43] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  370.000445][   T43] task:(spawn)         state:D stack:26464 pid:424   tgid:424   ppid:1      flags:0x00004006
> [  370.000676][   T43] Call Trace:
> [  370.000765][   T43]  <TASK>
> [  370.000833][   T43]  __schedule+0x6e0/0x17e0
> [  370.001017][   T43]  ? __pfx___schedule+0x10/0x10
> [  370.001136][   T43]  ? schedule+0x1a5/0x210
> [  370.001239][   T43]  ? __pfx___lock_release+0x10/0x10
> [  370.001372][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  370.001513][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  370.001692][   T43]  ? schedule+0x1a5/0x210
> [  370.001788][   T43]  schedule+0xdf/0x210
> [  370.001885][   T43]  d_alloc_parallel+0xaef/0xed0
> [  370.002017][   T43]  ? __pfx_d_alloc_parallel+0x10/0x10
> [  370.002195][   T43]  ? __pfx_default_wake_function+0x10/0x10
> [  370.002364][   T43]  ? lockdep_init_map_type+0x2cb/0x7c0
> [  370.002497][   T43]  __lookup_slow+0x17f/0x3c0
> [  370.002617][   T43]  ? __pfx___lookup_slow+0x10/0x10
> [  370.002884][   T43]  ? walk_component+0x29e/0x4f0
> [  370.003020][   T43]  walk_component+0x2ab/0x4f0
> [  370.003161][   T43]  link_path_walk.part.0.constprop.0+0x416/0x940
> [  370.003326][   T43]  ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
> [  370.003537][   T43]  path_openat+0x1be/0x440
> [  370.003660][   T43]  ? __pfx_path_openat+0x10/0x10
> [  370.003780][   T43]  ? hlock_class+0x4e/0x130
> [  370.003899][   T43]  ? mark_lock+0x38/0x3e0
> [  370.003997][   T43]  do_filp_open+0x1b3/0x3e0
> [  370.004178][   T43]  ? __pfx_do_filp_open+0x10/0x10
> [  370.004318][   T43]  ? __lock_release+0x103/0x460
> [  370.004442][   T43]  ? __create_object+0x5e/0xb0
> [  370.004562][   T43]  ? __pfx___lock_release+0x10/0x10
> [  370.004737][   T43]  ? hlock_class+0x4e/0x130
> [  370.004882][   T43]  ? mark_held_locks+0x9e/0xe0
> [  370.005005][   T43]  ? lockdep_hardirqs_on_prepare+0x275/0x410
> [  370.005160][   T43]  do_open_execat+0xb0/0x280
> [  370.005376][   T43]  ? __pfx_do_open_execat+0x10/0x10
> [  370.005503][   T43]  ? getname_kernel+0x14c/0x2d0
> [  370.005625][   T43]  ? __kmalloc_noprof+0x1d7/0x3d0
> [  370.005747][   T43]  open_exec+0x2b/0x50
> [  370.005896][   T43]  load_elf_binary+0x504/0x2880
> [  370.006022][   T43]  ? find_held_lock+0x2c/0x110
> [  370.006144][   T43]  ? __lock_release+0x103/0x460
> [  370.006278][   T43]  ? search_binary_handler+0x134/0x550
> [  370.006399][   T43]  ? __pfx___lock_release+0x10/0x10
> [  370.006572][   T43]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  370.006698][   T43]  ? trace_lock_acquire+0x14d/0x1f0
> [  370.006819][   T43]  ? __pfx_load_elf_binary+0x10/0x10
> [  370.006943][   T43]  ? search_binary_handler+0x134/0x550
> [  370.007121][   T43]  search_binary_handler+0x152/0x550
> [  370.007252][   T43]  ? __pfx_search_binary_handler+0x10/0x10
> [  370.007401][   T43]  ? __task_pid_nr_ns+0x11a/0x410
> [  370.007522][   T43]  ? exec_binprm+0x100/0x3c0
> [  370.007703][   T43]  exec_binprm+0x119/0x3c0
> [  370.007830][   T43]  bprm_execve+0xe9/0x4f0
> [  370.007923][   T43]  do_execveat_common.isra.0+0x3eb/0x530
> [  370.008047][   T43]  __x64_sys_execve+0x8c/0xb0
> [  370.008234][   T43]  do_syscall_64+0xc1/0x1d0
> [  370.008359][   T43]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  370.008506][   T43] RIP: 0033:0x7fa4f34f740b
> [  370.008628][   T43] RSP: 002b:00007ffd893fd6d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
> [  370.008863][   T43] RAX: ffffffffffffffda RBX: 0000560ac3d2df30 RCX: 00007fa4f34f740b
> [  370.009043][   T43] RDX: 0000560ac3d7ff10 RSI: 0000560ac3d8d940 RDI: 0000560ac3d33950
> [  370.009216][   T43] RBP: 00007ffd893fd800 R08: 0000000000000000 R09: 0000000000000000
> [  370.009461][   T43] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd89401890
> [  370.009635][   T43] R13: 00007fa4f2c3a9d8 R14: 00007ffd893fd720 R15: 0000000000000000
> [  370.009830][   T43]  </TASK>
> [  370.009975][   T43] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
> [  370.010169][   T43]
> [  370.010169][   T43] Showing all locks held in the system:
> [  370.010359][   T43] 1 lock held by khungtaskd/43:
> [  370.010531][   T43]  #0: ffffffff97b54b80 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x70/0x3a0
> [  370.010753][   T43] 1 lock held by ip/419:
> [  370.010840][   T43]  #0: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  370.011146][   T43] 1 lock held by rename_device/421:
> [  370.011269][   T43]  #0: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  370.011511][   T43] 2 locks held by (spawn)/422:
> [  370.011678][   T43]  #0: ffff888005bc4238 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
> [  370.011888][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  370.012133][   T43] 2 locks held by (spawn)/423:
> [  370.012365][   T43]  #0: ffff888005bc0578 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
> [  370.012579][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  370.012874][   T43] 2 locks held by (spawn)/424:
> [  370.012988][   T43]  #0: ffff888005bc0c38 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x51/0x4f0
> [  370.013202][   T43]  #1: ffff888001470148 (&type->i_mutex_dir_key#3){++++}-{3:3}, at: walk_component+0x29e/0x4f0
> [  370.013512][   T43]
> [  370.013571][   T43] =============================================
> [  370.013571][   T43]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-17 18:01 ` Eric Van Hensbergen
@ 2024-07-17 20:45   ` Dominique Martinet
  2024-07-17 23:30     ` Jakub Kicinski
  2024-07-17 21:42   ` Jakub Kicinski
  1 sibling, 1 reply; 14+ messages in thread
From: Dominique Martinet @ 2024-07-17 20:45 UTC (permalink / raw)
  To: Jakub Kicinski, Eric Van Hensbergen
  Cc: Latchesar Ionkov, Christian Schoenebeck, v9fs

Eric Van Hensbergen wrote on Wed, Jul 17, 2024 at 01:01:26PM -0500:
> I held way back on adding anything new over the past cycle, so this
> sounds likely some external change maybe perturbing an older issue.
> What guest side mount options, qemu options and qemu version are you
> using?

Mount option if any would be much appreciated, yes.
Also, what version you upgraded from to limit the limit the search a
bit; the previous 6.9 tree?

Also couldn't find much infos about the CI in general, given the traces
I don't think the workload matters all that much but it can give
something to start with e.g. running from initrd and mounting 9p or
using 9p as rootfs; that shouldn't make a difference but might as well
try to reproduce from something close to the actual netdev CI given the
low reproducibility.

> Dominique, you aware of anything coming in that might have messed with
> this?  Maybe the new common code refactoring underpinning
> readahead/etc.?

Given the trace it's probably stuck in __wait_on_freeing_inode (the only
schedule() call in find_inode_fast), so we could be looking at
fallbacks from the iget changes but that made it in 6.9 already so
that's unlikely...

I have a feeling that adding cache will make the error go away if you
just want to fix tests for now, so would be great if you could try
adding -o cache=loose to the 9p mount options; hopefully the inodes
won't be freed as often and that'll make the race harder to hit.

I had a quick look at other globals fs changes and nothing stood out,
it's going to be fun to track down.

> On Wed, Jul 17, 2024 at 12:24 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > Apologies for possible duplicate, I tried to search lore and haven't
> > found anything, but also I can't find the exact 9p archive there :S

The new list is here: https://lore.kernel.org/v9fs/
You probably didn't miss anything :)

-- 
Dominique

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-17 20:45   ` Dominique Martinet
@ 2024-07-17 23:30     ` Jakub Kicinski
  2024-07-18  1:14       ` Dominique Martinet
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-07-17 23:30 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	v9fs

On Thu, 18 Jul 2024 05:45:18 +0900 Dominique Martinet wrote:
> Eric Van Hensbergen wrote on Wed, Jul 17, 2024 at 01:01:26PM -0500:
> > I held way back on adding anything new over the past cycle, so this
> > sounds likely some external change maybe perturbing an older issue.
> > What guest side mount options, qemu options and qemu version are you
> > using?  
> 
> Mount option if any would be much appreciated, yes.
> Also, what version you upgraded from to limit the limit the search a
> bit; the previous 6.9 tree?

Between networking pull requests. Basically
528dd46d0fc35c0176257a13a27d41e44fcc6cb3 to 
51835949dda3783d4639cfa74ce13a3c9829de00

> Also couldn't find much infos about the CI in general, given the traces
> I don't think the workload matters all that much but it can give
> something to start with e.g. running from initrd and mounting 9p or
> using 9p as rootfs; that shouldn't make a difference but might as well
> try to reproduce from something close to the actual netdev CI given the
> low reproducibility.

What CI does is described here:
https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
it's fairly simple, virtme-ng just runs kselftests inside a VM with 
the host FS mounted via 9p-virtio.

I'd offer running experiments on the exact systems, but this week 
I'm partially AFK (at a conference).

> > Dominique, you aware of anything coming in that might have messed with
> > this?  Maybe the new common code refactoring underpinning
> > readahead/etc.?  
> 
> Given the trace it's probably stuck in __wait_on_freeing_inode (the only
> schedule() call in find_inode_fast), so we could be looking at
> fallbacks from the iget changes but that made it in 6.9 already so
> that's unlikely...
> 
> I have a feeling that adding cache will make the error go away if you
> just want to fix tests for now, so would be great if you could try
> adding -o cache=loose to the 9p mount options; hopefully the inodes
> won't be freed as often and that'll make the race harder to hit.

Poked that into where I think virtme-ng does the mounting, let's see.

> I had a quick look at other globals fs changes and nothing stood out,
> it's going to be fun to track down.
> 
> > On Wed, Jul 17, 2024 at 12:24 PM Jakub Kicinski <kuba@kernel.org> wrote:  
> > > Apologies for possible duplicate, I tried to search lore and haven't
> > > found anything, but also I can't find the exact 9p archive there :S  
> 
> The new list is here: https://lore.kernel.org/v9fs/
> You probably didn't miss anything :)
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-17 23:30     ` Jakub Kicinski
@ 2024-07-18  1:14       ` Dominique Martinet
  2024-07-18  3:54         ` Mateusz Guzik
  0 siblings, 1 reply; 14+ messages in thread
From: Dominique Martinet @ 2024-07-18  1:14 UTC (permalink / raw)
  To: Jakub Kicinski, Mateusz Guzik
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	Jan Kara, Christian Brauner, v9fs

Jakub Kicinski wrote on Wed, Jul 17, 2024 at 04:30:51PM -0700:
> On Thu, 18 Jul 2024 05:45:18 +0900 Dominique Martinet wrote:
> > Eric Van Hensbergen wrote on Wed, Jul 17, 2024 at 01:01:26PM -0500:
> > > I held way back on adding anything new over the past cycle, so this
> > > sounds likely some external change maybe perturbing an older issue.
> > > What guest side mount options, qemu options and qemu version are you
> > > using?  
> > 
> > Mount option if any would be much appreciated, yes.
> > Also, what version you upgraded from to limit the limit the search a
> > bit; the previous 6.9 tree?
> 
> Between networking pull requests. Basically
> 528dd46d0fc35c0176257a13a27d41e44fcc6cb3 to 
> 51835949dda3783d4639cfa74ce13a3c9829de00

Thanks, this includes this commit:
7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")

I think that's our most likely candidate at this point, adding Mateusz
and others involved in Cc until I find more time to reproduce.

@Mateusz, Jan, Christian - sorry if it wasn't the culprit.
We're seeing new hangs in find_inode_fast between the above two commits,
see the extact hang traces top of  the original mail:
https://lore.kernel.org/v9fs/CAFkjPTnTDWrfCSVzvtSszjoerOeWBn8Z6DC4qFMH70Aq7Pqr1Q@mail.gmail.com/T/#t


> > Also couldn't find much infos about the CI in general, given the traces
> > I don't think the workload matters all that much but it can give
> > something to start with e.g. running from initrd and mounting 9p or
> > using 9p as rootfs; that shouldn't make a difference but might as well
> > try to reproduce from something close to the actual netdev CI given the
> > low reproducibility.
> 
> What CI does is described here:
> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
> it's fairly simple, virtme-ng just runs kselftests inside a VM with 
> the host FS mounted via 9p-virtio.
> 
> I'd offer running experiments on the exact systems, but this week 
> I'm partially AFK (at a conference).

Thanks.
You're not doing anything specific, I'll hopefully be able to reproduce
it running similar workloads at which point I'll try to improve the
reproducer and get a closer look if Mateusz or others didn't beat me to
it.

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-18  1:14       ` Dominique Martinet
@ 2024-07-18  3:54         ` Mateusz Guzik
  2024-07-18  4:08           ` Mateusz Guzik
  0 siblings, 1 reply; 14+ messages in thread
From: Mateusz Guzik @ 2024-07-18  3:54 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Jakub Kicinski, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Jan Kara, Christian Brauner, v9fs

On Thu, Jul 18, 2024 at 3:15 AM Dominique Martinet
<asmadeus@codewreck.org> wrote:
>
> Jakub Kicinski wrote on Wed, Jul 17, 2024 at 04:30:51PM -0700:
> > On Thu, 18 Jul 2024 05:45:18 +0900 Dominique Martinet wrote:
> > > Eric Van Hensbergen wrote on Wed, Jul 17, 2024 at 01:01:26PM -0500:
> > > > I held way back on adding anything new over the past cycle, so this
> > > > sounds likely some external change maybe perturbing an older issue.
> > > > What guest side mount options, qemu options and qemu version are you
> > > > using?
> > >
> > > Mount option if any would be much appreciated, yes.
> > > Also, what version you upgraded from to limit the limit the search a
> > > bit; the previous 6.9 tree?
> >
> > Between networking pull requests. Basically
> > 528dd46d0fc35c0176257a13a27d41e44fcc6cb3 to
> > 51835949dda3783d4639cfa74ce13a3c9829de00
>
> Thanks, this includes this commit:
> 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")
>
> I think that's our most likely candidate at this point, adding Mateusz
> and others involved in Cc until I find more time to reproduce.
>
> @Mateusz, Jan, Christian - sorry if it wasn't the culprit.
> We're seeing new hangs in find_inode_fast between the above two commits,
> see the extact hang traces top of  the original mail:
> https://lore.kernel.org/v9fs/CAFkjPTnTDWrfCSVzvtSszjoerOeWBn8Z6DC4qFMH70Aq7Pqr1Q@mail.gmail.com/T/#t
>
>
> > > Also couldn't find much infos about the CI in general, given the traces
> > > I don't think the workload matters all that much but it can give
> > > something to start with e.g. running from initrd and mounting 9p or
> > > using 9p as rootfs; that shouldn't make a difference but might as well
> > > try to reproduce from something close to the actual netdev CI given the
> > > low reproducibility.
> >
> > What CI does is described here:
> > https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
> > it's fairly simple, virtme-ng just runs kselftests inside a VM with
> > the host FS mounted via 9p-virtio.
> >
> > I'd offer running experiments on the exact systems, but this week
> > I'm partially AFK (at a conference).
>
> Thanks.
> You're not doing anything specific, I'll hopefully be able to reproduce
> it running similar workloads at which point I'll try to improve the
> reproducer and get a closer look if Mateusz or others didn't beat me to
> it.
>

I think I see the bug -- find_inode_fast can race if evict calling
remove_inode_hash and issuing the wakeup.

Specifically it can find the inode and only get the lock *after* both
removal and wakeup are done. At that point one of the set flags is
I_FREEING, which is a sufficient condition for it to block.

Previously it was interlocked with the inode_hash_lock.

Easiest & safest way out is for the lockless lookup to never block and
instead fallback to locked operation if it can't immediately get the
inode. Another options is to take advantage of the invariant on
i_state:
         BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));

and not fail to find the inode considering this value.

That said let me chew on it.
-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-18  3:54         ` Mateusz Guzik
@ 2024-07-18  4:08           ` Mateusz Guzik
  2024-07-18 10:42             ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Mateusz Guzik @ 2024-07-18  4:08 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Jakub Kicinski, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Jan Kara, Christian Brauner, v9fs

On Thu, Jul 18, 2024 at 5:54 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Thu, Jul 18, 2024 at 3:15 AM Dominique Martinet
> <asmadeus@codewreck.org> wrote:
> >
> > Jakub Kicinski wrote on Wed, Jul 17, 2024 at 04:30:51PM -0700:
> > > On Thu, 18 Jul 2024 05:45:18 +0900 Dominique Martinet wrote:
> > > > Eric Van Hensbergen wrote on Wed, Jul 17, 2024 at 01:01:26PM -0500:
> > > > > I held way back on adding anything new over the past cycle, so this
> > > > > sounds likely some external change maybe perturbing an older issue.
> > > > > What guest side mount options, qemu options and qemu version are you
> > > > > using?
> > > >
> > > > Mount option if any would be much appreciated, yes.
> > > > Also, what version you upgraded from to limit the limit the search a
> > > > bit; the previous 6.9 tree?
> > >
> > > Between networking pull requests. Basically
> > > 528dd46d0fc35c0176257a13a27d41e44fcc6cb3 to
> > > 51835949dda3783d4639cfa74ce13a3c9829de00
> >
> > Thanks, this includes this commit:
> > 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")
> >
> > I think that's our most likely candidate at this point, adding Mateusz
> > and others involved in Cc until I find more time to reproduce.
> >
> > @Mateusz, Jan, Christian - sorry if it wasn't the culprit.
> > We're seeing new hangs in find_inode_fast between the above two commits,
> > see the extact hang traces top of  the original mail:
> > https://lore.kernel.org/v9fs/CAFkjPTnTDWrfCSVzvtSszjoerOeWBn8Z6DC4qFMH70Aq7Pqr1Q@mail.gmail.com/T/#t
> >
> >
> > > > Also couldn't find much infos about the CI in general, given the traces
> > > > I don't think the workload matters all that much but it can give
> > > > something to start with e.g. running from initrd and mounting 9p or
> > > > using 9p as rootfs; that shouldn't make a difference but might as well
> > > > try to reproduce from something close to the actual netdev CI given the
> > > > low reproducibility.
> > >
> > > What CI does is described here:
> > > https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
> > > it's fairly simple, virtme-ng just runs kselftests inside a VM with
> > > the host FS mounted via 9p-virtio.
> > >
> > > I'd offer running experiments on the exact systems, but this week
> > > I'm partially AFK (at a conference).
> >
> > Thanks.
> > You're not doing anything specific, I'll hopefully be able to reproduce
> > it running similar workloads at which point I'll try to improve the
> > reproducer and get a closer look if Mateusz or others didn't beat me to
> > it.
> >
>
> I think I see the bug -- find_inode_fast can race if evict calling
> remove_inode_hash and issuing the wakeup.
>
> Specifically it can find the inode and only get the lock *after* both
> removal and wakeup are done. At that point one of the set flags is
> I_FREEING, which is a sufficient condition for it to block.
>
> Previously it was interlocked with the inode_hash_lock.
>
> Easiest & safest way out is for the lockless lookup to never block and
> instead fallback to locked operation if it can't immediately get the
> inode. Another options is to take advantage of the invariant on
> i_state:
>          BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
>
> and not fail to find the inode considering this value.
>
> That said let me chew on it.

I'm going to have to do dayjob stuff, will look into reproducing it
later (if i'm correct about the bug it should be viable to add some
artificial delays to make it triggerable much easier).

But should you repro in the meantime, the code below should do it (i
don't know if that's specifically what I'm going to submit though):
diff --git a/fs/inode.c b/fs/inode.c
index f356fe2ec2b6..6faeac55c3ec 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -912,6 +912,11 @@ static struct inode *find_inode(struct super_block *sb,
                if (!test(inode, data))
                        continue;
                spin_lock(&inode->i_lock);
+               if (unlikely(inode_unhashed(inode))) {
+                       BUG_ON(locked);
+                       spin_unlock(&inode->i_lock);
+                       goto repeat;
+               }
                if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
                        __wait_on_freeing_inode(inode, locked);
                        goto repeat;
@@ -953,6 +958,11 @@ static struct inode *find_inode_fast(struct
super_block *sb,
                if (inode->i_sb != sb)
                        continue;
                spin_lock(&inode->i_lock);
+               if (unlikely(inode_unhashed(inode))) {
+                       BUG_ON(locked);
+                       spin_unlock(&inode->i_lock);
+                       goto repeat;
+               }
                if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
                        __wait_on_freeing_inode(inode, locked);
                        goto repeat;


-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-18  4:08           ` Mateusz Guzik
@ 2024-07-18 10:42             ` Jan Kara
  2024-07-18 10:57               ` Mateusz Guzik
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2024-07-18 10:42 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Dominique Martinet, Jakub Kicinski, Eric Van Hensbergen,
	Latchesar Ionkov, Christian Schoenebeck, Jan Kara,
	Christian Brauner, v9fs

On Thu 18-07-24 06:08:45, Mateusz Guzik wrote:
> On Thu, Jul 18, 2024 at 5:54 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 3:15 AM Dominique Martinet
> > <asmadeus@codewreck.org> wrote:
> > >
> > > Jakub Kicinski wrote on Wed, Jul 17, 2024 at 04:30:51PM -0700:
> > > > On Thu, 18 Jul 2024 05:45:18 +0900 Dominique Martinet wrote:
> > > > > Eric Van Hensbergen wrote on Wed, Jul 17, 2024 at 01:01:26PM -0500:
> > > > > > I held way back on adding anything new over the past cycle, so this
> > > > > > sounds likely some external change maybe perturbing an older issue.
> > > > > > What guest side mount options, qemu options and qemu version are you
> > > > > > using?
> > > > >
> > > > > Mount option if any would be much appreciated, yes.
> > > > > Also, what version you upgraded from to limit the limit the search a
> > > > > bit; the previous 6.9 tree?
> > > >
> > > > Between networking pull requests. Basically
> > > > 528dd46d0fc35c0176257a13a27d41e44fcc6cb3 to
> > > > 51835949dda3783d4639cfa74ce13a3c9829de00
> > >
> > > Thanks, this includes this commit:
> > > 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")
> > >
> > > I think that's our most likely candidate at this point, adding Mateusz
> > > and others involved in Cc until I find more time to reproduce.
> > >
> > > @Mateusz, Jan, Christian - sorry if it wasn't the culprit.
> > > We're seeing new hangs in find_inode_fast between the above two commits,
> > > see the extact hang traces top of  the original mail:
> > > https://lore.kernel.org/v9fs/CAFkjPTnTDWrfCSVzvtSszjoerOeWBn8Z6DC4qFMH70Aq7Pqr1Q@mail.gmail.com/T/#t
> > >
> > >
> > > > > Also couldn't find much infos about the CI in general, given the traces
> > > > > I don't think the workload matters all that much but it can give
> > > > > something to start with e.g. running from initrd and mounting 9p or
> > > > > using 9p as rootfs; that shouldn't make a difference but might as well
> > > > > try to reproduce from something close to the actual netdev CI given the
> > > > > low reproducibility.
> > > >
> > > > What CI does is described here:
> > > > https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
> > > > it's fairly simple, virtme-ng just runs kselftests inside a VM with
> > > > the host FS mounted via 9p-virtio.
> > > >
> > > > I'd offer running experiments on the exact systems, but this week
> > > > I'm partially AFK (at a conference).
> > >
> > > Thanks.
> > > You're not doing anything specific, I'll hopefully be able to reproduce
> > > it running similar workloads at which point I'll try to improve the
> > > reproducer and get a closer look if Mateusz or others didn't beat me to
> > > it.
> > >
> >
> > I think I see the bug -- find_inode_fast can race if evict calling
> > remove_inode_hash and issuing the wakeup.
> >
> > Specifically it can find the inode and only get the lock *after* both
> > removal and wakeup are done. At that point one of the set flags is
> > I_FREEING, which is a sufficient condition for it to block.
> >
> > Previously it was interlocked with the inode_hash_lock.

I agree.

> > Easiest & safest way out is for the lockless lookup to never block and
> > instead fallback to locked operation if it can't immediately get the
> > inode. Another options is to take advantage of the invariant on
> > i_state:
> >          BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
> >
> > and not fail to find the inode considering this value.
> >
> > That said let me chew on it.
> 
> I'm going to have to do dayjob stuff, will look into reproducing it
> later (if i'm correct about the bug it should be viable to add some
> artificial delays to make it triggerable much easier).
> 
> But should you repro in the meantime, the code below should do it (i
> don't know if that's specifically what I'm going to submit though):

Maybe it would be more elegant as:

diff --git a/fs/inode.c b/fs/inode.c
index f356fe2ec2b6..d0c39e71e403 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2291,6 +2291,16 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
 {
 	wait_queue_head_t *wq;
 	DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
+
+	/*
+	 * If the inode is already unhashed, evict() may be past the __I_NEW
+	 * wakeup. Return immediately to avoid sleeping indefinitely.
+	 */
+	if (unlikely(inode_unhashed(inode))) {
+		BUG_ON(locked);
+		return;
+	}
+
 	wq = bit_waitqueue(&inode->i_state, __I_NEW);
 	prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
 	spin_unlock(&inode->i_lock);
-- 
2.35.3

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-18 10:42             ` Jan Kara
@ 2024-07-18 10:57               ` Mateusz Guzik
  2024-07-18 11:40                 ` Mateusz Guzik
  0 siblings, 1 reply; 14+ messages in thread
From: Mateusz Guzik @ 2024-07-18 10:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dominique Martinet, Jakub Kicinski, Eric Van Hensbergen,
	Latchesar Ionkov, Christian Schoenebeck, Christian Brauner, v9fs

On Thu, Jul 18, 2024 at 12:42 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 18-07-24 06:08:45, Mateusz Guzik wrote:
> > > Previously it was interlocked with the inode_hash_lock.
>
> I agree.
>
> > > Easiest & safest way out is for the lockless lookup to never block and
> > > instead fallback to locked operation if it can't immediately get the
> > > inode. Another options is to take advantage of the invariant on
> > > i_state:
> > >          BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
> > >
> > > and not fail to find the inode considering this value.
> > >
> > > That said let me chew on it.
> >
> > I'm going to have to do dayjob stuff, will look into reproducing it
> > later (if i'm correct about the bug it should be viable to add some
> > artificial delays to make it triggerable much easier).
> >
> > But should you repro in the meantime, the code below should do it (i
> > don't know if that's specifically what I'm going to submit though):
>
> Maybe it would be more elegant as:
>
> diff --git a/fs/inode.c b/fs/inode.c
> index f356fe2ec2b6..d0c39e71e403 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2291,6 +2291,16 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
>  {
>         wait_queue_head_t *wq;
>         DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
> +
> +       /*
> +        * If the inode is already unhashed, evict() may be past the __I_NEW
> +        * wakeup. Return immediately to avoid sleeping indefinitely.
> +        */
> +       if (unlikely(inode_unhashed(inode))) {
> +               BUG_ON(locked);
> +               return;

this is missing spin_unlock(&inode->i_lock);

> +       }
> +
>         wq = bit_waitqueue(&inode->i_state, __I_NEW);
>         prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
>         spin_unlock(&inode->i_lock);
> --
> 2.35.3
>

Putting the check here was my first variant, I don't have a strong
opinion on this front. Whatever immediate fixup can get massaged. For
something to actually commit I was thinking a little nicer, see below.

While the bug being here is on me, I do think the current state
avoidably facilitated it by implicitly depending on the hash state.

In evict() you can find:
        spin_lock(&inode->i_lock);
        wake_up_bit(&inode->i_state, __I_NEW);
        BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
        spin_unlock(&inode->i_lock);

This should probably move to a dedicated routine to pair up with
__wait_on_freeing_inode (and yes, I know there are other spots which
wake up based i_state) -- this much could be done in the fixup. It
would also asserts on the unhashed condition so it would be apparent
this is the invariant for (dis)allowing sleep.

Longer term imo it should have something which provides the definite
"no wakeups past this point" indicator not dependent on the above,
maybe a dedicated i_state flag combination.

This brings me to lack of VFS_ debug macros which would allow for
generous sprinkling of asserts without affecting production kernels --
someone(tm) should look into it.

I'll try to find some time in the evening to repro and validate that
this is indeed the issue though.
-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-18 10:57               ` Mateusz Guzik
@ 2024-07-18 11:40                 ` Mateusz Guzik
  2024-07-18 12:16                   ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Mateusz Guzik @ 2024-07-18 11:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dominique Martinet, Jakub Kicinski, Eric Van Hensbergen,
	Latchesar Ionkov, Christian Schoenebeck, Christian Brauner, v9fs

On Thu, Jul 18, 2024 at 12:57 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Thu, Jul 18, 2024 at 12:42 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 18-07-24 06:08:45, Mateusz Guzik wrote:
> > > > Previously it was interlocked with the inode_hash_lock.
> >
> > I agree.
> >
> > > > Easiest & safest way out is for the lockless lookup to never block and
> > > > instead fallback to locked operation if it can't immediately get the
> > > > inode. Another options is to take advantage of the invariant on
> > > > i_state:
> > > >          BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
> > > >
> > > > and not fail to find the inode considering this value.
> > > >
> > > > That said let me chew on it.
> > >
> > > I'm going to have to do dayjob stuff, will look into reproducing it
> > > later (if i'm correct about the bug it should be viable to add some
> > > artificial delays to make it triggerable much easier).
> > >
> > > But should you repro in the meantime, the code below should do it (i
> > > don't know if that's specifically what I'm going to submit though):
> >
> > Maybe it would be more elegant as:
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index f356fe2ec2b6..d0c39e71e403 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -2291,6 +2291,16 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
> >  {
> >         wait_queue_head_t *wq;
> >         DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
> > +
> > +       /*
> > +        * If the inode is already unhashed, evict() may be past the __I_NEW
> > +        * wakeup. Return immediately to avoid sleeping indefinitely.
> > +        */
> > +       if (unlikely(inode_unhashed(inode))) {
> > +               BUG_ON(locked);
> > +               return;
>
> this is missing spin_unlock(&inode->i_lock);
>
> > +       }
> > +
> >         wq = bit_waitqueue(&inode->i_state, __I_NEW);
> >         prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> >         spin_unlock(&inode->i_lock);
> > --
> > 2.35.3
> >
>
> Putting the check here was my first variant, I don't have a strong
> opinion on this front. Whatever immediate fixup can get massaged. For
> something to actually commit I was thinking a little nicer, see below.
>
> While the bug being here is on me, I do think the current state
> avoidably facilitated it by implicitly depending on the hash state.
>
> In evict() you can find:
>         spin_lock(&inode->i_lock);
>         wake_up_bit(&inode->i_state, __I_NEW);
>         BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
>         spin_unlock(&inode->i_lock);
>
> This should probably move to a dedicated routine to pair up with
> __wait_on_freeing_inode (and yes, I know there are other spots which
> wake up based i_state) -- this much could be done in the fixup. It
> would also asserts on the unhashed condition so it would be apparent
> this is the invariant for (dis)allowing sleep.
>
> Longer term imo it should have something which provides the definite
> "no wakeups past this point" indicator not dependent on the above,
> maybe a dedicated i_state flag combination.
>
> This brings me to lack of VFS_ debug macros which would allow for
> generous sprinkling of asserts without affecting production kernels --
> someone(tm) should look into it.
>
> I'll try to find some time in the evening to repro and validate that
> this is indeed the issue though.
> --

Now that I had some caffeine I see we both messed up here -- bailing
on the unhashed status without I_FREEING is not necessarily equivalent
to doing this after, so the check indeed should land in
__wait_on_freeing_inode.

That said, assuming testing passes later, I'm going to submit this
(with maybe some word-smithing):

diff --git a/fs/inode.c b/fs/inode.c
index f356fe2ec2b6..e1082d61c093 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -676,7 +676,18 @@ static void evict(struct inode *inode)

        remove_inode_hash(inode);

+       /*
+        * Wake up waiters in __wait_on_freeing_inode().
+        *
+        * Lockless hash lookup may end up finding the inode before we removed
+        * it above, but only lock it *after* we are done with the wakeup below.
+        * Should this happen there will be noone to wake the waiter.
+        *
+        * Use inode_unhashed() status as an indicator whether blocking there is
+        * safe.
+        */
        spin_lock(&inode->i_lock);
+       BUG_ON(!inode_unhashed(inode));
        wake_up_bit(&inode->i_state, __I_NEW);
        BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
        spin_unlock(&inode->i_lock);
@@ -2291,6 +2302,16 @@ static void __wait_on_freeing_inode(struct
inode *inode, bool locked)
 {
        wait_queue_head_t *wq;
        DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
+
+       /*
+        * Handle racing against evict().
+        */
+       if (unlikely(inode_unhashed(inode))) {
+               BUG_ON(locked);
+               spin_unlock(&inode->i_lock);
+               return;
+       }
+
        wq = bit_waitqueue(&inode->i_state, __I_NEW);
        prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
        spin_unlock(&inode->i_lock);

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-18 11:40                 ` Mateusz Guzik
@ 2024-07-18 12:16                   ` Jan Kara
  2024-07-18 12:29                     ` Mateusz Guzik
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2024-07-18 12:16 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Jan Kara, Dominique Martinet, Jakub Kicinski, Eric Van Hensbergen,
	Latchesar Ionkov, Christian Schoenebeck, Christian Brauner, v9fs

On Thu 18-07-24 13:40:14, Mateusz Guzik wrote:
> On Thu, Jul 18, 2024 at 12:57 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 12:42 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Thu 18-07-24 06:08:45, Mateusz Guzik wrote:
> > > > > Previously it was interlocked with the inode_hash_lock.
> > >
> > > I agree.
> > >
> > > > > Easiest & safest way out is for the lockless lookup to never block and
> > > > > instead fallback to locked operation if it can't immediately get the
> > > > > inode. Another options is to take advantage of the invariant on
> > > > > i_state:
> > > > >          BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
> > > > >
> > > > > and not fail to find the inode considering this value.
> > > > >
> > > > > That said let me chew on it.
> > > >
> > > > I'm going to have to do dayjob stuff, will look into reproducing it
> > > > later (if i'm correct about the bug it should be viable to add some
> > > > artificial delays to make it triggerable much easier).
> > > >
> > > > But should you repro in the meantime, the code below should do it (i
> > > > don't know if that's specifically what I'm going to submit though):
> > >
> > > Maybe it would be more elegant as:
> > >
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index f356fe2ec2b6..d0c39e71e403 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -2291,6 +2291,16 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
> > >  {
> > >         wait_queue_head_t *wq;
> > >         DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
> > > +
> > > +       /*
> > > +        * If the inode is already unhashed, evict() may be past the __I_NEW
> > > +        * wakeup. Return immediately to avoid sleeping indefinitely.
> > > +        */
> > > +       if (unlikely(inode_unhashed(inode))) {
> > > +               BUG_ON(locked);
> > > +               return;
> >
> > this is missing spin_unlock(&inode->i_lock);
> >
> > > +       }
> > > +
> > >         wq = bit_waitqueue(&inode->i_state, __I_NEW);
> > >         prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> > >         spin_unlock(&inode->i_lock);
> > > --
> > > 2.35.3
> > >
> >
> > Putting the check here was my first variant, I don't have a strong
> > opinion on this front. Whatever immediate fixup can get massaged. For
> > something to actually commit I was thinking a little nicer, see below.
> >
> > While the bug being here is on me, I do think the current state
> > avoidably facilitated it by implicitly depending on the hash state.
> >
> > In evict() you can find:
> >         spin_lock(&inode->i_lock);
> >         wake_up_bit(&inode->i_state, __I_NEW);
> >         BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
> >         spin_unlock(&inode->i_lock);
> >
> > This should probably move to a dedicated routine to pair up with
> > __wait_on_freeing_inode (and yes, I know there are other spots which
> > wake up based i_state) -- this much could be done in the fixup. It
> > would also asserts on the unhashed condition so it would be apparent
> > this is the invariant for (dis)allowing sleep.
> >
> > Longer term imo it should have something which provides the definite
> > "no wakeups past this point" indicator not dependent on the above,
> > maybe a dedicated i_state flag combination.
> >
> > This brings me to lack of VFS_ debug macros which would allow for
> > generous sprinkling of asserts without affecting production kernels --
> > someone(tm) should look into it.
> >
> > I'll try to find some time in the evening to repro and validate that
> > this is indeed the issue though.
> > --
> 
> Now that I had some caffeine I see we both messed up here -- bailing
> on the unhashed status without I_FREEING is not necessarily equivalent
> to doing this after, so the check indeed should land in
> __wait_on_freeing_inode.

Yeah.

> That said, assuming testing passes later, I'm going to submit this
> (with maybe some word-smithing):
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index f356fe2ec2b6..e1082d61c093 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -676,7 +676,18 @@ static void evict(struct inode *inode)
> 
>         remove_inode_hash(inode);
> 
> +       /*
> +        * Wake up waiters in __wait_on_freeing_inode().
> +        *
> +        * Lockless hash lookup may end up finding the inode before we removed
> +        * it above, but only lock it *after* we are done with the wakeup below.
> +        * Should this happen there will be noone to wake the waiter.
> +        *
> +        * Use inode_unhashed() status as an indicator whether blocking there is
> +        * safe.
> +        */
>         spin_lock(&inode->i_lock);
> +       BUG_ON(!inode_unhashed(inode));

So this is going to trip on XFS I'm afraid. XFS is not using VFS to hash
inodes and sets inode->i_hash through hlist_add_fake() and
remove_inode_hash() does nothing for it. So XFS inodes return
!inode_unhashed(inode) all the way until they are destroyed. Now this is
fine because these inodes have no way to get to __wait_on_freeing_inode().
Also the !inode_unhashed() check looks a bit weird just a few lines below
remove_inode_hash() so I'd just drop this check.

Otherwise the fix looks good to me!

								Honza


>         wake_up_bit(&inode->i_state, __I_NEW);
>         BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
>         spin_unlock(&inode->i_lock);
> @@ -2291,6 +2302,16 @@ static void __wait_on_freeing_inode(struct
> inode *inode, bool locked)
>  {
>         wait_queue_head_t *wq;
>         DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
> +
> +       /*
> +        * Handle racing against evict().
> +        */
> +       if (unlikely(inode_unhashed(inode))) {
> +               BUG_ON(locked);
> +               spin_unlock(&inode->i_lock);
> +               return;
> +       }
> +
>         wq = bit_waitqueue(&inode->i_state, __I_NEW);
>         prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
>         spin_unlock(&inode->i_lock);
> 
> -- 
> Mateusz Guzik <mjguzik gmail.com>
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-18 12:16                   ` Jan Kara
@ 2024-07-18 12:29                     ` Mateusz Guzik
  0 siblings, 0 replies; 14+ messages in thread
From: Mateusz Guzik @ 2024-07-18 12:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dominique Martinet, Jakub Kicinski, Eric Van Hensbergen,
	Latchesar Ionkov, Christian Schoenebeck, Christian Brauner, v9fs

On Thu, Jul 18, 2024 at 2:16 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 18-07-24 13:40:14, Mateusz Guzik wrote:
> > On Thu, Jul 18, 2024 at 12:57 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> > >
> > > On Thu, Jul 18, 2024 at 12:42 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Thu 18-07-24 06:08:45, Mateusz Guzik wrote:
> > > > > > Previously it was interlocked with the inode_hash_lock.
> > > >
> > > > I agree.
> > > >
> > > > > > Easiest & safest way out is for the lockless lookup to never block and
> > > > > > instead fallback to locked operation if it can't immediately get the
> > > > > > inode. Another options is to take advantage of the invariant on
> > > > > > i_state:
> > > > > >          BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
> > > > > >
> > > > > > and not fail to find the inode considering this value.
> > > > > >
> > > > > > That said let me chew on it.
> > > > >
> > > > > I'm going to have to do dayjob stuff, will look into reproducing it
> > > > > later (if i'm correct about the bug it should be viable to add some
> > > > > artificial delays to make it triggerable much easier).
> > > > >
> > > > > But should you repro in the meantime, the code below should do it (i
> > > > > don't know if that's specifically what I'm going to submit though):
> > > >
> > > > Maybe it would be more elegant as:
> > > >
> > > > diff --git a/fs/inode.c b/fs/inode.c
> > > > index f356fe2ec2b6..d0c39e71e403 100644
> > > > --- a/fs/inode.c
> > > > +++ b/fs/inode.c
> > > > @@ -2291,6 +2291,16 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
> > > >  {
> > > >         wait_queue_head_t *wq;
> > > >         DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
> > > > +
> > > > +       /*
> > > > +        * If the inode is already unhashed, evict() may be past the __I_NEW
> > > > +        * wakeup. Return immediately to avoid sleeping indefinitely.
> > > > +        */
> > > > +       if (unlikely(inode_unhashed(inode))) {
> > > > +               BUG_ON(locked);
> > > > +               return;
> > >
> > > this is missing spin_unlock(&inode->i_lock);
> > >
> > > > +       }
> > > > +
> > > >         wq = bit_waitqueue(&inode->i_state, __I_NEW);
> > > >         prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> > > >         spin_unlock(&inode->i_lock);
> > > > --
> > > > 2.35.3
> > > >
> > >
> > > Putting the check here was my first variant, I don't have a strong
> > > opinion on this front. Whatever immediate fixup can get massaged. For
> > > something to actually commit I was thinking a little nicer, see below.
> > >
> > > While the bug being here is on me, I do think the current state
> > > avoidably facilitated it by implicitly depending on the hash state.
> > >
> > > In evict() you can find:
> > >         spin_lock(&inode->i_lock);
> > >         wake_up_bit(&inode->i_state, __I_NEW);
> > >         BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
> > >         spin_unlock(&inode->i_lock);
> > >
> > > This should probably move to a dedicated routine to pair up with
> > > __wait_on_freeing_inode (and yes, I know there are other spots which
> > > wake up based i_state) -- this much could be done in the fixup. It
> > > would also asserts on the unhashed condition so it would be apparent
> > > this is the invariant for (dis)allowing sleep.
> > >
> > > Longer term imo it should have something which provides the definite
> > > "no wakeups past this point" indicator not dependent on the above,
> > > maybe a dedicated i_state flag combination.
> > >
> > > This brings me to lack of VFS_ debug macros which would allow for
> > > generous sprinkling of asserts without affecting production kernels --
> > > someone(tm) should look into it.
> > >
> > > I'll try to find some time in the evening to repro and validate that
> > > this is indeed the issue though.
> > > --
> >
> > Now that I had some caffeine I see we both messed up here -- bailing
> > on the unhashed status without I_FREEING is not necessarily equivalent
> > to doing this after, so the check indeed should land in
> > __wait_on_freeing_inode.
>
> Yeah.
>
> > That said, assuming testing passes later, I'm going to submit this
> > (with maybe some word-smithing):
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index f356fe2ec2b6..e1082d61c093 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -676,7 +676,18 @@ static void evict(struct inode *inode)
> >
> >         remove_inode_hash(inode);
> >
> > +       /*
> > +        * Wake up waiters in __wait_on_freeing_inode().
> > +        *
> > +        * Lockless hash lookup may end up finding the inode before we removed
> > +        * it above, but only lock it *after* we are done with the wakeup below.
> > +        * Should this happen there will be noone to wake the waiter.
> > +        *
> > +        * Use inode_unhashed() status as an indicator whether blocking there is
> > +        * safe.
> > +        */
> >         spin_lock(&inode->i_lock);
> > +       BUG_ON(!inode_unhashed(inode));
>
> So this is going to trip on XFS I'm afraid. XFS is not using VFS to hash
> inodes and sets inode->i_hash through hlist_add_fake() and
> remove_inode_hash() does nothing for it. So XFS inodes return
> !inode_unhashed(inode) all the way until they are destroyed. Now this is
> fine because these inodes have no way to get to __wait_on_freeing_inode().
> Also the !inode_unhashed() check looks a bit weird just a few lines below
> remove_inode_hash() so I'd just drop this check.
>
> Otherwise the fix looks good to me!
>

I'll poke at xfs, worst case I'll drop the assert.

Fixing the bug aside, perhaps I should elaborate that I'm bothered by
the fact that the initial condition indicating whether one can go to
sleep *still holds* after there are no legal means to do the wakeup.
The added assert merely tries to document the side condition which has
to be checked separately, but is indeed optional and it may happen to
get messed up by filesystems which don't use the inode hash.

Doing something about the entire ordeal landed somewhere on my TODO
list after the bug is sorted out.

Thanks for feedback.

>                                                                 Honza
>
>
> >         wake_up_bit(&inode->i_state, __I_NEW);
> >         BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
> >         spin_unlock(&inode->i_lock);
> > @@ -2291,6 +2302,16 @@ static void __wait_on_freeing_inode(struct
> > inode *inode, bool locked)
> >  {
> >         wait_queue_head_t *wq;
> >         DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
> > +
> > +       /*
> > +        * Handle racing against evict().
> > +        */
> > +       if (unlikely(inode_unhashed(inode))) {
> > +               BUG_ON(locked);
> > +               spin_unlock(&inode->i_lock);
> > +               return;
> > +       }
> > +
> >         wq = bit_waitqueue(&inode->i_state, __I_NEW);
> >         prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> >         spin_unlock(&inode->i_lock);
> >
> > --
> > Mateusz Guzik <mjguzik gmail.com>
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR



-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-17 18:01 ` Eric Van Hensbergen
  2024-07-17 20:45   ` Dominique Martinet
@ 2024-07-17 21:42   ` Jakub Kicinski
  2024-07-17 22:00     ` Eric Van Hensbergen
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-07-17 21:42 UTC (permalink / raw)
  To: Eric Van Hensbergen
  Cc: Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, v9fs

On Wed, 17 Jul 2024 13:01:26 -0500 Eric Van Hensbergen wrote:
> I held way back on adding anything new over the past cycle, so this
> sounds likely some external change maybe perturbing an older issue.

Ah :(

> What guest side mount options, qemu options and qemu version are you
> using?

I use virtme-ng, according to ps qemu gets called with:

/usr/local/bin/qemu-system-x86_64 -name virtme-ng -m 1G -fsdev local,id=virtfs5,path=/,security_model=none,readonly=on,multidevs=remap -device virtio-9p-pci,fsdev=virtfs5,mount_tag=/dev/root -machine accel=kvm:tcg -device i6300esb,id=watchdog0 -cpu host -parallel none -net none -echr 1 -chardev file,path=/proc/self/fd/2,id=dmesg -device virtio-serial-pci -device virtconsole,chardev=dmesg -chardev stdio,id=console,signal=off,mux=on 

-serial chardev:console -mon chardev=console -vga none -display none -smp 4 -kernel ./arch/x86/boot/bzImage -append virtme_hostname=virtme-ng nr_open=1073741816 virtme_link_mods=/home/virtme/testing-1/.virtme_mods/lib/modules/0.0.0 virtme_rw_overlay0=/etc virtme_rw_overlay1=/lib virtme_rw_overlay2=/home virtme_rw_overlay3=/opt virtme_rw_overlay4=/srv virtme_rw_overlay5=/usr virtme_rw_overlay6=/var virtme_rw_overlay7=/tmp console=hvc0 earlyprintk=serial,ttyS0,115200 virtme_console=ttyS0 psmouse.proto=exps virtme_chdir=home/virtme/testing-1 rootfstype=9p rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro init=/home/virtme/virtme-ng/virtme/guest/virtme-init

QEMU version is 8.2.1 (built from source)

> Dominique, you aware of anything coming in that might have messed with
> this?  Maybe the new common code refactoring underpinning
> readahead/etc.?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hangs after merge window changes
  2024-07-17 21:42   ` Jakub Kicinski
@ 2024-07-17 22:00     ` Eric Van Hensbergen
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Van Hensbergen @ 2024-07-17 22:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, v9fs

Thanks - I don't see anything suspicious in the mount options.  I'll
try and get some time this week to stare at the inode management code
and see if I can work out where we may have a corner case or race
condition messing with us.  I agree with Dominique that turning on
some levels of caching that might keep inodes around for longer
probably will work around the issue for now, but I guess we really
need a stress test of that path to see if can get the problem
reproduced more reliably.

Before turning on a caching mode, you could enable P9_DEBUG and turn
on error messages from 9p to see if something off is happening that is
hidden behind debug ifdefs (debug=1 in mount flags should be
sufficient).

On Wed, Jul 17, 2024 at 4:42 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 17 Jul 2024 13:01:26 -0500 Eric Van Hensbergen wrote:
> > I held way back on adding anything new over the past cycle, so this
> > sounds likely some external change maybe perturbing an older issue.
>
> Ah :(
>
> > What guest side mount options, qemu options and qemu version are you
> > using?
>
> I use virtme-ng, according to ps qemu gets called with:
>
> /usr/local/bin/qemu-system-x86_64 -name virtme-ng -m 1G -fsdev local,id=virtfs5,path=/,security_model=none,readonly=on,multidevs=remap -device virtio-9p-pci,fsdev=virtfs5,mount_tag=/dev/root -machine accel=kvm:tcg -device i6300esb,id=watchdog0 -cpu host -parallel none -net none -echr 1 -chardev file,path=/proc/self/fd/2,id=dmesg -device virtio-serial-pci -device virtconsole,chardev=dmesg -chardev stdio,id=console,signal=off,mux=on
>
> -serial chardev:console -mon chardev=console -vga none -display none -smp 4 -kernel ./arch/x86/boot/bzImage -append virtme_hostname=virtme-ng nr_open=1073741816 virtme_link_mods=/home/virtme/testing-1/.virtme_mods/lib/modules/0.0.0 virtme_rw_overlay0=/etc virtme_rw_overlay1=/lib virtme_rw_overlay2=/home virtme_rw_overlay3=/opt virtme_rw_overlay4=/srv virtme_rw_overlay5=/usr virtme_rw_overlay6=/var virtme_rw_overlay7=/tmp console=hvc0 earlyprintk=serial,ttyS0,115200 virtme_console=ttyS0 psmouse.proto=exps virtme_chdir=home/virtme/testing-1 rootfstype=9p rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro init=/home/virtme/virtme-ng/virtme/guest/virtme-init
>
> QEMU version is 8.2.1 (built from source)
>
> > Dominique, you aware of anything coming in that might have messed with
> > this?  Maybe the new common code refactoring underpinning
> > readahead/etc.?
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-07-18 12:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-17 17:24 hangs after merge window changes Jakub Kicinski
2024-07-17 18:01 ` Eric Van Hensbergen
2024-07-17 20:45   ` Dominique Martinet
2024-07-17 23:30     ` Jakub Kicinski
2024-07-18  1:14       ` Dominique Martinet
2024-07-18  3:54         ` Mateusz Guzik
2024-07-18  4:08           ` Mateusz Guzik
2024-07-18 10:42             ` Jan Kara
2024-07-18 10:57               ` Mateusz Guzik
2024-07-18 11:40                 ` Mateusz Guzik
2024-07-18 12:16                   ` Jan Kara
2024-07-18 12:29                     ` Mateusz Guzik
2024-07-17 21:42   ` Jakub Kicinski
2024-07-17 22:00     ` Eric Van Hensbergen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.