Linux XFS filesystem development
 help / color / mirror / Atom feed
* [bug report] fstests generic/774 hang
@ 2025-10-30  8:45 Shinichiro Kawasaki
  2025-11-05  0:33 ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-10-30  8:45 UTC (permalink / raw)
  To: linux-xfs@vger.kernel.org

I observe the fstests test case generic/774 hangs, when I run it for xfs on 8GiB
TCMU fileio devices. It was observed with v6.17 and v6.18-rcX kernel versions.
FYI, here I attach the kernel message log that was taken with v6.18-rc3 kernel
[1]. The hang is recreated in stable manner by repeating the test case a few
times in my environment.

Actions for fix will be appreciated. If I can do any help, please let me know.


[1]

Oct 30 15:11:25 redsun117q unknown: run fstests generic/774 at 2025-10-30 15:11:25
Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
Oct 30 15:11:27 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Mounting V5 Filesystem f93350d1-9b73-448c-bca2-b5b69343922f
Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Ending clean mount
Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Unmounting Filesystem f93350d1-9b73-448c-bca2-b5b69343922f
Oct 30 15:11:29 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
Oct 30 15:11:29 redsun117q kernel: XFS (sdh): Mounting V5 Filesystem 55534b79-27e6-4ded-82e3-5c249c68cb4a
Oct 30 15:11:29 redsun117q kernel: XFS (sdh): Ending clean mount
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/0:0:9 blocked for more than 122 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/0:0     state:D stack:0     pid:9     tgid:9     ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __lock_release.isra.0+0x59/0x170
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/0:0:9 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:0:45 blocked for more than 122 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/1:0     state:D stack:0     pid:45    tgid:45    ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entities+0x24b/0x1530
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? schedule+0x1cc/0x250
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __lock_release.isra.0+0x59/0x170
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:0:45 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:0:105 blocked for more than 123 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/13:0    state:D stack:0     pid:105   tgid:105   ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __lock_release.isra.0+0x59/0x170
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:0:105 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:1:189 blocked for more than 123 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/1:1     state:D stack:0     pid:189   tgid:189   ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? preempt_schedule_notrace+0x53/0x90
Oct 30 15:33:37 redsun117q kernel:  ? schedule+0xfe/0x250
Oct 30 15:33:37 redsun117q kernel:  ? rcu_is_watching+0x67/0x80
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x482/0x1df0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __try_to_del_timer_sync+0xd7/0x130
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:1:189 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:1:204 blocked for more than 123 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/13:1    state:D stack:0     pid:204   tgid:204   ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entities+0x24b/0x1530
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __kthread_parkme+0xb3/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:1:204 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:1:261 blocked for more than 123 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/2:1     state:D stack:0     pid:261   tgid:261   ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? __kasan_slab_alloc+0x7e/0x90
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x482/0x1df0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __kthread_parkme+0xb3/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:1:261 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/12:4:352 blocked for more than 123 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/12:4    state:D stack:0     pid:352   tgid:352   ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? kick_pool+0x1a5/0x860
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/12:4:352 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/3:2:545 blocked for more than 123 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/3:2     state:D stack:0     pid:545   tgid:545   ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/3:2:545 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:2:549 blocked for more than 123 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/2:2     state:D stack:0     pid:549   tgid:549   ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:2:549 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/6:2:557 blocked for more than 123 seconds.
Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 15:33:37 redsun117q kernel: task:kworker/6:2     state:D stack:0     pid:557   tgid:557   ppid:2      task_flags:0x4248060 flags:0x00080000
Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
Oct 30 15:33:37 redsun117q kernel: Call Trace:
Oct 30 15:33:37 redsun117q kernel:  <TASK>
Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 15:33:37 redsun117q kernel:  </TASK>
Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/6:2:557 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
Oct 30 15:33:37 redsun117q kernel: Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
Oct 30 15:33:37 redsun117q kernel: INFO: lockdep is turned off.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-10-30  8:45 [bug report] fstests generic/774 hang Shinichiro Kawasaki
@ 2025-11-05  0:33 ` Darrick J. Wong
  2025-11-05  2:19   ` Shinichiro Kawasaki
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-05  0:33 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: linux-xfs@vger.kernel.org, John Garry, ojaswin

[add jogarry/ojaswin since this is a new atomic writes test]

On Thu, Oct 30, 2025 at 08:45:05AM +0000, Shinichiro Kawasaki wrote:
> I observe the fstests test case generic/774 hangs, when I run it for xfs on 8GiB
> TCMU fileio devices. It was observed with v6.17 and v6.18-rcX kernel versions.
> FYI, here I attach the kernel message log that was taken with v6.18-rc3 kernel
> [1]. The hang is recreated in stable manner by repeating the test case a few
> times in my environment.
> 
> Actions for fix will be appreciated. If I can do any help, please let me know.

I wonder, does your disk support atomic writes or are we just using the
software fallback in xfs?
> 
> [1]
> 
> Oct 30 15:11:25 redsun117q unknown: run fstests generic/774 at 2025-10-30 15:11:25
> Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
> Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
> Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
> Oct 30 15:11:27 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05

My guess is the disk doesn't support atomic writes?

--D

> Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Mounting V5 Filesystem f93350d1-9b73-448c-bca2-b5b69343922f
> Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Ending clean mount
> Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Unmounting Filesystem f93350d1-9b73-448c-bca2-b5b69343922f
> Oct 30 15:11:29 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
> Oct 30 15:11:29 redsun117q kernel: XFS (sdh): Mounting V5 Filesystem 55534b79-27e6-4ded-82e3-5c249c68cb4a
> Oct 30 15:11:29 redsun117q kernel: XFS (sdh): Ending clean mount
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/0:0:9 blocked for more than 122 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/0:0     state:D stack:0     pid:9     tgid:9     ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
> Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __lock_release.isra.0+0x59/0x170
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/0:0:9 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:0:45 blocked for more than 122 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/1:0     state:D stack:0     pid:45    tgid:45    ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
> Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entities+0x24b/0x1530
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? schedule+0x1cc/0x250
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __lock_release.isra.0+0x59/0x170
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:0:45 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:0:105 blocked for more than 123 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/13:0    state:D stack:0     pid:105   tgid:105   ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __lock_release.isra.0+0x59/0x170
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:0:105 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:1:189 blocked for more than 123 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/1:1     state:D stack:0     pid:189   tgid:189   ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? preempt_schedule_notrace+0x53/0x90
> Oct 30 15:33:37 redsun117q kernel:  ? schedule+0xfe/0x250
> Oct 30 15:33:37 redsun117q kernel:  ? rcu_is_watching+0x67/0x80
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x482/0x1df0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __try_to_del_timer_sync+0xd7/0x130
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:1:189 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:1:204 blocked for more than 123 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/13:1    state:D stack:0     pid:204   tgid:204   ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
> Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entities+0x24b/0x1530
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __kthread_parkme+0xb3/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:1:204 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:1:261 blocked for more than 123 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/2:1     state:D stack:0     pid:261   tgid:261   ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? __kasan_slab_alloc+0x7e/0x90
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x482/0x1df0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __kthread_parkme+0xb3/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:1:261 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/12:4:352 blocked for more than 123 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/12:4    state:D stack:0     pid:352   tgid:352   ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? kick_pool+0x1a5/0x860
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/12:4:352 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/3:2:545 blocked for more than 123 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/3:2     state:D stack:0     pid:545   tgid:545   ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
> Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/3:2:545 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:2:549 blocked for more than 123 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/2:2     state:D stack:0     pid:549   tgid:549   ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? rwsem_optimistic_spin+0x1d1/0x430
> Oct 30 15:33:37 redsun117q kernel:  ? do_raw_spin_lock+0x128/0x270
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_do_raw_spin_lock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:2:549 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/6:2:557 blocked for more than 123 seconds.
> Oct 30 15:33:37 redsun117q kernel:       Tainted: G        W           6.18.0-rc3-kts #3
> Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 15:33:37 redsun117q kernel: task:kworker/6:2     state:D stack:0     pid:557   tgid:557   ppid:2      task_flags:0x4248060 flags:0x00080000
> Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work
> Oct 30 15:33:37 redsun117q kernel: Call Trace:
> Oct 30 15:33:37 redsun117q kernel:  <TASK>
> Oct 30 15:33:37 redsun117q kernel:  __schedule+0x8bb/0x1ab0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_osq_unlock+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___schedule+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  schedule+0xd1/0x250
> Oct 30 15:33:37 redsun117q kernel:  schedule_preempt_disabled+0x15/0x30
> Oct 30 15:33:37 redsun117q kernel:  rwsem_down_write_slowpath+0x4c6/0x1320
> Oct 30 15:33:37 redsun117q kernel:  ? lock_release+0xcb/0x110
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? percpu_counter_add_batch+0x80/0x220
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx___might_resched+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  down_write_nested+0x1c4/0x1f0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_down_write_nested+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? dequeue_entity+0x33e/0x1df0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? update_load_avg+0x226/0x2200
> Oct 30 15:33:37 redsun117q kernel:  ? kvm_sched_clock_read+0x11/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock+0x10/0x30
> Oct 30 15:33:37 redsun117q kernel:  ? sched_clock_cpu+0x69/0x5a0
> Oct 30 15:33:37 redsun117q kernel:  xfs_dio_write_end_io+0x555/0x7c0 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs]
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete+0x13e/0x8d0
> Oct 30 15:33:37 redsun117q kernel:  ? trace_hardirqs_on+0x18/0x150
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_aio_complete_rw+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  iomap_dio_complete_work+0x58/0x90
> Oct 30 15:33:37 redsun117q kernel:  process_one_work+0x86b/0x14c0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_process_one_work+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> Oct 30 15:33:37 redsun117q kernel:  ? assign_work+0x156/0x390
> Oct 30 15:33:37 redsun117q kernel:  worker_thread+0x5f2/0xfd0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_worker_thread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  kthread+0x3a4/0x760
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? lock_acquire+0xf6/0x140
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork+0x2d6/0x3e0
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ? __pfx_kthread+0x10/0x10
> Oct 30 15:33:37 redsun117q kernel:  ret_from_fork_asm+0x1a/0x30
> Oct 30 15:33:37 redsun117q kernel:  </TASK>
> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/6:2:557 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer>
> Oct 30 15:33:37 redsun117q kernel: Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
> Oct 30 15:33:37 redsun117q kernel: INFO: lockdep is turned off.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-05  0:33 ` Darrick J. Wong
@ 2025-11-05  2:19   ` Shinichiro Kawasaki
  2025-11-05  8:52     ` John Garry
  0 siblings, 1 reply; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-11-05  2:19 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs@vger.kernel.org, John Garry, ojaswin@linux.ibm.com

On Nov 04, 2025 / 16:33, Darrick J. Wong wrote:
> [add jogarry/ojaswin since this is a new atomic writes test]
> 
> On Thu, Oct 30, 2025 at 08:45:05AM +0000, Shinichiro Kawasaki wrote:
> > I observe the fstests test case generic/774 hangs, when I run it for xfs on 8GiB
> > TCMU fileio devices. It was observed with v6.17 and v6.18-rcX kernel versions.
> > FYI, here I attach the kernel message log that was taken with v6.18-rc3 kernel
> > [1]. The hang is recreated in stable manner by repeating the test case a few
> > times in my environment.
> > 
> > Actions for fix will be appreciated. If I can do any help, please let me know.
> 
> I wonder, does your disk support atomic writes or are we just using the
> software fallback in xfs?

I don't think the disk supports atomic writes. It is just a regular TCMU device,
and its atomic write related sysfs attributes have value 0:

  $ grep -rne . /sys/block/sdh/queue/ | grep atomic
  /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0
  /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0
  /sys/block/sdh/queue/atomic_write_max_bytes:1:0
  /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0

FYI, I attach the all sysfs queue attribute values of the device [2].

> > 
> > [1]
> > 
> > Oct 30 15:11:25 redsun117q unknown: run fstests generic/774 at 2025-10-30 15:11:25
> > Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
> > Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
> > Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
> > Oct 30 15:11:27 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05
> 
> My guess is the disk doesn't support atomic writes?

The "MODE SENSE: unimplemented page/subpage" messages are reported to all other
test cases, like this:

  [495623.282810][T29013] run fstests generic/001 at 2025-11-05 11:10:33
  [495623.377143][T27961] MODE SENSE: unimplemented page/subpage: 0x0a/0x05
  [495623.650270][T28145] MODE SENSE: unimplemented page/subpage: 0x0a/0x05
  [495623.683842][T28157] MODE SENSE: unimplemented page/subpage: 0x0a/0x05
  [495660.733929][T32362] TARGET_CORE[loopback]: Expected Transfer Length: 0 does not match SCSI CDB Length: 512 for SAM Opcode: 0x8f
  [495662.073182][T32548] XFS (sdg): Unmounting Filesystem 16ee26f7-5a36-4e84-a6b9-04d076522519
  [495662.170053][T28145] MODE SENSE: unimplemented page/subpage: 0x0a/0x05
  [495662.439897][T32792] XFS (sdg): Mounting V5 Filesystem 16ee26f7-5a36-4e84-a6b9-04d076522519
  [495662.459341][T32792] XFS (sdg): Ending clean mount
  [495662.886657][T32833] XFS (sdg): Unmounting Filesystem 16ee26f7-5a36-4e84-a6b9-04d076522519

So I think the messages are irrelevant, probably.


[2] test target device sysfs queue attributes

$ grep -rne . /sys/block/sdh/queue/
/sys/block/sdh/queue/io_poll_delay:1:-1
/sys/block/sdh/queue/max_integrity_segments:1:65535
/sys/block/sdh/queue/zoned:1:none
/sys/block/sdh/queue/scheduler:1:none mq-deadline kyber [bfq]
/sys/block/sdh/queue/io_poll:1:0
/sys/block/sdh/queue/discard_zeroes_data:1:0
/sys/block/sdh/queue/minimum_io_size:1:512
/sys/block/sdh/queue/nr_zones:1:0
/sys/block/sdh/queue/write_same_max_bytes:1:0
/sys/block/sdh/queue/max_segments:1:256
/sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0
/sys/block/sdh/queue/atomic_write_boundary_bytes:1:0
/sys/block/sdh/queue/dax:1:0
/sys/block/sdh/queue/dma_alignment:1:3
/sys/block/sdh/queue/physical_block_size:1:512
/sys/block/sdh/queue/logical_block_size:1:512
/sys/block/sdh/queue/virt_boundary_mask:1:0
/sys/block/sdh/queue/zone_append_max_bytes:1:0
/sys/block/sdh/queue/io_timeout:1:30000
/sys/block/sdh/queue/nr_requests:1:256
/sys/block/sdh/queue/write_stream_granularity:1:0
/sys/block/sdh/queue/iostats_passthrough:1:0
/sys/block/sdh/queue/write_cache:1:write back
/sys/block/sdh/queue/stable_writes:1:0
/sys/block/sdh/queue/max_segment_size:1:65536
/sys/block/sdh/queue/max_write_streams:1:0
/sys/block/sdh/queue/write_zeroes_unmap_max_bytes:1:0
/sys/block/sdh/queue/rotational:1:1
/sys/block/sdh/queue/discard_max_bytes:1:0
/sys/block/sdh/queue/write_zeroes_unmap_max_hw_bytes:1:0
/sys/block/sdh/queue/atomic_write_max_bytes:1:0
/sys/block/sdh/queue/add_random:1:1
/sys/block/sdh/queue/discard_max_hw_bytes:1:0
/sys/block/sdh/queue/optimal_io_size:1:8388608
/sys/block/sdh/queue/chunk_sectors:1:0
/sys/block/sdh/queue/iosched/fifo_expire_async:1:250
/sys/block/sdh/queue/iosched/back_seek_penalty:1:2
/sys/block/sdh/queue/iosched/timeout_sync:1:125
/sys/block/sdh/queue/iosched/back_seek_max:1:16384
/sys/block/sdh/queue/iosched/low_latency:1:1
/sys/block/sdh/queue/iosched/strict_guarantees:1:0
/sys/block/sdh/queue/iosched/slice_idle_us:1:8000
/sys/block/sdh/queue/iosched/fifo_expire_sync:1:125
/sys/block/sdh/queue/iosched/slice_idle:1:8
/sys/block/sdh/queue/iosched/max_budget:1:0
/sys/block/sdh/queue/read_ahead_kb:1:16384
/sys/block/sdh/queue/max_discard_segments:1:1
/sys/block/sdh/queue/write_zeroes_max_bytes:1:0
/sys/block/sdh/queue/nomerges:1:0
/sys/block/sdh/queue/zone_write_granularity:1:0
/sys/block/sdh/queue/wbt_lat_usec:1:0
/sys/block/sdh/queue/fua:1:1
/sys/block/sdh/queue/discard_granularity:1:0
/sys/block/sdh/queue/rq_affinity:1:1
/sys/block/sdh/queue/max_sectors_kb:1:8192
/sys/block/sdh/queue/hw_sector_size:1:512
/sys/block/sdh/queue/max_hw_sectors_kb:1:32767
/sys/block/sdh/queue/iostats:1:1
/sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-05  2:19   ` Shinichiro Kawasaki
@ 2025-11-05  8:52     ` John Garry
  2025-11-05 10:39       ` John Garry
  0 siblings, 1 reply; 23+ messages in thread
From: John Garry @ 2025-11-05  8:52 UTC (permalink / raw)
  To: Shinichiro Kawasaki, Darrick J. Wong
  Cc: linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

On 05/11/2025 02:19, Shinichiro Kawasaki wrote:
> On Nov 04, 2025 / 16:33, Darrick J. Wong wrote:
>> [add jogarry/ojaswin since this is a new atomic writes test]
>>
>> On Thu, Oct 30, 2025 at 08:45:05AM +0000, Shinichiro Kawasaki wrote:
>>> I observe the fstests test case generic/774 hangs, when I run it for xfs on 8GiB
>>> TCMU fileio devices. It was observed with v6.17 and v6.18-rcX kernel versions.
>>> FYI, here I attach the kernel message log that was taken with v6.18-rc3 kernel
>>> [1]. The hang is recreated in stable manner by repeating the test case a few
>>> times in my environment.
>>>
>>> Actions for fix will be appreciated. If I can do any help, please let me know.
>>
>> I wonder, does your disk support atomic writes or are we just using the
>> software fallback in xfs?
> 
> I don't think the disk supports atomic writes. It is just a regular TCMU device,
> and its atomic write related sysfs attributes have value 0:
> 
>    $ grep -rne . /sys/block/sdh/queue/ | grep atomic
>    /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0
>    /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0
>    /sys/block/sdh/queue/atomic_write_max_bytes:1:0
>    /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0
> 
> FYI, I attach the all sysfs queue attribute values of the device [2].

Yes, this would only be using software-based atomic writes.

Shinichiro, do the other atomic writes tests run ok, like 775, 767? You 
can check group "atomicwrites" to know which tests they are.

774 is the fio test.

Some things to try:
- use a physical disk for the TEST_DEV
- Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to 
reduce $threads to a low value, say, 2
- trying turning on XFS_DEBUG config

BTW, Darrick has posted some xfs atomics fixes @ 
https://lore.kernel.org/linux-xfs/20251105001200.GV196370@frogsfrogsfrogs/T/#t. 
I doubt that they will help this, but worth trying.

I will try to recreate.

Thanks,
John

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-05  8:52     ` John Garry
@ 2025-11-05 10:39       ` John Garry
  2025-11-05 11:29         ` John Garry
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: John Garry @ 2025-11-05 10:39 UTC (permalink / raw)
  To: Shinichiro Kawasaki, Darrick J. Wong
  Cc: linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

On 05/11/2025 08:52, John Garry wrote:
>> I don't think the disk supports atomic writes. It is just a regular 
>> TCMU device,
>> and its atomic write related sysfs attributes have value 0:
>>
>>    $ grep -rne . /sys/block/sdh/queue/ | grep atomic
>>    /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0
>>    /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0
>>    /sys/block/sdh/queue/atomic_write_max_bytes:1:0
>>    /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0
>>
>> FYI, I attach the all sysfs queue attribute values of the device [2].
> 
> Yes, this would only be using software-based atomic writes.
> 
> Shinichiro, do the other atomic writes tests run ok, like 775, 767? You 
> can check group "atomicwrites" to know which tests they are.
> 
> 774 is the fio test.
> 
> Some things to try:
> - use a physical disk for the TEST_DEV
> - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to 
> reduce $threads to a low value, say, 2
> - trying turning on XFS_DEBUG config
> 
> BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/ 
> v3/__https://lore.kernel.org/linux- 
> xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! 
> IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$ . I doubt that they will help this, but worth trying.
> 
> I will try to recreate.

I tested this and the filesize which we try to write is huge, like 3.3G 
in my case. That seems excessive.

The calc comes from the following in 774:

filesize=$((aw_bsize * threads * 100))

aw_bsize for  me is 1M, and threads is 32

aw_bsize is large as XFS supports software-based atomics, which is 
generally going to be huge compared to anything which HW can support.

When I tried to run this test, it was not completing in a sane amount of 
time - it was taking many minutes before I gave up.

@shinichiro, please try this:

--- a/tests/generic/774
+++ b/tests/generic/774
@@ -29,7 +29,7 @@ aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))")
  fsbsize=$(_get_block_size $SCRATCH_MNT)

  threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100")
-filesize=$((aw_bsize * threads * 100))
+filesize=$((aw_bsize * threads))
  depth=$threads
  aw_io_size=$((filesize / threads))
  aw_io_inc=$aw_io_size
ubuntu@jgarry-instance-20240626-1657-xfs-ubuntu:~/xfstests-dev$


Note, I ran with this change and the test now completes, but I get this:

+fio: failed initializing LFSR
     +fio: failed initializing LFSR
     +fio: failed initializing LFSR
     +fio: failed initializing LFSR
     +verify: bad magic header 0, wanted acca at file 
/home/ubuntu/mnt/scratch/test-file offset 0, length 1048576 (requested 
block: offset=0, length=1048576)
     +verify: bad magic header e3d6, wanted acca at file 
/home/ubuntu/mnt/scratch/test-file offset 8388608, length 1048576 
(requested block: offset=8388608, length=1048576)

I need to check that fio complaint.

Thanks,
John

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-05 10:39       ` John Garry
@ 2025-11-05 11:29         ` John Garry
  2025-11-05 12:37         ` Shinichiro Kawasaki
  2025-11-09 11:58         ` Ojaswin Mujoo
  2 siblings, 0 replies; 23+ messages in thread
From: John Garry @ 2025-11-05 11:29 UTC (permalink / raw)
  To: Shinichiro Kawasaki, Darrick J. Wong
  Cc: linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

On 05/11/2025 10:39, John Garry wrote:
> 
> +fio: failed initializing LFSR
>      +fio: failed initializing LFSR
>      +fio: failed initializing LFSR
>      +fio: failed initializing LFSR
>      +verify: bad magic header 0, wanted acca at file /home/ubuntu/mnt/ 
> scratch/test-file offset 0, length 1048576 (requested block: offset=0, 
> length=1048576)
>      +verify: bad magic header e3d6, wanted acca at file /home/ubuntu/ 
> mnt/scratch/test-file offset 8388608, length 1048576 (requested block: 
> offset=8388608, length=1048576)
> 
> I need to check that fio complaint.

This issue goes away when I stop using lfsr, i.e. the test passes.

The problem is that lfsr init in fio does not have enough "blocks", and 
this comes from how the fio bs is same as the increment aw_io_inc, both 
1M in my case. I think that aw_io_inc needs to be much lager than bs.

BTW, I think that the random number gen fio param is only relevant in 
fio write mode. It seems to be even set in 774 for verify read.

Thanks,
John

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-05 10:39       ` John Garry
  2025-11-05 11:29         ` John Garry
@ 2025-11-05 12:37         ` Shinichiro Kawasaki
  2025-11-06  8:19           ` Shinichiro Kawasaki
  2025-11-09 11:58         ` Ojaswin Mujoo
  2 siblings, 1 reply; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-11-05 12:37 UTC (permalink / raw)
  To: John Garry
  Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

On Nov 05, 2025 / 10:39, John Garry wrote:
> On 05/11/2025 08:52, John Garry wrote:
> > > I don't think the disk supports atomic writes. It is just a regular
> > > TCMU device,
> > > and its atomic write related sysfs attributes have value 0:
> > > 
> > >    $ grep -rne . /sys/block/sdh/queue/ | grep atomic
> > >    /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0
> > >    /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0
> > >    /sys/block/sdh/queue/atomic_write_max_bytes:1:0
> > >    /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0
> > > 
> > > FYI, I attach the all sysfs queue attribute values of the device [2].
> > 
> > Yes, this would only be using software-based atomic writes.
> > 
> > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You
> > can check group "atomicwrites" to know which tests they are.
> > 
> > 774 is the fio test.
> > 
> > Some things to try:
> > - use a physical disk for the TEST_DEV
> > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to
> > reduce $threads to a low value, say, 2
> > - trying turning on XFS_DEBUG config
> > 
> > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/
> > v3/__https://lore.kernel.org/linux-
> > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$
> > . I doubt that they will help this, but worth trying.

John, thank you for looking into this. Tomorrow, I will do some trials based on
your comments above.

Today, I have just done a quick try with the change below you suggested.

> > 
> > I will try to recreate.
> 
> I tested this and the filesize which we try to write is huge, like 3.3G in
> my case. That seems excessive.
> 
> The calc comes from the following in 774:
> 
> filesize=$((aw_bsize * threads * 100))
> 
> aw_bsize for  me is 1M, and threads is 32
> 
> aw_bsize is large as XFS supports software-based atomics, which is generally
> going to be huge compared to anything which HW can support.
> 
> When I tried to run this test, it was not completing in a sane amount of
> time - it was taking many minutes before I gave up.
> 
> @shinichiro, please try this:
> 
> --- a/tests/generic/774
> +++ b/tests/generic/774
> @@ -29,7 +29,7 @@ aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))")
>  fsbsize=$(_get_block_size $SCRATCH_MNT)
> 
>  threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100")
> -filesize=$((aw_bsize * threads * 100))
> +filesize=$((aw_bsize * threads))
>  depth=$threads
>  aw_io_size=$((filesize / threads))
>  aw_io_inc=$aw_io_size
> ubuntu@jgarry-instance-20240626-1657-xfs-ubuntu:~/xfstests-dev$

With the change above, the test case g774 completed less than a miniute on my
test node. No kernel INFO/WARN/BUG.

> 
> 
> Note, I ran with this change and the test now completes, but I get this:
> 
> +fio: failed initializing LFSR
>     +fio: failed initializing LFSR
>     +fio: failed initializing LFSR
>     +fio: failed initializing LFSR
>     +verify: bad magic header 0, wanted acca at file
> /home/ubuntu/mnt/scratch/test-file offset 0, length 1048576 (requested
> block: offset=0, length=1048576)
>     +verify: bad magic header e3d6, wanted acca at file
> /home/ubuntu/mnt/scratch/test-file offset 8388608, length 1048576 (requested
> block: offset=8388608, length=1048576)
> 
> I need to check that fio complaint.

I also saw the fio error messages.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-05 12:37         ` Shinichiro Kawasaki
@ 2025-11-06  8:19           ` Shinichiro Kawasaki
  2025-11-06  8:53             ` John Garry
  2025-11-09 12:02             ` Ojaswin Mujoo
  0 siblings, 2 replies; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-11-06  8:19 UTC (permalink / raw)
  To: John Garry
  Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

On Nov 05, 2025 / 21:37, Shin'ichiro Kawasaki wrote:
> On Nov 05, 2025 / 10:39, John Garry wrote:
> > On 05/11/2025 08:52, John Garry wrote:
> > > > I don't think the disk supports atomic writes. It is just a regular
> > > > TCMU device,
> > > > and its atomic write related sysfs attributes have value 0:
> > > > 
> > > >    $ grep -rne . /sys/block/sdh/queue/ | grep atomic
> > > >    /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0
> > > >    /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0
> > > >    /sys/block/sdh/queue/atomic_write_max_bytes:1:0
> > > >    /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0
> > > > 
> > > > FYI, I attach the all sysfs queue attribute values of the device [2].
> > > 
> > > Yes, this would only be using software-based atomic writes.
> > > 
> > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You
> > > can check group "atomicwrites" to know which tests they are.
> > > 
> > > 774 is the fio test.

I tried the other "atomicwrites" test. I found g778 took very long time.
I think it implies that g778 may have similar problem as g774.

  g765: [not run] write atomic not supported by this block device
  g767: 11s
  g768: 13s
  g769: 13s
  g770: 35s
  g773: [not run] write atomic not supported by this block device
  g774: did not completed after 3 hours run (and kernel reported the INFO messages)
  g775: 48s
  g776: [not run] write atomic not supported by this block device
  g778: did not completed after 50 minutes run
  x838: [not run] External volumes not in use, skipped this test
  x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG
  x840: [not run] write atomic not supported by this block device

> > > 
> > > Some things to try:
> > > - use a physical disk for the TEST_DEV

I tried using a real HDD for TEST_DEV, but still observed the hang and INFO
messages at g774.

> > > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to
> > > reduce $threads to a low value, say, 2

I do not set LOAD_FACTOR. I changed g775 script to set threads=2, then the
test case completed quickly, within a few minutes. I'm suspecting that this
short test time might hide the hang/INFO problem.

> > > - trying turning on XFS_DEBUG config

I turned on XFS_DEBUG, and still observed the hang and the INFO messages.

> > > 
> > > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/
> > > v3/__https://lore.kernel.org/linux-
> > > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$
> > > . I doubt that they will help this, but worth trying.

I have not yet tried this. Will try it tomorrow.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-06  8:19           ` Shinichiro Kawasaki
@ 2025-11-06  8:53             ` John Garry
  2025-11-07  2:27               ` Shinichiro Kawasaki
  2025-11-09 12:02             ` Ojaswin Mujoo
  1 sibling, 1 reply; 23+ messages in thread
From: John Garry @ 2025-11-06  8:53 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

>>>>
>>>> Shinichiro, do the other atomic writes tests run ok, like 775, 767? You
>>>> can check group "atomicwrites" to know which tests they are.
>>>>
>>>> 774 is the fio test.
> 
> I tried the other "atomicwrites" test. I found g778 took very long time.
> I think it implies that g778 may have similar problem as g774.
> 
>    g765: [not run] write atomic not supported by this block device
>    g767: 11s
>    g768: 13s
>    g769: 13s
>    g770: 35s
>    g773: [not run] write atomic not supported by this block device
>    g774: did not completed after 3 hours run (and kernel reported the INFO messages)
>    g775: 48s
>    g776: [not run] write atomic not supported by this block device
>    g778: did not completed after 50 minutes run
>    x838: [not run] External volumes not in use, skipped this test
>    x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG
>    x840: [not run] write atomic not supported by this block device

This is testing software-based atomic writes, and they are just slow. 
Very slow, relative to HW-based atomic writes. And having bs=1M will 
make things worse, as we are locking out other threads for longer (when 
doing the write). So I think that we should limit the file size which we 
try to write.

> 
>>>>
>>>> Some things to try:
>>>> - use a physical disk for the TEST_DEV
> 
> I tried using a real HDD for TEST_DEV, but still observed the hang and INFO
> messages at g774.
> 
>>>> - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to
>>>> reduce $threads to a low value, say, 2
> 
> I do not set LOAD_FACTOR. I changed g775 script to set threads=2, then the
> test case completed quickly, within a few minutes. I'm suspecting that this
> short test time might hide the hang/INFO problem.
> 
>>>> - trying turning on XFS_DEBUG config
> 
> I turned on XFS_DEBUG, and still observed the hang and the INFO messages.
> 

I don't think that this will help.

>>>>
>>>> BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/
>>>> v3/__https://urldefense.com/v3/__https://lore.kernel.org/linux-__;!!ACWV5N9M2RV99hQ!J3HKTWLF8Qx-j42OOJ4o1YAttSSoqOCm9ymJtisUYoOtGgOyNNGqHnjjl1Zd9DQXJvCz8zqPMG-kgeVdo9MQuupMlcAo$
>>>> xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$
>>>> . I doubt that they will help this, but worth trying.
> 
> I have not yet tried this. Will try it tomorrow.

Nor this.

Having a hang - even for the conditions set - should not produce a hang. 
I can check on whether we can improve the software-based atomic writes 
in xfs to avoid this.

Thanks,
John


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-06  8:53             ` John Garry
@ 2025-11-07  2:27               ` Shinichiro Kawasaki
  2025-11-07  4:28                 ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-11-07  2:27 UTC (permalink / raw)
  To: John Garry
  Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

On Nov 06, 2025 / 08:53, John Garry wrote:
> > > > > 
> > > > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You
> > > > > can check group "atomicwrites" to know which tests they are.
> > > > > 
> > > > > 774 is the fio test.
> > 
> > I tried the other "atomicwrites" test. I found g778 took very long time.
> > I think it implies that g778 may have similar problem as g774.
> > 
> >    g765: [not run] write atomic not supported by this block device
> >    g767: 11s
> >    g768: 13s
> >    g769: 13s
> >    g770: 35s
> >    g773: [not run] write atomic not supported by this block device
> >    g774: did not completed after 3 hours run (and kernel reported the INFO messages)
> >    g775: 48s
> >    g776: [not run] write atomic not supported by this block device
> >    g778: did not completed after 50 minutes run
> >    x838: [not run] External volumes not in use, skipped this test
> >    x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG
> >    x840: [not run] write atomic not supported by this block device
> 
> This is testing software-based atomic writes, and they are just slow. Very
> slow, relative to HW-based atomic writes. And having bs=1M will make things
> worse, as we are locking out other threads for longer (when doing the
> write).

I see, thanks for the explanation.

> So I think that we should limit the file size which we try to write.

This sounds reasonable, and it will make fstests run maintenance work easier.

> 
> > 
> > > > > 
> > > > > Some things to try:
> > > > > - use a physical disk for the TEST_DEV
> > 
> > I tried using a real HDD for TEST_DEV, but still observed the hang and INFO
> > messages at g774.
> > 
> > > > > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to
> > > > > reduce $threads to a low value, say, 2
> > 
> > I do not set LOAD_FACTOR. I changed g775 script to set threads=2, then the
> > test case completed quickly, within a few minutes. I'm suspecting that this
> > short test time might hide the hang/INFO problem.
> > 
> > > > > - trying turning on XFS_DEBUG config
> > 
> > I turned on XFS_DEBUG, and still observed the hang and the INFO messages.
> > 
> 
> I don't think that this will help.
> 
> > > > > 
> > > > > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/
> > > > > v3/__https://urldefense.com/v3/__https://lore.kernel.org/linux-__;!!ACWV5N9M2RV99hQ!J3HKTWLF8Qx-j42OOJ4o1YAttSSoqOCm9ymJtisUYoOtGgOyNNGqHnjjl1Zd9DQXJvCz8zqPMG-kgeVdo9MQuupMlcAo$
> > > > > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$
> > > > > . I doubt that they will help this, but worth trying.
> > 
> > I have not yet tried this. Will try it tomorrow.
> 
> Nor this.

I confirmed it. I applied the patches to v6.18-rc4 kernel. With this kernel, the
hang and the INFO messages are recreated.

> 
> Having a hang - even for the conditions set - should not produce a hang. I
> can check on whether we can improve the software-based atomic writes in xfs
> to avoid this.

Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
test node and share.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-07  2:27               ` Shinichiro Kawasaki
@ 2025-11-07  4:28                 ` Darrick J. Wong
  2025-11-07  5:53                   ` Shinichiro Kawasaki
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-07  4:28 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: John Garry, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote:
> On Nov 06, 2025 / 08:53, John Garry wrote:
> > > > > > 
> > > > > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You
> > > > > > can check group "atomicwrites" to know which tests they are.
> > > > > > 
> > > > > > 774 is the fio test.
> > > 
> > > I tried the other "atomicwrites" test. I found g778 took very long time.
> > > I think it implies that g778 may have similar problem as g774.
> > > 
> > >    g765: [not run] write atomic not supported by this block device
> > >    g767: 11s
> > >    g768: 13s
> > >    g769: 13s
> > >    g770: 35s
> > >    g773: [not run] write atomic not supported by this block device
> > >    g774: did not completed after 3 hours run (and kernel reported the INFO messages)
> > >    g775: 48s
> > >    g776: [not run] write atomic not supported by this block device
> > >    g778: did not completed after 50 minutes run
> > >    x838: [not run] External volumes not in use, skipped this test
> > >    x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG
> > >    x840: [not run] write atomic not supported by this block device
> > 
> > This is testing software-based atomic writes, and they are just slow. Very
> > slow, relative to HW-based atomic writes. And having bs=1M will make things
> > worse, as we are locking out other threads for longer (when doing the
> > write).
> 
> I see, thanks for the explanation.
> 
> > So I think that we should limit the file size which we try to write.
> 
> This sounds reasonable, and it will make fstests run maintenance work easier.
> 
> > 
> > > 
> > > > > > 
> > > > > > Some things to try:
> > > > > > - use a physical disk for the TEST_DEV
> > > 
> > > I tried using a real HDD for TEST_DEV, but still observed the hang and INFO
> > > messages at g774.
> > > 
> > > > > > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to
> > > > > > reduce $threads to a low value, say, 2
> > > 
> > > I do not set LOAD_FACTOR. I changed g775 script to set threads=2, then the
> > > test case completed quickly, within a few minutes. I'm suspecting that this
> > > short test time might hide the hang/INFO problem.
> > > 
> > > > > > - trying turning on XFS_DEBUG config
> > > 
> > > I turned on XFS_DEBUG, and still observed the hang and the INFO messages.
> > > 
> > 
> > I don't think that this will help.
> > 
> > > > > > 
> > > > > > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/
> > > > > > v3/__https://urldefense.com/v3/__https://lore.kernel.org/linux-__;!!ACWV5N9M2RV99hQ!J3HKTWLF8Qx-j42OOJ4o1YAttSSoqOCm9ymJtisUYoOtGgOyNNGqHnjjl1Zd9DQXJvCz8zqPMG-kgeVdo9MQuupMlcAo$
> > > > > > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$
> > > > > > . I doubt that they will help this, but worth trying.
> > > 
> > > I have not yet tried this. Will try it tomorrow.
> > 
> > Nor this.
> 
> I confirmed it. I applied the patches to v6.18-rc4 kernel. With this kernel, the
> hang and the INFO messages are recreated.
> 
> > 
> > Having a hang - even for the conditions set - should not produce a hang. I
> > can check on whether we can improve the software-based atomic writes in xfs
> > to avoid this.
> 
> Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
> test node and share.

Yes, anything you can share would be helpful.  FWIW the test runs in 51
seconds here, but I only have 4 CPUs in the VM and fast storage so its
filesize is "only" 800MB.

--D

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-07  4:28                 ` Darrick J. Wong
@ 2025-11-07  5:53                   ` Shinichiro Kawasaki
  2025-11-07 12:48                     ` John Garry
  0 siblings, 1 reply; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-11-07  5:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Garry, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

[-- Attachment #1: Type: text/plain, Size: 1149 bytes --]

On Nov 06, 2025 / 20:28, Darrick J. Wong wrote:
> On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote:
> > On Nov 06, 2025 / 08:53, John Garry wrote:
...
> > > Having a hang - even for the conditions set - should not produce a hang. I
> > > can check on whether we can improve the software-based atomic writes in xfs
> > > to avoid this.
> > 
> > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
> > test node and share.
> 
> Yes, anything you can share would be helpful.

Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and
the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by
Darrick. I also attached the kernel config (_config.gz) which I used to build
the test target kernel.

> FWIW the test runs in 51
> seconds here, but I only have 4 CPUs in the VM and fast storage so its
> filesize is "only" 800MB.

FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the
test case a few times to recreate it with the 8GiB TCMU devices. When it does
not hang, the test case takes about an hour to complete.

[-- Attachment #2: dmesg.gz --]
[-- Type: application/gzip, Size: 219128 bytes --]

[-- Attachment #3: _config.gz --]
[-- Type: application/gzip, Size: 42647 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-07  5:53                   ` Shinichiro Kawasaki
@ 2025-11-07 12:48                     ` John Garry
  2025-11-07 17:50                       ` Darrick J. Wong
  2025-11-10  2:41                       ` Shinichiro Kawasaki
  0 siblings, 2 replies; 23+ messages in thread
From: John Garry @ 2025-11-07 12:48 UTC (permalink / raw)
  To: Shinichiro Kawasaki, Darrick J. Wong
  Cc: linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

On 07/11/2025 05:53, Shinichiro Kawasaki wrote:
> On Nov 06, 2025 / 20:28, Darrick J. Wong wrote:
>> On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote:
>>> On Nov 06, 2025 / 08:53, John Garry wrote:
> ...
>>>> Having a hang - even for the conditions set - should not produce a hang. I
>>>> can check on whether we can improve the software-based atomic writes in xfs
>>>> to avoid this.
>>>
>>> Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
>>> test node and share.
>>
>> Yes, anything you can share would be helpful.
> 
> Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and
> the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by
> Darrick. I also attached the kernel config (_config.gz) which I used to build
> the test target kernel.
> 
>> FWIW the test runs in 51
>> seconds here, but I only have 4 CPUs in the VM and fast storage so its
>> filesize is "only" 800MB.
> 
> FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the
> test case a few times to recreate it with the 8GiB TCMU devices. When it does
> not hang, the test case takes about an hour to complete.

Hi Shinichiro,

Can you still stop the test with ctrl^C, right?

@Darrick, I worry that there is too much ip lock contention in 
xfs_atomic_write_cow_iomap_begin(), especially since we may drop and 
re-acquire the lock (in xfs_trans_alloc_inode()). Maybe we should force 
serialization in xfs_file_dio_write_atomic(). After all, this was not 
intended to provide good performance. Or look at other ways to optimise 
this (if we do want good performance).

Thanks,
John

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-07 12:48                     ` John Garry
@ 2025-11-07 17:50                       ` Darrick J. Wong
  2025-11-07 23:18                         ` Darrick J. Wong
  2025-11-10  2:41                       ` Shinichiro Kawasaki
  1 sibling, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-07 17:50 UTC (permalink / raw)
  To: John Garry
  Cc: Shinichiro Kawasaki, linux-xfs@vger.kernel.org,
	ojaswin@linux.ibm.com

On Fri, Nov 07, 2025 at 12:48:38PM +0000, John Garry wrote:
> On 07/11/2025 05:53, Shinichiro Kawasaki wrote:
> > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote:
> > > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote:
> > > > On Nov 06, 2025 / 08:53, John Garry wrote:
> > ...
> > > > > Having a hang - even for the conditions set - should not produce a hang. I
> > > > > can check on whether we can improve the software-based atomic writes in xfs
> > > > > to avoid this.
> > > > 
> > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
> > > > test node and share.
> > > 
> > > Yes, anything you can share would be helpful.
> > 
> > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and
> > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by
> > Darrick. I also attached the kernel config (_config.gz) which I used to build
> > the test target kernel.
> > 
> > > FWIW the test runs in 51
> > > seconds here, but I only have 4 CPUs in the VM and fast storage so its
> > > filesize is "only" 800MB.
> > 
> > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the
> > test case a few times to recreate it with the 8GiB TCMU devices. When it does
> > not hang, the test case takes about an hour to complete.
> 
> Hi Shinichiro,
> 
> Can you still stop the test with ctrl^C, right?
> 
> @Darrick, I worry that there is too much ip lock contention in
> xfs_atomic_write_cow_iomap_begin(), especially since we may drop and
> re-acquire the lock (in xfs_trans_alloc_inode()). Maybe we should force
> serialization in xfs_file_dio_write_atomic(). After all, this was not
> intended to provide good performance. Or look at other ways to optimise this
> (if we do want good performance).

I don't see how that helps.  All that does is shift the lock contention
from xfs_inode::i_lock to inode::i_rwsem.  At the end of the day, this
test is starting up 2*nr_cpus threads to issue large atomic directio
writes that take a long time to complete.  Stall warnings when there are
a large number of threads all trying to directio write to a file whose
blocks require a metadata update upon IO completion are a long known
problem.

I altered my test VM to have 24 cores and enough RAM to avoid OOMing the
machine.  Setting up the mixed mappings file took 27 seconds, and the
aio writes themselves took 3:15.  Validating the contents took 4
seconds.

Maaaybe we should back off on the file size.  I don't see why it needs
to create a 5GB file for testing.  The verify runs at 2100MB/s whereas
the atomic writes plod along at 25MB/s.  That's why this test takes a
loooong time to run.

(I don't see the lfsr complaints, but I'm running fio 3.41 from git)

--D

> Thanks,
> John
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-07 17:50                       ` Darrick J. Wong
@ 2025-11-07 23:18                         ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-07 23:18 UTC (permalink / raw)
  To: John Garry
  Cc: Shinichiro Kawasaki, linux-xfs@vger.kernel.org,
	ojaswin@linux.ibm.com

On Fri, Nov 07, 2025 at 09:50:04AM -0800, Darrick J. Wong wrote:
> On Fri, Nov 07, 2025 at 12:48:38PM +0000, John Garry wrote:
> > On 07/11/2025 05:53, Shinichiro Kawasaki wrote:
> > > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote:
> > > > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote:
> > > > > On Nov 06, 2025 / 08:53, John Garry wrote:
> > > ...
> > > > > > Having a hang - even for the conditions set - should not produce a hang. I
> > > > > > can check on whether we can improve the software-based atomic writes in xfs
> > > > > > to avoid this.
> > > > > 
> > > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
> > > > > test node and share.
> > > > 
> > > > Yes, anything you can share would be helpful.
> > > 
> > > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and
> > > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by
> > > Darrick. I also attached the kernel config (_config.gz) which I used to build
> > > the test target kernel.
> > > 
> > > > FWIW the test runs in 51
> > > > seconds here, but I only have 4 CPUs in the VM and fast storage so its
> > > > filesize is "only" 800MB.
> > > 
> > > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the
> > > test case a few times to recreate it with the 8GiB TCMU devices. When it does
> > > not hang, the test case takes about an hour to complete.
> > 
> > Hi Shinichiro,
> > 
> > Can you still stop the test with ctrl^C, right?
> > 
> > @Darrick, I worry that there is too much ip lock contention in
> > xfs_atomic_write_cow_iomap_begin(), especially since we may drop and
> > re-acquire the lock (in xfs_trans_alloc_inode()). Maybe we should force
> > serialization in xfs_file_dio_write_atomic(). After all, this was not
> > intended to provide good performance. Or look at other ways to optimise this
> > (if we do want good performance).
> 
> I don't see how that helps.  All that does is shift the lock contention
> from xfs_inode::i_lock to inode::i_rwsem.  At the end of the day, this
> test is starting up 2*nr_cpus threads to issue large atomic directio
> writes that take a long time to complete.  Stall warnings when there are
> a large number of threads all trying to directio write to a file whose
> blocks require a metadata update upon IO completion are a long known
> problem.
> 
> I altered my test VM to have 24 cores and enough RAM to avoid OOMing the
> machine.  Setting up the mixed mappings file took 27 seconds, and the
> aio writes themselves took 3:15.  Validating the contents took 4
> seconds.
> 
> Maaaybe we should back off on the file size.  I don't see why it needs
> to create a 5GB file for testing.  The verify runs at 2100MB/s whereas
> the atomic writes plod along at 25MB/s.  That's why this test takes a
> loooong time to run.
> 
> (I don't see the lfsr complaints, but I'm running fio 3.41 from git)

Spoke too soon, now I'm seeing it all over the test fleet.

--D

> --D
> 
> > Thanks,
> > John
> > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-05 10:39       ` John Garry
  2025-11-05 11:29         ` John Garry
  2025-11-05 12:37         ` Shinichiro Kawasaki
@ 2025-11-09 11:58         ` Ojaswin Mujoo
  2025-11-10  8:58           ` John Garry
  2025-11-10 12:39           ` Shinichiro Kawasaki
  2 siblings, 2 replies; 23+ messages in thread
From: Ojaswin Mujoo @ 2025-11-09 11:58 UTC (permalink / raw)
  To: John Garry
  Cc: Shinichiro Kawasaki, Darrick J. Wong, linux-xfs@vger.kernel.org

On Wed, Nov 05, 2025 at 10:39:43AM +0000, John Garry wrote:
> On 05/11/2025 08:52, John Garry wrote:
> > > I don't think the disk supports atomic writes. It is just a regular
> > > TCMU device,
> > > and its atomic write related sysfs attributes have value 0:
> > > 
> > >    $ grep -rne . /sys/block/sdh/queue/ | grep atomic
> > >    /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0
> > >    /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0
> > >    /sys/block/sdh/queue/atomic_write_max_bytes:1:0
> > >    /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0
> > > 
> > > FYI, I attach the all sysfs queue attribute values of the device [2].
> > 
> > Yes, this would only be using software-based atomic writes.
> > 
> > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You
> > can check group "atomicwrites" to know which tests they are.
> > 
> > 774 is the fio test.
> > 
> > Some things to try:
> > - use a physical disk for the TEST_DEV
> > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to
> > reduce $threads to a low value, say, 2
> > - trying turning on XFS_DEBUG config
> > 
> > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/
> > v3/__https://lore.kernel.org/linux-
> > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$
> > . I doubt that they will help this, but worth trying.
> > 
> > I will try to recreate.
> 
> I tested this and the filesize which we try to write is huge, like 3.3G in
> my case. That seems excessive.
> 
> The calc comes from the following in 774:
> 
> filesize=$((aw_bsize * threads * 100))
> 
> aw_bsize for  me is 1M, and threads is 32
> 
> aw_bsize is large as XFS supports software-based atomics, which is generally
> going to be huge compared to anything which HW can support.
> 
> When I tried to run this test, it was not completing in a sane amount of
> time - it was taking many minutes before I gave up.

Hi John, Shinichiro, Darrick.

Thanks for looking into this. Sorry, I'm on vacation so a bit slow in
responding.

Anyways, the logic behind the filesize calculation is that we want each
thread to do 100 atomic writes in their own isolated ranges in the file. 
But seems like it is being especially slow when we have high CPUs.

I think in that sense, it'll be better to limit the threads itself
rather than filesize. Since its a stress test we dont want it to be too
less. Maybe:

diff --git a/tests/generic/774 b/tests/generic/774
index 7a4d7016..c68fb4b7 100755
--- a/tests/generic/774
+++ b/tests/generic/774
@@ -28,7 +28,7 @@ awu_max_write=$(_get_atomic_write_unit_max "$SCRATCH_MNT/f1")
 aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))")
 fsbsize=$(_get_block_size $SCRATCH_MNT)

-threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100")
+threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "16")
 filesize=$((aw_bsize * threads * 100))
 depth=$threads
 aw_io_size=$((filesize / threads))

Can you check if this helps? 

Regards,
ojaswin

> 
> @shinichiro, please try this:
> 
> --- a/tests/generic/774
> +++ b/tests/generic/774
> @@ -29,7 +29,7 @@ aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))")
>  fsbsize=$(_get_block_size $SCRATCH_MNT)
> 
>  threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100")
> -filesize=$((aw_bsize * threads * 100))
> +filesize=$((aw_bsize * threads))
>  depth=$threads
>  aw_io_size=$((filesize / threads))
>  aw_io_inc=$aw_io_size
> ubuntu@jgarry-instance-20240626-1657-xfs-ubuntu:~/xfstests-dev$
> 
> 
> Note, I ran with this change and the test now completes, but I get this:
> 
> +fio: failed initializing LFSR
>     +fio: failed initializing LFSR
>     +fio: failed initializing LFSR
>     +fio: failed initializing LFSR
>     +verify: bad magic header 0, wanted acca at file
> /home/ubuntu/mnt/scratch/test-file offset 0, length 1048576 (requested
> block: offset=0, length=1048576)
>     +verify: bad magic header e3d6, wanted acca at file
> /home/ubuntu/mnt/scratch/test-file offset 8388608, length 1048576 (requested
> block: offset=8388608, length=1048576)
> 
> I need to check that fio complaint.
> 
> Thanks,
> John

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-06  8:19           ` Shinichiro Kawasaki
  2025-11-06  8:53             ` John Garry
@ 2025-11-09 12:02             ` Ojaswin Mujoo
  2025-11-10 12:46               ` [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: " Shinichiro Kawasaki
  1 sibling, 1 reply; 23+ messages in thread
From: Ojaswin Mujoo @ 2025-11-09 12:02 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: John Garry, Darrick J. Wong, linux-xfs@vger.kernel.org

On Thu, Nov 06, 2025 at 08:19:12AM +0000, Shinichiro Kawasaki wrote:
> On Nov 05, 2025 / 21:37, Shin'ichiro Kawasaki wrote:
> > On Nov 05, 2025 / 10:39, John Garry wrote:
> > > On 05/11/2025 08:52, John Garry wrote:
> > > > > I don't think the disk supports atomic writes. It is just a regular
> > > > > TCMU device,
> > > > > and its atomic write related sysfs attributes have value 0:
> > > > > 
> > > > >    $ grep -rne . /sys/block/sdh/queue/ | grep atomic
> > > > >    /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0
> > > > >    /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0
> > > > >    /sys/block/sdh/queue/atomic_write_max_bytes:1:0
> > > > >    /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0
> > > > > 
> > > > > FYI, I attach the all sysfs queue attribute values of the device [2].
> > > > 
> > > > Yes, this would only be using software-based atomic writes.
> > > > 
> > > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You
> > > > can check group "atomicwrites" to know which tests they are.
> > > > 
> > > > 774 is the fio test.
> 
> I tried the other "atomicwrites" test. I found g778 took very long time.
> I think it implies that g778 may have similar problem as g774.
> 
>   g765: [not run] write atomic not supported by this block device
>   g767: 11s
>   g768: 13s
>   g769: 13s
>   g770: 35s
>   g773: [not run] write atomic not supported by this block device
>   g774: did not completed after 3 hours run (and kernel reported the INFO messages)
>   g775: 48s
>   g776: [not run] write atomic not supported by this block device
>   g778: did not completed after 50 minutes run

Hi Shinichiro

Hmm that's strange, g/778 should tune itself to the speed of the device
ideally. Will you be able to share the results/generic/778.full file.
That might give some hints.

>   x838: [not run] External volumes not in use, skipped this test
>   x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG
>   x840: [not run] write atomic not supported by this block device
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-07 12:48                     ` John Garry
  2025-11-07 17:50                       ` Darrick J. Wong
@ 2025-11-10  2:41                       ` Shinichiro Kawasaki
  1 sibling, 0 replies; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-11-10  2:41 UTC (permalink / raw)
  To: John Garry
  Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com

On Nov 07, 2025 / 12:48, John Garry wrote:
> On 07/11/2025 05:53, Shinichiro Kawasaki wrote:
> > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote:
> > > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote:
> > > > On Nov 06, 2025 / 08:53, John Garry wrote:
> > ...
> > > > > Having a hang - even for the conditions set - should not produce a hang. I
> > > > > can check on whether we can improve the software-based atomic writes in xfs
> > > > > to avoid this.
> > > > 
> > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
> > > > test node and share.
> > > 
> > > Yes, anything you can share would be helpful.
> > 
> > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and
> > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by
> > Darrick. I also attached the kernel config (_config.gz) which I used to build
> > the test target kernel.
> > 
> > > FWIW the test runs in 51
> > > seconds here, but I only have 4 CPUs in the VM and fast storage so its
> > > filesize is "only" 800MB.
> > 
> > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the
> > test case a few times to recreate it with the 8GiB TCMU devices. When it does
> > not hang, the test case takes about an hour to complete.
> 
> Hi Shinichiro,
> 
> Can you still stop the test with ctrl^C, right?

No, I can't. Even when I type Ctrl-C after the hang, the fstests check process
does not stop. I still can login to the system and create new terminals. To
clean up the system, I just do sysrq-b to reboot the system.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-09 11:58         ` Ojaswin Mujoo
@ 2025-11-10  8:58           ` John Garry
  2025-11-10 12:39           ` Shinichiro Kawasaki
  1 sibling, 0 replies; 23+ messages in thread
From: John Garry @ 2025-11-10  8:58 UTC (permalink / raw)
  To: Ojaswin Mujoo
  Cc: Shinichiro Kawasaki, Darrick J. Wong, linux-xfs@vger.kernel.org

On 09/11/2025 11:58, Ojaswin Mujoo wrote:
>> aw_bsize for  me is 1M, and threads is 32
>>
>> aw_bsize is large as XFS supports software-based atomics, which is generally
>> going to be huge compared to anything which HW can support.
>>
>> When I tried to run this test, it was not completing in a sane amount of
>> time - it was taking many minutes before I gave up.
> Hi John, Shinichiro, Darrick.
> 
> Thanks for looking into this. Sorry, I'm on vacation so a bit slow in
> responding.
> 
> Anyways, the logic behind the filesize calculation is that we want each
> thread to do 100 atomic writes in their own isolated ranges in the file.
> But seems like it is being especially slow when we have high CPUs.

It's not just the number of CPUs which is the problem. The test does the 
awu max size writes - for XFS, this size can be many MBs, and not like 
typically < 100 KBs for any FS which relies only on HW-based atomic 
writes, i.e. ext4. Please also consider limiting the awu max size.

> 
> I think in that sense, it'll be better to limit the threads itself
> rather than filesize. Since its a stress test we dont want it to be too
> less. Maybe:
> 
> diff --git a/tests/generic/774 b/tests/generic/774
> index 7a4d7016..c68fb4b7 100755
> --- a/tests/generic/774
> +++ b/tests/generic/774
> @@ -28,7 +28,7 @@ awu_max_write=$(_get_atomic_write_unit_max "$SCRATCH_MNT/f1")
>   aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))")
>   fsbsize=$(_get_block_size $SCRATCH_MNT)
> 
> -threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100")
> +threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "16")
>   filesize=$((aw_bsize * threads * 100))
>   depth=$threads
>   aw_io_size=$((filesize / threads))
> 
> Can you check if this helps?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bug report] fstests generic/774 hang
  2025-11-09 11:58         ` Ojaswin Mujoo
  2025-11-10  8:58           ` John Garry
@ 2025-11-10 12:39           ` Shinichiro Kawasaki
  1 sibling, 0 replies; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-11-10 12:39 UTC (permalink / raw)
  To: Ojaswin Mujoo; +Cc: John Garry, Darrick J. Wong, linux-xfs@vger.kernel.org

On Nov 09, 2025 / 17:28, Ojaswin Mujoo wrote:
[...]
> Anyways, the logic behind the filesize calculation is that we want each
> thread to do 100 atomic writes in their own isolated ranges in the file. 
> But seems like it is being especially slow when we have high CPUs.
> 
> I think in that sense, it'll be better to limit the threads itself
> rather than filesize. Since its a stress test we dont want it to be too
> less. Maybe:
> 
> diff --git a/tests/generic/774 b/tests/generic/774
> index 7a4d7016..c68fb4b7 100755
> --- a/tests/generic/774
> +++ b/tests/generic/774
> @@ -28,7 +28,7 @@ awu_max_write=$(_get_atomic_write_unit_max "$SCRATCH_MNT/f1")
>  aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))")
>  fsbsize=$(_get_block_size $SCRATCH_MNT)
> 
> -threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100")
> +threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "16")
>  filesize=$((aw_bsize * threads * 100))
>  depth=$threads
>  aw_io_size=$((filesize / threads))
> 
> Can you check if this helps? 

As John pointed out, the 1MB atomic block size sounds too large and it might
need care also. Anyway, I applied the change above, and observed the test case
runtime was shortened from ~50m to ~8m. So, this change shows some improvement
for the unexpected long runtime. When I repeated the test case g774 20 times,
the hang was not observed.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: [bug report] fstests generic/774 hang
  2025-11-09 12:02             ` Ojaswin Mujoo
@ 2025-11-10 12:46               ` Shinichiro Kawasaki
  2025-11-10 21:12                 ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-11-10 12:46 UTC (permalink / raw)
  To: Ojaswin Mujoo; +Cc: John Garry, Darrick J. Wong, linux-xfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

On Nov 09, 2025 / 17:32, Ojaswin Mujoo wrote:
> On Thu, Nov 06, 2025 at 08:19:12AM +0000, Shinichiro Kawasaki wrote:
[...]
> > I tried the other "atomicwrites" test. I found g778 took very long time.
> > I think it implies that g778 may have similar problem as g774.
> > 
> >   g765: [not run] write atomic not supported by this block device
> >   g767: 11s
> >   g768: 13s
> >   g769: 13s
> >   g770: 35s
> >   g773: [not run] write atomic not supported by this block device
> >   g774: did not completed after 3 hours run (and kernel reported the INFO messages)
> >   g775: 48s
> >   g776: [not run] write atomic not supported by this block device
> >   g778: did not completed after 50 minutes run
> 
> Hi Shinichiro
> 
> Hmm that's strange, g/778 should tune itself to the speed of the device
> ideally. Will you be able to share the results/generic/778.full file.
> That might give some hints.

Please find the attached 778.full.gz, which I copied about 50 minutes after
the test case start. The test case was still running at that time. Near the end
of the full file, I find "Iteration 13". It looks like the test case is not
hanging, but just taking long time to complete the 20 iterations.

[-- Attachment #2: 778.full.gz --]
[-- Type: application/gzip, Size: 1985 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: [bug report] fstests generic/774 hang
  2025-11-10 12:46               ` [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: " Shinichiro Kawasaki
@ 2025-11-10 21:12                 ` Darrick J. Wong
  2025-11-11 11:43                   ` Shinichiro Kawasaki
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-10 21:12 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Ojaswin Mujoo, John Garry, linux-xfs@vger.kernel.org

On Mon, Nov 10, 2025 at 12:46:19PM +0000, Shinichiro Kawasaki wrote:
> On Nov 09, 2025 / 17:32, Ojaswin Mujoo wrote:
> > On Thu, Nov 06, 2025 at 08:19:12AM +0000, Shinichiro Kawasaki wrote:
> [...]
> > > I tried the other "atomicwrites" test. I found g778 took very long time.
> > > I think it implies that g778 may have similar problem as g774.
> > > 
> > >   g765: [not run] write atomic not supported by this block device
> > >   g767: 11s
> > >   g768: 13s
> > >   g769: 13s
> > >   g770: 35s
> > >   g773: [not run] write atomic not supported by this block device
> > >   g774: did not completed after 3 hours run (and kernel reported the INFO messages)
> > >   g775: 48s
> > >   g776: [not run] write atomic not supported by this block device
> > >   g778: did not completed after 50 minutes run
> > 
> > Hi Shinichiro
> > 
> > Hmm that's strange, g/778 should tune itself to the speed of the device
> > ideally. Will you be able to share the results/generic/778.full file.
> > That might give some hints.
> 
> Please find the attached 778.full.gz, which I copied about 50 minutes after
> the test case start. The test case was still running at that time. Near the end
> of the full file, I find "Iteration 13". It looks like the test case is not
> hanging, but just taking long time to complete the 20 iterations.

<nod> 778 invokes xfs_io and fallocate a few tens of thousands of times,
which makes the test runtime really slow if fork/exec() aren't fast.  I
try to fix that here:

https://lore.kernel.org/fstests/176279908967.605950.2192923313361120314.stgit@frogsfrogsfrogs/T/#t

As well as reducing the test file size for 774, per everyone's comments.

--D

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: [bug report] fstests generic/774 hang
  2025-11-10 21:12                 ` Darrick J. Wong
@ 2025-11-11 11:43                   ` Shinichiro Kawasaki
  0 siblings, 0 replies; 23+ messages in thread
From: Shinichiro Kawasaki @ 2025-11-11 11:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Ojaswin Mujoo, John Garry, linux-xfs@vger.kernel.org

On Nov 10, 2025 / 13:12, Darrick J. Wong wrote:
> On Mon, Nov 10, 2025 at 12:46:19PM +0000, Shinichiro Kawasaki wrote:
> > On Nov 09, 2025 / 17:32, Ojaswin Mujoo wrote:
> > > On Thu, Nov 06, 2025 at 08:19:12AM +0000, Shinichiro Kawasaki wrote:
> > [...]
> > > > I tried the other "atomicwrites" test. I found g778 took very long time.
> > > > I think it implies that g778 may have similar problem as g774.
> > > > 
> > > >   g765: [not run] write atomic not supported by this block device
> > > >   g767: 11s
> > > >   g768: 13s
> > > >   g769: 13s
> > > >   g770: 35s
> > > >   g773: [not run] write atomic not supported by this block device
> > > >   g774: did not completed after 3 hours run (and kernel reported the INFO messages)
> > > >   g775: 48s
> > > >   g776: [not run] write atomic not supported by this block device
> > > >   g778: did not completed after 50 minutes run
> > > 
> > > Hi Shinichiro
> > > 
> > > Hmm that's strange, g/778 should tune itself to the speed of the device
> > > ideally. Will you be able to share the results/generic/778.full file.
> > > That might give some hints.
> > 
> > Please find the attached 778.full.gz, which I copied about 50 minutes after
> > the test case start. The test case was still running at that time. Near the end
> > of the full file, I find "Iteration 13". It looks like the test case is not
> > hanging, but just taking long time to complete the 20 iterations.
> 
> <nod> 778 invokes xfs_io and fallocate a few tens of thousands of times,
> which makes the test runtime really slow if fork/exec() aren't fast.  I
> try to fix that here:
> 
> https://lore.kernel.org/fstests/176279908967.605950.2192923313361120314.stgit@frogsfrogsfrogs/T/#t
> 
> As well as reducing the test file size for 774, per everyone's comments.

Thanks! With the series, g774 completes within four minutes, and g778
completes within a minute in my environment.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-11-11 11:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30  8:45 [bug report] fstests generic/774 hang Shinichiro Kawasaki
2025-11-05  0:33 ` Darrick J. Wong
2025-11-05  2:19   ` Shinichiro Kawasaki
2025-11-05  8:52     ` John Garry
2025-11-05 10:39       ` John Garry
2025-11-05 11:29         ` John Garry
2025-11-05 12:37         ` Shinichiro Kawasaki
2025-11-06  8:19           ` Shinichiro Kawasaki
2025-11-06  8:53             ` John Garry
2025-11-07  2:27               ` Shinichiro Kawasaki
2025-11-07  4:28                 ` Darrick J. Wong
2025-11-07  5:53                   ` Shinichiro Kawasaki
2025-11-07 12:48                     ` John Garry
2025-11-07 17:50                       ` Darrick J. Wong
2025-11-07 23:18                         ` Darrick J. Wong
2025-11-10  2:41                       ` Shinichiro Kawasaki
2025-11-09 12:02             ` Ojaswin Mujoo
2025-11-10 12:46               ` [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: " Shinichiro Kawasaki
2025-11-10 21:12                 ` Darrick J. Wong
2025-11-11 11:43                   ` Shinichiro Kawasaki
2025-11-09 11:58         ` Ojaswin Mujoo
2025-11-10  8:58           ` John Garry
2025-11-10 12:39           ` Shinichiro Kawasaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox