* [bug report] fstests generic/774 hang @ 2025-10-30 8:45 Shinichiro Kawasaki 2025-11-05 0:33 ` Darrick J. Wong 0 siblings, 1 reply; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-10-30 8:45 UTC (permalink / raw) To: linux-xfs@vger.kernel.org I observe the fstests test case generic/774 hangs, when I run it for xfs on 8GiB TCMU fileio devices. It was observed with v6.17 and v6.18-rcX kernel versions. FYI, here I attach the kernel message log that was taken with v6.18-rc3 kernel [1]. The hang is recreated in stable manner by repeating the test case a few times in my environment. Actions for fix will be appreciated. If I can do any help, please let me know. [1] Oct 30 15:11:25 redsun117q unknown: run fstests generic/774 at 2025-10-30 15:11:25 Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 Oct 30 15:11:27 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Mounting V5 Filesystem f93350d1-9b73-448c-bca2-b5b69343922f Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Ending clean mount Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Unmounting Filesystem f93350d1-9b73-448c-bca2-b5b69343922f Oct 30 15:11:29 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 Oct 30 15:11:29 redsun117q kernel: XFS (sdh): Mounting V5 Filesystem 55534b79-27e6-4ded-82e3-5c249c68cb4a Oct 30 15:11:29 redsun117q kernel: XFS (sdh): Ending clean mount Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/0:0:9 blocked for more than 122 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/0:0 state:D stack:0 pid:9 tgid:9 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __lock_release.isra.0+0x59/0x170 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/0:0:9 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:0:45 blocked for more than 122 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/1:0 state:D stack:0 pid:45 tgid:45 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? dequeue_entities+0x24b/0x1530 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? schedule+0x1cc/0x250 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __lock_release.isra.0+0x59/0x170 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:0:45 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:0:105 blocked for more than 123 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/13:0 state:D stack:0 pid:105 tgid:105 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __lock_release.isra.0+0x59/0x170 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:0:105 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:1:189 blocked for more than 123 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/1:1 state:D stack:0 pid:189 tgid:189 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? preempt_schedule_notrace+0x53/0x90 Oct 30 15:33:37 redsun117q kernel: ? schedule+0xfe/0x250 Oct 30 15:33:37 redsun117q kernel: ? rcu_is_watching+0x67/0x80 Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x482/0x1df0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __try_to_del_timer_sync+0xd7/0x130 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:1:189 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:1:204 blocked for more than 123 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/13:1 state:D stack:0 pid:204 tgid:204 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? dequeue_entities+0x24b/0x1530 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __kthread_parkme+0xb3/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:1:204 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:1:261 blocked for more than 123 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/2:1 state:D stack:0 pid:261 tgid:261 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? __kasan_slab_alloc+0x7e/0x90 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x482/0x1df0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __kthread_parkme+0xb3/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:1:261 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/12:4:352 blocked for more than 123 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/12:4 state:D stack:0 pid:352 tgid:352 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? kick_pool+0x1a5/0x860 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/12:4:352 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/3:2:545 blocked for more than 123 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/3:2 state:D stack:0 pid:545 tgid:545 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/3:2:545 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:2:549 blocked for more than 123 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/2:2 state:D stack:0 pid:549 tgid:549 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:2:549 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/6:2:557 blocked for more than 123 seconds. Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 15:33:37 redsun117q kernel: task:kworker/6:2 state:D stack:0 pid:557 tgid:557 ppid:2 task_flags:0x4248060 flags:0x00080000 Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work Oct 30 15:33:37 redsun117q kernel: Call Trace: Oct 30 15:33:37 redsun117q kernel: <TASK> Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 Oct 30 15:33:37 redsun117q kernel: </TASK> Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/6:2:557 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> Oct 30 15:33:37 redsun117q kernel: Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings Oct 30 15:33:37 redsun117q kernel: INFO: lockdep is turned off. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-10-30 8:45 [bug report] fstests generic/774 hang Shinichiro Kawasaki @ 2025-11-05 0:33 ` Darrick J. Wong 2025-11-05 2:19 ` Shinichiro Kawasaki 0 siblings, 1 reply; 23+ messages in thread From: Darrick J. Wong @ 2025-11-05 0:33 UTC (permalink / raw) To: Shinichiro Kawasaki; +Cc: linux-xfs@vger.kernel.org, John Garry, ojaswin [add jogarry/ojaswin since this is a new atomic writes test] On Thu, Oct 30, 2025 at 08:45:05AM +0000, Shinichiro Kawasaki wrote: > I observe the fstests test case generic/774 hangs, when I run it for xfs on 8GiB > TCMU fileio devices. It was observed with v6.17 and v6.18-rcX kernel versions. > FYI, here I attach the kernel message log that was taken with v6.18-rc3 kernel > [1]. The hang is recreated in stable manner by repeating the test case a few > times in my environment. > > Actions for fix will be appreciated. If I can do any help, please let me know. I wonder, does your disk support atomic writes or are we just using the software fallback in xfs? > > [1] > > Oct 30 15:11:25 redsun117q unknown: run fstests generic/774 at 2025-10-30 15:11:25 > Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 > Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 > Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 > Oct 30 15:11:27 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 My guess is the disk doesn't support atomic writes? --D > Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Mounting V5 Filesystem f93350d1-9b73-448c-bca2-b5b69343922f > Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Ending clean mount > Oct 30 15:11:28 redsun117q kernel: XFS (sdh): Unmounting Filesystem f93350d1-9b73-448c-bca2-b5b69343922f > Oct 30 15:11:29 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 > Oct 30 15:11:29 redsun117q kernel: XFS (sdh): Mounting V5 Filesystem 55534b79-27e6-4ded-82e3-5c249c68cb4a > Oct 30 15:11:29 redsun117q kernel: XFS (sdh): Ending clean mount > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/0:0:9 blocked for more than 122 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/0:0 state:D stack:0 pid:9 tgid:9 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 > Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __lock_release.isra.0+0x59/0x170 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/0:0:9 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:0:45 blocked for more than 122 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/1:0 state:D stack:0 pid:45 tgid:45 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 > Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? dequeue_entities+0x24b/0x1530 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? schedule+0x1cc/0x250 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __lock_release.isra.0+0x59/0x170 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:0:45 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:0:105 blocked for more than 123 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/13:0 state:D stack:0 pid:105 tgid:105 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __lock_release.isra.0+0x59/0x170 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:0:105 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:1:189 blocked for more than 123 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/1:1 state:D stack:0 pid:189 tgid:189 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? preempt_schedule_notrace+0x53/0x90 > Oct 30 15:33:37 redsun117q kernel: ? schedule+0xfe/0x250 > Oct 30 15:33:37 redsun117q kernel: ? rcu_is_watching+0x67/0x80 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x482/0x1df0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __try_to_del_timer_sync+0xd7/0x130 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/1:1:189 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:1:204 blocked for more than 123 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/13:1 state:D stack:0 pid:204 tgid:204 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 > Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? dequeue_entities+0x24b/0x1530 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __kthread_parkme+0xb3/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/13:1:204 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:1:261 blocked for more than 123 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/2:1 state:D stack:0 pid:261 tgid:261 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? __kasan_slab_alloc+0x7e/0x90 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x482/0x1df0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __kthread_parkme+0xb3/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:1:261 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/12:4:352 blocked for more than 123 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/12:4 state:D stack:0 pid:352 tgid:352 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? kick_pool+0x1a5/0x860 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/12:4:352 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/3:2:545 blocked for more than 123 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/3:2 state:D stack:0 pid:545 tgid:545 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 > Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/3:2:545 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:2:549 blocked for more than 123 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/2:2 state:D stack:0 pid:549 tgid:549 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? rwsem_optimistic_spin+0x1d1/0x430 > Oct 30 15:33:37 redsun117q kernel: ? do_raw_spin_lock+0x128/0x270 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_do_raw_spin_lock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/2:2:549 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/6:2:557 blocked for more than 123 seconds. > Oct 30 15:33:37 redsun117q kernel: Tainted: G W 6.18.0-rc3-kts #3 > Oct 30 15:33:37 redsun117q kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 30 15:33:37 redsun117q kernel: task:kworker/6:2 state:D stack:0 pid:557 tgid:557 ppid:2 task_flags:0x4248060 flags:0x00080000 > Oct 30 15:33:37 redsun117q kernel: Workqueue: dio/sdh iomap_dio_complete_work > Oct 30 15:33:37 redsun117q kernel: Call Trace: > Oct 30 15:33:37 redsun117q kernel: <TASK> > Oct 30 15:33:37 redsun117q kernel: __schedule+0x8bb/0x1ab0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_osq_unlock+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___schedule+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: schedule+0xd1/0x250 > Oct 30 15:33:37 redsun117q kernel: schedule_preempt_disabled+0x15/0x30 > Oct 30 15:33:37 redsun117q kernel: rwsem_down_write_slowpath+0x4c6/0x1320 > Oct 30 15:33:37 redsun117q kernel: ? lock_release+0xcb/0x110 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? percpu_counter_add_batch+0x80/0x220 > Oct 30 15:33:37 redsun117q kernel: ? __pfx___might_resched+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: down_write_nested+0x1c4/0x1f0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_down_write_nested+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: xfs_reflink_end_atomic_cow+0x2b9/0x500 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? dequeue_entity+0x33e/0x1df0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_reflink_end_atomic_cow+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? update_load_avg+0x226/0x2200 > Oct 30 15:33:37 redsun117q kernel: ? kvm_sched_clock_read+0x11/0x20 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock+0x10/0x30 > Oct 30 15:33:37 redsun117q kernel: ? sched_clock_cpu+0x69/0x5a0 > Oct 30 15:33:37 redsun117q kernel: xfs_dio_write_end_io+0x555/0x7c0 [xfs] > Oct 30 15:33:37 redsun117q kernel: ? __pfx_xfs_dio_write_end_io+0x10/0x10 [xfs] > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete+0x13e/0x8d0 > Oct 30 15:33:37 redsun117q kernel: ? trace_hardirqs_on+0x18/0x150 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_aio_complete_rw+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: iomap_dio_complete_work+0x58/0x90 > Oct 30 15:33:37 redsun117q kernel: process_one_work+0x86b/0x14c0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_process_one_work+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > Oct 30 15:33:37 redsun117q kernel: ? assign_work+0x156/0x390 > Oct 30 15:33:37 redsun117q kernel: worker_thread+0x5f2/0xfd0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_worker_thread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: kthread+0x3a4/0x760 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? lock_acquire+0xf6/0x140 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork+0x2d6/0x3e0 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ? __pfx_kthread+0x10/0x10 > Oct 30 15:33:37 redsun117q kernel: ret_from_fork_asm+0x1a/0x30 > Oct 30 15:33:37 redsun117q kernel: </TASK> > Oct 30 15:33:37 redsun117q kernel: INFO: task kworker/6:2:557 <writer> blocked on an rw-semaphore likely owned by task kworker/0:7:2826 <writer> > Oct 30 15:33:37 redsun117q kernel: Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings > Oct 30 15:33:37 redsun117q kernel: INFO: lockdep is turned off. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-05 0:33 ` Darrick J. Wong @ 2025-11-05 2:19 ` Shinichiro Kawasaki 2025-11-05 8:52 ` John Garry 0 siblings, 1 reply; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-11-05 2:19 UTC (permalink / raw) To: Darrick J. Wong Cc: linux-xfs@vger.kernel.org, John Garry, ojaswin@linux.ibm.com On Nov 04, 2025 / 16:33, Darrick J. Wong wrote: > [add jogarry/ojaswin since this is a new atomic writes test] > > On Thu, Oct 30, 2025 at 08:45:05AM +0000, Shinichiro Kawasaki wrote: > > I observe the fstests test case generic/774 hangs, when I run it for xfs on 8GiB > > TCMU fileio devices. It was observed with v6.17 and v6.18-rcX kernel versions. > > FYI, here I attach the kernel message log that was taken with v6.18-rc3 kernel > > [1]. The hang is recreated in stable manner by repeating the test case a few > > times in my environment. > > > > Actions for fix will be appreciated. If I can do any help, please let me know. > > I wonder, does your disk support atomic writes or are we just using the > software fallback in xfs? I don't think the disk supports atomic writes. It is just a regular TCMU device, and its atomic write related sysfs attributes have value 0: $ grep -rne . /sys/block/sdh/queue/ | grep atomic /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0 /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0 /sys/block/sdh/queue/atomic_write_max_bytes:1:0 /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0 FYI, I attach the all sysfs queue attribute values of the device [2]. > > > > [1] > > > > Oct 30 15:11:25 redsun117q unknown: run fstests generic/774 at 2025-10-30 15:11:25 > > Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 > > Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 > > Oct 30 15:11:25 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 > > Oct 30 15:11:27 redsun117q kernel: MODE SENSE: unimplemented page/subpage: 0x0a/0x05 > > My guess is the disk doesn't support atomic writes? The "MODE SENSE: unimplemented page/subpage" messages are reported to all other test cases, like this: [495623.282810][T29013] run fstests generic/001 at 2025-11-05 11:10:33 [495623.377143][T27961] MODE SENSE: unimplemented page/subpage: 0x0a/0x05 [495623.650270][T28145] MODE SENSE: unimplemented page/subpage: 0x0a/0x05 [495623.683842][T28157] MODE SENSE: unimplemented page/subpage: 0x0a/0x05 [495660.733929][T32362] TARGET_CORE[loopback]: Expected Transfer Length: 0 does not match SCSI CDB Length: 512 for SAM Opcode: 0x8f [495662.073182][T32548] XFS (sdg): Unmounting Filesystem 16ee26f7-5a36-4e84-a6b9-04d076522519 [495662.170053][T28145] MODE SENSE: unimplemented page/subpage: 0x0a/0x05 [495662.439897][T32792] XFS (sdg): Mounting V5 Filesystem 16ee26f7-5a36-4e84-a6b9-04d076522519 [495662.459341][T32792] XFS (sdg): Ending clean mount [495662.886657][T32833] XFS (sdg): Unmounting Filesystem 16ee26f7-5a36-4e84-a6b9-04d076522519 So I think the messages are irrelevant, probably. [2] test target device sysfs queue attributes $ grep -rne . /sys/block/sdh/queue/ /sys/block/sdh/queue/io_poll_delay:1:-1 /sys/block/sdh/queue/max_integrity_segments:1:65535 /sys/block/sdh/queue/zoned:1:none /sys/block/sdh/queue/scheduler:1:none mq-deadline kyber [bfq] /sys/block/sdh/queue/io_poll:1:0 /sys/block/sdh/queue/discard_zeroes_data:1:0 /sys/block/sdh/queue/minimum_io_size:1:512 /sys/block/sdh/queue/nr_zones:1:0 /sys/block/sdh/queue/write_same_max_bytes:1:0 /sys/block/sdh/queue/max_segments:1:256 /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0 /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0 /sys/block/sdh/queue/dax:1:0 /sys/block/sdh/queue/dma_alignment:1:3 /sys/block/sdh/queue/physical_block_size:1:512 /sys/block/sdh/queue/logical_block_size:1:512 /sys/block/sdh/queue/virt_boundary_mask:1:0 /sys/block/sdh/queue/zone_append_max_bytes:1:0 /sys/block/sdh/queue/io_timeout:1:30000 /sys/block/sdh/queue/nr_requests:1:256 /sys/block/sdh/queue/write_stream_granularity:1:0 /sys/block/sdh/queue/iostats_passthrough:1:0 /sys/block/sdh/queue/write_cache:1:write back /sys/block/sdh/queue/stable_writes:1:0 /sys/block/sdh/queue/max_segment_size:1:65536 /sys/block/sdh/queue/max_write_streams:1:0 /sys/block/sdh/queue/write_zeroes_unmap_max_bytes:1:0 /sys/block/sdh/queue/rotational:1:1 /sys/block/sdh/queue/discard_max_bytes:1:0 /sys/block/sdh/queue/write_zeroes_unmap_max_hw_bytes:1:0 /sys/block/sdh/queue/atomic_write_max_bytes:1:0 /sys/block/sdh/queue/add_random:1:1 /sys/block/sdh/queue/discard_max_hw_bytes:1:0 /sys/block/sdh/queue/optimal_io_size:1:8388608 /sys/block/sdh/queue/chunk_sectors:1:0 /sys/block/sdh/queue/iosched/fifo_expire_async:1:250 /sys/block/sdh/queue/iosched/back_seek_penalty:1:2 /sys/block/sdh/queue/iosched/timeout_sync:1:125 /sys/block/sdh/queue/iosched/back_seek_max:1:16384 /sys/block/sdh/queue/iosched/low_latency:1:1 /sys/block/sdh/queue/iosched/strict_guarantees:1:0 /sys/block/sdh/queue/iosched/slice_idle_us:1:8000 /sys/block/sdh/queue/iosched/fifo_expire_sync:1:125 /sys/block/sdh/queue/iosched/slice_idle:1:8 /sys/block/sdh/queue/iosched/max_budget:1:0 /sys/block/sdh/queue/read_ahead_kb:1:16384 /sys/block/sdh/queue/max_discard_segments:1:1 /sys/block/sdh/queue/write_zeroes_max_bytes:1:0 /sys/block/sdh/queue/nomerges:1:0 /sys/block/sdh/queue/zone_write_granularity:1:0 /sys/block/sdh/queue/wbt_lat_usec:1:0 /sys/block/sdh/queue/fua:1:1 /sys/block/sdh/queue/discard_granularity:1:0 /sys/block/sdh/queue/rq_affinity:1:1 /sys/block/sdh/queue/max_sectors_kb:1:8192 /sys/block/sdh/queue/hw_sector_size:1:512 /sys/block/sdh/queue/max_hw_sectors_kb:1:32767 /sys/block/sdh/queue/iostats:1:1 /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-05 2:19 ` Shinichiro Kawasaki @ 2025-11-05 8:52 ` John Garry 2025-11-05 10:39 ` John Garry 0 siblings, 1 reply; 23+ messages in thread From: John Garry @ 2025-11-05 8:52 UTC (permalink / raw) To: Shinichiro Kawasaki, Darrick J. Wong Cc: linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On 05/11/2025 02:19, Shinichiro Kawasaki wrote: > On Nov 04, 2025 / 16:33, Darrick J. Wong wrote: >> [add jogarry/ojaswin since this is a new atomic writes test] >> >> On Thu, Oct 30, 2025 at 08:45:05AM +0000, Shinichiro Kawasaki wrote: >>> I observe the fstests test case generic/774 hangs, when I run it for xfs on 8GiB >>> TCMU fileio devices. It was observed with v6.17 and v6.18-rcX kernel versions. >>> FYI, here I attach the kernel message log that was taken with v6.18-rc3 kernel >>> [1]. The hang is recreated in stable manner by repeating the test case a few >>> times in my environment. >>> >>> Actions for fix will be appreciated. If I can do any help, please let me know. >> >> I wonder, does your disk support atomic writes or are we just using the >> software fallback in xfs? > > I don't think the disk supports atomic writes. It is just a regular TCMU device, > and its atomic write related sysfs attributes have value 0: > > $ grep -rne . /sys/block/sdh/queue/ | grep atomic > /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0 > /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0 > /sys/block/sdh/queue/atomic_write_max_bytes:1:0 > /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0 > > FYI, I attach the all sysfs queue attribute values of the device [2]. Yes, this would only be using software-based atomic writes. Shinichiro, do the other atomic writes tests run ok, like 775, 767? You can check group "atomicwrites" to know which tests they are. 774 is the fio test. Some things to try: - use a physical disk for the TEST_DEV - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to reduce $threads to a low value, say, 2 - trying turning on XFS_DEBUG config BTW, Darrick has posted some xfs atomics fixes @ https://lore.kernel.org/linux-xfs/20251105001200.GV196370@frogsfrogsfrogs/T/#t. I doubt that they will help this, but worth trying. I will try to recreate. Thanks, John ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-05 8:52 ` John Garry @ 2025-11-05 10:39 ` John Garry 2025-11-05 11:29 ` John Garry ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: John Garry @ 2025-11-05 10:39 UTC (permalink / raw) To: Shinichiro Kawasaki, Darrick J. Wong Cc: linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On 05/11/2025 08:52, John Garry wrote: >> I don't think the disk supports atomic writes. It is just a regular >> TCMU device, >> and its atomic write related sysfs attributes have value 0: >> >> $ grep -rne . /sys/block/sdh/queue/ | grep atomic >> /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0 >> /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0 >> /sys/block/sdh/queue/atomic_write_max_bytes:1:0 >> /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0 >> >> FYI, I attach the all sysfs queue attribute values of the device [2]. > > Yes, this would only be using software-based atomic writes. > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You > can check group "atomicwrites" to know which tests they are. > > 774 is the fio test. > > Some things to try: > - use a physical disk for the TEST_DEV > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to > reduce $threads to a low value, say, 2 > - trying turning on XFS_DEBUG config > > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/ > v3/__https://lore.kernel.org/linux- > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! > IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$ . I doubt that they will help this, but worth trying. > > I will try to recreate. I tested this and the filesize which we try to write is huge, like 3.3G in my case. That seems excessive. The calc comes from the following in 774: filesize=$((aw_bsize * threads * 100)) aw_bsize for me is 1M, and threads is 32 aw_bsize is large as XFS supports software-based atomics, which is generally going to be huge compared to anything which HW can support. When I tried to run this test, it was not completing in a sane amount of time - it was taking many minutes before I gave up. @shinichiro, please try this: --- a/tests/generic/774 +++ b/tests/generic/774 @@ -29,7 +29,7 @@ aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))") fsbsize=$(_get_block_size $SCRATCH_MNT) threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100") -filesize=$((aw_bsize * threads * 100)) +filesize=$((aw_bsize * threads)) depth=$threads aw_io_size=$((filesize / threads)) aw_io_inc=$aw_io_size ubuntu@jgarry-instance-20240626-1657-xfs-ubuntu:~/xfstests-dev$ Note, I ran with this change and the test now completes, but I get this: +fio: failed initializing LFSR +fio: failed initializing LFSR +fio: failed initializing LFSR +fio: failed initializing LFSR +verify: bad magic header 0, wanted acca at file /home/ubuntu/mnt/scratch/test-file offset 0, length 1048576 (requested block: offset=0, length=1048576) +verify: bad magic header e3d6, wanted acca at file /home/ubuntu/mnt/scratch/test-file offset 8388608, length 1048576 (requested block: offset=8388608, length=1048576) I need to check that fio complaint. Thanks, John ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-05 10:39 ` John Garry @ 2025-11-05 11:29 ` John Garry 2025-11-05 12:37 ` Shinichiro Kawasaki 2025-11-09 11:58 ` Ojaswin Mujoo 2 siblings, 0 replies; 23+ messages in thread From: John Garry @ 2025-11-05 11:29 UTC (permalink / raw) To: Shinichiro Kawasaki, Darrick J. Wong Cc: linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On 05/11/2025 10:39, John Garry wrote: > > +fio: failed initializing LFSR > +fio: failed initializing LFSR > +fio: failed initializing LFSR > +fio: failed initializing LFSR > +verify: bad magic header 0, wanted acca at file /home/ubuntu/mnt/ > scratch/test-file offset 0, length 1048576 (requested block: offset=0, > length=1048576) > +verify: bad magic header e3d6, wanted acca at file /home/ubuntu/ > mnt/scratch/test-file offset 8388608, length 1048576 (requested block: > offset=8388608, length=1048576) > > I need to check that fio complaint. This issue goes away when I stop using lfsr, i.e. the test passes. The problem is that lfsr init in fio does not have enough "blocks", and this comes from how the fio bs is same as the increment aw_io_inc, both 1M in my case. I think that aw_io_inc needs to be much lager than bs. BTW, I think that the random number gen fio param is only relevant in fio write mode. It seems to be even set in 774 for verify read. Thanks, John ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-05 10:39 ` John Garry 2025-11-05 11:29 ` John Garry @ 2025-11-05 12:37 ` Shinichiro Kawasaki 2025-11-06 8:19 ` Shinichiro Kawasaki 2025-11-09 11:58 ` Ojaswin Mujoo 2 siblings, 1 reply; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-11-05 12:37 UTC (permalink / raw) To: John Garry Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On Nov 05, 2025 / 10:39, John Garry wrote: > On 05/11/2025 08:52, John Garry wrote: > > > I don't think the disk supports atomic writes. It is just a regular > > > TCMU device, > > > and its atomic write related sysfs attributes have value 0: > > > > > > $ grep -rne . /sys/block/sdh/queue/ | grep atomic > > > /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0 > > > /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0 > > > /sys/block/sdh/queue/atomic_write_max_bytes:1:0 > > > /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0 > > > > > > FYI, I attach the all sysfs queue attribute values of the device [2]. > > > > Yes, this would only be using software-based atomic writes. > > > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You > > can check group "atomicwrites" to know which tests they are. > > > > 774 is the fio test. > > > > Some things to try: > > - use a physical disk for the TEST_DEV > > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to > > reduce $threads to a low value, say, 2 > > - trying turning on XFS_DEBUG config > > > > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/ > > v3/__https://lore.kernel.org/linux- > > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$ > > . I doubt that they will help this, but worth trying. John, thank you for looking into this. Tomorrow, I will do some trials based on your comments above. Today, I have just done a quick try with the change below you suggested. > > > > I will try to recreate. > > I tested this and the filesize which we try to write is huge, like 3.3G in > my case. That seems excessive. > > The calc comes from the following in 774: > > filesize=$((aw_bsize * threads * 100)) > > aw_bsize for me is 1M, and threads is 32 > > aw_bsize is large as XFS supports software-based atomics, which is generally > going to be huge compared to anything which HW can support. > > When I tried to run this test, it was not completing in a sane amount of > time - it was taking many minutes before I gave up. > > @shinichiro, please try this: > > --- a/tests/generic/774 > +++ b/tests/generic/774 > @@ -29,7 +29,7 @@ aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))") > fsbsize=$(_get_block_size $SCRATCH_MNT) > > threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100") > -filesize=$((aw_bsize * threads * 100)) > +filesize=$((aw_bsize * threads)) > depth=$threads > aw_io_size=$((filesize / threads)) > aw_io_inc=$aw_io_size > ubuntu@jgarry-instance-20240626-1657-xfs-ubuntu:~/xfstests-dev$ With the change above, the test case g774 completed less than a miniute on my test node. No kernel INFO/WARN/BUG. > > > Note, I ran with this change and the test now completes, but I get this: > > +fio: failed initializing LFSR > +fio: failed initializing LFSR > +fio: failed initializing LFSR > +fio: failed initializing LFSR > +verify: bad magic header 0, wanted acca at file > /home/ubuntu/mnt/scratch/test-file offset 0, length 1048576 (requested > block: offset=0, length=1048576) > +verify: bad magic header e3d6, wanted acca at file > /home/ubuntu/mnt/scratch/test-file offset 8388608, length 1048576 (requested > block: offset=8388608, length=1048576) > > I need to check that fio complaint. I also saw the fio error messages. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-05 12:37 ` Shinichiro Kawasaki @ 2025-11-06 8:19 ` Shinichiro Kawasaki 2025-11-06 8:53 ` John Garry 2025-11-09 12:02 ` Ojaswin Mujoo 0 siblings, 2 replies; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-11-06 8:19 UTC (permalink / raw) To: John Garry Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On Nov 05, 2025 / 21:37, Shin'ichiro Kawasaki wrote: > On Nov 05, 2025 / 10:39, John Garry wrote: > > On 05/11/2025 08:52, John Garry wrote: > > > > I don't think the disk supports atomic writes. It is just a regular > > > > TCMU device, > > > > and its atomic write related sysfs attributes have value 0: > > > > > > > > $ grep -rne . /sys/block/sdh/queue/ | grep atomic > > > > /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0 > > > > /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0 > > > > /sys/block/sdh/queue/atomic_write_max_bytes:1:0 > > > > /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0 > > > > > > > > FYI, I attach the all sysfs queue attribute values of the device [2]. > > > > > > Yes, this would only be using software-based atomic writes. > > > > > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You > > > can check group "atomicwrites" to know which tests they are. > > > > > > 774 is the fio test. I tried the other "atomicwrites" test. I found g778 took very long time. I think it implies that g778 may have similar problem as g774. g765: [not run] write atomic not supported by this block device g767: 11s g768: 13s g769: 13s g770: 35s g773: [not run] write atomic not supported by this block device g774: did not completed after 3 hours run (and kernel reported the INFO messages) g775: 48s g776: [not run] write atomic not supported by this block device g778: did not completed after 50 minutes run x838: [not run] External volumes not in use, skipped this test x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG x840: [not run] write atomic not supported by this block device > > > > > > Some things to try: > > > - use a physical disk for the TEST_DEV I tried using a real HDD for TEST_DEV, but still observed the hang and INFO messages at g774. > > > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to > > > reduce $threads to a low value, say, 2 I do not set LOAD_FACTOR. I changed g775 script to set threads=2, then the test case completed quickly, within a few minutes. I'm suspecting that this short test time might hide the hang/INFO problem. > > > - trying turning on XFS_DEBUG config I turned on XFS_DEBUG, and still observed the hang and the INFO messages. > > > > > > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/ > > > v3/__https://lore.kernel.org/linux- > > > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$ > > > . I doubt that they will help this, but worth trying. I have not yet tried this. Will try it tomorrow. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-06 8:19 ` Shinichiro Kawasaki @ 2025-11-06 8:53 ` John Garry 2025-11-07 2:27 ` Shinichiro Kawasaki 2025-11-09 12:02 ` Ojaswin Mujoo 1 sibling, 1 reply; 23+ messages in thread From: John Garry @ 2025-11-06 8:53 UTC (permalink / raw) To: Shinichiro Kawasaki Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com >>>> >>>> Shinichiro, do the other atomic writes tests run ok, like 775, 767? You >>>> can check group "atomicwrites" to know which tests they are. >>>> >>>> 774 is the fio test. > > I tried the other "atomicwrites" test. I found g778 took very long time. > I think it implies that g778 may have similar problem as g774. > > g765: [not run] write atomic not supported by this block device > g767: 11s > g768: 13s > g769: 13s > g770: 35s > g773: [not run] write atomic not supported by this block device > g774: did not completed after 3 hours run (and kernel reported the INFO messages) > g775: 48s > g776: [not run] write atomic not supported by this block device > g778: did not completed after 50 minutes run > x838: [not run] External volumes not in use, skipped this test > x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG > x840: [not run] write atomic not supported by this block device This is testing software-based atomic writes, and they are just slow. Very slow, relative to HW-based atomic writes. And having bs=1M will make things worse, as we are locking out other threads for longer (when doing the write). So I think that we should limit the file size which we try to write. > >>>> >>>> Some things to try: >>>> - use a physical disk for the TEST_DEV > > I tried using a real HDD for TEST_DEV, but still observed the hang and INFO > messages at g774. > >>>> - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to >>>> reduce $threads to a low value, say, 2 > > I do not set LOAD_FACTOR. I changed g775 script to set threads=2, then the > test case completed quickly, within a few minutes. I'm suspecting that this > short test time might hide the hang/INFO problem. > >>>> - trying turning on XFS_DEBUG config > > I turned on XFS_DEBUG, and still observed the hang and the INFO messages. > I don't think that this will help. >>>> >>>> BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/ >>>> v3/__https://urldefense.com/v3/__https://lore.kernel.org/linux-__;!!ACWV5N9M2RV99hQ!J3HKTWLF8Qx-j42OOJ4o1YAttSSoqOCm9ymJtisUYoOtGgOyNNGqHnjjl1Zd9DQXJvCz8zqPMG-kgeVdo9MQuupMlcAo$ >>>> xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$ >>>> . I doubt that they will help this, but worth trying. > > I have not yet tried this. Will try it tomorrow. Nor this. Having a hang - even for the conditions set - should not produce a hang. I can check on whether we can improve the software-based atomic writes in xfs to avoid this. Thanks, John ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-06 8:53 ` John Garry @ 2025-11-07 2:27 ` Shinichiro Kawasaki 2025-11-07 4:28 ` Darrick J. Wong 0 siblings, 1 reply; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-11-07 2:27 UTC (permalink / raw) To: John Garry Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On Nov 06, 2025 / 08:53, John Garry wrote: > > > > > > > > > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You > > > > > can check group "atomicwrites" to know which tests they are. > > > > > > > > > > 774 is the fio test. > > > > I tried the other "atomicwrites" test. I found g778 took very long time. > > I think it implies that g778 may have similar problem as g774. > > > > g765: [not run] write atomic not supported by this block device > > g767: 11s > > g768: 13s > > g769: 13s > > g770: 35s > > g773: [not run] write atomic not supported by this block device > > g774: did not completed after 3 hours run (and kernel reported the INFO messages) > > g775: 48s > > g776: [not run] write atomic not supported by this block device > > g778: did not completed after 50 minutes run > > x838: [not run] External volumes not in use, skipped this test > > x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG > > x840: [not run] write atomic not supported by this block device > > This is testing software-based atomic writes, and they are just slow. Very > slow, relative to HW-based atomic writes. And having bs=1M will make things > worse, as we are locking out other threads for longer (when doing the > write). I see, thanks for the explanation. > So I think that we should limit the file size which we try to write. This sounds reasonable, and it will make fstests run maintenance work easier. > > > > > > > > > > > > > Some things to try: > > > > > - use a physical disk for the TEST_DEV > > > > I tried using a real HDD for TEST_DEV, but still observed the hang and INFO > > messages at g774. > > > > > > > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to > > > > > reduce $threads to a low value, say, 2 > > > > I do not set LOAD_FACTOR. I changed g775 script to set threads=2, then the > > test case completed quickly, within a few minutes. I'm suspecting that this > > short test time might hide the hang/INFO problem. > > > > > > > - trying turning on XFS_DEBUG config > > > > I turned on XFS_DEBUG, and still observed the hang and the INFO messages. > > > > I don't think that this will help. > > > > > > > > > > > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/ > > > > > v3/__https://urldefense.com/v3/__https://lore.kernel.org/linux-__;!!ACWV5N9M2RV99hQ!J3HKTWLF8Qx-j42OOJ4o1YAttSSoqOCm9ymJtisUYoOtGgOyNNGqHnjjl1Zd9DQXJvCz8zqPMG-kgeVdo9MQuupMlcAo$ > > > > > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$ > > > > > . I doubt that they will help this, but worth trying. > > > > I have not yet tried this. Will try it tomorrow. > > Nor this. I confirmed it. I applied the patches to v6.18-rc4 kernel. With this kernel, the hang and the INFO messages are recreated. > > Having a hang - even for the conditions set - should not produce a hang. I > can check on whether we can improve the software-based atomic writes in xfs > to avoid this. Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging test node and share. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-07 2:27 ` Shinichiro Kawasaki @ 2025-11-07 4:28 ` Darrick J. Wong 2025-11-07 5:53 ` Shinichiro Kawasaki 0 siblings, 1 reply; 23+ messages in thread From: Darrick J. Wong @ 2025-11-07 4:28 UTC (permalink / raw) To: Shinichiro Kawasaki Cc: John Garry, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote: > On Nov 06, 2025 / 08:53, John Garry wrote: > > > > > > > > > > > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You > > > > > > can check group "atomicwrites" to know which tests they are. > > > > > > > > > > > > 774 is the fio test. > > > > > > I tried the other "atomicwrites" test. I found g778 took very long time. > > > I think it implies that g778 may have similar problem as g774. > > > > > > g765: [not run] write atomic not supported by this block device > > > g767: 11s > > > g768: 13s > > > g769: 13s > > > g770: 35s > > > g773: [not run] write atomic not supported by this block device > > > g774: did not completed after 3 hours run (and kernel reported the INFO messages) > > > g775: 48s > > > g776: [not run] write atomic not supported by this block device > > > g778: did not completed after 50 minutes run > > > x838: [not run] External volumes not in use, skipped this test > > > x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG > > > x840: [not run] write atomic not supported by this block device > > > > This is testing software-based atomic writes, and they are just slow. Very > > slow, relative to HW-based atomic writes. And having bs=1M will make things > > worse, as we are locking out other threads for longer (when doing the > > write). > > I see, thanks for the explanation. > > > So I think that we should limit the file size which we try to write. > > This sounds reasonable, and it will make fstests run maintenance work easier. > > > > > > > > > > > > > > > > > > Some things to try: > > > > > > - use a physical disk for the TEST_DEV > > > > > > I tried using a real HDD for TEST_DEV, but still observed the hang and INFO > > > messages at g774. > > > > > > > > > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to > > > > > > reduce $threads to a low value, say, 2 > > > > > > I do not set LOAD_FACTOR. I changed g775 script to set threads=2, then the > > > test case completed quickly, within a few minutes. I'm suspecting that this > > > short test time might hide the hang/INFO problem. > > > > > > > > > - trying turning on XFS_DEBUG config > > > > > > I turned on XFS_DEBUG, and still observed the hang and the INFO messages. > > > > > > > I don't think that this will help. > > > > > > > > > > > > > > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/ > > > > > > v3/__https://urldefense.com/v3/__https://lore.kernel.org/linux-__;!!ACWV5N9M2RV99hQ!J3HKTWLF8Qx-j42OOJ4o1YAttSSoqOCm9ymJtisUYoOtGgOyNNGqHnjjl1Zd9DQXJvCz8zqPMG-kgeVdo9MQuupMlcAo$ > > > > > > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$ > > > > > > . I doubt that they will help this, but worth trying. > > > > > > I have not yet tried this. Will try it tomorrow. > > > > Nor this. > > I confirmed it. I applied the patches to v6.18-rc4 kernel. With this kernel, the > hang and the INFO messages are recreated. > > > > > Having a hang - even for the conditions set - should not produce a hang. I > > can check on whether we can improve the software-based atomic writes in xfs > > to avoid this. > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging > test node and share. Yes, anything you can share would be helpful. FWIW the test runs in 51 seconds here, but I only have 4 CPUs in the VM and fast storage so its filesize is "only" 800MB. --D ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-07 4:28 ` Darrick J. Wong @ 2025-11-07 5:53 ` Shinichiro Kawasaki 2025-11-07 12:48 ` John Garry 0 siblings, 1 reply; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-11-07 5:53 UTC (permalink / raw) To: Darrick J. Wong Cc: John Garry, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com [-- Attachment #1: Type: text/plain, Size: 1149 bytes --] On Nov 06, 2025 / 20:28, Darrick J. Wong wrote: > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote: > > On Nov 06, 2025 / 08:53, John Garry wrote: ... > > > Having a hang - even for the conditions set - should not produce a hang. I > > > can check on whether we can improve the software-based atomic writes in xfs > > > to avoid this. > > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging > > test node and share. > > Yes, anything you can share would be helpful. Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by Darrick. I also attached the kernel config (_config.gz) which I used to build the test target kernel. > FWIW the test runs in 51 > seconds here, but I only have 4 CPUs in the VM and fast storage so its > filesize is "only" 800MB. FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the test case a few times to recreate it with the 8GiB TCMU devices. When it does not hang, the test case takes about an hour to complete. [-- Attachment #2: dmesg.gz --] [-- Type: application/gzip, Size: 219128 bytes --] [-- Attachment #3: _config.gz --] [-- Type: application/gzip, Size: 42647 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-07 5:53 ` Shinichiro Kawasaki @ 2025-11-07 12:48 ` John Garry 2025-11-07 17:50 ` Darrick J. Wong 2025-11-10 2:41 ` Shinichiro Kawasaki 0 siblings, 2 replies; 23+ messages in thread From: John Garry @ 2025-11-07 12:48 UTC (permalink / raw) To: Shinichiro Kawasaki, Darrick J. Wong Cc: linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On 07/11/2025 05:53, Shinichiro Kawasaki wrote: > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote: >> On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote: >>> On Nov 06, 2025 / 08:53, John Garry wrote: > ... >>>> Having a hang - even for the conditions set - should not produce a hang. I >>>> can check on whether we can improve the software-based atomic writes in xfs >>>> to avoid this. >>> >>> Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging >>> test node and share. >> >> Yes, anything you can share would be helpful. > > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by > Darrick. I also attached the kernel config (_config.gz) which I used to build > the test target kernel. > >> FWIW the test runs in 51 >> seconds here, but I only have 4 CPUs in the VM and fast storage so its >> filesize is "only" 800MB. > > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the > test case a few times to recreate it with the 8GiB TCMU devices. When it does > not hang, the test case takes about an hour to complete. Hi Shinichiro, Can you still stop the test with ctrl^C, right? @Darrick, I worry that there is too much ip lock contention in xfs_atomic_write_cow_iomap_begin(), especially since we may drop and re-acquire the lock (in xfs_trans_alloc_inode()). Maybe we should force serialization in xfs_file_dio_write_atomic(). After all, this was not intended to provide good performance. Or look at other ways to optimise this (if we do want good performance). Thanks, John ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-07 12:48 ` John Garry @ 2025-11-07 17:50 ` Darrick J. Wong 2025-11-07 23:18 ` Darrick J. Wong 2025-11-10 2:41 ` Shinichiro Kawasaki 1 sibling, 1 reply; 23+ messages in thread From: Darrick J. Wong @ 2025-11-07 17:50 UTC (permalink / raw) To: John Garry Cc: Shinichiro Kawasaki, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On Fri, Nov 07, 2025 at 12:48:38PM +0000, John Garry wrote: > On 07/11/2025 05:53, Shinichiro Kawasaki wrote: > > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote: > > > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote: > > > > On Nov 06, 2025 / 08:53, John Garry wrote: > > ... > > > > > Having a hang - even for the conditions set - should not produce a hang. I > > > > > can check on whether we can improve the software-based atomic writes in xfs > > > > > to avoid this. > > > > > > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging > > > > test node and share. > > > > > > Yes, anything you can share would be helpful. > > > > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and > > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by > > Darrick. I also attached the kernel config (_config.gz) which I used to build > > the test target kernel. > > > > > FWIW the test runs in 51 > > > seconds here, but I only have 4 CPUs in the VM and fast storage so its > > > filesize is "only" 800MB. > > > > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the > > test case a few times to recreate it with the 8GiB TCMU devices. When it does > > not hang, the test case takes about an hour to complete. > > Hi Shinichiro, > > Can you still stop the test with ctrl^C, right? > > @Darrick, I worry that there is too much ip lock contention in > xfs_atomic_write_cow_iomap_begin(), especially since we may drop and > re-acquire the lock (in xfs_trans_alloc_inode()). Maybe we should force > serialization in xfs_file_dio_write_atomic(). After all, this was not > intended to provide good performance. Or look at other ways to optimise this > (if we do want good performance). I don't see how that helps. All that does is shift the lock contention from xfs_inode::i_lock to inode::i_rwsem. At the end of the day, this test is starting up 2*nr_cpus threads to issue large atomic directio writes that take a long time to complete. Stall warnings when there are a large number of threads all trying to directio write to a file whose blocks require a metadata update upon IO completion are a long known problem. I altered my test VM to have 24 cores and enough RAM to avoid OOMing the machine. Setting up the mixed mappings file took 27 seconds, and the aio writes themselves took 3:15. Validating the contents took 4 seconds. Maaaybe we should back off on the file size. I don't see why it needs to create a 5GB file for testing. The verify runs at 2100MB/s whereas the atomic writes plod along at 25MB/s. That's why this test takes a loooong time to run. (I don't see the lfsr complaints, but I'm running fio 3.41 from git) --D > Thanks, > John > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-07 17:50 ` Darrick J. Wong @ 2025-11-07 23:18 ` Darrick J. Wong 0 siblings, 0 replies; 23+ messages in thread From: Darrick J. Wong @ 2025-11-07 23:18 UTC (permalink / raw) To: John Garry Cc: Shinichiro Kawasaki, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On Fri, Nov 07, 2025 at 09:50:04AM -0800, Darrick J. Wong wrote: > On Fri, Nov 07, 2025 at 12:48:38PM +0000, John Garry wrote: > > On 07/11/2025 05:53, Shinichiro Kawasaki wrote: > > > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote: > > > > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote: > > > > > On Nov 06, 2025 / 08:53, John Garry wrote: > > > ... > > > > > > Having a hang - even for the conditions set - should not produce a hang. I > > > > > > can check on whether we can improve the software-based atomic writes in xfs > > > > > > to avoid this. > > > > > > > > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging > > > > > test node and share. > > > > > > > > Yes, anything you can share would be helpful. > > > > > > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and > > > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by > > > Darrick. I also attached the kernel config (_config.gz) which I used to build > > > the test target kernel. > > > > > > > FWIW the test runs in 51 > > > > seconds here, but I only have 4 CPUs in the VM and fast storage so its > > > > filesize is "only" 800MB. > > > > > > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the > > > test case a few times to recreate it with the 8GiB TCMU devices. When it does > > > not hang, the test case takes about an hour to complete. > > > > Hi Shinichiro, > > > > Can you still stop the test with ctrl^C, right? > > > > @Darrick, I worry that there is too much ip lock contention in > > xfs_atomic_write_cow_iomap_begin(), especially since we may drop and > > re-acquire the lock (in xfs_trans_alloc_inode()). Maybe we should force > > serialization in xfs_file_dio_write_atomic(). After all, this was not > > intended to provide good performance. Or look at other ways to optimise this > > (if we do want good performance). > > I don't see how that helps. All that does is shift the lock contention > from xfs_inode::i_lock to inode::i_rwsem. At the end of the day, this > test is starting up 2*nr_cpus threads to issue large atomic directio > writes that take a long time to complete. Stall warnings when there are > a large number of threads all trying to directio write to a file whose > blocks require a metadata update upon IO completion are a long known > problem. > > I altered my test VM to have 24 cores and enough RAM to avoid OOMing the > machine. Setting up the mixed mappings file took 27 seconds, and the > aio writes themselves took 3:15. Validating the contents took 4 > seconds. > > Maaaybe we should back off on the file size. I don't see why it needs > to create a 5GB file for testing. The verify runs at 2100MB/s whereas > the atomic writes plod along at 25MB/s. That's why this test takes a > loooong time to run. > > (I don't see the lfsr complaints, but I'm running fio 3.41 from git) Spoke too soon, now I'm seeing it all over the test fleet. --D > --D > > > Thanks, > > John > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-07 12:48 ` John Garry 2025-11-07 17:50 ` Darrick J. Wong @ 2025-11-10 2:41 ` Shinichiro Kawasaki 1 sibling, 0 replies; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-11-10 2:41 UTC (permalink / raw) To: John Garry Cc: Darrick J. Wong, linux-xfs@vger.kernel.org, ojaswin@linux.ibm.com On Nov 07, 2025 / 12:48, John Garry wrote: > On 07/11/2025 05:53, Shinichiro Kawasaki wrote: > > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote: > > > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote: > > > > On Nov 06, 2025 / 08:53, John Garry wrote: > > ... > > > > > Having a hang - even for the conditions set - should not produce a hang. I > > > > > can check on whether we can improve the software-based atomic writes in xfs > > > > > to avoid this. > > > > > > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging > > > > test node and share. > > > > > > Yes, anything you can share would be helpful. > > > > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and > > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by > > Darrick. I also attached the kernel config (_config.gz) which I used to build > > the test target kernel. > > > > > FWIW the test runs in 51 > > > seconds here, but I only have 4 CPUs in the VM and fast storage so its > > > filesize is "only" 800MB. > > > > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the > > test case a few times to recreate it with the 8GiB TCMU devices. When it does > > not hang, the test case takes about an hour to complete. > > Hi Shinichiro, > > Can you still stop the test with ctrl^C, right? No, I can't. Even when I type Ctrl-C after the hang, the fstests check process does not stop. I still can login to the system and create new terminals. To clean up the system, I just do sysrq-b to reboot the system. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-06 8:19 ` Shinichiro Kawasaki 2025-11-06 8:53 ` John Garry @ 2025-11-09 12:02 ` Ojaswin Mujoo 2025-11-10 12:46 ` [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: " Shinichiro Kawasaki 1 sibling, 1 reply; 23+ messages in thread From: Ojaswin Mujoo @ 2025-11-09 12:02 UTC (permalink / raw) To: Shinichiro Kawasaki Cc: John Garry, Darrick J. Wong, linux-xfs@vger.kernel.org On Thu, Nov 06, 2025 at 08:19:12AM +0000, Shinichiro Kawasaki wrote: > On Nov 05, 2025 / 21:37, Shin'ichiro Kawasaki wrote: > > On Nov 05, 2025 / 10:39, John Garry wrote: > > > On 05/11/2025 08:52, John Garry wrote: > > > > > I don't think the disk supports atomic writes. It is just a regular > > > > > TCMU device, > > > > > and its atomic write related sysfs attributes have value 0: > > > > > > > > > > $ grep -rne . /sys/block/sdh/queue/ | grep atomic > > > > > /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0 > > > > > /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0 > > > > > /sys/block/sdh/queue/atomic_write_max_bytes:1:0 > > > > > /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0 > > > > > > > > > > FYI, I attach the all sysfs queue attribute values of the device [2]. > > > > > > > > Yes, this would only be using software-based atomic writes. > > > > > > > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You > > > > can check group "atomicwrites" to know which tests they are. > > > > > > > > 774 is the fio test. > > I tried the other "atomicwrites" test. I found g778 took very long time. > I think it implies that g778 may have similar problem as g774. > > g765: [not run] write atomic not supported by this block device > g767: 11s > g768: 13s > g769: 13s > g770: 35s > g773: [not run] write atomic not supported by this block device > g774: did not completed after 3 hours run (and kernel reported the INFO messages) > g775: 48s > g776: [not run] write atomic not supported by this block device > g778: did not completed after 50 minutes run Hi Shinichiro Hmm that's strange, g/778 should tune itself to the speed of the device ideally. Will you be able to share the results/generic/778.full file. That might give some hints. > x838: [not run] External volumes not in use, skipped this test > x839: [not run] XFS error injection requires CONFIG_XFS_DEBUG > x840: [not run] write atomic not supported by this block device > ^ permalink raw reply [flat|nested] 23+ messages in thread
* [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: [bug report] fstests generic/774 hang 2025-11-09 12:02 ` Ojaswin Mujoo @ 2025-11-10 12:46 ` Shinichiro Kawasaki 2025-11-10 21:12 ` Darrick J. Wong 0 siblings, 1 reply; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-11-10 12:46 UTC (permalink / raw) To: Ojaswin Mujoo; +Cc: John Garry, Darrick J. Wong, linux-xfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1225 bytes --] On Nov 09, 2025 / 17:32, Ojaswin Mujoo wrote: > On Thu, Nov 06, 2025 at 08:19:12AM +0000, Shinichiro Kawasaki wrote: [...] > > I tried the other "atomicwrites" test. I found g778 took very long time. > > I think it implies that g778 may have similar problem as g774. > > > > g765: [not run] write atomic not supported by this block device > > g767: 11s > > g768: 13s > > g769: 13s > > g770: 35s > > g773: [not run] write atomic not supported by this block device > > g774: did not completed after 3 hours run (and kernel reported the INFO messages) > > g775: 48s > > g776: [not run] write atomic not supported by this block device > > g778: did not completed after 50 minutes run > > Hi Shinichiro > > Hmm that's strange, g/778 should tune itself to the speed of the device > ideally. Will you be able to share the results/generic/778.full file. > That might give some hints. Please find the attached 778.full.gz, which I copied about 50 minutes after the test case start. The test case was still running at that time. Near the end of the full file, I find "Iteration 13". It looks like the test case is not hanging, but just taking long time to complete the 20 iterations. [-- Attachment #2: 778.full.gz --] [-- Type: application/gzip, Size: 1985 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: [bug report] fstests generic/774 hang 2025-11-10 12:46 ` [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: " Shinichiro Kawasaki @ 2025-11-10 21:12 ` Darrick J. Wong 2025-11-11 11:43 ` Shinichiro Kawasaki 0 siblings, 1 reply; 23+ messages in thread From: Darrick J. Wong @ 2025-11-10 21:12 UTC (permalink / raw) To: Shinichiro Kawasaki; +Cc: Ojaswin Mujoo, John Garry, linux-xfs@vger.kernel.org On Mon, Nov 10, 2025 at 12:46:19PM +0000, Shinichiro Kawasaki wrote: > On Nov 09, 2025 / 17:32, Ojaswin Mujoo wrote: > > On Thu, Nov 06, 2025 at 08:19:12AM +0000, Shinichiro Kawasaki wrote: > [...] > > > I tried the other "atomicwrites" test. I found g778 took very long time. > > > I think it implies that g778 may have similar problem as g774. > > > > > > g765: [not run] write atomic not supported by this block device > > > g767: 11s > > > g768: 13s > > > g769: 13s > > > g770: 35s > > > g773: [not run] write atomic not supported by this block device > > > g774: did not completed after 3 hours run (and kernel reported the INFO messages) > > > g775: 48s > > > g776: [not run] write atomic not supported by this block device > > > g778: did not completed after 50 minutes run > > > > Hi Shinichiro > > > > Hmm that's strange, g/778 should tune itself to the speed of the device > > ideally. Will you be able to share the results/generic/778.full file. > > That might give some hints. > > Please find the attached 778.full.gz, which I copied about 50 minutes after > the test case start. The test case was still running at that time. Near the end > of the full file, I find "Iteration 13". It looks like the test case is not > hanging, but just taking long time to complete the 20 iterations. <nod> 778 invokes xfs_io and fallocate a few tens of thousands of times, which makes the test runtime really slow if fork/exec() aren't fast. I try to fix that here: https://lore.kernel.org/fstests/176279908967.605950.2192923313361120314.stgit@frogsfrogsfrogs/T/#t As well as reducing the test file size for 774, per everyone's comments. --D ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: [bug report] fstests generic/774 hang 2025-11-10 21:12 ` Darrick J. Wong @ 2025-11-11 11:43 ` Shinichiro Kawasaki 0 siblings, 0 replies; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-11-11 11:43 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Ojaswin Mujoo, John Garry, linux-xfs@vger.kernel.org On Nov 10, 2025 / 13:12, Darrick J. Wong wrote: > On Mon, Nov 10, 2025 at 12:46:19PM +0000, Shinichiro Kawasaki wrote: > > On Nov 09, 2025 / 17:32, Ojaswin Mujoo wrote: > > > On Thu, Nov 06, 2025 at 08:19:12AM +0000, Shinichiro Kawasaki wrote: > > [...] > > > > I tried the other "atomicwrites" test. I found g778 took very long time. > > > > I think it implies that g778 may have similar problem as g774. > > > > > > > > g765: [not run] write atomic not supported by this block device > > > > g767: 11s > > > > g768: 13s > > > > g769: 13s > > > > g770: 35s > > > > g773: [not run] write atomic not supported by this block device > > > > g774: did not completed after 3 hours run (and kernel reported the INFO messages) > > > > g775: 48s > > > > g776: [not run] write atomic not supported by this block device > > > > g778: did not completed after 50 minutes run > > > > > > Hi Shinichiro > > > > > > Hmm that's strange, g/778 should tune itself to the speed of the device > > > ideally. Will you be able to share the results/generic/778.full file. > > > That might give some hints. > > > > Please find the attached 778.full.gz, which I copied about 50 minutes after > > the test case start. The test case was still running at that time. Near the end > > of the full file, I find "Iteration 13". It looks like the test case is not > > hanging, but just taking long time to complete the 20 iterations. > > <nod> 778 invokes xfs_io and fallocate a few tens of thousands of times, > which makes the test runtime really slow if fork/exec() aren't fast. I > try to fix that here: > > https://lore.kernel.org/fstests/176279908967.605950.2192923313361120314.stgit@frogsfrogsfrogs/T/#t > > As well as reducing the test file size for 774, per everyone's comments. Thanks! With the series, g774 completes within four minutes, and g778 completes within a minute in my environment. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-05 10:39 ` John Garry 2025-11-05 11:29 ` John Garry 2025-11-05 12:37 ` Shinichiro Kawasaki @ 2025-11-09 11:58 ` Ojaswin Mujoo 2025-11-10 8:58 ` John Garry 2025-11-10 12:39 ` Shinichiro Kawasaki 2 siblings, 2 replies; 23+ messages in thread From: Ojaswin Mujoo @ 2025-11-09 11:58 UTC (permalink / raw) To: John Garry Cc: Shinichiro Kawasaki, Darrick J. Wong, linux-xfs@vger.kernel.org On Wed, Nov 05, 2025 at 10:39:43AM +0000, John Garry wrote: > On 05/11/2025 08:52, John Garry wrote: > > > I don't think the disk supports atomic writes. It is just a regular > > > TCMU device, > > > and its atomic write related sysfs attributes have value 0: > > > > > > $ grep -rne . /sys/block/sdh/queue/ | grep atomic > > > /sys/block/sdh/queue/atomic_write_unit_max_bytes:1:0 > > > /sys/block/sdh/queue/atomic_write_boundary_bytes:1:0 > > > /sys/block/sdh/queue/atomic_write_max_bytes:1:0 > > > /sys/block/sdh/queue/atomic_write_unit_min_bytes:1:0 > > > > > > FYI, I attach the all sysfs queue attribute values of the device [2]. > > > > Yes, this would only be using software-based atomic writes. > > > > Shinichiro, do the other atomic writes tests run ok, like 775, 767? You > > can check group "atomicwrites" to know which tests they are. > > > > 774 is the fio test. > > > > Some things to try: > > - use a physical disk for the TEST_DEV > > - Don't set LOAD_FACTOR (if you were setting it). If not, bodge 774 to > > reduce $threads to a low value, say, 2 > > - trying turning on XFS_DEBUG config > > > > BTW, Darrick has posted some xfs atomics fixes @ https://urldefense.com/ > > v3/__https://lore.kernel.org/linux- > > xfs/20251105001200.GV196370@frogsfrogsfrogs/T/*t__;Iw!!ACWV5N9M2RV99hQ! IuEPY6yJ1ZEQu7dpfjUplkPJucOHMQ9cpPvIC4fiJhTi_X_7ImN0t6wGqxg9_GM6gWe4B1OBiBjEI8Gz_At0595tIQ$ > > . I doubt that they will help this, but worth trying. > > > > I will try to recreate. > > I tested this and the filesize which we try to write is huge, like 3.3G in > my case. That seems excessive. > > The calc comes from the following in 774: > > filesize=$((aw_bsize * threads * 100)) > > aw_bsize for me is 1M, and threads is 32 > > aw_bsize is large as XFS supports software-based atomics, which is generally > going to be huge compared to anything which HW can support. > > When I tried to run this test, it was not completing in a sane amount of > time - it was taking many minutes before I gave up. Hi John, Shinichiro, Darrick. Thanks for looking into this. Sorry, I'm on vacation so a bit slow in responding. Anyways, the logic behind the filesize calculation is that we want each thread to do 100 atomic writes in their own isolated ranges in the file. But seems like it is being especially slow when we have high CPUs. I think in that sense, it'll be better to limit the threads itself rather than filesize. Since its a stress test we dont want it to be too less. Maybe: diff --git a/tests/generic/774 b/tests/generic/774 index 7a4d7016..c68fb4b7 100755 --- a/tests/generic/774 +++ b/tests/generic/774 @@ -28,7 +28,7 @@ awu_max_write=$(_get_atomic_write_unit_max "$SCRATCH_MNT/f1") aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))") fsbsize=$(_get_block_size $SCRATCH_MNT) -threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100") +threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "16") filesize=$((aw_bsize * threads * 100)) depth=$threads aw_io_size=$((filesize / threads)) Can you check if this helps? Regards, ojaswin > > @shinichiro, please try this: > > --- a/tests/generic/774 > +++ b/tests/generic/774 > @@ -29,7 +29,7 @@ aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))") > fsbsize=$(_get_block_size $SCRATCH_MNT) > > threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100") > -filesize=$((aw_bsize * threads * 100)) > +filesize=$((aw_bsize * threads)) > depth=$threads > aw_io_size=$((filesize / threads)) > aw_io_inc=$aw_io_size > ubuntu@jgarry-instance-20240626-1657-xfs-ubuntu:~/xfstests-dev$ > > > Note, I ran with this change and the test now completes, but I get this: > > +fio: failed initializing LFSR > +fio: failed initializing LFSR > +fio: failed initializing LFSR > +fio: failed initializing LFSR > +verify: bad magic header 0, wanted acca at file > /home/ubuntu/mnt/scratch/test-file offset 0, length 1048576 (requested > block: offset=0, length=1048576) > +verify: bad magic header e3d6, wanted acca at file > /home/ubuntu/mnt/scratch/test-file offset 8388608, length 1048576 (requested > block: offset=8388608, length=1048576) > > I need to check that fio complaint. > > Thanks, > John ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-09 11:58 ` Ojaswin Mujoo @ 2025-11-10 8:58 ` John Garry 2025-11-10 12:39 ` Shinichiro Kawasaki 1 sibling, 0 replies; 23+ messages in thread From: John Garry @ 2025-11-10 8:58 UTC (permalink / raw) To: Ojaswin Mujoo Cc: Shinichiro Kawasaki, Darrick J. Wong, linux-xfs@vger.kernel.org On 09/11/2025 11:58, Ojaswin Mujoo wrote: >> aw_bsize for me is 1M, and threads is 32 >> >> aw_bsize is large as XFS supports software-based atomics, which is generally >> going to be huge compared to anything which HW can support. >> >> When I tried to run this test, it was not completing in a sane amount of >> time - it was taking many minutes before I gave up. > Hi John, Shinichiro, Darrick. > > Thanks for looking into this. Sorry, I'm on vacation so a bit slow in > responding. > > Anyways, the logic behind the filesize calculation is that we want each > thread to do 100 atomic writes in their own isolated ranges in the file. > But seems like it is being especially slow when we have high CPUs. It's not just the number of CPUs which is the problem. The test does the awu max size writes - for XFS, this size can be many MBs, and not like typically < 100 KBs for any FS which relies only on HW-based atomic writes, i.e. ext4. Please also consider limiting the awu max size. > > I think in that sense, it'll be better to limit the threads itself > rather than filesize. Since its a stress test we dont want it to be too > less. Maybe: > > diff --git a/tests/generic/774 b/tests/generic/774 > index 7a4d7016..c68fb4b7 100755 > --- a/tests/generic/774 > +++ b/tests/generic/774 > @@ -28,7 +28,7 @@ awu_max_write=$(_get_atomic_write_unit_max "$SCRATCH_MNT/f1") > aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))") > fsbsize=$(_get_block_size $SCRATCH_MNT) > > -threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100") > +threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "16") > filesize=$((aw_bsize * threads * 100)) > depth=$threads > aw_io_size=$((filesize / threads)) > > Can you check if this helps? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [bug report] fstests generic/774 hang 2025-11-09 11:58 ` Ojaswin Mujoo 2025-11-10 8:58 ` John Garry @ 2025-11-10 12:39 ` Shinichiro Kawasaki 1 sibling, 0 replies; 23+ messages in thread From: Shinichiro Kawasaki @ 2025-11-10 12:39 UTC (permalink / raw) To: Ojaswin Mujoo; +Cc: John Garry, Darrick J. Wong, linux-xfs@vger.kernel.org On Nov 09, 2025 / 17:28, Ojaswin Mujoo wrote: [...] > Anyways, the logic behind the filesize calculation is that we want each > thread to do 100 atomic writes in their own isolated ranges in the file. > But seems like it is being especially slow when we have high CPUs. > > I think in that sense, it'll be better to limit the threads itself > rather than filesize. Since its a stress test we dont want it to be too > less. Maybe: > > diff --git a/tests/generic/774 b/tests/generic/774 > index 7a4d7016..c68fb4b7 100755 > --- a/tests/generic/774 > +++ b/tests/generic/774 > @@ -28,7 +28,7 @@ awu_max_write=$(_get_atomic_write_unit_max "$SCRATCH_MNT/f1") > aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))") > fsbsize=$(_get_block_size $SCRATCH_MNT) > > -threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100") > +threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "16") > filesize=$((aw_bsize * threads * 100)) > depth=$threads > aw_io_size=$((filesize / threads)) > > Can you check if this helps? As John pointed out, the 1MB atomic block size sounds too large and it might need care also. Anyway, I applied the change above, and observed the test case runtime was shortened from ~50m to ~8m. So, this change shows some improvement for the unexpected long runtime. When I repeated the test case g774 20 times, the hang was not observed. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-11-11 11:43 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-30 8:45 [bug report] fstests generic/774 hang Shinichiro Kawasaki 2025-11-05 0:33 ` Darrick J. Wong 2025-11-05 2:19 ` Shinichiro Kawasaki 2025-11-05 8:52 ` John Garry 2025-11-05 10:39 ` John Garry 2025-11-05 11:29 ` John Garry 2025-11-05 12:37 ` Shinichiro Kawasaki 2025-11-06 8:19 ` Shinichiro Kawasaki 2025-11-06 8:53 ` John Garry 2025-11-07 2:27 ` Shinichiro Kawasaki 2025-11-07 4:28 ` Darrick J. Wong 2025-11-07 5:53 ` Shinichiro Kawasaki 2025-11-07 12:48 ` John Garry 2025-11-07 17:50 ` Darrick J. Wong 2025-11-07 23:18 ` Darrick J. Wong 2025-11-10 2:41 ` Shinichiro Kawasaki 2025-11-09 12:02 ` Ojaswin Mujoo 2025-11-10 12:46 ` [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: " Shinichiro Kawasaki 2025-11-10 21:12 ` Darrick J. Wong 2025-11-11 11:43 ` Shinichiro Kawasaki 2025-11-09 11:58 ` Ojaswin Mujoo 2025-11-10 8:58 ` John Garry 2025-11-10 12:39 ` Shinichiro Kawasaki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox