From: bugzilla-daemon@bugzilla.kernel.org
To: linux-xfs@vger.kernel.org
Subject: [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
Date: Wed, 20 Oct 2021 07:36:07 +0000 [thread overview]
Message-ID: <bug-214767-201763@https.bugzilla.kernel.org/> (raw)
https://bugzilla.kernel.org/show_bug.cgi?id=214767
Bug ID: 214767
Summary: xfs seems to hang due to race condition? maybe related
to (gratuitous) thaw.
Product: File System
Version: 2.5
Kernel Version: 5.10.70
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: high
Priority: P1
Component: XFS
Assignee: filesystem_xfs@kernel-bugs.kernel.org
Reporter: ct@flyingcircus.io
Regression: No
We have been getting somewhat frequently (around 1 per day for about 500 VMs)
blocked tasks with tracebacks always similar to this:
First we get:
[656898.010322] INFO: task kworker/u2:1:458736 blocked for more than 122
seconds.
[656898.011988] Not tainted 5.10.70 #1-NixOS
[656898.012981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[656898.014679] task:kworker/u2:1 state:D stack: 0 pid:458736 ppid: 2
flags:0x00004000
[656898.016530] Workqueue: writeback wb_workfn (flush-253:0)
[656898.017698] Call Trace:
[656898.018310] __schedule+0x271/0x860
[656898.019208] schedule+0x46/0xb0
[656898.019944] rwsem_down_read_slowpath+0x169/0x490
[656898.021008] ? mempool_alloc+0x62/0x170
[656898.022032] xfs_map_blocks+0xb9/0x400 [xfs]
[656898.022998] iomap_do_writepage+0x163/0x850
[656898.023587] ? __mod_memcg_lruvec_state+0x21/0xe0
[656898.024044] write_cache_pages+0x186/0x3d0
[656898.024463] ? iomap_migrate_page+0xc0/0xc0
[656898.024882] ? submit_bio_noacct+0x3a9/0x420
[656898.025303] iomap_writepages+0x1c/0x40
[656898.025712] xfs_vm_writepages+0x64/0x90 [xfs]
[656898.026146] do_writepages+0x34/0xc0
[656898.026513] __writeback_single_inode+0x39/0x2a0
[656898.026969] writeback_sb_inodes+0x200/0x470
[656898.027388] __writeback_inodes_wb+0x4c/0xe0
[656898.027805] wb_writeback+0x1d8/0x290
[656898.028161] wb_workfn+0x29b/0x4d0
[656898.028526] ? __switch_to_asm+0x42/0x70
[656898.028915] ? __switch_to+0x7b/0x3e0
[656898.029285] process_one_work+0x1df/0x370
[656898.029696] worker_thread+0x50/0x400
[656898.030054] ? process_one_work+0x370/0x370
[656898.030475] kthread+0x11b/0x140
[656898.030798] ? __kthread_bind_mask+0x60/0x60
[656898.031214] ret_from_fork+0x22/0x30
[656898.031592] INFO: task nix-daemon:459204 blocked for more than 122 seconds.
[656898.032259] Not tainted 5.10.70 #1-NixOS
[656898.032695] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[656898.033451] task:nix-daemon state:D stack: 0 pid:459204 ppid: 1067
flags:0x00000000
[656898.034261] Call Trace:
[656898.034538] __schedule+0x271/0x860
[656898.034881] schedule+0x46/0xb0
[656898.035226] xfs_log_commit_cil+0x6a4/0x800 [xfs]
[656898.035709] ? wake_up_q+0xa0/0xa0
[656898.036073] __xfs_trans_commit+0x9d/0x310 [xfs]
[656898.036561] xfs_setattr_nonsize+0x342/0x520 [xfs]
[656898.037027] notify_change+0x348/0x4c0
[656898.037407] ? chmod_common+0xa1/0x150
[656898.037779] chmod_common+0xa1/0x150
[656898.038139] do_fchmodat+0x5a/0xb0
[656898.038491] __x64_sys_chmod+0x17/0x20
[656898.038860] do_syscall_64+0x33/0x40
[656898.039221] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[656898.039722] RIP: 0033:0x7f3d9b3a3b77
[656898.040072] RSP: 002b:00007ffc388e2b18 EFLAGS: 00000206 ORIG_RAX:
000000000000005a
[656898.040797] RAX: ffffffffffffffda RBX: 00007ffc388e2e10 RCX:
00007f3d9b3a3b77
[656898.041542] RDX: 0000000000008049 RSI: 0000000000008124 RDI:
00000000016a2400
[656898.042222] RBP: 00007ffc388e32a0 R08: 00007ffc388e4628 R09:
0000000000000000
[656898.042913] R10: 0000000000000000 R11: 0000000000000206 R12:
00007ffc388e4620
[656898.043604] R13: 0000000000000000 R14: 00007ffc388e2df0 R15:
00007ffc388e2e10
and then a while later we see this:
[657020.886633] INFO: task qemu-ga:750 blocked for more than 122 seconds.
[657020.888346] Not tainted 5.10.70 #1-NixOS
[657020.889482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[657020.891072] task:qemu-ga state:D stack: 0 pid: 750 ppid: 1
flags:0x00000080
[657020.891906] Call Trace:
[657020.892188] __schedule+0x271/0x860
[657020.892547] ? kvm_sched_clock_read+0xd/0x20
[657020.892983] schedule+0x46/0xb0
[657020.893319] rwsem_down_write_slowpath+0x218/0x480
[657020.893802] thaw_super+0x12/0x20
[657020.894130] __x64_sys_ioctl+0x62/0xb0
[657020.894491] do_syscall_64+0x33/0x40
[657020.894860] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[657020.895396] RIP: 0033:0x7f69a55f2b17
[657020.895762] RSP: 002b:00007ffe5687b548 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[657020.896470] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007f69a55f2b17
[657020.897144] RDX: 0000000000000000 RSI: 00000000c0045878 RDI:
0000000000000006
[657020.897815] RBP: 0000000000000000 R08: 00007f69a5477c08 R09:
0000000000000000
[657020.898479] R10: 0000000000000000 R11: 0000000000000246 R12:
000055ef74a15190
[657020.899155] R13: 00007ffe5687b5c0 R14: 00007ffe5687b560 R15:
0000000000000006
[657020.899850] INFO: task kworker/u2:1:458736 blocked for more than 245
seconds.
[657020.900513] Not tainted 5.10.70 #1-NixOS
[657020.900942] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[657020.901672] task:kworker/u2:1 state:D stack: 0 pid:458736 ppid: 2
flags:0x00004000
[657020.902469] Workqueue: writeback wb_workfn (flush-253:0)
[657020.902976] Call Trace:
[657020.903220] __schedule+0x271/0x860
[657020.903556] schedule+0x46/0xb0
[657020.903870] rwsem_down_read_slowpath+0x169/0x490
[657020.904321] ? mempool_alloc+0x62/0x170
[657020.904996] xfs_map_blocks+0xb9/0x400 [xfs]
[657020.905412] iomap_do_writepage+0x163/0x850
[657020.905823] ? __mod_memcg_lruvec_state+0x21/0xe0
[657020.906269] write_cache_pages+0x186/0x3d0
[657020.906664] ? iomap_migrate_page+0xc0/0xc0
[657020.907079] ? submit_bio_noacct+0x3a9/0x420
[657020.907487] iomap_writepages+0x1c/0x40
[657020.907942] xfs_vm_writepages+0x64/0x90 [xfs]
[657020.908376] do_writepages+0x34/0xc0
[657020.908760] __writeback_single_inode+0x39/0x2a0
[657020.909222] writeback_sb_inodes+0x200/0x470
[657020.909635] __writeback_inodes_wb+0x4c/0xe0
[657020.910058] wb_writeback+0x1d8/0x290
[657020.910410] wb_workfn+0x29b/0x4d0
[657020.910756] ? __switch_to_asm+0x42/0x70
[657020.911135] ? __switch_to+0x7b/0x3e0
[657020.911489] process_one_work+0x1df/0x370
[657020.911912] worker_thread+0x50/0x400
[657020.912264] ? process_one_work+0x370/0x370
[657020.912667] kthread+0x11b/0x140
[657020.913008] ? __kthread_bind_mask+0x60/0x60
[657020.913413] ret_from_fork+0x22/0x30
[657020.913791] INFO: task nix-daemon:459204 blocked for more than 245 seconds.
[657020.914440] Not tainted 5.10.70 #1-NixOS
[657020.914879] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[657020.915612] task:nix-daemon state:D stack: 0 pid:459204 ppid: 1067
flags:0x00000000
[657020.916399] Call Trace:
[657020.916654] __schedule+0x271/0x860
[657020.917006] schedule+0x46/0xb0
[657020.917343] xfs_log_commit_cil+0x6a4/0x800 [xfs]
[657020.917806] ? wake_up_q+0xa0/0xa0
[657020.918162] __xfs_trans_commit+0x9d/0x310 [xfs]
[657020.918633] xfs_setattr_nonsize+0x342/0x520 [xfs]
[657020.919101] notify_change+0x348/0x4c0
[657020.919461] ? chmod_common+0xa1/0x150
[657020.919831] chmod_common+0xa1/0x150
[657020.920180] do_fchmodat+0x5a/0xb0
[657020.920509] __x64_sys_chmod+0x17/0x20
[657020.920878] do_syscall_64+0x33/0x40
[657020.921222] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[657020.921703] RIP: 0033:0x7f3d9b3a3b77
[657020.922055] RSP: 002b:00007ffc388e2b18 EFLAGS: 00000206 ORIG_RAX:
000000000000005a
[657020.922772] RAX: ffffffffffffffda RBX: 00007ffc388e2e10 RCX:
00007f3d9b3a3b77
[657020.923435] RDX: 0000000000008049 RSI: 0000000000008124 RDI:
00000000016a2400
[657020.924115] RBP: 00007ffc388e32a0 R08: 00007ffc388e4628 R09:
0000000000000000
[657020.924787] R10: 0000000000000000 R11: 0000000000000206 R12:
00007ffc388e4620
[657020.925450] R13: 0000000000000000 R14: 00007ffc388e2df0 R15:
00007ffc388e2e10
IO is subsequently stalled on ALL disks (we have vda/vdb/vdc with two of them
running XFS and one running swap).
I am pretty sure that this is not primarily an issue in the hypervisor (Qemu
4.1) or storage (Ceph Jewel) because I can warm reset the virtual machine using
"system-reset" and the machine will boot properly.
This could be related to https://bugzilla.kernel.org/show_bug.cgi?id=207053 but
it doesn't look quite right.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next reply other threads:[~2021-10-20 7:36 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-20 7:36 bugzilla-daemon [this message]
2021-10-20 7:39 ` [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
2021-10-20 16:16 ` bugzilla-daemon
2021-10-20 16:31 ` bugzilla-daemon
2021-10-20 18:29 ` bugzilla-daemon
2021-10-20 18:37 ` bugzilla-daemon
2021-10-20 19:22 ` bugzilla-daemon
2021-10-20 19:26 ` bugzilla-daemon
2021-10-20 22:16 ` [Bug 214767] New: " Dave Chinner
2021-10-20 22:38 ` [Bug 214767] " bugzilla-daemon
2021-10-21 4:21 ` bugzilla-daemon
2021-10-21 13:17 ` bugzilla-daemon
2021-10-21 13:19 ` bugzilla-daemon
2021-10-21 13:20 ` bugzilla-daemon
2021-10-29 2:21 ` bugzilla-daemon
2021-10-29 2:22 ` bugzilla-daemon
2021-11-09 5:19 ` bugzilla-daemon
2021-11-10 15:16 ` bugzilla-daemon
2021-11-20 22:31 ` Dave Chinner
2021-11-20 22:31 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-214767-201763@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@bugzilla.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).