* [Bug 200981] hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy
2018-08-30 14:32 [Bug 200981] New: hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy bugzilla-daemon
@ 2018-08-30 14:34 ` bugzilla-daemon
2018-08-31 1:26 ` bugzilla-daemon
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2018-08-30 14:34 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=200981
--- Comment #1 from Alexander Y. Fomichev (git.user@gmail.com) ---
Created attachment 278205
--> https://bugzilla.kernel.org/attachment.cgi?id=278205&action=edit
config
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 200981] hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy
2018-08-30 14:32 [Bug 200981] New: hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy bugzilla-daemon
2018-08-30 14:34 ` [Bug 200981] " bugzilla-daemon
@ 2018-08-31 1:26 ` bugzilla-daemon
2018-08-31 7:39 ` bugzilla-daemon
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2018-08-31 1:26 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=200981
--- Comment #2 from Dave Chinner (david@fromorbit.com) ---
On Thu, Aug 30, 2018 at 02:32:35PM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=200981
>
> kernel: vanilla 4.18.5
> gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
> Copyright (C) 2017 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> More or less reproducible for me using next sequence:
>
> - on host:
> create LV of appropriate size (20g in my case)
> mkfs.xfs -m reflink=1 /dev/data/LV
> mount /dev/data/LV /mnt/
> run kvm VM with qcow2 image (/mnt/disk)
>
> - inside vm:
> sysbench --test=fileio --file-total-size=9G prepare
>
> - on host:
> cp --reflink=always disk disk.b
>
> - inside vm:
> sysbench --test=fileio --file-total-size=9G --file-test-mode=seqwr
> --max-time=6000 --max-requests=0 --threads=16 run
>
> Some time after i/o on /dev/data/LV fall to zero and fs become completely
> unavailable and then I see a bunch of records:
The first error is this:
[ 2212.046108] ================================================
[ 2212.051809] WARNING: lock held when returning to user space!
[ 2212.057511] 4.18.5 #1 Not tainted
[ 2212.060864] ------------------------------------------------
[ 2212.066564] worker/6123 is leaving the kernel with locks still held!
[ 2212.072961] 1 lock held by worker/6123:
[ 2212.076835] #0: 000000009eab4f1b (sb_internal#2){.+.+}, at:
xfs_trans_alloc+0x17c/0x220
Which happens 5 minutes before the hung processes start being
reported. Looks like something has gone wrong and an error path has
leaked a transaction.
Can you see if commit dcbd44f79986 ("xfs: fix transaction leak on
remote attr set/remove failure") addresses the problem you are
seeing?
-Dave.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 200981] hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy
2018-08-30 14:32 [Bug 200981] New: hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy bugzilla-daemon
2018-08-30 14:34 ` [Bug 200981] " bugzilla-daemon
2018-08-31 1:26 ` bugzilla-daemon
@ 2018-08-31 7:39 ` bugzilla-daemon
2018-09-01 5:01 ` bugzilla-daemon
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2018-08-31 7:39 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=200981
--- Comment #3 from Alexander Y. Fomichev (git.user@gmail.com) ---
> Can you see if commit dcbd44f79986 ("xfs: fix transaction leak on
remote attr set/remove failure") addresses the problem you are
seeing?
I've just tried but see no difference.
Warning still here and vm become blocked some time after.
( It worth noting I've CONFIG_LOCKDEP turned on )
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 200981] hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy
2018-08-30 14:32 [Bug 200981] New: hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy bugzilla-daemon
` (2 preceding siblings ...)
2018-08-31 7:39 ` bugzilla-daemon
@ 2018-09-01 5:01 ` bugzilla-daemon
2018-09-06 5:36 ` bugzilla-daemon
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2018-09-01 5:01 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=200981
Dave Chinner (david@fromorbit.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |david@fromorbit.com
--- Comment #4 from Dave Chinner (david@fromorbit.com) ---
Ok, so the obvious fix doesn't help.
Any idea why lockdep is not giving you the full stack trace of where the lock
that was leaked to userspace was first accessed? Without that, we have no idea
which of the 70-odd calls to xfs_trans_alloc() in the XFS code the lockdep
warning refers to. At minimum, we need to know what syscall is triggering the
warning so we know what code to look at.
Also, can you reproduce this on older kernels? If not, could you bisect it?
-Dave.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 200981] hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy
2018-08-30 14:32 [Bug 200981] New: hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy bugzilla-daemon
` (3 preceding siblings ...)
2018-09-01 5:01 ` bugzilla-daemon
@ 2018-09-06 5:36 ` bugzilla-daemon
2018-09-06 5:37 ` bugzilla-daemon
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2018-09-06 5:36 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=200981
--- Comment #5 from Alexander Y. Fomichev (git.user@gmail.com) ---
Created attachment 278345
--> https://bugzilla.kernel.org/attachment.cgi?id=278345&action=edit
dmesg showing all locks held in the system
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 200981] hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy
2018-08-30 14:32 [Bug 200981] New: hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy bugzilla-daemon
` (4 preceding siblings ...)
2018-09-06 5:36 ` bugzilla-daemon
@ 2018-09-06 5:37 ` bugzilla-daemon
2018-09-06 5:39 ` bugzilla-daemon
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2018-09-06 5:37 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=200981
--- Comment #6 from Alexander Y. Fomichev (git.user@gmail.com) ---
Created attachment 278347
--> https://bugzilla.kernel.org/attachment.cgi?id=278347&action=edit
missing sb_end_intwrite: proof of concept
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 200981] hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy
2018-08-30 14:32 [Bug 200981] New: hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy bugzilla-daemon
` (5 preceding siblings ...)
2018-09-06 5:37 ` bugzilla-daemon
@ 2018-09-06 5:39 ` bugzilla-daemon
2018-09-06 14:25 ` bugzilla-daemon
2018-09-06 16:52 ` bugzilla-daemon
8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2018-09-06 5:39 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=200981
--- Comment #7 from Alexander Y. Fomichev (git.user@gmail.com) ---
Ok, it looks like a warning is a result of missing sb_end_intwrite in
xfs_reflink_allocate_cow, namely in a sequence xfs_trans_alloc -> goto retry ->
goto convert. With proof-of-concept patch (attached) warning is gone but
deadlock is not. Though now I have a list of held locks (dmesg attached).
Did you mean taintad kernel will not show locks held?
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 200981] hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy
2018-08-30 14:32 [Bug 200981] New: hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy bugzilla-daemon
` (6 preceding siblings ...)
2018-09-06 5:39 ` bugzilla-daemon
@ 2018-09-06 14:25 ` bugzilla-daemon
2018-09-06 16:52 ` bugzilla-daemon
8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2018-09-06 14:25 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=200981
--- Comment #8 from Dave Chinner (david@fromorbit.com) ---
(In reply to Alexander Y. Fomichev from comment #7)
> Ok, it looks like a warning is a result of missing y in
> xfs_reflink_allocate_cow, namely in a sequence xfs_trans_alloc -> goto retry
> -> goto convert.
Yup, I see the issue. Nice work finding this!
> With proof-of-concept patch (attached) warning is gone but
> deadlock is not.
The code is not missing a sb_end_intwrite() call - it's missing a
xfs_trans_cancel() call. That's why it still deadlocks even though you shut up
the warning. I'll have a closer look in the morning to see if there's any other
paths that can leak the transaction and write a patch to fix it.
Thanks again for tracking it down, Alexander :)
Cheers,
Dave.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug 200981] hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy
2018-08-30 14:32 [Bug 200981] New: hypervisor fs hangs at heavy write activity on VM (kvm, qcow2 image) having a reflink disk copy bugzilla-daemon
` (7 preceding siblings ...)
2018-09-06 14:25 ` bugzilla-daemon
@ 2018-09-06 16:52 ` bugzilla-daemon
8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2018-09-06 16:52 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=200981
--- Comment #9 from Alexander Y. Fomichev (git.user@gmail.com) ---
I confirm. With xfs_trans_cancel() lock is gone and works like a charm. Thanks.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread