* soft lockup in xfs/170 on a file system formatted with -m crc=0
@ 2021-11-05 17:25 Theodore Ts'o
2021-11-06 1:58 ` xfs/076 takes a long long time testing with a realtime volume Theodore Ts'o
2021-11-06 2:10 ` soft lockup in xfs/170 on a file system formatted with -m crc=0 Darrick J. Wong
0 siblings, 2 replies; 6+ messages in thread
From: Theodore Ts'o @ 2021-11-05 17:25 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, leah.rumancik
[-- Attachment #1: Type: text/plain, Size: 5582 bytes --]
Is this a known failure? I can reliably reproduce this soft lockup
running xfs/170 using "gce-xfstests -c xfs/v4 xfs/170" using
v5.15-rc4. The xfs/v4 test config formats the file system using -m
crc=0 with no special mount options.
I've attached the kernel config that I used; it's the standard one
obtained via "gce-xfstests install-kconfig"[1].
[1] https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/util/install-kconfig
Thanks,
- Ted
commit adac31869b098d5f85d0930874dcf6a524d128d3
Author: Theodore Ts'o <tytso@mit.edu>
Date: Fri Nov 5 13:10:35 2021 -0400
test-appliance: add xfs/170 to xfs/v4's exclude file
The xfs/170 test is reliably causing a soft lockup when run on a file
system formatted with mkfs.xfs -m crc=0.
CONFIG: xfs/v4
ZONE: us-east1-b
VM STATUS: timeout on one test (xfs/170)
SINCE LAST UPDATE: 3h6m10s
TEST STATUS: hang
run fstests xfs/170 at 2021-11-05 02:27:05
...
[11024.269799] XFS (dm-1): Unmounting Filesystem
[11024.695394] XFS (dm-1): Mounting V4 Filesystem
[11024.731406] XFS (dm-1): Ending clean mount
[11024.731552] xfs filesystem being mounted at /xt-vdc supports timestamps until 2038 (0x7fffffff)
watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [dd:9671]
irq event stamp: 22146924
hardirqs last enabled at (22146923): [<ffffffff903cebad>] _raw_spin_unlock_irqrestore+0x2d/0x40
hardirqs last disabled at (22146924): [<ffffffff903bd4fb>] sysvec_apic_timer_interrupt+0xb/0x90
softirqs last enabled at (16886422): [<ffffffff906002ce>] __do_softirq+0x2ce/0x3fd
softirqs last disabled at (16886399): [<ffffffff8f6bc1e8>] __irq_exit_rcu+0x88/0xb0
CPU: 1 PID: 9671 Comm: dd Not tainted 5.15.0-rc4-xfstests-00018-g124e7c61deb2 #382
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
Code: c7 18 53 48 89 f3 48 8b 74 24 10 e8 e5 3b 35 ff 48 89 ef e8 dd 66 35 ff 80 e7 02 74 06 e8 f3 77 3e ff fb 65 ff 0d 4b 82 c4 6f <5b> 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 65 ff 05 32
RSP: 0018:ffffa055c2913598 EFLAGS: 00000246
RAX: 000000000151ef6b RBX: 0000000000000282 RCX: 0000000000000040
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff903cebad
RBP: ffff945d3aff4be0 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000080000 R12: 0000000000000000
R13: ffffa055c2913640 R14: ffff945d13892a10 R15: ffff945d13892a58
FS: 00007f62407d8580(0000) GS:ffff945dd9400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6240613000 CR3: 000000014bc04004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
down_trylock+0x25/0x30
xfs_buf_trylock+0x17/0x190
xfs_buf_find+0x1b3/0x4a0
xfs_buf_get_map+0x44/0x3a0
xfs_buf_read_map+0x52/0x2e0
? xfs_read_agf+0xa3/0x180
xfs_trans_read_buf_map+0x144/0x430
? xfs_read_agf+0xa3/0x180
? __lock_acquire+0x3a7/0x6c0
xfs_read_agf+0xa3/0x180
xfs_alloc_read_agf+0x4c/0x110
xfs_alloc_pagf_init+0x27/0x60
xfs_filestream_pick_ag+0x280/0x530
xfs_filestream_new_ag+0x87/0x100
xfs_bmap_btalloc_filestreams.constprop.0+0xe0/0x120
xfs_bmap_btalloc+0x3e6/0x700
xfs_bmapi_allocate+0xe4/0x310
xfs_bmapi_write+0x42a/0x580
xfs_iomap_write_direct+0x18d/0x210
xfs_direct_write_iomap_begin+0x3ab/0x690
? lock_is_held_type+0x98/0x100
iomap_iter+0x12b/0x240
__iomap_dio_rw+0x1ff/0x630
iomap_dio_rw+0xa/0x30
xfs_file_dio_write_aligned+0xae/0x1c0
xfs_file_write_iter+0xd8/0x130
new_sync_write+0x122/0x1b0
? mod_objcg_state+0x180/0x2a0
vfs_write+0x25d/0x370
ksys_write+0x68/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f6240700504
Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
RSP: 002b:00007ffdac48f148 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6240700504
RDX: 0000000000100000 RSI: 00007f6240514000 RDI: 0000000000000001
RBP: 0000000000100000 R08: 00000000ffffffff R09: 0000000000000000
R10: ffffffffffffff3b R11: 0000000000000246 R12: 00007f6240514000
R13: 0000000000000000 R14: 0000000000000000 R15: 00007f6240514000
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
index a9acba9c..83ccfd79 100644
--- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
+++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
@@ -1,2 +1,16 @@
# Normal configurations don't support dax
-g dax
+
+# On a 5.15-rc4 kernel, xfs/170 reliably causes a soft lockup in
+# xfs_alloc_read_agf()+0x41/0x110. Call stack:
+#
+# xfs_alloc_pagf_init+0x27/0x60
+# xfs_filestream_pick_ag+0x280/0x530
+# xfs_filestream_new_ag+0x87/0x100
+# xfs_bmap_btalloc_filestreams.constprop.0+0xe0/0x120
+# xfs_bmap_btalloc+0x3e6/0x700
+# xfs_bmapi_allocate+0xe4/0x310
+# xfs_bmapi_convert_delalloc+0x26c/0x480
+# xfs_map_blocks+0x1b5/0x510
+# ...
+xfs/170
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 18534 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
* xfs/076 takes a long long time testing with a realtime volume
2021-11-05 17:25 soft lockup in xfs/170 on a file system formatted with -m crc=0 Theodore Ts'o
@ 2021-11-06 1:58 ` Theodore Ts'o
2021-11-06 2:08 ` Darrick J. Wong
2021-11-06 2:10 ` soft lockup in xfs/170 on a file system formatted with -m crc=0 Darrick J. Wong
1 sibling, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2021-11-06 1:58 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: fstests, linux-xfs, leah.rumancik
After committing some exclusions into my test runner framework (see
below), I tested a potential fix to xfs/076 which disables the
real-time volume when creating the scratch volume. Should I send it
as a formal patch to fstests?
diff --git a/tests/xfs/076 b/tests/xfs/076
index eac7410e..5628c08f 100755
--- a/tests/xfs/076
+++ b/tests/xfs/076
@@ -60,6 +60,7 @@ _require_xfs_io_command "falloc"
_require_xfs_io_command "fpunch"
_require_xfs_sparse_inodes
+unset SCRATCH_RTDEV
_scratch_mkfs "-d size=50m -m crc=1 -i sparse" |
_filter_mkfs > /dev/null 2> $tmp.mkfs
. $tmp.mkfs # for isize
- Ted
For why this is needed, see the commit description below:
commit c41ae1cc0b21eafd2858541c0bc195f951c0726c
Author: Theodore Ts'o <tytso@mit.edu>
Date: Fri Nov 5 20:46:19 2021 -0400
test-appliance: exclude xfs/076 from the realtime configs
The xfs/076 test takes two minutes on a normal xfs file system (e.g.,
a normal 4k block size file system). However, when there is a
real-time volume attached, this test takes over 80 minutes. The
reason for this seems to be because the test is spending a lot more
time failing to create files due to missing directories. Compare:
root@xfstests-2:~# ls -sh /results/xfs/results-4k/xfs/076.full
48K /results/xfs/results-4k/xfs/076.full
root@xfstests-2:~# ls -sh /tmp/realtime-076.full
25M /tmp/realtime-076.full
and:
root@xfstests-2:~# grep "cannot touch" /results/xfs/results-4k/xfs/076.full | wc -l
656
root@xfstests-2:~# grep "cannot touch" /tmp/realtime-076.full | wc -l
327664
The failures from 076.full look like this:
touch: cannot touch '/xt-vdc/offset.21473722368/25659': No space left on device
touch: cannot touch '/xt-vdc/offset.21473656832/0': No such file or directory
touch: cannot touch '/xt-vdc/offset.21473591296/0': No such file or directory
...
touch: cannot touch '/xt-vdc/offset.196608/0': No such file or directory
touch: cannot touch '/xt-vdc/offset.131072/0': No such file or directory
touch: cannot touch '/xt-vdc/offset.65536/0': No such file or directory
touch: cannot touch '/xt-vdc/offset.0/0': No such file or directory
What seems to be going on is that xfs/076 tries to create a small
scratch file system --- but when we attach a real-time volume this
balloons the available size of the file system. Of course, that space
can't be used for normal files. As a result, xfs/076 is incorrectly
estimating how many files it needs to create to fill the file system.
I'm not sure what's the best way to fix this in the test; perhaps the
test should forcibly unset SCRATCH_RTDEV environment variable before
running _scratch_mkfs? Anyway, for now, we'll just skip running
xfs/076 for the xfs/realtime* configs.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime.exclude
index a9acba9c..bafce552 100644
--- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime.exclude
+++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime.exclude
@@ -1,2 +1,7 @@
# Normal configurations don't support dax
-g dax
+
+# The xfs/076 test takes well over an hour (80 minutes using 100GB GCE
+# PD/SSD) when run with an external realtime device, which triggers
+# the ltm "test is stalled" failsafe which aborts the VM.
+xfs/076
diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_28k_logdev.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_28k_logdev.exclude
index a9acba9c..bafce552 100644
--- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_28k_logdev.exclude
+++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_28k_logdev.exclude
@@ -1,2 +1,7 @@
# Normal configurations don't support dax
-g dax
+
+# The xfs/076 test takes well over an hour (80 minutes using 100GB GCE
+# PD/SSD) when run with an external realtime device, which triggers
+# the ltm "test is stalled" failsafe which aborts the VM.
+xfs/076
diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_logdev.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_logdev.exclude
index a9acba9c..bafce552 100644
--- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_logdev.exclude
+++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_logdev.exclude
@@ -1,2 +1,7 @@
# Normal configurations don't support dax
-g dax
+
+# The xfs/076 test takes well over an hour (80 minutes using 100GB GCE
+# PD/SSD) when run with an external realtime device, which triggers
+# the ltm "test is stalled" failsafe which aborts the VM.
+xfs/076
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: xfs/076 takes a long long time testing with a realtime volume
2021-11-06 1:58 ` xfs/076 takes a long long time testing with a realtime volume Theodore Ts'o
@ 2021-11-06 2:08 ` Darrick J. Wong
2021-11-06 16:43 ` Theodore Ts'o
0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2021-11-06 2:08 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: fstests, linux-xfs, leah.rumancik
On Fri, Nov 05, 2021 at 09:58:14PM -0400, Theodore Ts'o wrote:
> After committing some exclusions into my test runner framework (see
> below), I tested a potential fix to xfs/076 which disables the
> real-time volume when creating the scratch volume. Should I send it
> as a formal patch to fstests?
Does adding:
_xfs_force_bdev data $SCRATCH_MNT
right after _scratch_mount make the performance problem go away? Sparse
inodes and realtime are a supported configuration.
--D
> diff --git a/tests/xfs/076 b/tests/xfs/076
> index eac7410e..5628c08f 100755
> --- a/tests/xfs/076
> +++ b/tests/xfs/076
> @@ -60,6 +60,7 @@ _require_xfs_io_command "falloc"
> _require_xfs_io_command "fpunch"
> _require_xfs_sparse_inodes
>
> +unset SCRATCH_RTDEV
> _scratch_mkfs "-d size=50m -m crc=1 -i sparse" |
> _filter_mkfs > /dev/null 2> $tmp.mkfs
> . $tmp.mkfs # for isize
>
> - Ted
>
> For why this is needed, see the commit description below:
>
> commit c41ae1cc0b21eafd2858541c0bc195f951c0726c
> Author: Theodore Ts'o <tytso@mit.edu>
> Date: Fri Nov 5 20:46:19 2021 -0400
>
> test-appliance: exclude xfs/076 from the realtime configs
>
> The xfs/076 test takes two minutes on a normal xfs file system (e.g.,
> a normal 4k block size file system). However, when there is a
> real-time volume attached, this test takes over 80 minutes. The
> reason for this seems to be because the test is spending a lot more
> time failing to create files due to missing directories. Compare:
>
> root@xfstests-2:~# ls -sh /results/xfs/results-4k/xfs/076.full
> 48K /results/xfs/results-4k/xfs/076.full
> root@xfstests-2:~# ls -sh /tmp/realtime-076.full
> 25M /tmp/realtime-076.full
>
> and:
>
> root@xfstests-2:~# grep "cannot touch" /results/xfs/results-4k/xfs/076.full | wc -l
> 656
> root@xfstests-2:~# grep "cannot touch" /tmp/realtime-076.full | wc -l
> 327664
>
> The failures from 076.full look like this:
>
> touch: cannot touch '/xt-vdc/offset.21473722368/25659': No space left on device
> touch: cannot touch '/xt-vdc/offset.21473656832/0': No such file or directory
> touch: cannot touch '/xt-vdc/offset.21473591296/0': No such file or directory
> ...
> touch: cannot touch '/xt-vdc/offset.196608/0': No such file or directory
> touch: cannot touch '/xt-vdc/offset.131072/0': No such file or directory
> touch: cannot touch '/xt-vdc/offset.65536/0': No such file or directory
> touch: cannot touch '/xt-vdc/offset.0/0': No such file or directory
>
> What seems to be going on is that xfs/076 tries to create a small
> scratch file system --- but when we attach a real-time volume this
> balloons the available size of the file system. Of course, that space
> can't be used for normal files. As a result, xfs/076 is incorrectly
> estimating how many files it needs to create to fill the file system.
>
> I'm not sure what's the best way to fix this in the test; perhaps the
> test should forcibly unset SCRATCH_RTDEV environment variable before
> running _scratch_mkfs? Anyway, for now, we'll just skip running
> xfs/076 for the xfs/realtime* configs.
>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
>
> diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime.exclude
> index a9acba9c..bafce552 100644
> --- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime.exclude
> +++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime.exclude
> @@ -1,2 +1,7 @@
> # Normal configurations don't support dax
> -g dax
> +
> +# The xfs/076 test takes well over an hour (80 minutes using 100GB GCE
> +# PD/SSD) when run with an external realtime device, which triggers
> +# the ltm "test is stalled" failsafe which aborts the VM.
> +xfs/076
> diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_28k_logdev.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_28k_logdev.exclude
> index a9acba9c..bafce552 100644
> --- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_28k_logdev.exclude
> +++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_28k_logdev.exclude
> @@ -1,2 +1,7 @@
> # Normal configurations don't support dax
> -g dax
> +
> +# The xfs/076 test takes well over an hour (80 minutes using 100GB GCE
> +# PD/SSD) when run with an external realtime device, which triggers
> +# the ltm "test is stalled" failsafe which aborts the VM.
> +xfs/076
> diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_logdev.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_logdev.exclude
> index a9acba9c..bafce552 100644
> --- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_logdev.exclude
> +++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/realtime_logdev.exclude
> @@ -1,2 +1,7 @@
> # Normal configurations don't support dax
> -g dax
> +
> +# The xfs/076 test takes well over an hour (80 minutes using 100GB GCE
> +# PD/SSD) when run with an external realtime device, which triggers
> +# the ltm "test is stalled" failsafe which aborts the VM.
> +xfs/076
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: soft lockup in xfs/170 on a file system formatted with -m crc=0
2021-11-05 17:25 soft lockup in xfs/170 on a file system formatted with -m crc=0 Theodore Ts'o
2021-11-06 1:58 ` xfs/076 takes a long long time testing with a realtime volume Theodore Ts'o
@ 2021-11-06 2:10 ` Darrick J. Wong
2021-11-06 14:52 ` Theodore Ts'o
1 sibling, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2021-11-06 2:10 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: linux-xfs, leah.rumancik
On Fri, Nov 05, 2021 at 01:25:05PM -0400, Theodore Ts'o wrote:
> Is this a known failure? I can reliably reproduce this soft lockup
> running xfs/170 using "gce-xfstests -c xfs/v4 xfs/170" using
> v5.15-rc4. The xfs/v4 test config formats the file system using -m
> crc=0 with no special mount options.
>
> I've attached the kernel config that I used; it's the standard one
> obtained via "gce-xfstests install-kconfig"[1].
Is the system totally idle, or is it still pounding the storage? The
softlockup looks like we're stuck trying to lock an AGF buffer, which
could just be the result of a long(ish) stall due to other threads or
the log or something else.
I guess "3h6m10s" implies it just died for 3+ hours.
Also, uh... 5.15 didn't prove to be a stable testing base (at least not
without a bunch of other patches to kvm, the memory manager, and the
block layer) until 5.15-rc7.
--D
>
>
> [1] https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/util/install-kconfig
>
> Thanks,
>
> - Ted
>
> commit adac31869b098d5f85d0930874dcf6a524d128d3
> Author: Theodore Ts'o <tytso@mit.edu>
> Date: Fri Nov 5 13:10:35 2021 -0400
>
> test-appliance: add xfs/170 to xfs/v4's exclude file
>
> The xfs/170 test is reliably causing a soft lockup when run on a file
> system formatted with mkfs.xfs -m crc=0.
>
> CONFIG: xfs/v4
> ZONE: us-east1-b
> VM STATUS: timeout on one test (xfs/170)
> SINCE LAST UPDATE: 3h6m10s
> TEST STATUS: hang
>
> run fstests xfs/170 at 2021-11-05 02:27:05
> ...
> [11024.269799] XFS (dm-1): Unmounting Filesystem
> [11024.695394] XFS (dm-1): Mounting V4 Filesystem
> [11024.731406] XFS (dm-1): Ending clean mount
> [11024.731552] xfs filesystem being mounted at /xt-vdc supports timestamps until 2038 (0x7fffffff)
> watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [dd:9671]
> irq event stamp: 22146924
> hardirqs last enabled at (22146923): [<ffffffff903cebad>] _raw_spin_unlock_irqrestore+0x2d/0x40
> hardirqs last disabled at (22146924): [<ffffffff903bd4fb>] sysvec_apic_timer_interrupt+0xb/0x90
> softirqs last enabled at (16886422): [<ffffffff906002ce>] __do_softirq+0x2ce/0x3fd
> softirqs last disabled at (16886399): [<ffffffff8f6bc1e8>] __irq_exit_rcu+0x88/0xb0
> CPU: 1 PID: 9671 Comm: dd Not tainted 5.15.0-rc4-xfstests-00018-g124e7c61deb2 #382
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
> Code: c7 18 53 48 89 f3 48 8b 74 24 10 e8 e5 3b 35 ff 48 89 ef e8 dd 66 35 ff 80 e7 02 74 06 e8 f3 77 3e ff fb 65 ff 0d 4b 82 c4 6f <5b> 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 65 ff 05 32
> RSP: 0018:ffffa055c2913598 EFLAGS: 00000246
> RAX: 000000000151ef6b RBX: 0000000000000282 RCX: 0000000000000040
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff903cebad
> RBP: ffff945d3aff4be0 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000080000 R12: 0000000000000000
> R13: ffffa055c2913640 R14: ffff945d13892a10 R15: ffff945d13892a58
> FS: 00007f62407d8580(0000) GS:ffff945dd9400000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f6240613000 CR3: 000000014bc04004 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> down_trylock+0x25/0x30
> xfs_buf_trylock+0x17/0x190
> xfs_buf_find+0x1b3/0x4a0
> xfs_buf_get_map+0x44/0x3a0
> xfs_buf_read_map+0x52/0x2e0
> ? xfs_read_agf+0xa3/0x180
> xfs_trans_read_buf_map+0x144/0x430
> ? xfs_read_agf+0xa3/0x180
> ? __lock_acquire+0x3a7/0x6c0
> xfs_read_agf+0xa3/0x180
> xfs_alloc_read_agf+0x4c/0x110
> xfs_alloc_pagf_init+0x27/0x60
> xfs_filestream_pick_ag+0x280/0x530
> xfs_filestream_new_ag+0x87/0x100
> xfs_bmap_btalloc_filestreams.constprop.0+0xe0/0x120
> xfs_bmap_btalloc+0x3e6/0x700
> xfs_bmapi_allocate+0xe4/0x310
> xfs_bmapi_write+0x42a/0x580
> xfs_iomap_write_direct+0x18d/0x210
> xfs_direct_write_iomap_begin+0x3ab/0x690
> ? lock_is_held_type+0x98/0x100
> iomap_iter+0x12b/0x240
> __iomap_dio_rw+0x1ff/0x630
> iomap_dio_rw+0xa/0x30
> xfs_file_dio_write_aligned+0xae/0x1c0
> xfs_file_write_iter+0xd8/0x130
> new_sync_write+0x122/0x1b0
> ? mod_objcg_state+0x180/0x2a0
> vfs_write+0x25d/0x370
> ksys_write+0x68/0xe0
> do_syscall_64+0x3b/0x90
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f6240700504
> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
> RSP: 002b:00007ffdac48f148 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6240700504
> RDX: 0000000000100000 RSI: 00007f6240514000 RDI: 0000000000000001
> RBP: 0000000000100000 R08: 00000000ffffffff R09: 0000000000000000
> R10: ffffffffffffff3b R11: 0000000000000246 R12: 00007f6240514000
> R13: 0000000000000000 R14: 0000000000000000 R15: 00007f6240514000
>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
>
> diff --git a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
> index a9acba9c..83ccfd79 100644
> --- a/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
> +++ b/kvm-xfstests/test-appliance/files/root/fs/xfs/cfg/v4.exclude
> @@ -1,2 +1,16 @@
> # Normal configurations don't support dax
> -g dax
> +
> +# On a 5.15-rc4 kernel, xfs/170 reliably causes a soft lockup in
> +# xfs_alloc_read_agf()+0x41/0x110. Call stack:
> +#
> +# xfs_alloc_pagf_init+0x27/0x60
> +# xfs_filestream_pick_ag+0x280/0x530
> +# xfs_filestream_new_ag+0x87/0x100
> +# xfs_bmap_btalloc_filestreams.constprop.0+0xe0/0x120
> +# xfs_bmap_btalloc+0x3e6/0x700
> +# xfs_bmapi_allocate+0xe4/0x310
> +# xfs_bmapi_convert_delalloc+0x26c/0x480
> +# xfs_map_blocks+0x1b5/0x510
> +# ...
> +xfs/170
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: soft lockup in xfs/170 on a file system formatted with -m crc=0
2021-11-06 2:10 ` soft lockup in xfs/170 on a file system formatted with -m crc=0 Darrick J. Wong
@ 2021-11-06 14:52 ` Theodore Ts'o
0 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2021-11-06 14:52 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, leah.rumancik
On Fri, Nov 05, 2021 at 07:10:31PM -0700, Darrick J. Wong wrote:
> On Fri, Nov 05, 2021 at 01:25:05PM -0400, Theodore Ts'o wrote:
> > Is this a known failure? I can reliably reproduce this soft lockup
> > running xfs/170 using "gce-xfstests -c xfs/v4 xfs/170" using
> > v5.15-rc4. The xfs/v4 test config formats the file system using -m
> > crc=0 with no special mount options.
> >
> > I've attached the kernel config that I used; it's the standard one
> > obtained via "gce-xfstests install-kconfig"[1].
>
> Also, uh... 5.15 didn't prove to be a stable testing base (at least not
> without a bunch of other patches to kvm, the memory manager, and the
> block layer) until 5.15-rc7.
Thanks, I haven't had a problem using 5.15-rc4 as a testing base for
ext4, so I wasn't aware of that. I just tested xfs/170 on 5.15 final,
and it is no longer triggering a soft lockup.
Thanks again,
- Ted
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: xfs/076 takes a long long time testing with a realtime volume
2021-11-06 2:08 ` Darrick J. Wong
@ 2021-11-06 16:43 ` Theodore Ts'o
0 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2021-11-06 16:43 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: fstests, linux-xfs, leah.rumancik
On Fri, Nov 05, 2021 at 07:08:04PM -0700, Darrick J. Wong wrote:
> On Fri, Nov 05, 2021 at 09:58:14PM -0400, Theodore Ts'o wrote:
> > After committing some exclusions into my test runner framework (see
> > below), I tested a potential fix to xfs/076 which disables the
> > real-time volume when creating the scratch volume. Should I send it
> > as a formal patch to fstests?
>
> Does adding:
>
> _xfs_force_bdev data $SCRATCH_MNT
>
> right after _scratch_mount make the performance problem go away? Sparse
> inodes and realtime are a supported configuration.
The test fails with an "fpunch failed" in 076.out.bad, and nothing
enlightening in 076.full. But it does complete in roughly two
minutes.
- Ted
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-11-06 16:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-05 17:25 soft lockup in xfs/170 on a file system formatted with -m crc=0 Theodore Ts'o
2021-11-06 1:58 ` xfs/076 takes a long long time testing with a realtime volume Theodore Ts'o
2021-11-06 2:08 ` Darrick J. Wong
2021-11-06 16:43 ` Theodore Ts'o
2021-11-06 2:10 ` soft lockup in xfs/170 on a file system formatted with -m crc=0 Darrick J. Wong
2021-11-06 14:52 ` Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox