* [PATCH] xfs: test log recovery for extent frees right after growfs
@ 2024-09-10 4:31 Christoph Hellwig
2024-09-10 8:57 ` Zorro Lang
2024-09-10 14:19 ` Brian Foster
0 siblings, 2 replies; 13+ messages in thread
From: Christoph Hellwig @ 2024-09-10 4:31 UTC (permalink / raw)
To: zlang; +Cc: djwong, fstests, linux-xfs
Reproduce a bug where log recovery fails when an unfinised extent free
intent is in the same log as the growfs transaction that added the AG.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
tests/xfs/1323 | 61 ++++++++++++++++++++++++++++++++++++++++++++++
tests/xfs/1323.out | 14 +++++++++++
2 files changed, 75 insertions(+)
create mode 100755 tests/xfs/1323
create mode 100644 tests/xfs/1323.out
diff --git a/tests/xfs/1323 b/tests/xfs/1323
new file mode 100755
index 000000000..a436510b0
--- /dev/null
+++ b/tests/xfs/1323
@@ -0,0 +1,61 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024, Christoph Hellwig
+#
+# FS QA Test No. 1323
+#
+# Test that recovering an extfree item residing on a freshly grown AG works.
+#
+. ./common/preamble
+_begin_fstest auto quick growfs
+
+. ./common/filter
+. ./common/inject
+
+_require_xfs_io_error_injection "free_extent"
+
+_xfs_force_bdev data $SCRATCH_MNT
+
+_cleanup()
+{
+ cd /
+ _scratch_unmount > /dev/null 2>&1
+ rm -rf $tmp.*
+}
+
+echo "Format filesystem"
+_scratch_mkfs_sized $((128 * 1024 * 1024)) >> $seqres.full
+_scratch_mount >> $seqres.full
+
+echo "Fill file system"
+dd if=/dev/zero of=$SCRATCH_MNT/filler1 bs=64k oflag=direct &>/dev/null
+sync
+dd if=/dev/zero of=$SCRATCH_MNT/filler2 bs=64k oflag=direct &>/dev/null
+sync
+
+echo "Grow file system"
+$XFS_GROWFS_PROG $SCRATCH_MNT >>$seqres.full
+
+echo "Create test files"
+dd if=/dev/zero of=$SCRATCH_MNT/test1 bs=8M count=4 oflag=direct | \
+ _filter_dd
+dd if=/dev/zero of=$SCRATCH_MNT/test2 bs=8M count=4 oflag=direct | \
+ _filter_dd
+
+echo "Inject error"
+_scratch_inject_error "free_extent"
+
+echo "Remove test file"
+rm $SCRATCH_MNT/test2
+
+echo "FS should be shut down, touch will fail"
+touch $SCRATCH_MNT/test1 2>&1 | _filter_scratch
+
+echo "Remount to replay log"
+_scratch_remount_dump_log >> $seqres.full
+
+echo "Done"
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/1323.out b/tests/xfs/1323.out
new file mode 100644
index 000000000..1740f9a1f
--- /dev/null
+++ b/tests/xfs/1323.out
@@ -0,0 +1,14 @@
+QA output created by 1323
+Format filesystem
+Fill file system
+Grow file system
+Create test files
+4+0 records in
+4+0 records out
+4+0 records in
+4+0 records out
+Inject error
+Remove test file
+FS should be shut down, touch will fail
+Remount to replay log
+Done
--
2.45.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-09-10 4:31 [PATCH] xfs: test log recovery for extent frees right after growfs Christoph Hellwig
@ 2024-09-10 8:57 ` Zorro Lang
2024-09-10 11:34 ` Christoph Hellwig
2024-09-10 14:19 ` Brian Foster
1 sibling, 1 reply; 13+ messages in thread
From: Zorro Lang @ 2024-09-10 8:57 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: zlang, djwong, fstests, linux-xfs
On Tue, Sep 10, 2024 at 07:31:17AM +0300, Christoph Hellwig wrote:
> Reproduce a bug where log recovery fails when an unfinised extent free
> intent is in the same log as the growfs transaction that added the AG.
Which bug? If it's a regression test, can we have a _fixed_by_kernel_commit
to mark the known issue?
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> tests/xfs/1323 | 61 ++++++++++++++++++++++++++++++++++++++++++++++
> tests/xfs/1323.out | 14 +++++++++++
> 2 files changed, 75 insertions(+)
> create mode 100755 tests/xfs/1323
> create mode 100644 tests/xfs/1323.out
>
> diff --git a/tests/xfs/1323 b/tests/xfs/1323
> new file mode 100755
> index 000000000..a436510b0
> --- /dev/null
> +++ b/tests/xfs/1323
> @@ -0,0 +1,61 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024, Christoph Hellwig
> +#
> +# FS QA Test No. 1323
> +#
> +# Test that recovering an extfree item residing on a freshly grown AG works.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick growfs
> +
> +. ./common/filter
> +. ./common/inject
> +
_require_scratch
> +_require_xfs_io_error_injection "free_extent"
> +
> +_xfs_force_bdev data $SCRATCH_MNT
Don't you need to do this after below _scratch_mount ?
> +
> +_cleanup()
> +{
> + cd /
> + _scratch_unmount > /dev/null 2>&1
SCRATCH_DEV will be unmounted at the end of each test, so this might not be needed.
If so, this whole _cleanup is not necessary.
> + rm -rf $tmp.*
> +}
> +
> +echo "Format filesystem"
> +_scratch_mkfs_sized $((128 * 1024 * 1024)) >> $seqres.full
> +_scratch_mount >> $seqres.full
> +
> +echo "Fill file system"
> +dd if=/dev/zero of=$SCRATCH_MNT/filler1 bs=64k oflag=direct &>/dev/null
> +sync
> +dd if=/dev/zero of=$SCRATCH_MNT/filler2 bs=64k oflag=direct &>/dev/null
> +sync
There's a helper named _fill_fs() in common/populate, I'm not sure if
your above steps are necessary or can be replaced, just to confirm with
you.
> +
> +echo "Grow file system"
> +$XFS_GROWFS_PROG $SCRATCH_MNT >>$seqres.full
_require_command "$XFS_GROWFS_PROG" xfs_growfs
> +
> +echo "Create test files"
> +dd if=/dev/zero of=$SCRATCH_MNT/test1 bs=8M count=4 oflag=direct | \
> + _filter_dd
> +dd if=/dev/zero of=$SCRATCH_MNT/test2 bs=8M count=4 oflag=direct | \
> + _filter_dd
> +
> +echo "Inject error"
> +_scratch_inject_error "free_extent"
> +
> +echo "Remove test file"
> +rm $SCRATCH_MNT/test2
Is -f needed ?
Thanks,
Zorro
> +
> +echo "FS should be shut down, touch will fail"
> +touch $SCRATCH_MNT/test1 2>&1 | _filter_scratch
> +
> +echo "Remount to replay log"
> +_scratch_remount_dump_log >> $seqres.full
> +
> +echo "Done"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/1323.out b/tests/xfs/1323.out
> new file mode 100644
> index 000000000..1740f9a1f
> --- /dev/null
> +++ b/tests/xfs/1323.out
> @@ -0,0 +1,14 @@
> +QA output created by 1323
> +Format filesystem
> +Fill file system
> +Grow file system
> +Create test files
> +4+0 records in
> +4+0 records out
> +4+0 records in
> +4+0 records out
> +Inject error
> +Remove test file
> +FS should be shut down, touch will fail
> +Remount to replay log
> +Done
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-09-10 8:57 ` Zorro Lang
@ 2024-09-10 11:34 ` Christoph Hellwig
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2024-09-10 11:34 UTC (permalink / raw)
To: Zorro Lang; +Cc: Christoph Hellwig, zlang, djwong, fstests, linux-xfs
On Tue, Sep 10, 2024 at 04:57:48PM +0800, Zorro Lang wrote:
> On Tue, Sep 10, 2024 at 07:31:17AM +0300, Christoph Hellwig wrote:
> > Reproduce a bug where log recovery fails when an unfinised extent free
> > intent is in the same log as the growfs transaction that added the AG.
>
> Which bug? If it's a regression test, can we have a _fixed_by_kernel_commit
> to mark the known issue?
I just sent the kernel patches for it. It's been there basically
forever as far as I can tell.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-09-10 4:31 [PATCH] xfs: test log recovery for extent frees right after growfs Christoph Hellwig
2024-09-10 8:57 ` Zorro Lang
@ 2024-09-10 14:19 ` Brian Foster
2024-09-10 15:10 ` Christoph Hellwig
1 sibling, 1 reply; 13+ messages in thread
From: Brian Foster @ 2024-09-10 14:19 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: zlang, djwong, fstests, linux-xfs
On Tue, Sep 10, 2024 at 07:31:17AM +0300, Christoph Hellwig wrote:
> Reproduce a bug where log recovery fails when an unfinised extent free
> intent is in the same log as the growfs transaction that added the AG.
>
No real issue with the test, but I wonder if we could do something more
generic. Various XFS shutdown and log recovery issues went undetected
for a while until we started adding more of the generic stress tests
currently categorized in the recoveryloop group.
So for example, I'm wondering if you took something like generic/388 or
475 and modified it to start with a smallish fs, grew it in 1GB or
whatever increments on each loop iteration, and then ran the same
generic stress/timeout/shutdown/recovery sequence, would that eventually
reproduce the issue you've fixed? I don't think reproducibility would
need to be 100% for the test to be useful, fwiw.
Note that I'm assuming we don't have something like that already. I see
growfs and shutdown tests in tests/xfs/group.list, but nothing in both
groups and I haven't looked through the individual tests. Just a
thought.
Brian
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> tests/xfs/1323 | 61 ++++++++++++++++++++++++++++++++++++++++++++++
> tests/xfs/1323.out | 14 +++++++++++
> 2 files changed, 75 insertions(+)
> create mode 100755 tests/xfs/1323
> create mode 100644 tests/xfs/1323.out
>
> diff --git a/tests/xfs/1323 b/tests/xfs/1323
> new file mode 100755
> index 000000000..a436510b0
> --- /dev/null
> +++ b/tests/xfs/1323
> @@ -0,0 +1,61 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024, Christoph Hellwig
> +#
> +# FS QA Test No. 1323
> +#
> +# Test that recovering an extfree item residing on a freshly grown AG works.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick growfs
> +
> +. ./common/filter
> +. ./common/inject
> +
> +_require_xfs_io_error_injection "free_extent"
> +
> +_xfs_force_bdev data $SCRATCH_MNT
> +
> +_cleanup()
> +{
> + cd /
> + _scratch_unmount > /dev/null 2>&1
> + rm -rf $tmp.*
> +}
> +
> +echo "Format filesystem"
> +_scratch_mkfs_sized $((128 * 1024 * 1024)) >> $seqres.full
> +_scratch_mount >> $seqres.full
> +
> +echo "Fill file system"
> +dd if=/dev/zero of=$SCRATCH_MNT/filler1 bs=64k oflag=direct &>/dev/null
> +sync
> +dd if=/dev/zero of=$SCRATCH_MNT/filler2 bs=64k oflag=direct &>/dev/null
> +sync
> +
> +echo "Grow file system"
> +$XFS_GROWFS_PROG $SCRATCH_MNT >>$seqres.full
> +
> +echo "Create test files"
> +dd if=/dev/zero of=$SCRATCH_MNT/test1 bs=8M count=4 oflag=direct | \
> + _filter_dd
> +dd if=/dev/zero of=$SCRATCH_MNT/test2 bs=8M count=4 oflag=direct | \
> + _filter_dd
> +
> +echo "Inject error"
> +_scratch_inject_error "free_extent"
> +
> +echo "Remove test file"
> +rm $SCRATCH_MNT/test2
> +
> +echo "FS should be shut down, touch will fail"
> +touch $SCRATCH_MNT/test1 2>&1 | _filter_scratch
> +
> +echo "Remount to replay log"
> +_scratch_remount_dump_log >> $seqres.full
> +
> +echo "Done"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/1323.out b/tests/xfs/1323.out
> new file mode 100644
> index 000000000..1740f9a1f
> --- /dev/null
> +++ b/tests/xfs/1323.out
> @@ -0,0 +1,14 @@
> +QA output created by 1323
> +Format filesystem
> +Fill file system
> +Grow file system
> +Create test files
> +4+0 records in
> +4+0 records out
> +4+0 records in
> +4+0 records out
> +Inject error
> +Remove test file
> +FS should be shut down, touch will fail
> +Remount to replay log
> +Done
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-09-10 14:19 ` Brian Foster
@ 2024-09-10 15:10 ` Christoph Hellwig
2024-09-10 16:13 ` Brian Foster
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2024-09-10 15:10 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, zlang, djwong, fstests, linux-xfs
On Tue, Sep 10, 2024 at 10:19:50AM -0400, Brian Foster wrote:
> No real issue with the test, but I wonder if we could do something more
> generic. Various XFS shutdown and log recovery issues went undetected
> for a while until we started adding more of the generic stress tests
> currently categorized in the recoveryloop group.
>
> So for example, I'm wondering if you took something like generic/388 or
> 475 and modified it to start with a smallish fs, grew it in 1GB or
> whatever increments on each loop iteration, and then ran the same
> generic stress/timeout/shutdown/recovery sequence, would that eventually
> reproduce the issue you've fixed? I don't think reproducibility would
> need to be 100% for the test to be useful, fwiw.
>
> Note that I'm assuming we don't have something like that already. I see
> growfs and shutdown tests in tests/xfs/group.list, but nothing in both
> groups and I haven't looked through the individual tests. Just a
> thought.
It turns out reproducing this bug was surprisingly complicated.
After a growfs we can now dip into reserves that made the test1
file start filling up the existing AGs first for a while, and thus
the error injection would hit on that and never even reach a new
AG.
So while agree with your sentiment and like the highlevel idea, I
suspect it will need a fair amount of work to actually be useful.
Right now I'm too busy with various projects to look into it
unfortunately.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-09-10 15:10 ` Christoph Hellwig
@ 2024-09-10 16:13 ` Brian Foster
2024-10-08 16:28 ` Brian Foster
0 siblings, 1 reply; 13+ messages in thread
From: Brian Foster @ 2024-09-10 16:13 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: zlang, djwong, fstests, linux-xfs
On Tue, Sep 10, 2024 at 05:10:53PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 10, 2024 at 10:19:50AM -0400, Brian Foster wrote:
> > No real issue with the test, but I wonder if we could do something more
> > generic. Various XFS shutdown and log recovery issues went undetected
> > for a while until we started adding more of the generic stress tests
> > currently categorized in the recoveryloop group.
> >
> > So for example, I'm wondering if you took something like generic/388 or
> > 475 and modified it to start with a smallish fs, grew it in 1GB or
> > whatever increments on each loop iteration, and then ran the same
> > generic stress/timeout/shutdown/recovery sequence, would that eventually
> > reproduce the issue you've fixed? I don't think reproducibility would
> > need to be 100% for the test to be useful, fwiw.
> >
> > Note that I'm assuming we don't have something like that already. I see
> > growfs and shutdown tests in tests/xfs/group.list, but nothing in both
> > groups and I haven't looked through the individual tests. Just a
> > thought.
>
> It turns out reproducing this bug was surprisingly complicated.
> After a growfs we can now dip into reserves that made the test1
> file start filling up the existing AGs first for a while, and thus
> the error injection would hit on that and never even reach a new
> AG.
>
> So while agree with your sentiment and like the highlevel idea, I
> suspect it will need a fair amount of work to actually be useful.
> Right now I'm too busy with various projects to look into it
> unfortunately.
>
Fair enough, maybe I'll play with it a bit when I have some more time.
Brian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-09-10 16:13 ` Brian Foster
@ 2024-10-08 16:28 ` Brian Foster
2024-10-09 8:04 ` Christoph Hellwig
0 siblings, 1 reply; 13+ messages in thread
From: Brian Foster @ 2024-10-08 16:28 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: zlang, djwong, fstests, linux-xfs
On Tue, Sep 10, 2024 at 12:13:29PM -0400, Brian Foster wrote:
> On Tue, Sep 10, 2024 at 05:10:53PM +0200, Christoph Hellwig wrote:
> > On Tue, Sep 10, 2024 at 10:19:50AM -0400, Brian Foster wrote:
> > > No real issue with the test, but I wonder if we could do something more
> > > generic. Various XFS shutdown and log recovery issues went undetected
> > > for a while until we started adding more of the generic stress tests
> > > currently categorized in the recoveryloop group.
> > >
> > > So for example, I'm wondering if you took something like generic/388 or
> > > 475 and modified it to start with a smallish fs, grew it in 1GB or
> > > whatever increments on each loop iteration, and then ran the same
> > > generic stress/timeout/shutdown/recovery sequence, would that eventually
> > > reproduce the issue you've fixed? I don't think reproducibility would
> > > need to be 100% for the test to be useful, fwiw.
> > >
> > > Note that I'm assuming we don't have something like that already. I see
> > > growfs and shutdown tests in tests/xfs/group.list, but nothing in both
> > > groups and I haven't looked through the individual tests. Just a
> > > thought.
> >
> > It turns out reproducing this bug was surprisingly complicated.
> > After a growfs we can now dip into reserves that made the test1
> > file start filling up the existing AGs first for a while, and thus
> > the error injection would hit on that and never even reach a new
> > AG.
> >
> > So while agree with your sentiment and like the highlevel idea, I
> > suspect it will need a fair amount of work to actually be useful.
> > Right now I'm too busy with various projects to look into it
> > unfortunately.
> >
>
> Fair enough, maybe I'll play with it a bit when I have some more time.
>
> Brian
>
>
FWIW, here's a quick hack at such a test. This is essentially a copy of
xfs/104, tweaked to remove some of the output noise and whatnot, and
hacked in some bits from generic/388 to do a shutdown and mount cycle
per iteration.
I'm not sure if this reproduces your original problem, but this blows up
pretty quickly on 6.12.0-rc2. I see a stream of warnings that start like
this (buffer readahead path via log recovery):
[ 2807.764283] XFS (vdb2): xfs_buf_map_verify: daddr 0x3e803 out of range, EOFS 0x3e800
[ 2807.768094] ------------[ cut here ]------------
[ 2807.770629] WARNING: CPU: 0 PID: 28386 at fs/xfs/xfs_buf.c:553 xfs_buf_get_map+0x184e/0x2670 [xfs]
... and then end up with an unrecoverable/unmountable fs. From the title
it sounds like this may be a different issue though.. hm?
Brian
--- 8< ---
diff --git a/tests/xfs/609 b/tests/xfs/609
new file mode 100755
index 00000000..b9c23869
--- /dev/null
+++ b/tests/xfs/609
@@ -0,0 +1,100 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2000-2004 Silicon Graphics, Inc. All Rights Reserved.
+#
+# FS QA Test No. 609
+#
+# XFS online growfs-while-allocating tests (data subvol variant)
+#
+. ./common/preamble
+_begin_fstest growfs ioctl prealloc auto stress
+
+# Import common functions.
+. ./common/filter
+
+_create_scratch()
+{
+ _scratch_mkfs_xfs $@ >> $seqres.full
+
+ if ! _try_scratch_mount 2>/dev/null
+ then
+ echo "failed to mount $SCRATCH_DEV"
+ exit 1
+ fi
+
+ # fix the reserve block pool to a known size so that the enospc
+ # calculations work out correctly.
+ _scratch_resvblks 1024 > /dev/null 2>&1
+}
+
+_fill_scratch()
+{
+ $XFS_IO_PROG -f -c "resvsp 0 ${1}" $SCRATCH_MNT/resvfile
+}
+
+_stress_scratch()
+{
+ procs=3
+ nops=1000
+ # -w ensures that the only ops are ones which cause write I/O
+ FSSTRESS_ARGS=`_scale_fsstress_args -d $SCRATCH_MNT -w -p $procs \
+ -n $nops $FSSTRESS_AVOID`
+ $FSSTRESS_PROG $FSSTRESS_ARGS >> $seqres.full 2>&1 &
+}
+
+_require_scratch
+_require_xfs_io_command "falloc"
+
+_scratch_mkfs_xfs | tee -a $seqres.full | _filter_mkfs 2>$tmp.mkfs
+. $tmp.mkfs # extract blocksize and data size for scratch device
+
+endsize=`expr 550 \* 1048576` # stop after growing this big
+incsize=`expr 42 \* 1048576` # grow in chunks of this size
+modsize=`expr 4 \* $incsize` # pause after this many increments
+
+[ `expr $endsize / $dbsize` -lt $dblocks ] || _notrun "Scratch device too small"
+
+nags=4
+size=`expr 125 \* 1048576` # 120 megabytes initially
+sizeb=`expr $size / $dbsize` # in data blocks
+logblks=$(_scratch_find_xfs_min_logblocks -dsize=${size} -dagcount=${nags})
+_create_scratch -lsize=${logblks}b -dsize=${size} -dagcount=${nags}
+
+for i in `seq 125 -1 90`; do
+ fillsize=`expr $i \* 1048576`
+ out="$(_fill_scratch $fillsize 2>&1)"
+ echo "$out" | grep -q 'No space left on device' && continue
+ test -n "${out}" && echo "$out"
+ break
+done
+
+#
+# Grow the filesystem while actively stressing it...
+# Kick off more stress threads on each iteration, grow; repeat.
+#
+while [ $size -le $endsize ]; do
+ echo "*** stressing a ${sizeb} block filesystem" >> $seqres.full
+ _stress_scratch
+ size=`expr $size + $incsize`
+ sizeb=`expr $size / $dbsize` # in data blocks
+ echo "*** growing to a ${sizeb} block filesystem" >> $seqres.full
+ xfs_growfs -D ${sizeb} $SCRATCH_MNT >> $seqres.full
+ echo AGCOUNT=$agcount >> $seqres.full
+ echo >> $seqres.full
+
+ sleep $((RANDOM % 3))
+ _scratch_shutdown
+ ps -e | grep fsstress > /dev/null 2>&1
+ while [ $? -eq 0 ]; do
+ killall -9 fsstress > /dev/null 2>&1
+ wait > /dev/null 2>&1
+ ps -e | grep fsstress > /dev/null 2>&1
+ done
+ _scratch_cycle_mount || _fail "cycle mount failed"
+done > /dev/null 2>&1
+wait # stop for any remaining stress processes
+
+_scratch_unmount
+
+status=0
+exit
diff --git a/tests/xfs/609.out b/tests/xfs/609.out
new file mode 100644
index 00000000..1853cc65
--- /dev/null
+++ b/tests/xfs/609.out
@@ -0,0 +1,7 @@
+QA output created by 609
+meta-data=DDEV isize=XXX agcount=N, agsize=XXX blks
+data = bsize=XXX blocks=XXX, imaxpct=PCT
+ = sunit=XXX swidth=XXX, unwritten=X
+naming =VERN bsize=XXX
+log =LDEV bsize=XXX blocks=XXX
+realtime =RDEV extsz=XXX blocks=XXX, rtextents=XXX
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-10-08 16:28 ` Brian Foster
@ 2024-10-09 8:04 ` Christoph Hellwig
2024-10-09 12:35 ` Brian Foster
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2024-10-09 8:04 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, zlang, djwong, fstests, linux-xfs
On Tue, Oct 08, 2024 at 12:28:37PM -0400, Brian Foster wrote:
> FWIW, here's a quick hack at such a test. This is essentially a copy of
> xfs/104, tweaked to remove some of the output noise and whatnot, and
> hacked in some bits from generic/388 to do a shutdown and mount cycle
> per iteration.
>
> I'm not sure if this reproduces your original problem, but this blows up
> pretty quickly on 6.12.0-rc2. I see a stream of warnings that start like
> this (buffer readahead path via log recovery):
>
> [ 2807.764283] XFS (vdb2): xfs_buf_map_verify: daddr 0x3e803 out of range, EOFS 0x3e800
> [ 2807.768094] ------------[ cut here ]------------
> [ 2807.770629] WARNING: CPU: 0 PID: 28386 at fs/xfs/xfs_buf.c:553 xfs_buf_get_map+0x184e/0x2670 [xfs]
>
> ... and then end up with an unrecoverable/unmountable fs. From the title
> it sounds like this may be a different issue though.. hm?
That's at least the same initial message I hit.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-10-09 8:04 ` Christoph Hellwig
@ 2024-10-09 12:35 ` Brian Foster
2024-10-09 12:43 ` Christoph Hellwig
0 siblings, 1 reply; 13+ messages in thread
From: Brian Foster @ 2024-10-09 12:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: zlang, djwong, fstests, linux-xfs
On Wed, Oct 09, 2024 at 10:04:51AM +0200, Christoph Hellwig wrote:
> On Tue, Oct 08, 2024 at 12:28:37PM -0400, Brian Foster wrote:
> > FWIW, here's a quick hack at such a test. This is essentially a copy of
> > xfs/104, tweaked to remove some of the output noise and whatnot, and
> > hacked in some bits from generic/388 to do a shutdown and mount cycle
> > per iteration.
> >
> > I'm not sure if this reproduces your original problem, but this blows up
> > pretty quickly on 6.12.0-rc2. I see a stream of warnings that start like
> > this (buffer readahead path via log recovery):
> >
> > [ 2807.764283] XFS (vdb2): xfs_buf_map_verify: daddr 0x3e803 out of range, EOFS 0x3e800
> > [ 2807.768094] ------------[ cut here ]------------
> > [ 2807.770629] WARNING: CPU: 0 PID: 28386 at fs/xfs/xfs_buf.c:553 xfs_buf_get_map+0x184e/0x2670 [xfs]
> >
> > ... and then end up with an unrecoverable/unmountable fs. From the title
> > it sounds like this may be a different issue though.. hm?
>
> That's at least the same initial message I hit.
>
>
Ok, so then what happened? :) Are there outstanding patches somewhere to
fix this problem? If so, I can give it a test with this.
I'm also trying to figure out if the stress level of this particular
test should be turned up a notch or three, but I can't really dig into
that until this initial variant is passing reliably.
Brian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-10-09 12:35 ` Brian Foster
@ 2024-10-09 12:43 ` Christoph Hellwig
2024-10-09 15:14 ` Brian Foster
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2024-10-09 12:43 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, zlang, djwong, fstests, linux-xfs
On Wed, Oct 09, 2024 at 08:35:46AM -0400, Brian Foster wrote:
> Ok, so then what happened? :) Are there outstanding patches somewhere to
> fix this problem? If so, I can give it a test with this.
Yes, "fix recovery of allocator ops after a growfs" from Sep 30.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-10-09 12:43 ` Christoph Hellwig
@ 2024-10-09 15:14 ` Brian Foster
2024-10-10 6:51 ` Christoph Hellwig
2024-10-14 6:00 ` Christoph Hellwig
0 siblings, 2 replies; 13+ messages in thread
From: Brian Foster @ 2024-10-09 15:14 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: zlang, djwong, fstests, linux-xfs
On Wed, Oct 09, 2024 at 02:43:16PM +0200, Christoph Hellwig wrote:
> On Wed, Oct 09, 2024 at 08:35:46AM -0400, Brian Foster wrote:
> > Ok, so then what happened? :) Are there outstanding patches somewhere to
> > fix this problem? If so, I can give it a test with this.
>
> Yes, "fix recovery of allocator ops after a growfs" from Sep 30.
>
Thanks. This seems to fix the unmountable fs problem, so I'd guess it's
reproducing something related.
The test still fails occasionally with a trans abort and I see some
bnobt/cntbt corruption messages like the one appended below, but I'll
leave to you to decide whether this is a regression or preexisting
problem.
I probably won't get through it today, but I'll try to take a closer
look at the patches soon..
Brian
...
XFS (vdb2): cntbt record corruption in AG 8 detected at xfs_alloc_check_irec+0xfa/0x160 [xfs]!
XFS (vdb2): start block 0xa block count 0x1f36
XFS (vdb2): Internal error xfs_trans_cancel at line 872 of file fs/xfs/xfs_trans.c. Caller xfs_symlink+0x5a6/0xbd0 [xfs]
CPU: 5 UID: 0 PID: 8625 Comm: fsstress Tainted: G E 6.12.0-rc2+ #251
Tainted: [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x8d/0xb0
xfs_trans_cancel+0x3ca/0x530 [xfs]
xfs_symlink+0x5a6/0xbd0 [xfs]
? __pfx_xfs_symlink+0x10/0x10 [xfs]
? avc_has_perm+0x77/0x110
? lock_is_held_type+0xcd/0x120
? __pfx_avc_has_perm+0x10/0x10
? avc_has_perm_noaudit+0x3a/0x280
? may_create+0x26a/0x2e0
xfs_vn_symlink+0x144/0x390 [xfs]
? __pfx_selinux_inode_permission+0x10/0x10
? __pfx_xfs_vn_symlink+0x10/0x10 [xfs]
vfs_symlink+0x33e/0x580
do_symlinkat+0x1cf/0x250
? __pfx_do_symlinkat+0x10/0x10
? getname_flags.part.0+0xae/0x490
__x64_sys_symlink+0x71/0x90
do_syscall_64+0x93/0x180
? do_syscall_64+0x9f/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fcb692378eb
Code: 8b 0d 49 f5 0c 00 f7 d8 64 89 01 b9 ff ff ff ff eb d3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 58 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 11 f5 0c 00 f7 d8
RSP: 002b:00007ffc547e52e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000058
RAX: ffffffffffffffda RBX: 000000003804a200 RCX: 00007fcb692378eb
RDX: 0000000000000000 RSI: 0000000038049200 RDI: 000000003804a200
RBP: 0000000038049200 R08: 000000003804a440 R09: 00007fcb69307b20
R10: 0000000000000270 R11: 0000000000000206 R12: 000000003804a200
R13: 00007ffc547e5450 R14: 0000000078ba5238 R15: 00007fcb6912c6c8
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-10-09 15:14 ` Brian Foster
@ 2024-10-10 6:51 ` Christoph Hellwig
2024-10-14 6:00 ` Christoph Hellwig
1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2024-10-10 6:51 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, zlang, djwong, fstests, linux-xfs
On Wed, Oct 09, 2024 at 11:14:49AM -0400, Brian Foster wrote:
> Thanks. This seems to fix the unmountable fs problem, so I'd guess it's
> reproducing something related.
Heh.
>
> The test still fails occasionally with a trans abort and I see some
> bnobt/cntbt corruption messages like the one appended below, but I'll
> leave to you to decide whether this is a regression or preexisting
> problem.
>
> I probably won't get through it today, but I'll try to take a closer
> look at the patches soon..
My bet is on pre-existing, but either way we should use the chance
to fix this properly. I'm a little busy right now, but I'll try to
get back to this soon and play with your test.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] xfs: test log recovery for extent frees right after growfs
2024-10-09 15:14 ` Brian Foster
2024-10-10 6:51 ` Christoph Hellwig
@ 2024-10-14 6:00 ` Christoph Hellwig
1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2024-10-14 6:00 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, zlang, djwong, fstests, linux-xfs
On Wed, Oct 09, 2024 at 11:14:49AM -0400, Brian Foster wrote:
> Thanks. This seems to fix the unmountable fs problem, so I'd guess it's
> reproducing something related.
>
> The test still fails occasionally with a trans abort and I see some
> bnobt/cntbt corruption messages like the one appended below, but I'll
> leave to you to decide whether this is a regression or preexisting
> problem.
That's because log recovery completely fails to update the in-core
state for the last existing AG. I've added a fix for that.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-10-14 6:01 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-10 4:31 [PATCH] xfs: test log recovery for extent frees right after growfs Christoph Hellwig
2024-09-10 8:57 ` Zorro Lang
2024-09-10 11:34 ` Christoph Hellwig
2024-09-10 14:19 ` Brian Foster
2024-09-10 15:10 ` Christoph Hellwig
2024-09-10 16:13 ` Brian Foster
2024-10-08 16:28 ` Brian Foster
2024-10-09 8:04 ` Christoph Hellwig
2024-10-09 12:35 ` Brian Foster
2024-10-09 12:43 ` Christoph Hellwig
2024-10-09 15:14 ` Brian Foster
2024-10-10 6:51 ` Christoph Hellwig
2024-10-14 6:00 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox