[PATCHSET 0/3] fstests: random fixes for v2023.10.08

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCHSET 0/3] fstests: random fixes for v2023.10.08
@ 2023-10-09 18:18 Darrick J. Wong
  2023-10-09 18:18 ` [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Darrick J. Wong @ 2023-10-09 18:18 UTC (permalink / raw)
  To: djwong, zlang; +Cc: tytso, jack, linux-xfs, fstests, guan

Hi all,

Here's the usual odd fixes for fstests.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=random-fixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=random-fixes

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=random-fixes
---
 .../aio-dio-append-write-read-race.c               |    2 +-
 tests/generic/269                                  |    2 +-
 tests/xfs/051                                      |    2 +-
 tests/xfs/178                                      |    9 ++++++++-
 tests/xfs/178.out                                  |    2 --
 5 files changed, 11 insertions(+), 6 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-09 18:18 [PATCHSET 0/3] fstests: random fixes for v2023.10.08 Darrick J. Wong
@ 2023-10-09 18:18 ` Darrick J. Wong
  2023-10-10  7:01   ` Christoph Hellwig
                     ` (2 more replies)
  2023-10-09 18:18 ` [PATCH 2/3] generic/465: only complain about stale disk contents when racing directio Darrick J. Wong
  2023-10-09 18:18 ` [PATCH 3/3] generic/269,xfs/051: don't drop fsstress failures to stdout Darrick J. Wong
  2 siblings, 3 replies; 13+ messages in thread
From: Darrick J. Wong @ 2023-10-09 18:18 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

When I added an fstests config for "RAID" striping (aka MKFS_OPTIONS='-d
su=128k,sw=4'), I suddenly started seeing this test fail sporadically
with:

--- /tmp/fstests/tests/xfs/178.out	2023-07-11 12:18:21.714970364 -0700
+++ /var/tmp/fstests/xfs/178.out.bad	2023-07-25 22:05:39.756000000 -0700
@@ -10,6 +10,20 @@ bad primary superblock - bad magic numbe

 attempting to find secondary superblock...
 found candidate secondary superblock...
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
 verified secondary superblock...
 writing modified primary superblock
 sb root inode INO inconsistent with calculated value INO

Eventually I tracked this down to a mis-interaction between the test,
xfs_repair, and the storage device.

The storage advertises SCSI UNMAP support, but it is of the variety
where the UNMAP command returns immediately but takes its time to unmap
in the background.  Subsequent rereads are allowed to return stale
contents, per DISCARD semantics.

When the fstests cloud is not busy, the old contents disappear in a few
seconds.  However, at peak utilization, there are ~75 VMs running, and
the storage backend can take several minutes to commit these background
requests.

When we zero the primary super and start xfs_repair on SCRATCH_DEV, it
will walk the device looking for secondary supers.  Most of the time it
finds the actual AG 1 secondary super, but sometimes it finds ghosts
from previous formats.  When that happens, xfs_repair will talk quite a
bit about those failed secondaries, even if it eventually finds an
acceptable secondary sb and completes the repair.

Filter out the messages about secondary supers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/178     |    9 ++++++++-
 tests/xfs/178.out |    2 --
 2 files changed, 8 insertions(+), 3 deletions(-)


diff --git a/tests/xfs/178 b/tests/xfs/178
index a65197cde3..fee1e92bf3 100755
--- a/tests/xfs/178
+++ b/tests/xfs/178
@@ -10,13 +10,20 @@
 . ./common/preamble
 _begin_fstest mkfs other auto
 
+filter_repair() {
+	_filter_repair | sed \
+		-e '/unable to verify superblock, continuing/d' \
+		-e '/found candidate secondary superblock/d' \
+		-e '/error reading superblock.*-- seek to offset/d'
+}
+
 # dd the 1st sector then repair
 _dd_repair_check()
 {
 	#dd first sector
 	dd if=/dev/zero of=$1 bs=$2 count=1 2>&1 | _filter_dd
 	#xfs_repair
-	_scratch_xfs_repair 2>&1 | _filter_repair
+	_scratch_xfs_repair 2>&1 | filter_repair
 	#check repair
 	if _check_scratch_fs; then
         	echo "repair passed"
diff --git a/tests/xfs/178.out b/tests/xfs/178.out
index 0bebe553eb..711e90cc26 100644
--- a/tests/xfs/178.out
+++ b/tests/xfs/178.out
@@ -9,7 +9,6 @@ Phase 1 - find and verify superblock...
 bad primary superblock - bad magic number !!!
 
 attempting to find secondary superblock...
-found candidate secondary superblock...
 verified secondary superblock...
 writing modified primary superblock
 sb root inode INO inconsistent with calculated value INO
@@ -45,7 +44,6 @@ Phase 1 - find and verify superblock...
 bad primary superblock - bad magic number !!!
 
 attempting to find secondary superblock...
-found candidate secondary superblock...
 verified secondary superblock...
 writing modified primary superblock
 sb root inode INO inconsistent with calculated value INO


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] generic/465: only complain about stale disk contents when racing directio
  2023-10-09 18:18 [PATCHSET 0/3] fstests: random fixes for v2023.10.08 Darrick J. Wong
  2023-10-09 18:18 ` [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks Darrick J. Wong
@ 2023-10-09 18:18 ` Darrick J. Wong
  2023-10-10  7:03   ` Christoph Hellwig
  2023-10-09 18:18 ` [PATCH 3/3] generic/269,xfs/051: don't drop fsstress failures to stdout Darrick J. Wong
  2 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2023-10-09 18:18 UTC (permalink / raw)
  To: djwong, zlang; +Cc: tytso, jack, linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

This test does a strange thing with directio -- it races a reader thread
with an appending aio writer thread and checks that the reader thread
only ever sees a (probably short) buffer containing the same contents
that are being read.

However, this has never worked correctly on XFS, which supports
concurrent readers and writers for directio.  Say you already have a
file with a single written mapping A:

AAAAAAAAAA
0        EOF

Then one thread initiates an aligned appending write:

AAAAAAAAAA---------
0        EOF      new_EOF

However, the free space is fragmented, so the file range maps to
multiple extents b and c (lowercase means unwritten here):

AAAAAAAAAAbbbbccccc
0        EOF      new_EOF

This implies separate bios for b and c.  Both bios are issued, but c
completes first.  The ioend for c will extend i_size all the way to
new_EOF.  Extent b is still marked unwritten because it hasn't completed
yet.

Next, the test reader slips in and tries to read the range between the
old EOF and the new EOF.  The file looks like this now:

AAAAAAAAAAbbbbCCCCC
0        EOF      new_EOF

So the reader sees "bbbbCCCCC" in the mapping, and the buffer returned
contains a range of zeroes followed by whatever was written to C.

For pagecache IO I would say that i_size should not be extended until
the extending write is fully complete, but the pagecache also
coordinates access so that reads and writes cannot conflict.

However, this is directio.  Reads and writes to the storage device can
be issued and acknowledged in any order.  I asked Ted and Jan about this
point, and they echoed that for directio it's expected that application
software must coordinate access themselves.

In other words, the only thing that the reader can check here is that
the filesystem is not returning stale disk contents.  Amend the test so
that null bytes in the reader buffer are acceptable.

Cc: tytso@mit.edu, jack@suse.cz
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 .../aio-dio-append-write-read-race.c               |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/aio-dio-regress/aio-dio-append-write-read-race.c b/src/aio-dio-regress/aio-dio-append-write-read-race.c
index 911f27230b..d9f8982f00 100644
--- a/src/aio-dio-regress/aio-dio-append-write-read-race.c
+++ b/src/aio-dio-regress/aio-dio-append-write-read-race.c
@@ -191,7 +191,7 @@ int main(int argc, char *argv[])
 		}

 		for (j = 0; j < rdata.read_sz; j++) {
-			if (rdata.buf[j] != 'a') {
+			if (rdata.buf[j] != 'a' && rdata.buf[j] != 0) {
 				fail("encounter an error: "
 					"block %d offset %d, content %x\n",
 					i, j, rbuf[j]);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] generic/269,xfs/051: don't drop fsstress failures to stdout
  2023-10-09 18:18 [PATCHSET 0/3] fstests: random fixes for v2023.10.08 Darrick J. Wong
  2023-10-09 18:18 ` [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks Darrick J. Wong
  2023-10-09 18:18 ` [PATCH 2/3] generic/465: only complain about stale disk contents when racing directio Darrick J. Wong
@ 2023-10-09 18:18 ` Darrick J. Wong
  2023-10-10  7:03   ` Christoph Hellwig
  2 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2023-10-09 18:18 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Prior to commit f55e46d629, these two tests would run fsstress until it
hit a failure -- ENOSPC in the case of generic/269, and EIO in the case
of xfs/051.  These errors are expected, which was why stderr was also
redirected to /dev/null.  Commit f55e46d629 removed the stderr
redirection, which has resulted in a 100% failure rate.

Fix this regression by pushing stderr stream to $seqres.full.

Fixes: f55e46d629 ("fstests: redirect fsstress' stdout to $seqres.full instead of /dev/null")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/generic/269 |    2 +-
 tests/xfs/051     |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/tests/generic/269 b/tests/generic/269
index b852f6bf7e..b7cdecd94f 100755
--- a/tests/generic/269
+++ b/tests/generic/269
@@ -23,7 +23,7 @@ _workout()
 	out=$SCRATCH_MNT/fsstress.$$
 	args=`_scale_fsstress_args -p128 -n999999999 -f setattr=1 $FSSTRESS_AVOID -d $out`
 	echo "fsstress $args" >> $seqres.full
-	$FSSTRESS_PROG $args >> $seqres.full &
+	$FSSTRESS_PROG $args &>> $seqres.full &
 	pid=$!
 	echo "Run dd writers in parallel"
 	for ((i=0; i < num_iterations; i++))
diff --git a/tests/xfs/051 b/tests/xfs/051
index 1c6709648d..aca867c940 100755
--- a/tests/xfs/051
+++ b/tests/xfs/051
@@ -38,7 +38,7 @@ _scratch_mount
 
 # Start a workload and shutdown the fs. The subsequent mount will require log
 # recovery.
-$FSSTRESS_PROG -n 9999 -p 2 -w -d $SCRATCH_MNT >> $seqres.full &
+$FSSTRESS_PROG -n 9999 -p 2 -w -d $SCRATCH_MNT &>> $seqres.full &
 sleep 5
 _scratch_shutdown -f
 $KILLALL_PROG -q $FSSTRESS_PROG


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-09 18:18 ` [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks Darrick J. Wong
@ 2023-10-10  7:01   ` Christoph Hellwig
  2023-10-11 19:10     ` Darrick J. Wong
  2023-10-12  8:30   ` Christoph Hellwig
  2023-10-12 15:09   ` [PATCH v2 " Darrick J. Wong
  2 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-10-10  7:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, linux-xfs, fstests, guan, linux-block, linux-scsi

On Mon, Oct 09, 2023 at 11:18:33AM -0700, Darrick J. Wong wrote:
> The storage advertises SCSI UNMAP support, but it is of the variety
> where the UNMAP command returns immediately but takes its time to unmap
> in the background.  Subsequent rereads are allowed to return stale
> contents, per DISCARD semantics.
> 
> When the fstests cloud is not busy, the old contents disappear in a few
> seconds.  However, at peak utilization, there are ~75 VMs running, and
> the storage backend can take several minutes to commit these background
> requests.

Umm, that is not valid behavior fo SCSI UNMAP or any other command
that Linux discard maps to.  All of them can do one of the two options
on a per-block basis:

 - return the unmap pattern (usually but not always 0) for any read
   following the unmap/trim/discard
 - always return the previous pattern until it is overwritten or
   discarded again

Changing the pattern some time after unmap is a grave bug, and we need
to blacklist the device.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] generic/465: only complain about stale disk contents when racing directio
  2023-10-09 18:18 ` [PATCH 2/3] generic/465: only complain about stale disk contents when racing directio Darrick J. Wong
@ 2023-10-10  7:03   ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2023-10-10  7:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, tytso, jack, linux-xfs, fstests, guan

Yes, this looks sensible.  I've actually seen sproadic failures from
this test for a while and had this on my list to investigate.

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] generic/269,xfs/051: don't drop fsstress failures to stdout
  2023-10-09 18:18 ` [PATCH 3/3] generic/269,xfs/051: don't drop fsstress failures to stdout Darrick J. Wong
@ 2023-10-10  7:03   ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2023-10-10  7:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, linux-xfs, fstests, guan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-10  7:01   ` Christoph Hellwig
@ 2023-10-11 19:10     ` Darrick J. Wong
  2023-10-12  4:26       ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2023-10-11 19:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: zlang, linux-xfs, fstests, guan, linux-block, linux-scsi

On Tue, Oct 10, 2023 at 12:01:33AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 09, 2023 at 11:18:33AM -0700, Darrick J. Wong wrote:
> > The storage advertises SCSI UNMAP support, but it is of the variety
> > where the UNMAP command returns immediately but takes its time to unmap
> > in the background.  Subsequent rereads are allowed to return stale
> > contents, per DISCARD semantics.
> > 
> > When the fstests cloud is not busy, the old contents disappear in a few
> > seconds.  However, at peak utilization, there are ~75 VMs running, and
> > the storage backend can take several minutes to commit these background
> > requests.
> 
> Umm, that is not valid behavior fo SCSI UNMAP or any other command
> that Linux discard maps to.  All of them can do one of the two options
> on a per-block basis:
> 
>  - return the unmap pattern (usually but not always 0) for any read
>    following the unmap/trim/discard
>  - always return the previous pattern until it is overwritten or
>    discarded again
> 
> Changing the pattern some time after unmap is a grave bug, and we need
> to blacklist the device.

Ok, I'll go pester them about fixing that, if they haven't already.
Apparently discard support is somewhat new.

I'm pretty sure I've seen some NVME SSDs where you can issue devicewide
DISCARDs and slowly watch the namespace utilization go down over tens of
minutes; and reads will only eventually start returning zeroes.

(Note that *writes* during the slow-discard period are persisted
correctly.)

However, that's orthogonal to this patch -- if the device doesn't
support discard, _scratch_mkfs won't zero the entire disk to remove old
dead superblocks that might have been written by previous tests.  After
we shatter the primary super, the xfs_repair scanning code can still
trip over those old supers and break the golden output.

--D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-11 19:10     ` Darrick J. Wong
@ 2023-10-12  4:26       ` Christoph Hellwig
  2023-10-12  5:03         ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-10-12  4:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, zlang, linux-xfs, fstests, guan, linux-block,
	linux-scsi

On Wed, Oct 11, 2023 at 12:10:25PM -0700, Darrick J. Wong wrote:
> I'm pretty sure I've seen some NVME SSDs where you can issue devicewide
> DISCARDs and slowly watch the namespace utilization go down over tens of
> minutes; and reads will only eventually start returning zeroes.

Well, the second part is broken.  The first part is fine, and I've briefly
consulted with a firmware team implementing such a feature.  It just needs
to make sure to return zeroes right after the return of the discard
even if the blocks aren't erased yet, including after a powerfail.
(anyone who knows the XFS truncate / hole punch code will have a vague
idea of how that could work).

> However, that's orthogonal to this patch -- if the device doesn't
> support discard, _scratch_mkfs won't zero the entire disk to remove old
> dead superblocks that might have been written by previous tests.  After
> we shatter the primary super, the xfs_repair scanning code can still
> trip over those old supers and break the golden output.

True.  I have to admit I stopped reading the patch after the unmap
description.  I'll take another look.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-12  4:26       ` Christoph Hellwig
@ 2023-10-12  5:03         ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2023-10-12  5:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: zlang, linux-xfs, fstests, guan, linux-block, linux-scsi

On Wed, Oct 11, 2023 at 09:26:51PM -0700, Christoph Hellwig wrote:
> On Wed, Oct 11, 2023 at 12:10:25PM -0700, Darrick J. Wong wrote:
> > I'm pretty sure I've seen some NVME SSDs where you can issue devicewide
> > DISCARDs and slowly watch the namespace utilization go down over tens of
> > minutes; and reads will only eventually start returning zeroes.
> 
> Well, the second part is broken.  The first part is fine, and I've briefly
> consulted with a firmware team implementing such a feature.  It just needs
> to make sure to return zeroes right after the return of the discard
> even if the blocks aren't erased yet, including after a powerfail.
> (anyone who knows the XFS truncate / hole punch code will have a vague
> idea of how that could work).
> 
> > However, that's orthogonal to this patch -- if the device doesn't
> > support discard, _scratch_mkfs won't zero the entire disk to remove old
> > dead superblocks that might have been written by previous tests.  After
> > we shatter the primary super, the xfs_repair scanning code can still
> > trip over those old supers and break the golden output.
> 
> True.  I have to admit I stopped reading the patch after the unmap
> description.  I'll take another look.

<nod> I think I'll update the next version of this patch to substitute
the paragraph that I wrote above for all the misleading ramblings about
DISCARD.  Today's revisit of that clod block device doesn't show the
weird stale reads that disappear behavior, so it's entirely possible
that they've fixed it already.

--D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-09 18:18 ` [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks Darrick J. Wong
  2023-10-10  7:01   ` Christoph Hellwig
@ 2023-10-12  8:30   ` Christoph Hellwig
  2023-10-12 15:09   ` [PATCH v2 " Darrick J. Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2023-10-12  8:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, linux-xfs, fstests, guan

Unlike the commit log, the changes look good to me.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-09 18:18 ` [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks Darrick J. Wong
  2023-10-10  7:01   ` Christoph Hellwig
  2023-10-12  8:30   ` Christoph Hellwig
@ 2023-10-12 15:09   ` Darrick J. Wong
  2023-10-12 15:10     ` Christoph Hellwig
  2 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2023-10-12 15:09 UTC (permalink / raw)
  To: zlang, Christoph Hellwig; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

When I added an fstests config for "RAID" striping (aka MKFS_OPTIONS='-d
su=128k,sw=4'), I suddenly started seeing this test fail sporadically
with:

--- /tmp/fstests/tests/xfs/178.out	2023-07-11 12:18:21.714970364 -0700
+++ /var/tmp/fstests/xfs/178.out.bad	2023-07-25 22:05:39.756000000 -0700
@@ -10,6 +10,20 @@ bad primary superblock - bad magic numbe

 attempting to find secondary superblock...
 found candidate secondary superblock...
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
+error reading superblock 1 -- seek to offset 584115421184 failed
+unable to verify superblock, continuing...
+found candidate secondary superblock...
 verified secondary superblock...
 writing modified primary superblock
 sb root inode INO inconsistent with calculated value INO

Eventually I tracked this down to a mis-interaction between the test,
xfs_repair, and the storage device.

If the device doesn't support discard, _scratch_mkfs won't zero the
entire disk to remove old dead superblocks that might have been written
by previous tests.  After we shatter the primary super, the xfs_repair
scanning code can still trip over those old supers when it goes looking
for secondary supers.

Most of the time it finds the actual AG 1 secondary super, but sometimes
it finds ghosts from previous formats.  When that happens, xfs_repair
will talk quite a bit about those failed secondaries, even if it
eventually finds an acceptable secondary sb and completes the repair.

Filter out the messages about secondary supers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v2: fix commit message to identify the problem in fstests, drop the
irrelevant mumbbling about SCSI UNMAP
---
 tests/xfs/178     |    9 ++++++++-
 tests/xfs/178.out |    2 --
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/tests/xfs/178 b/tests/xfs/178
index a65197cde3..fee1e92bf3 100755
--- a/tests/xfs/178
+++ b/tests/xfs/178
@@ -10,13 +10,20 @@
 . ./common/preamble
 _begin_fstest mkfs other auto
 
+filter_repair() {
+	_filter_repair | sed \
+		-e '/unable to verify superblock, continuing/d' \
+		-e '/found candidate secondary superblock/d' \
+		-e '/error reading superblock.*-- seek to offset/d'
+}
+
 # dd the 1st sector then repair
 _dd_repair_check()
 {
 	#dd first sector
 	dd if=/dev/zero of=$1 bs=$2 count=1 2>&1 | _filter_dd
 	#xfs_repair
-	_scratch_xfs_repair 2>&1 | _filter_repair
+	_scratch_xfs_repair 2>&1 | filter_repair
 	#check repair
 	if _check_scratch_fs; then
         	echo "repair passed"
diff --git a/tests/xfs/178.out b/tests/xfs/178.out
index 0bebe553eb..711e90cc26 100644
--- a/tests/xfs/178.out
+++ b/tests/xfs/178.out
@@ -9,7 +9,6 @@ Phase 1 - find and verify superblock...
 bad primary superblock - bad magic number !!!
 
 attempting to find secondary superblock...
-found candidate secondary superblock...
 verified secondary superblock...
 writing modified primary superblock
 sb root inode INO inconsistent with calculated value INO
@@ -45,7 +44,6 @@ Phase 1 - find and verify superblock...
 bad primary superblock - bad magic number !!!
 
 attempting to find secondary superblock...
-found candidate secondary superblock...
 verified secondary superblock...
 writing modified primary superblock
 sb root inode INO inconsistent with calculated value INO

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-12 15:09   ` [PATCH v2 " Darrick J. Wong
@ 2023-10-12 15:10     ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2023-10-12 15:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, Christoph Hellwig, linux-xfs, fstests, guan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-10-12 15:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-09 18:18 [PATCHSET 0/3] fstests: random fixes for v2023.10.08 Darrick J. Wong
2023-10-09 18:18 ` [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks Darrick J. Wong
2023-10-10  7:01   ` Christoph Hellwig
2023-10-11 19:10     ` Darrick J. Wong
2023-10-12  4:26       ` Christoph Hellwig
2023-10-12  5:03         ` Darrick J. Wong
2023-10-12  8:30   ` Christoph Hellwig
2023-10-12 15:09   ` [PATCH v2 " Darrick J. Wong
2023-10-12 15:10     ` Christoph Hellwig
2023-10-09 18:18 ` [PATCH 2/3] generic/465: only complain about stale disk contents when racing directio Darrick J. Wong
2023-10-10  7:03   ` Christoph Hellwig
2023-10-09 18:18 ` [PATCH 3/3] generic/269,xfs/051: don't drop fsstress failures to stdout Darrick J. Wong
2023-10-10  7:03   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox