* [PATCH kdevops] xfs: add xfs/242 as failing on xfs_reflink_2k
@ 2024-04-16 23:51 Luis Chamberlain
2024-04-17 0:37 ` Darrick J. Wong
2024-04-17 4:48 ` Dave Chinner
0 siblings, 2 replies; 5+ messages in thread
From: Luis Chamberlain @ 2024-04-16 23:51 UTC (permalink / raw)
To: kdevops; +Cc: linux-xfs, Luis Chamberlain
This test is rather simple, and somehow we managed to capture a
non-crash failure. The test was added to fstests via fstests commit
0c95fadc35c8e450 ("expand 252 with more corner case tests") which
essentially does this:
+ $XFS_IO_PROG $xfs_io_opt -f -c "truncate $block_size" \
+ -c "pwrite 0 $block_size" $sync_cmd \
+ -c "$zero_cmd 128 128" \
+ -c "$map_cmd -v" $testfile | $filter_cmd
The map_cmd in this case is: 'bmap -p'. So the test does:
a) truncates data to the block size
b) sync
c) zero-fills the the blocksize
The xfs_io bmap displays the block mapping for the current open file.
Since our failed delta is:
-0: [0..7]: data
+0: [0..7]: unwritten
So it would seem we somehow managed to race to write, but it never
went anywhere. I can't reproduce yet, but figured I'd put this out
there to at least acknowledge its seen at least once.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
Super rare to trigger this but figured I'd check to see if others have seen
this fail before. This was on vanilla v6.8-rc2. I'm wondering a race is
possible with a guest using sparse files on the host, and the host somehow
incorrectly informing the guest the write is done. btrfs sparse files
were used on the host for the drives used by this guest for scratch drives.
.../fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt | 1 +
1 file changed, 1 insertion(+)
diff --git a/workflows/fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt b/workflows/fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt
index f6ea47b0479f..8b4161f3700e 100644
--- a/workflows/fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt
+++ b/workflows/fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt
@@ -19,6 +19,7 @@ xfs/075
xfs/155
xfs/168
xfs/188
+xfs/242 # F:1/2000 non-fatal failure cosmic ray? https://gist.github.com/mcgrof/6ef50311179a65221413a63c0cc8efd1
xfs/270
xfs/301
xfs/502 # F:1/8 korg#218226 xfs assert fs/xfs/xfs_message.c:102
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH kdevops] xfs: add xfs/242 as failing on xfs_reflink_2k
2024-04-16 23:51 [PATCH kdevops] xfs: add xfs/242 as failing on xfs_reflink_2k Luis Chamberlain
@ 2024-04-17 0:37 ` Darrick J. Wong
2024-04-17 1:51 ` Luis Chamberlain
2024-04-17 4:48 ` Dave Chinner
1 sibling, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2024-04-17 0:37 UTC (permalink / raw)
To: Luis Chamberlain; +Cc: kdevops, linux-xfs
On Tue, Apr 16, 2024 at 04:51:08PM -0700, Luis Chamberlain wrote:
> This test is rather simple, and somehow we managed to capture a
> non-crash failure. The test was added to fstests via fstests commit
> 0c95fadc35c8e450 ("expand 252 with more corner case tests") which
> essentially does this:
>
> + $XFS_IO_PROG $xfs_io_opt -f -c "truncate $block_size" \
> + -c "pwrite 0 $block_size" $sync_cmd \
> + -c "$zero_cmd 128 128" \
> + -c "$map_cmd -v" $testfile | $filter_cmd
>
> The map_cmd in this case is: 'bmap -p'. So the test does:
>
> a) truncates data to the block size
> b) sync
> c) zero-fills the the blocksize
Which subtest is this?
I've seen periodic failures in xfs/242 that I can't reproduce either:
--- /tmp/fstests/tests/xfs/242.out 2024-02-28 16:20:24.448887914 -0800
+++ /var/tmp/fstests/xfs/242.out.bad 2024-04-15 17:36:46.736000000 -0700
@@ -6,8 +6,7 @@ QA output created by 242
1aca77e2188f52a62674fe8a873bdaba
2. into allocated space
0: [0..127]: data
-1: [128..383]: unwritten
-2: [384..639]: data
+1: [128..639]: unwritten
2f7a72b9ca9923b610514a11a45a80c9
3. into unwritten space
0: [0..639]: unwritten
It's very strange to me that the block map changes but the md5 doesn't?
The pwrite should have written 0xcd into the file and then the space
between 64k and 192K got replaced with an unwritten extent. But
everything between 192K and 320K should still be written data.
--D
> The xfs_io bmap displays the block mapping for the current open file.
> Since our failed delta is:
>
> -0: [0..7]: data
> +0: [0..7]: unwritten
>
> So it would seem we somehow managed to race to write, but it never
> went anywhere. I can't reproduce yet, but figured I'd put this out
> there to at least acknowledge its seen at least once.
>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>
> Super rare to trigger this but figured I'd check to see if others have seen
> this fail before. This was on vanilla v6.8-rc2. I'm wondering a race is
> possible with a guest using sparse files on the host, and the host somehow
> incorrectly informing the guest the write is done. btrfs sparse files
> were used on the host for the drives used by this guest for scratch drives.
>
> .../fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/workflows/fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt b/workflows/fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt
> index f6ea47b0479f..8b4161f3700e 100644
> --- a/workflows/fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt
> +++ b/workflows/fstests/expunges/6.8.0-rc2/xfs/unassigned/xfs_reflink_2k.txt
> @@ -19,6 +19,7 @@ xfs/075
> xfs/155
> xfs/168
> xfs/188
> +xfs/242 # F:1/2000 non-fatal failure cosmic ray? https://gist.github.com/mcgrof/6ef50311179a65221413a63c0cc8efd1
> xfs/270
> xfs/301
> xfs/502 # F:1/8 korg#218226 xfs assert fs/xfs/xfs_message.c:102
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH kdevops] xfs: add xfs/242 as failing on xfs_reflink_2k
2024-04-17 0:37 ` Darrick J. Wong
@ 2024-04-17 1:51 ` Luis Chamberlain
0 siblings, 0 replies; 5+ messages in thread
From: Luis Chamberlain @ 2024-04-17 1:51 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: kdevops, linux-xfs
On Tue, Apr 16, 2024 at 05:37:55PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 16, 2024 at 04:51:08PM -0700, Luis Chamberlain wrote:
> > This test is rather simple, and somehow we managed to capture a
> > non-crash failure. The test was added to fstests via fstests commit
> > 0c95fadc35c8e450 ("expand 252 with more corner case tests") which
> > essentially does this:
> >
> > + $XFS_IO_PROG $xfs_io_opt -f -c "truncate $block_size" \
> > + -c "pwrite 0 $block_size" $sync_cmd \
> > + -c "$zero_cmd 128 128" \
> > + -c "$map_cmd -v" $testfile | $filter_cmd
> >
> > The map_cmd in this case is: 'bmap -p'. So the test does:
> >
> > a) truncates data to the block size
> > b) sync
> > c) zero-fills the the blocksize
>
> Which subtest is this?
It's the:
17. data -> hole in single block file
> I've seen periodic failures in xfs/242 that I can't reproduce either:
Oh good to hear.
> --- /tmp/fstests/tests/xfs/242.out 2024-02-28 16:20:24.448887914 -0800
> +++ /var/tmp/fstests/xfs/242.out.bad 2024-04-15 17:36:46.736000000 -0700
> @@ -6,8 +6,7 @@ QA output created by 242
> 1aca77e2188f52a62674fe8a873bdaba
> 2. into allocated space
> 0: [0..127]: data
> -1: [128..383]: unwritten
> -2: [384..639]: data
> +1: [128..639]: unwritten
> 2f7a72b9ca9923b610514a11a45a80c9
> 3. into unwritten space
> 0: [0..639]: unwritten
Oh curious, you hit #2 while I saw #17.
VM or bare metal? If VM, real drive or sparse files? Mine guest files
are virtio drives on files placed on the host on an XFS partition, the
guest uses btrfs truncated files for the sparse files and loopback
devices.
ie: TEST_DEV=/dev/loop16
kdevops@base-xfs-reflink-2k ~ $ sudo losetup -a| grep loop16
/dev/loop16: [0038]:268 (/media/sparsefiles/sparse-disk16)
kdevops@base-xfs-reflink-2k ~ $ df -h /media/sparsefiles/sparse-disk16
Filesystem Size Used Avail Use% Mounted on
/dev/vdc 100G 408M 96G 1% /media/sparsefiles
kdevops@base-xfs-reflink-2k ~ $ du -hs /media/sparsefiles/sparse-disk16
70M /media/sparsefiles/sparse-disk16
kdevops@base-xfs-reflink-2k ~ $ mount | grep -E "sparse|loop16"
/dev/vdc on /media/sparsefiles type btrfs (rw,relatime,discard=async,noacl,space_cache=v2,subvolid=5,subvol=/)
/dev/loop16 on /media/test type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> It's very strange to me that the block map changes but the md5 doesn't?
Great point! How is that possible?
> The pwrite should have written 0xcd into the file and then the space
> between 64k and 192K got replaced with an unwritten extent. But
> everything between 192K and 320K should still be written data.
Luis
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH kdevops] xfs: add xfs/242 as failing on xfs_reflink_2k
2024-04-16 23:51 [PATCH kdevops] xfs: add xfs/242 as failing on xfs_reflink_2k Luis Chamberlain
2024-04-17 0:37 ` Darrick J. Wong
@ 2024-04-17 4:48 ` Dave Chinner
2024-07-11 11:52 ` Long Li
1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2024-04-17 4:48 UTC (permalink / raw)
To: Luis Chamberlain; +Cc: kdevops, linux-xfs
On Tue, Apr 16, 2024 at 04:51:08PM -0700, Luis Chamberlain wrote:
> This test is rather simple, and somehow we managed to capture a
> non-crash failure. The test was added to fstests via fstests commit
> 0c95fadc35c8e450 ("expand 252 with more corner case tests") which
> essentially does this:
>
> + $XFS_IO_PROG $xfs_io_opt -f -c "truncate $block_size" \
> + -c "pwrite 0 $block_size" $sync_cmd \
> + -c "$zero_cmd 128 128" \
> + -c "$map_cmd -v" $testfile | $filter_cmd
>
> The map_cmd in this case is: 'bmap -p'. So the test does:
>
> a) truncates data to the block size
> b) sync
> c) zero-fills the the blocksize
>
> The xfs_io bmap displays the block mapping for the current open file.
> Since our failed delta is:
>
> -0: [0..7]: data
> +0: [0..7]: unwritten
That's most likely a _filter_bmap() issue, not a kernel code bug.
i.e. 'bmap -vp' output looks like:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..231]: 2076367680..2076367911 18 (6251328..6251559) 232 000000
and _filter_bmap has two separate regex matches against different
fields that both trigger "unwritten" output. The first check is
against field 5 which is actually the AG-OFFSET in this output, not
field 7 which is the FLAGS field.
Hence if the ag offset matches '/0[01][01][01][01]/' the filter will
emit 'unwritten' regardless of what the flags say it actually is.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH kdevops] xfs: add xfs/242 as failing on xfs_reflink_2k
2024-04-17 4:48 ` Dave Chinner
@ 2024-07-11 11:52 ` Long Li
0 siblings, 0 replies; 5+ messages in thread
From: Long Li @ 2024-07-11 11:52 UTC (permalink / raw)
To: Dave Chinner, Luis Chamberlain; +Cc: kdevops, linux-xfs, fstests, zlang
On Wed, Apr 17, 2024 at 02:48:16PM +1000, Dave Chinner wrote:
> On Tue, Apr 16, 2024 at 04:51:08PM -0700, Luis Chamberlain wrote:
> > This test is rather simple, and somehow we managed to capture a
> > non-crash failure. The test was added to fstests via fstests commit
> > 0c95fadc35c8e450 ("expand 252 with more corner case tests") which
> > essentially does this:
> >
> > + $XFS_IO_PROG $xfs_io_opt -f -c "truncate $block_size" \
> > + -c "pwrite 0 $block_size" $sync_cmd \
> > + -c "$zero_cmd 128 128" \
> > + -c "$map_cmd -v" $testfile | $filter_cmd
> >
> > The map_cmd in this case is: 'bmap -p'. So the test does:
> >
> > a) truncates data to the block size
> > b) sync
> > c) zero-fills the the blocksize
> >
> > The xfs_io bmap displays the block mapping for the current open file.
> > Since our failed delta is:
> >
> > -0: [0..7]: data
> > +0: [0..7]: unwritten
>
> That's most likely a _filter_bmap() issue, not a kernel code bug.
>
> i.e. 'bmap -vp' output looks like:
>
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..231]: 2076367680..2076367911 18 (6251328..6251559) 232 000000
>
> and _filter_bmap has two separate regex matches against different
> fields that both trigger "unwritten" output. The first check is
> against field 5 which is actually the AG-OFFSET in this output, not
> field 7 which is the FLAGS field.
>
> Hence if the ag offset matches '/0[01][01][01][01]/' the filter will
> emit 'unwritten' regardless of what the flags say it actually is.
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
I got a similar failure in xfs/242 as follows. It can be easily reproduced
when I run xfs/242 as a cyclic test. The root cause, as Dave pointed out,
is that _filter_bmap first matches the data in column 5, which may be
incorrect. On the other hand, _filter_bmap seems to be missing "next" to
jump out when it matches "data" in the 5th column, otherwise it might
print the result twice. The issue was introduced by commit 7d5d3f77154e
("xfs/242: fix _filter_bmap for xfs_io bmap that does rt file properly").
The failure disappeared when I retest xfs/242 by reverted commit 7d5d3f77154e.
The problem is not fixed yet. How about matching the 7th column first
and then the 5th column in _filter_bmap? because the rtdev file only has
5 columns in the `bmap -vp` output.
Best Regards,
Long Li
----------------------------------------------------------------------------
FSTYP -- xfs (debug)
PLATFORM -- Linux/aarch64 localhost 6.6.0+ #84 SMP Tue Jul 9 23:35:56 CST 2024
VMIP -- 192.168.250.78
MKFS_OPTIONS -- -f -m crc=1,rmapbt=0,reflink=0 /dev/sdb
MOUNT_OPTIONS -- /dev/sdb /tmp/scratch
xfs/242 1s ... - output mismatch (see /root/xfstests-dev/results//xfs/242.out.bad)
--- tests/xfs/242.out 2024-06-06 19:08:46.677638130 +0800
+++ /root/xfstests-dev/results//xfs/242.out.bad 2024-07-11 02:24:35.337554580 +0800
@@ -57,8 +57,7 @@
1aca77e2188f52a62674fe8a873bdaba
13. data -> unwritten -> data
0: [0..127]: data
-1: [128..511]: unwritten
-2: [512..639]: data
+1: [128..639]: unwritten
0bcfc7652751f8fe46381240ccadd9d7
...
(Run 'diff -u /root/xfstests-dev/tests/xfs/242.out /root/xfstests-dev/results//xfs/242.out.bad' to see the entire diff)
Ran: xfs/242
Failures: xfs/242
Failed 1 of 1 tests
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-11 11:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-16 23:51 [PATCH kdevops] xfs: add xfs/242 as failing on xfs_reflink_2k Luis Chamberlain
2024-04-17 0:37 ` Darrick J. Wong
2024-04-17 1:51 ` Luis Chamberlain
2024-04-17 4:48 ` Dave Chinner
2024-07-11 11:52 ` Long Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox