linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: fstests@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2] tests/xfs: test COW writeback failure when overlapping non-shared blocks
Date: Fri, 17 Dec 2021 10:19:27 -0500	[thread overview]
Message-ID: <Ybyqf6OPsFt/X73C@bfoster> (raw)
In-Reply-To: <20211117180105.GP24282@magnolia>

On Wed, Nov 17, 2021 at 10:01:05AM -0800, Darrick J. Wong wrote:
> On Mon, Oct 25, 2021 at 09:00:53AM -0400, Brian Foster wrote:
> > Test that COW writeback that overlaps non-shared delalloc blocks
> > does not leave around stale delalloc blocks on I/O failure. This
> > triggers assert failures and free space accounting corruption on
> > XFS.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > v2:
> > - Explicitly set COW extent size hint.
> > - Move to tests/xfs.
> > - Various minor cleanups.
> > v1: https://lore.kernel.org/fstests/20211021163959.1887011-1-bfoster@redhat.com/
> > 
> >  tests/xfs/999     | 62 +++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/999.out |  2 ++
> >  2 files changed, 64 insertions(+)
> >  create mode 100755 tests/xfs/999
> >  create mode 100644 tests/xfs/999.out
> > 
> > diff --git a/tests/xfs/999 b/tests/xfs/999
> > new file mode 100755
> > index 00000000..f27972bc
> > --- /dev/null
> > +++ b/tests/xfs/999
> > @@ -0,0 +1,62 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2021 Red Hat, Inc.  All Rights Reserved.
> > +#
> > +# FS QA Test 999
> > +#
> > +# Test that COW writeback that overlaps non-shared delalloc blocks does not
> > +# leave around stale delalloc blocks on I/O failure. This triggers assert
> > +# failures and free space accounting corruption on XFS.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto quick clone
> > +
> > +_cleanup()
> > +{
> > +	_cleanup_flakey
> > +	cd /
> > +	rm -r -f $tmp.*
> > +}
> > +
> > +# Import common functions.
> > +. ./common/reflink
> > +. ./common/dmflakey
> > +
> > +# real QA test starts here
> > +_supported_fs xfs
> > +_require_scratch_reflink
> > +_require_cp_reflink
> > +_require_xfs_io_command "cowextsize"
> > +_require_flakey_with_error_writes
> > +
> > +_scratch_mkfs >> $seqres.full
> > +_init_flakey
> > +_mount_flakey
> > +
> > +blksz=$(_get_file_block_size $SCRATCH_MNT)
> > +
> > +# Set the COW extent size hint to guarantee COW fork preallocation occurs over a
> > +# bordering block offset.
> > +$XFS_IO_PROG -c "cowextsize $((blksz * 2))" $SCRATCH_MNT >> $seqres.full
> > +
> > +# create two files that share a single block
> > +$XFS_IO_PROG -fc "pwrite $blksz $blksz" $SCRATCH_MNT/file1 >> $seqres.full
> > +_cp_reflink $SCRATCH_MNT/file1 $SCRATCH_MNT/file2
> > +
> > +# Perform a buffered write across the shared and non-shared blocks. On XFS, this
> > +# creates a COW fork extent that covers the shared block as well as the just
> > +# created non-shared delalloc block. Fail the writeback to verify that all
> > +# delayed allocation is cleaned up properly.
> > +_load_flakey_table $FLAKEY_ERROR_WRITES
> > +$XFS_IO_PROG -c "pwrite 0 $((blksz * 2))" \
> > +	-c fsync $SCRATCH_MNT/file2 >> $seqres.full
> > +_load_flakey_table $FLAKEY_ALLOW_WRITES
> 
> Hmm.  So I've been running this test in my djwong-dev tree and hit this
> last night:
> 
> --- xfs/999.out
> +++ xfs/999.out.bad
> @@ -1,2 +1,3 @@
>  QA output created by 999
> -fsync: Input/output error
> +stat: Input/output error
> +cp: failed to access '/opt/file3': Input/output error
> 
> Digging into the kernel log, I see this happen:
> 
> [10240.821719] XFS (dm-0): Mounting V5 Filesystem
> [10240.855461] XFS (dm-0): Ending clean mount
> [10240.857030] XFS (dm-0): Quotacheck needed: Please wait.
> [10240.860095] XFS (dm-0): Quotacheck: Done.
> [10240.977055] XFS (dm-0): log I/O error -5
> [10240.977459] XFS (dm-0): Log I/O Error (0x2) detected at xlog_ioend_work+0x5f/0xb0 [xfs] (fs/xfs/xfs_log.c:1377).  Shutting down filesystem.
> [10240.978682] XFS (dm-0): Please unmount the filesystem and rectify the problem(s)
> [10241.044886] XFS (dm-0): Unmounting Filesystem
> 
> I guess the log tried to checkpoint for the brief window where the
> flakey table was enabled, and shut down the whole fs?  I don't have any
> good ideas for how to solve this, though.
> 
> Hm.  What if you did something like:
> 
> $XFS_IO_PROG -c 'pwrite...' $SCRATCH_MNT/file2
> _load_flakey_table $FLAKEY_ERROR_WRITES
> $XFS_IO_PROG -c 'sync_range -wa' $SCRATCH_MNT/file2
> +_load_flakey_table $FLAKEY_ALLOW_WRITES
> 

I notice that sync_range doesn't combine options like that even though
the underlying flags can be combined for the syscall. That aside, I
suspect a combination of an fsync on the earlier reflink command and a
couple sync_range calls here (to achieve the intended behavior above) is
probably the best option to try and avoid this. That seems to preserve
the intended behavior of the test and I don't see any spurious failures
in a couple hundred or so iterations.. (granted I don't know how
reproducible that was in the first place).

Brian

> to constrain the window in which disk write will fail?  Seeing as s_f_r
> doesn't actually tell the fs to flush its own metadata or anything.
> 
> (Yikes, did I finally find a use for sync_file_range??)
> 
> --D
> 
> > +
> > +# Try a post-fail reflink and then unmount. Both of these are known to produce
> > +# errors and/or assert failures on XFS if we trip over a stale delalloc block.
> > +_cp_reflink $SCRATCH_MNT/file2 $SCRATCH_MNT/file3
> > +_unmount_flakey
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/xfs/999.out b/tests/xfs/999.out
> > new file mode 100644
> > index 00000000..88b69c4c
> > --- /dev/null
> > +++ b/tests/xfs/999.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 999
> > +fsync: Input/output error
> > -- 
> > 2.31.1
> > 
> 


      reply	other threads:[~2021-12-17 15:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-25 13:00 [PATCH v2] tests/xfs: test COW writeback failure when overlapping non-shared blocks Brian Foster
2021-10-25 16:10 ` Darrick J. Wong
2021-10-31 15:20   ` Eryu Guan
2021-11-03 16:10     ` Darrick J. Wong
2021-11-17 18:01 ` Darrick J. Wong
2021-12-17 15:19   ` Brian Foster [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ybyqf6OPsFt/X73C@bfoster \
    --to=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).