From: Brian Foster <bfoster@redhat.com>
To: Zorro Lang <zlang@redhat.com>
Cc: fstests@vger.kernel.org
Subject: Re: [PATCH] generic: test eofblocks race with file extending aio dio writes
Date: Thu, 20 Apr 2017 14:22:30 -0400 [thread overview]
Message-ID: <20170420182228.GA47198@bfoster.bfoster> (raw)
In-Reply-To: <1492701663-9209-1-git-send-email-zlang@redhat.com>
On Thu, Apr 20, 2017 at 11:21:03PM +0800, Zorro Lang wrote:
> It's possible for post-eof blocks to end up being used for direct I/O
> writes. dio write performs an upfront unwritten extent allocation, sends
> the dio and then updates the inode size (if necessary) on write
> completion. If a file release occurs while a file extending dio write is
> in flight, it is possible to mistake the post-eof blocks for speculative
> preallocation and incorrectly truncate them from the inode. This means
> that the resulting dio write completion can discover a hole and allocate
> new blocks rather than perform unwritten extent conversion.
>
> A kernel warning can be reproduced by generic/299 on XFS:
> XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, \
> file: fs/xfs//xfs_trans.c, line: 309
>
It might be important to point out that this test also exposes file
corruption and/or stale data exposure. E.g., the verification of written
data is an important distinction between this test and generic/299.
> The root cause is that xfs_free_eofblocks() uses i_size to truncate
> post-eof blocks from the inode, but async, file extending direct writes
> do not update i_size until write completion, long after inode locks are
> dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode
> to the incorrect size.
>
> For cover this filesystem corruption testing, write this new case to
> check data integrality manually, not only depend on a kernel warning.
>
> Signed-off-by: Zorro Lang <zlang@redhat.com>
> ---
>
> Hi,
>
> This case is similar with generic/114 which use same aio-dio-eof-race.c
> program. I was planning to write an aio writer, but I found there's
> already a good program written by Eric, so I turn to use it directly :)
>
Well, that is convenient. ;) A few quick notes...
> This case is different with g/114, this case runs aio-dio-eof-race
> with an "open & close" process together. This multi-processes
> maybe trigger a free_eofblocks race with file extending dio write.
>
> Thanks,
> Zorro
>
>
> tests/generic/426 | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++
> tests/generic/426.out | 2 ++
> tests/generic/group | 1 +
> 3 files changed, 90 insertions(+)
> create mode 100755 tests/generic/426
> create mode 100644 tests/generic/426.out
>
> diff --git a/tests/generic/426 b/tests/generic/426
> new file mode 100755
> index 0000000..ff022ad
> --- /dev/null
> +++ b/tests/generic/426
> @@ -0,0 +1,87 @@
> +#! /bin/bash
> +# FS QA Test 426
> +#
> +# Try to trigger a race of free eofblocks and file extending dio writes.
> +# A known bug of XFS has been fixed by "e4229d6 xfs: fix eofblocks race
> +# with file extending async dio writes"
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2017 Red Hat Inc. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_aiodio aio-dio-eof-race
> +
It looks like this tool doesn't do a ton of writing... up to 8MB. Has
that proven to be a reliable reproducer in your testing so far? I'm kind
of wondering if we should update that tool with a flag to accept a
configurable file size, but that's not a big deal if the reproducer is
reliable.
> +# limit the filesystem size, to save the time of filling filesystem
> +_scratch_mkfs_sized $((1024 * 1024 * 1024)) >>$seqres.full 2>&1
> +_scratch_mount
> +
That's 1G... I wonder if we can get away with smaller (a couple hundred
MB perhaps?), since we have to fill the whole thing first.
> +# try to write more bytes than filesystem size to fill the filesystem
> +$XFS_IO_PROG -f -c "pwrite -S 0x55 0 $((1024 * 1024 * 1024 * 2))" \
> + $SCRATCH_MNT/fillfs-$seq 2>/dev/null
> +rm -f $SCRATCH_MNT/fillfs-$seq
> +
> +# start a background aio writer, which does several extending loops
> +# internally and check data integrality
> +$AIO_TEST $SCRATCH_MNT/tst-aio-dio-eof-race.$seq &
> +aio_pid=$!
> +
> +# open & close the file frequently, to trigger xfs_free_eofblocks
> +while true; do
> + cat $SCRATCH_MNT/tst-aio-dio-eof-race.$seq >/dev/null 2>&1
> +done &
> +open_close_pid=$!
> +
We probably want to start the open/close loop before the writer to
reduce the chances of the writer doing much work before the open/close
loop actually starts.
I also wonder if something like the following would be better than using
cat:
while true; do
exec 3<>$SCRATCH_MNT/tst-aio-dio-eof-race.$seq
exec 3<>-
done
... because we can skip useless reads of the file. Something like
'$XFS_IO -c open <file>' may be another option as well.
Brian
> +wait $aio_pid
> +status=$?
> +kill $open_close_pid
> +wait $open_close_pid
> +if [ $status -ne 0 ]; then
> + od -t x1 $SCRATCH_MNT/tst-aio-dio-eof-race.$seq >> $seqres.full
> + exit
> +fi
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/426.out b/tests/generic/426.out
> new file mode 100644
> index 0000000..ad7a01a
> --- /dev/null
> +++ b/tests/generic/426.out
> @@ -0,0 +1,2 @@
> +QA output created by 426
> +Success, all done.
> diff --git a/tests/generic/group b/tests/generic/group
> index 6d6e4f6..70c36f9 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -428,3 +428,4 @@
> 423 auto quick
> 424 auto quick
> 425 auto quick attr
> +426 auto aio rw
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-04-20 18:22 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-20 15:21 [PATCH] generic: test eofblocks race with file extending aio dio writes Zorro Lang
2017-04-20 18:22 ` Brian Foster [this message]
2017-04-24 5:32 ` Zorro Lang
2017-04-24 12:16 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170420182228.GA47198@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=fstests@vger.kernel.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.