From: Brian Foster <bfoster@redhat.com>
To: kaixuxia <xiakaixu1987@gmail.com>
Cc: fstests@vger.kernel.org, linux-xfs@vger.kernel.org,
Eryu Guan <guaneryu@gmail.com>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
newtongao@tencent.com, jasperwang@tencent.com
Subject: Re: [PATCH v3 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT
Date: Fri, 20 Sep 2019 08:20:59 -0400 [thread overview]
Message-ID: <20190920122059.GA40150@bfoster> (raw)
In-Reply-To: <fed287ab-c083-64d5-d934-da7d11e24917@gmail.com>
On Fri, Sep 20, 2019 at 05:18:00PM +0800, kaixuxia wrote:
>
> On 2019/9/19 20:26, Brian Foster wrote:
> > On Thu, Sep 19, 2019 at 08:14:10PM +0800, kaixuxia wrote:
> >>
> >>
> >> On 2019/9/19 18:47, Brian Foster wrote:
> >>> On Thu, Sep 19, 2019 at 05:08:04PM +0800, kaixuxia wrote:
> >>>>
> >>>>
> >>>> On 2019/9/18 21:59, Brian Foster wrote:
> >>>>> On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote:
> >>>>>> There is ABBA deadlock bug between the AGI and AGF when performing
> >>>>>> rename() with RENAME_WHITEOUT flag, and add this testcase to make
> >>>>>> sure the rename() call works well.
> >>>>>>
> >>>>>> Signed-off-by: kaixuxia <kaixuxia@tencent.com>
> >>>>>> ---
> >>>>>
> >>>>> FYI, for some reason your patch series isn't threaded on the mailing
> >>>>> list. I thought git send-email did this by default. Assuming you're not
> >>>>> explicitly using --no-thread, you might have to use the --thread option
> >>>>> so this gets posted as a proper series.
> >>>>>
> >>>> Yeah, thanks!
> >>>>>> tests/xfs/512 | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>> tests/xfs/512.out | 2 ++
> >>>>>> tests/xfs/group | 1 +
> >>>>>> 3 files changed, 99 insertions(+)
> >>>>>> create mode 100755 tests/xfs/512
> >>>>>> create mode 100644 tests/xfs/512.out
> >>>>>>
> >>>>>> diff --git a/tests/xfs/512 b/tests/xfs/512
> >>>>>> new file mode 100755
> >>>>>> index 0000000..a2089f0
> >>>>>> --- /dev/null
> >>>>>> +++ b/tests/xfs/512
> >>>>>> @@ -0,0 +1,96 @@
> >>>>>> +#! /bin/bash
> >>>>>> +# SPDX-License-Identifier: GPL-2.0
> >>>>>> +# Copyright (c) 2019 Tencent. All Rights Reserved.
> >>>>>> +#
> >>>>>> +# FS QA Test 512
> >>>>>> +#
> >>>>>> +# Test the ABBA deadlock case between the AGI and AGF When performing
> >>>>>> +# rename operation with RENAME_WHITEOUT flag.
> >>>>>> +#
> >>>>>> +seq=`basename $0`
> >>>>>> +seqres=$RESULT_DIR/$seq
> >>>>>> +echo "QA output created by $seq"
> >>>>>> +
> >>>>>> +here=`pwd`
> >>>>>> +tmp=/tmp/$$
> >>>>>> +status=1 # failure is the default!
> >>>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
> >>>>>> +
> >>>>>> +_cleanup()
> >>>>>> +{
> >>>>>> + cd /
> >>>>>> + rm -f $tmp.*
> >>>>>> +}
> >>>>>> +
> >>>>>> +# get standard environment, filters and checks
> >>>>>> +. ./common/rc
> >>>>>> +. ./common/filter
> >>>>>> +. ./common/renameat2
> >>>>>> +
> >>>>>> +rm -f $seqres.full
> >>>>>> +
> >>>>>> +# real QA test starts here
> >>>>>> +_supported_fs xfs
> >>>>>> +_supported_os Linux
> >>>>>> +# single AG will cause default xfs_repair to fail. This test need a
> >>>>>> +# single AG fs, so ignore the check.
> >>>>>> +_require_scratch_nocheck
> >>>>>> +_requires_renameat2 whiteout
> >>>>>> +
> >>>>>> +filter_enospc() {
> >>>>>> + sed -e '/^.*No space left on device.*/d'
> >>>>>> +}
> >>>>>> +
> >>>>>> +create_file()
> >>>>>> +{
> >>>>>> + local target_dir=$1
> >>>>>> + local files_count=$2
> >>>>>> +
> >>>>>> + for i in $(seq 1 $files_count); do
> >>>>>> + echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc
> >>>>>> + done
> >>>>>> +}
> >>>>>> +
> >>>>>> +rename_whiteout()
> >>>>>> +{
> >>>>>> + local target_dir=$1
> >>>>>> + local files_count=$2
> >>>>>> +
> >>>>>> + # a long filename could increase the possibility that target_dp
> >>>>>> + # allocate new blocks(acquire the AGF lock) to store the filename
> >>>>>> + longnamepre=`$PERL_PROG -e 'print "a"x200;'`
> >>>>>> +
> >>>>>> + # now try to do rename with RENAME_WHITEOUT flag
> >>>>>> + for i in $(seq 1 $files_count); do
> >>>>>> + src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1
> >>>>>> + done
> >>>>>> +}
> >>>>>> +
> >>>>>> +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 ||
> >>>>>> + _fail "mkfs failed"
> >>>>>
> >>>>> This appears to be the only XFS specific bit. Could it be
> >>>>> conditionalized using FSTYP such that this test could go under
> >>>>> tests/generic?
> >>>>>
> >>>> OK, I'll move this test to tests/generic by using FSTYP.
> >>>>
> >>>>>> +_scratch_mount
> >>>>>> +
> >>>>>> +# set the rename and create file counts
> >>>>>> +file_count=50000
> >>>>>> +
> >>>>>> +# create the necessary directory for create and rename operations
> >>>>>> +createdir=$SCRATCH_MNT/createdir
> >>>>>> +mkdir $createdir
> >>>>>> +renamedir=$SCRATCH_MNT/renamedir
> >>>>>> +mkdir $renamedir
> >>>>>> +
> >>>>>> +# create many small files for the rename with RENAME_WHITEOUT
> >>>>>> +create_file $SCRATCH_MNT $file_count
> >>>>>> +
> >>>>>> +# try to create files at the same time to hit the deadlock
> >>>>>> +rename_whiteout $renamedir $file_count &
> >>>>>> +create_file $createdir $file_count &
> >>>>>> +
> >>>>>
> >>>>> When I ran this test I noticed that the rename_whiteout task completed
> >>>>> renaming the 50k files before the create_file task created even 30k of
> >>>>> the 50k files. There's no risk of deadlock once one of these tasks
> >>>>> completes, right? If so, that seems like something that could be fixed
> >>>>> up.
> >>>>>
> >>>>> Beyond that though, the test itself ran for almost 19 minutes on a vm
> >>>>> with the deadlock fix. That seems like overkill to me for a test that's
> >>>>> so narrowly focused on a particular bug that it's unlikely to fail in
> >>>>> the future. If we can't find a way to get this down to a reasonable time
> >>>>> while still reproducing the deadlock, I'm kind of wondering if there's a
> >>>>> better approach to get more rename coverage from existing tests. For
> >>>>> example, could we add this support to fsstress and see if any of the
> >>>>> existing stress tests might trigger the original problem? Even if we
> >>>>> needed to add a new rename/create focused fsstress test, that might at
> >>>>> least be general enough to provide broader coverage.
> >>>>>
> >>>> Yeah, rename_whiteout task run faster than create_file task, so maybe
> >>>> we can set two different files counts for them to reduce the test run
> >>>> time. This test ran for 380s on my vm with the fixed kernel, but we
> >>>> still need to find a way to reduce the run time, like the 19 minutes
> >>>> case. Actually, in most cases, the deadlock happened when the
> >>>> rename_whiteout task completed renaming hundreds of files. 50000
> >>>> is set just because this test take 380s on my vm which is acceptable
> >>>> and the reproduce possibility is near 100%. So maybe we can choose a
> >>>> proper files count to make the test runs faster. Of course, I'll
> >>>> also try to use fsstresss and the TIME_FACTOR if they can help to
> >>>> reduce the run time.
> >>>>
> >>>
> >>> I think using different file counts as such is too unpredictable across
> >>> different test environments. If we end up with something like the
> >>> current test, I'd rather see explicit logic in the test to terminate the
> >>> workload thread when the rename thread completes. This probably would
> >>> have knocked 2-3 minutes off the slow runtime I reported above.
> >>>
> >>> That aside, I think the fsstress approach is preferable because there is
> >>> at least potential to avoid the need for a new test. The relevant
> >>> questions to me are:
> >>>
> >>> 1.) If you add renameat2 support to fsstress, do any of the existing
> >>> fsstress tests reproduce the original problem?
> >>
> >> Not sure about this, need to do research whether there are existing
> >> fsstress tests can reproduce the problem.
> >
> > Right, but this will require some work to fsstress to add renameat2
> > support. To Eryu's earlier point, however, that is probably a useful
> > patch regardless of what approach we take below.
>
> Yeah, if taking the fsstress approach to reproduce the deadlock we need
> to add renameat2 support for fsstress. Given that the deadlock is a corner
> case and need some special settings for higher reproduce possibility, such
> as the smaller bsize and agcount, the longer rename target filename, and all
> the rename call need have the same target_dp to acquire the AGF lock
> (xfs_dir_createname()), so we may need a separate test with using customized
> parameters(limited to rename(whiteout) and creates). If not, the possibility
> would be lower and also need longer run time.
>
> >
> >>>
> >>> 2.) If not, can fsstress reproduce the problem using customized
> >>> parameters (i.e., limited to rename and creates)? If so, we may still
> >>> need a separate test, but it would be trivial in that it just invokes
> >>> fsstress with particular flags for a period of time.
> >>>
> >>> 3.) If not, then we need to find a way for this test to run quicker. At
> >>> this point, I'm curious how long it takes for this test to reproduce the
> >>> problem (on a broken kernel) on average once the setup portion
> >>> completes. More than a minute or two, for example, or tens of minutes..
> >>> etc.?
> >>>
> >> About five minutes with 50000 files count on a broken kernel to reproduce
> >> the deadlock on my vm, and the most time is preparing 50000 empty files for
> >> the rename operation.
> >>
> >
> > Ok, so how much time is that outside of the file creation part? You
> > could add some timestamps to the test to figure that out if necessary.
> > How many files were renamed before the deadlock occurred?
> >
> About one or two minutes that outside of the file creation on my vm.
> The number of renamed files before the deadlock occurred is not fixed,
> the most common range is from 500 to 10000 files. Of course, this range
> maybe change on different test environments...
>
Ok. So FWIW a 10k file count and immediate exit after the renameat task
completes is a less noticeable 3m45s on my low end vm. I'd still prefer
to see an fsstress test if possible, but that might be a reasonable
fallback option.
Brian
> > Brian
> >
> >> A example for deadlock happened when renaming 2729 files.
> >> call trace:
> >> root 31829 ... D+ ... /renamedir/aaaaaaaaaaaaaaaaaaa...aaaaaaaaaaaaaaa2729
> >> # cat /proc/31829/stack
> >> [<0>] xfs_buf_lock+0x34/0xf0 [xfs]
> >> [<0>] xfs_buf_find+0x215/0x6c0 [xfs]
> >> [<0>] xfs_buf_get_map+0x37/0x230 [xfs]
> >> [<0>] xfs_buf_read_map+0x29/0x190 [xfs]
> >> [<0>] xfs_trans_read_buf_map+0x13d/0x520 [xfs]
> >> [<0>] xfs_read_agi+0xa8/0x160 [xfs]
> >> [<0>] xfs_iunlink_remove+0x6f/0x2a0 [xfs]
> >> [<0>] xfs_rename+0x57a/0xae0 [xfs]
> >> [<0>] xfs_vn_rename+0xe4/0x150 [xfs]
> >> [<0>] vfs_rename+0x1f4/0x7b0
> >> [<0>] do_renameat2+0x431/0x4c0
> >> [<0>] __x64_sys_renameat2+0x20/0x30
> >> [<0>] do_syscall_64+0x49/0x120
> >> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>
> >>> Brian
> >>>
> >>>>> Alternatively, what if this test ran a create/rename workload (on a
> >>>>> smaller fileset) for a fixed time of a minute or two and then exited? I
> >>>>> think it would be a reasonable compromise if the test still reproduced
> >>>>> on some smaller frequency, it's just not clear to me how effective such
> >>>>> a test would be without actually trying it. Maybe Eryu has additional
> >>>>> thoughts..
> >>>>>
> >>>>> Brian
> >>>>>
> >>>>>> +wait
> >>>>>> +echo Silence is golden
> >>>>>> +
> >>>>>> +# Failure comes in the form of a deadlock.
> >>>>>> +
> >>>>>> +# success, all done
> >>>>>> +status=0
> >>>>>> +exit
> >>>>>> diff --git a/tests/xfs/512.out b/tests/xfs/512.out
> >>>>>> new file mode 100644
> >>>>>> index 0000000..0aabdef
> >>>>>> --- /dev/null
> >>>>>> +++ b/tests/xfs/512.out
> >>>>>> @@ -0,0 +1,2 @@
> >>>>>> +QA output created by 512
> >>>>>> +Silence is golden
> >>>>>> diff --git a/tests/xfs/group b/tests/xfs/group
> >>>>>> index a7ad300..ed250d6 100644
> >>>>>> --- a/tests/xfs/group
> >>>>>> +++ b/tests/xfs/group
> >>>>>> @@ -509,3 +509,4 @@
> >>>>>> 509 auto ioctl
> >>>>>> 510 auto ioctl quick
> >>>>>> 511 auto quick quota
> >>>>>> +512 auto rename
> >>>>>> --
> >>>>>> 1.8.3.1
> >>>>>>
> >>>>>> --
> >>>>>> kaixuxia
> >>>>
> >>>> --
> >>>> kaixuxia
> >>
> >> --
> >> kaixuxia
>
> --
> kaixuxia
prev parent reply other threads:[~2019-09-20 12:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-18 11:49 [PATCH v3 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT kaixuxia
2019-09-18 13:59 ` Brian Foster
2019-09-19 2:37 ` Eryu Guan
2019-09-19 9:08 ` kaixuxia
2019-09-19 10:47 ` Brian Foster
2019-09-19 12:14 ` kaixuxia
2019-09-19 12:26 ` Brian Foster
2019-09-20 9:18 ` kaixuxia
2019-09-20 12:20 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190920122059.GA40150@bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=fstests@vger.kernel.org \
--cc=guaneryu@gmail.com \
--cc=jasperwang@tencent.com \
--cc=linux-xfs@vger.kernel.org \
--cc=newtongao@tencent.com \
--cc=xiakaixu1987@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox