From: "Darrick J. Wong" <djwong@kernel.org>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Linux Filesystem Development List <linux-fsdevel@vger.kernel.org>,
fstests@vger.kernel.org
Subject: Re: Flaky test: generic:269 (EBUSY on umount)
Date: Wed, 12 Jun 2024 12:41:36 -0700 [thread overview]
Message-ID: <20240612194136.GA2764780@frogsfrogsfrogs> (raw)
In-Reply-To: <20240612162948.GA2093190@mit.edu>
On Wed, Jun 12, 2024 at 05:29:48PM +0100, Theodore Ts'o wrote:
> I've been trying to clear various failing or flaky tests, and in that
> context I've been finding that generic/269 is failing with a
> probability of ~5% on a wide variety of test scenarios on ext4, xfs,
> btrfs, and f2fs on 6.10-rc2 and on fs-next. (See below for the
> details; the failure probability ranges from 1% to 10% depending on
> the test config.)
>
> What generic/269 does is to run fsstress and ENOSPC hitters in
> parallel, and checks to make sure the file system is consistent at the
> end of the tests. Failure is caused by the umount of the file system
> failing with EBUSY. I've tried adding a sync and a "sync -f
> $SCRATCH_MNT" before the attempted _scratch_umount, and that doesn't
> seem to change the failure.
>
> However, on a failure, if you sleep for 10 seconds, and then retry the
> unmount, this seems to make the proble go away. This is despite the
> fact that we do wait for the fstress process to exit --- I vaguely
> recall that there is some kind of RCU failure which means that the
> umount will not reliably succeed under some circumstances. Do we
> think this is the right fix?
>
> (Note: when I tried shortening the sleep 10 to sleep 1, the problem
> came back; so this seems like a real hack. Thoughts?)
I don't see this problem; if you apply this to fstests to turn off
io_uring:
https://lore.kernel.org/fstests/169335095953.3534600.16325849760213190849.stgit@frogsfrogsfrogs/#r
do the problems go away?
--D
> Thanks,
>
> - Ted
>
> diff --git a/tests/generic/269 b/tests/generic/269
> index 29f453735..dad02abf3
> --- a/tests/generic/269
> +++ b/tests/generic/269
> @@ -51,9 +51,12 @@ if ! _workout; then
> fi
>
> if ! _scratch_unmount; then
> + sleep 10
> + if ! _scratch_unmount ; then
> echo "failed to umount"
> status=1
> exit
> + fi
> fi
> status=0
> exit
>
>
> ext4/4k: 50 tests, 2 failures, 1339 seconds
> Flaky: generic/269: 4% (2/50)
> ext4/1k: 50 tests, 5 failures, 1224 seconds
> Flaky: generic/269: 10% (5/50)
> ext4/ext3: 50 tests, 1477 seconds
> ext4/encrypt: 50 tests, 2 failures, 1253 seconds
> Flaky: generic/269: 4% (2/50)
> ext4/nojournal: 50 tests, 1 failures, 1503 seconds
> Flaky: generic/269: 2% (1/50)
> ext4/ext3conv: 50 tests, 4 failures, 1294 seconds
> Flaky: generic/269: 8% (4/50)
> ext4/adv: 50 tests, 2 failures, 1263 seconds
> Flaky: generic/269: 4% (2/50)
> ext4/dioread_nolock: 50 tests, 3 failures, 1327 seconds
> Flaky: generic/269: 6% (3/50)
> ext4/data_journal: 50 tests, 1 failures, 1317 seconds
> Flaky: generic/269: 2% (1/50)
> ext4/bigalloc_4k: 50 tests, 2 failures, 1193 seconds
> Flaky: generic/269: 4% (2/50)
> ext4/bigalloc_1k: 50 tests, 1259 seconds
> ext4/dax: 50 tests, 5 failures, 1136 seconds
> Flaky: generic/269: 10% (5/50)
> xfs/4k: 50 tests, 3 failures, 1211 seconds
> Flaky: generic/269: 6% (3/50)
> xfs/1k: 50 tests, 1219 seconds
> xfs/v4: 50 tests, 4 failures, 1206 seconds
> Flaky: generic/269: 8% (4/50)
> xfs/adv: 50 tests, 1 failures, 1206 seconds
> Flaky: generic/269: 2% (1/50)
> xfs/quota: 50 tests, 2 failures, 1460 seconds
> Flaky: generic/269: 4% (2/50)
> xfs/quota_1k: 50 tests, 1449 seconds
> xfs/dirblock_8k: 50 tests, 1 failures, 1351 seconds
> Flaky: generic/269: 2% (1/50)
> xfs/realtime: 50 tests, 1286 seconds
> xfs/realtime_28k_logdev: 50 tests, 1234 seconds
> xfs/realtime_logdev: 50 tests, 1259 seconds
> xfs/logdev: 50 tests, 3 failures, 1390 seconds
> Flaky: generic/269: 6% (3/50)
> xfs/dax: 50 tests, 1125 seconds
> btrfs/default: 50 tests, 1573 seconds
> f2fs/default: 50 tests, 1471 seconds
> f2fs/encrypt: 50 tests, 1 failures, 1424 seconds
> Flaky: generic/269: 2% (1/50)
> Totals: 1350 tests, 0 skipped, 42 failures, 0 errors, 35449s
>
>
next prev parent reply other threads:[~2024-06-12 19:41 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-12 16:29 Flaky test: generic:269 (EBUSY on umount) Theodore Ts'o
2024-06-12 19:41 ` Darrick J. Wong [this message]
2024-06-13 21:56 ` Theodore Ts'o
2024-06-13 22:18 ` [PATCH 1/2] generic/269, generic/475: disable io_uring to prevent umount EBUSY flakes Theodore Ts'o
2024-06-13 22:18 ` [PATCH 2/2] generic: new test which tests for an io_uring bug that causes umounts to fail Theodore Ts'o
2024-06-14 4:16 ` Flaky test: generic:269 (EBUSY on umount) Darrick J. Wong
2024-06-14 18:27 ` Theodore Ts'o
2024-06-14 20:44 ` Darrick J. Wong
2024-07-12 2:30 ` Theodore Ts'o
2024-08-23 1:16 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240612194136.GA2764780@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=fstests@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).