FS/XFS testing framework
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Linux Filesystem Development List <linux-fsdevel@vger.kernel.org>,
	fstests@vger.kernel.org
Subject: Flaky test: generic:269 (EBUSY on umount)
Date: Wed, 12 Jun 2024 17:29:48 +0100	[thread overview]
Message-ID: <20240612162948.GA2093190@mit.edu> (raw)

I've been trying to clear various failing or flaky tests, and in that
context I've been finding that generic/269 is failing with a
probability of ~5% on a wide variety of test scenarios on ext4, xfs,
btrfs, and f2fs on 6.10-rc2 and on fs-next.  (See below for the
details; the failure probability ranges from 1% to 10% depending on
the test config.)

What generic/269 does is to run fsstress and ENOSPC hitters in
parallel, and checks to make sure the file system is consistent at the
end of the tests.  Failure is caused by the umount of the file system
failing with EBUSY.  I've tried adding a sync and a "sync -f
$SCRATCH_MNT" before the attempted _scratch_umount, and that doesn't
seem to change the failure.

However, on a failure, if you sleep for 10 seconds, and then retry the
unmount, this seems to make the proble go away.  This is despite the
fact that we do wait for the fstress process to exit --- I vaguely
recall that there is some kind of RCU failure which means that the
umount will not reliably succeed under some circumstances.  Do we
think this is the right fix?

(Note: when I tried shortening the sleep 10 to sleep 1, the problem
came back; so this seems like a real hack.   Thoughts?)

Thanks,

     	      	   	      	     - Ted

diff --git a/tests/generic/269 b/tests/generic/269
index 29f453735..dad02abf3
--- a/tests/generic/269
+++ b/tests/generic/269
@@ -51,9 +51,12 @@ if ! _workout; then
 fi
 
 if ! _scratch_unmount; then
+    sleep 10
+    if ! _scratch_unmount ; then
 	echo "failed to umount"
 	status=1
 	exit
+    fi
 fi
 status=0
 exit


ext4/4k: 50 tests, 2 failures, 1339 seconds
  Flaky: generic/269:  4% (2/50)
ext4/1k: 50 tests, 5 failures, 1224 seconds
  Flaky: generic/269: 10% (5/50)
ext4/ext3: 50 tests, 1477 seconds
ext4/encrypt: 50 tests, 2 failures, 1253 seconds
  Flaky: generic/269:  4% (2/50)
ext4/nojournal: 50 tests, 1 failures, 1503 seconds
  Flaky: generic/269:  2% (1/50)
ext4/ext3conv: 50 tests, 4 failures, 1294 seconds
  Flaky: generic/269:  8% (4/50)
ext4/adv: 50 tests, 2 failures, 1263 seconds
  Flaky: generic/269:  4% (2/50)
ext4/dioread_nolock: 50 tests, 3 failures, 1327 seconds
  Flaky: generic/269:  6% (3/50)
ext4/data_journal: 50 tests, 1 failures, 1317 seconds
  Flaky: generic/269:  2% (1/50)
ext4/bigalloc_4k: 50 tests, 2 failures, 1193 seconds
  Flaky: generic/269:  4% (2/50)
ext4/bigalloc_1k: 50 tests, 1259 seconds
ext4/dax: 50 tests, 5 failures, 1136 seconds
  Flaky: generic/269: 10% (5/50)
xfs/4k: 50 tests, 3 failures, 1211 seconds
  Flaky: generic/269:  6% (3/50)
xfs/1k: 50 tests, 1219 seconds
xfs/v4: 50 tests, 4 failures, 1206 seconds
  Flaky: generic/269:  8% (4/50)
xfs/adv: 50 tests, 1 failures, 1206 seconds
  Flaky: generic/269:  2% (1/50)
xfs/quota: 50 tests, 2 failures, 1460 seconds
  Flaky: generic/269:  4% (2/50)
xfs/quota_1k: 50 tests, 1449 seconds
xfs/dirblock_8k: 50 tests, 1 failures, 1351 seconds
  Flaky: generic/269:  2% (1/50)
xfs/realtime: 50 tests, 1286 seconds
xfs/realtime_28k_logdev: 50 tests, 1234 seconds
xfs/realtime_logdev: 50 tests, 1259 seconds
xfs/logdev: 50 tests, 3 failures, 1390 seconds
  Flaky: generic/269:  6% (3/50)
xfs/dax: 50 tests, 1125 seconds
btrfs/default: 50 tests, 1573 seconds
f2fs/default: 50 tests, 1471 seconds
f2fs/encrypt: 50 tests, 1 failures, 1424 seconds
  Flaky: generic/269:  2% (1/50)
Totals: 1350 tests, 0 skipped, 42 failures, 0 errors, 35449s


             reply	other threads:[~2024-06-12 16:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-12 16:29 Theodore Ts'o [this message]
2024-06-12 19:41 ` Flaky test: generic:269 (EBUSY on umount) Darrick J. Wong
2024-06-13 21:56   ` Theodore Ts'o
2024-06-13 22:18     ` [PATCH 1/2] generic/269, generic/475: disable io_uring to prevent umount EBUSY flakes Theodore Ts'o
2024-06-13 22:18       ` [PATCH 2/2] generic: new test which tests for an io_uring bug that causes umounts to fail Theodore Ts'o
2024-06-14  4:16     ` Flaky test: generic:269 (EBUSY on umount) Darrick J. Wong
2024-06-14 18:27       ` Theodore Ts'o
2024-06-14 20:44         ` Darrick J. Wong
2024-07-12  2:30           ` Theodore Ts'o
2024-08-23  1:16             ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240612162948.GA2093190@mit.edu \
    --to=tytso@mit.edu \
    --cc=fstests@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox