From: Dave Chinner <david@fromorbit.com>
To: fstests@vger.kernel.org
Subject: [PATCH 10/40] fstests: fix DM device creation/removal vs udev races
Date: Wed, 27 Nov 2024 15:51:40 +1100 [thread overview]
Message-ID: <20241127045403.3665299-11-david@fromorbit.com> (raw)
In-Reply-To: <20241127045403.3665299-1-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
When there is load on the system, newly created DM devices don't
seem to be created consistently. When a new device is created,
it is supposed to be created as /dev/dm-X, and then a udev rule
creates the symlink from /dev/mapper/<dev name> to /dev/dm-X.
Unfortunately, a lot of the tests that use dynamically created dm
devices (dmerror, dmflakey) are not being created with this device
node structure. This is resulting in getting the wrong short device
name for the block device and hence we can't find the filesystem
sysfs attribute directory for the filesystem on that block device.
For example, with added debug to check what device name was being
passed around and resolved:
eneric/489 - output mismatch (see /mnt/xfs/runner-10/results/xfs/generic/489.out.bad)
--- tests/generic/489.out 2022-12-21 15:53:25.503043574 +1100
+++ /mnt/xfs/runner-10/results/xfs/generic/489.out.bad 2024-10-24 10:27:29.767196340 +1100
@@ -1,4 +1,10 @@
QA output created by 489
+./common/rc: line 4955: /sys/fs/xfs/flakey-test.489/error/fail_at_unmount: No such file or directory
+dev: /dev/mapper/flakey-test.489
+resolved dev: /dev/mapper/flakey-test.489
+brw-rw----. 1 root disk 251, 5 Oct 24 10:27 /dev/mapper/flakey-test.489
+./common/rc: line 4955: /sys/fs/xfs/flakey-test.489/error/metadata/EIO/max_retries: No such file or directory
+./common/rc: line 4955: /sys/fs/xfs/flakey-test.489/error/metadata/EIO/retry_timeout_seconds: No such file or directory
...
(Run 'diff -u /home/dave/src/xfstests-dev/tests/generic/489.out /mnt/xfs/runner-10/results/xfs/generic/489.out.bad' to see the entire diff)
Here we see that the block device node is actually at
/dev/mapper/flakey-test.489, not a link to a /dev/dm-X device node.
This implies that the udev rule to create the /dev/dm-X node and
the symlink to it at /dev/mapper/flakey-test.489 has not run, and
something else created the device node.
That looks like a bug in _dmsetup_create(). It creates the new DM
device, then runs 'dmsetup mknodes', then waits for udev to settle.
This means the mknodes command - which makes sure the dm device
nodes exist - is racing with udev to create the device nodes. They
don't use the same rules to create nodes, so we end up with this
broken situation.
'dmsetup mknodes' is considered legacy functionality, intended for
systems that have no udev capability. For systems that have udev
enabled (i.e. all modern distros), mknodes should not be run because
it creates a different device node structure to what udev creates
and can race with udev as we see here.
Fix it by removing the 'dmsetup mknodes' as it is unnecessary to
create the correct device node layout the rest of the system is
expecting to see.
Additionally,_dmsetup_remove() calls 'dmsetup mknodes' and that can
also race with udev and cause issues. Hence we need to remove that
call from the remove operation as well.
Further, 'dmsetup remove' is also subject to races with udev which
results in device remove failing. This problem is documented in the
dmsetup man page and suggests the use of the "--retry" option. This
means dmsetup will retry several times over a few seconds before
failing the removal.
This reduces the remove failure rate substantially,
but it can still occasionally fail when the system is under heavy
load and udev processing is very slow. This is fixable, but requires
fstests udev infrastructure changes as it requires udevadm
functionality that is relatively new. Hence that will be done as
a separate fix.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
common/rc | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/common/rc b/common/rc
index 391370fd5..a601e2c80 100644
--- a/common/rc
+++ b/common/rc
@@ -5162,8 +5162,8 @@ _require_label_get_max()
_dmsetup_remove()
{
$UDEV_SETTLE_PROG >/dev/null 2>&1
- $DMSETUP_PROG remove "$@" >>$seqres.full 2>&1
- $DMSETUP_PROG mknodes >/dev/null 2>&1
+ $DMSETUP_PROG remove --retry "$@" >>$seqres.full 2>&1
+ $UDEV_SETTLE_PROG >/dev/null 2>&1
}
_dmsetup_create()
@@ -5174,7 +5174,6 @@ _dmsetup_create()
# device open won't also fail.
$UDEV_SETTLE_PROG >/dev/null 2>&1
$DMSETUP_PROG create "$@" >>$seqres.full 2>&1 || return 1
- $DMSETUP_PROG mknodes >/dev/null 2>&1
$UDEV_SETTLE_PROG >/dev/null 2>&1
}
--
2.45.2
next prev parent reply other threads:[~2024-11-27 4:54 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-27 4:51 [RFC PATCH 00/40] fstests: concurrent test execution Dave Chinner
2024-11-27 4:51 ` [PATCH 01/40] xfs/448: get rid of assert-on-failure Dave Chinner
2024-11-27 4:51 ` [PATCH 02/40] fstests: cleanup fsstress process management Dave Chinner
2024-11-29 4:03 ` Zorro Lang
2024-12-04 17:57 ` Zorro Lang
2024-12-05 4:42 ` Dave Chinner
2024-12-05 9:57 ` Zorro Lang
2024-12-04 18:04 ` Zorro Lang
2024-12-05 4:55 ` Dave Chinner
2024-12-05 10:05 ` Zorro Lang
2024-11-27 4:51 ` [PATCH 03/40] fuzzy: don't use killall Dave Chinner
2024-11-27 4:51 ` [PATCH 04/40] fstests: per-test dmflakey instances Dave Chinner
2024-11-27 4:51 ` [PATCH 05/40] fstests: per-test dmerror instances Dave Chinner
2024-11-27 4:51 ` [PATCH 06/40] fstests: per-test dmhuge instances Dave Chinner
2024-11-27 4:51 ` [PATCH 07/40] fstests: per-test dmthin instances Dave Chinner
2024-11-27 4:51 ` [PATCH 08/40] fstests: per-test dmdust instances Dave Chinner
2024-11-27 4:51 ` [PATCH 09/40] fstests: per-test dmdelay instances Dave Chinner
2024-11-27 4:51 ` Dave Chinner [this message]
2024-11-27 4:51 ` [PATCH 11/40] fstests: use syncfs rather than sync Dave Chinner
2024-11-27 4:51 ` [PATCH 12/40] fstests: clean up mount and unmount operations Dave Chinner
2024-11-27 4:51 ` [PATCH 13/40] fstests: clean up loop device instantiation Dave Chinner
2024-12-01 12:31 ` Zorro Lang
2024-12-01 12:50 ` Zorro Lang
2024-12-07 12:44 ` Zorro Lang
2024-12-07 18:59 ` Zorro Lang
2024-12-07 19:51 ` Zorro Lang
2024-11-27 4:51 ` [PATCH 14/40] fstests: xfs/227 is really slow Dave Chinner
2024-11-27 4:51 ` [PATCH 15/40] fstests: mark tests that are unreliable when run in parallel Dave Chinner
2024-11-27 4:51 ` [PATCH 16/40] fstests: use udevadm wait in preference to settle Dave Chinner
2024-11-29 17:10 ` Darrick J. Wong
2024-11-29 22:33 ` Dave Chinner
2024-11-30 2:34 ` Zorro Lang
2024-11-27 4:51 ` [PATCH 17/40] xfs/442: rescale load so it's not exponential Dave Chinner
2024-11-27 4:51 ` [PATCH 18/40] xfs/176: fix broken setup code Dave Chinner
2024-11-27 4:51 ` [PATCH 19/40] xfs/177: remove unused slab object count location checks Dave Chinner
2024-11-27 4:51 ` [PATCH 20/40] fstests: remove uses of killall where possible Dave Chinner
2024-11-27 4:51 ` [PATCH 21/40] generic/127: reduce runtime Dave Chinner
2024-11-27 4:51 ` [PATCH 22/40] quota: system project quota files need to be shared Dave Chinner
2024-11-27 4:51 ` [PATCH 23/40] dmesg: reduce noise from other tests Dave Chinner
2024-11-27 4:51 ` [PATCH 24/40] fstests: stop using /tmp directly Dave Chinner
2024-11-27 4:51 ` [PATCH 25/40] fstests: scale some tests for high CPU count sanity Dave Chinner
2024-11-29 3:34 ` Zorro Lang
2024-11-27 4:51 ` [PATCH 26/40] generic/310: cleanup killing background processes Dave Chinner
2024-11-27 4:51 ` [PATCH 27/40] filter: handle mount errors from CONFIG_BLK_DEV_WRITE_MOUNTED=y Dave Chinner
2024-11-27 4:51 ` [PATCH 28/40] filters: add a filter that accepts EIO instead of other errors Dave Chinner
2024-11-27 4:51 ` [PATCH 29/40] generic/085: general cleanup for reliability and debugging Dave Chinner
2024-11-27 4:52 ` [PATCH 30/40] fstests: don't use directory stacks Dave Chinner
2024-12-01 12:10 ` Zorro Lang
2024-12-01 21:37 ` Dave Chinner
2024-11-27 4:52 ` [PATCH 31/40] fstests: clean up a couple of dm-flakey tests Dave Chinner
2024-11-27 4:52 ` [PATCH 32/40] fstests: clean up termination of various tests Dave Chinner
2024-11-27 4:52 ` [PATCH 33/40] vfstests: some tests require the testdir to be shared Dave Chinner
2024-11-27 4:52 ` [PATCH 34/40] xfs/629: single extent files should be within tolerance Dave Chinner
2024-11-27 4:52 ` [PATCH 35/40] xfs/076: fix broken mkfs filtering Dave Chinner
2024-11-27 4:52 ` [PATCH 36/40] fstests: capture some failures to seqres.full Dave Chinner
2024-11-27 4:52 ` [PATCH 37/40] fstests: always use fail-at-unmount semantics for XFS Dave Chinner
2024-11-27 4:52 ` [PATCH 38/40] generic/062: don't leave debug files in $here on failure Dave Chinner
2024-11-27 4:52 ` [PATCH 39/40] fstests: quota grace periods unreliable under load Dave Chinner
2024-11-27 4:52 ` [PATCH 40/40] fstests: check-parallel Dave Chinner
2024-11-29 4:22 ` [RFC PATCH 00/40] fstests: concurrent test execution Zorro Lang
2024-12-07 0:09 ` Darrick J. Wong
2024-12-07 9:38 ` Zorro Lang
2024-12-08 0:02 ` Dave Chinner
2024-12-08 6:15 ` Zorro Lang
2024-12-10 0:55 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241127045403.3665299-11-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox