From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: zlang@redhat.com, hch@lst.de, fstests@vger.kernel.org,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 08/23] common: fix pkill by running test program in a separate session
Date: Tue, 28 Jan 2025 19:13:13 -0800 [thread overview]
Message-ID: <20250129031313.GV3557695@frogsfrogsfrogs> (raw)
In-Reply-To: <Z5lAek54UK8mdFs-@dread.disaster.area>
On Wed, Jan 29, 2025 at 07:39:22AM +1100, Dave Chinner wrote:
> On Mon, Jan 27, 2025 at 11:23:52PM -0800, Darrick J. Wong wrote:
> > On Tue, Jan 28, 2025 at 03:34:50PM +1100, Dave Chinner wrote:
> > > On Thu, Jan 23, 2025 at 12:16:50PM +1100, Dave Chinner wrote:
> > > 4. /tmp is still shared across all runner instances so all the
> > >
> > > concurrent runners dump all their tmp files in /tmp. However, the
> > > runners no longer have unique PIDs (i.e. check runs as PID 3 in
> > > all runner instaces). This means using /tmp/tmp.$$ as the
> > > check/test temp file definition results is instant tmp file name
> > > collisions and random things in check and tests fail. check and
> > > common/preamble have to be converted to use `mktemp` to provide
> > > unique tempfile name prefixes again.
> > >
> > > 5. Don't forget to revert the parent /proc mount back to shared
> > > after check has finished running (or was aborted).
> > >
> > > I think with this (current prototype patch below), we can use PID
> > > namespaces rather than process session IDs for check-parallel safe
> > > process management.
> > >
> > > Thoughts?
> >
> > Was about to go to bed, but can we also start a new mount namespace,
> > create a private (or at least non-global) /tmp to put files into, and
> > then each test instance is isolated from clobbering the /tmpfiles of
> > other ./check instances *and* the rest of the system?
>
> We probably can. I didn't want to go down that rat hole straight
> away, because then I'd have to make a decision about what to mount
> there. One thing at a time....
>
> I suspect that I can just use a tmpfs filesystem for it - there's
> heaps of memory available on my test machines and we don't use /tmp
> to hold large files, so that should work fine for me. However, I'm
> a little concerned about what will happen when testing under memory
> pressure situations if /tmp needs memory to operate correctly.
>
> I'll have a look at what is needed for private tmpfs /tmp instances
> to work - it should work just fine.
>
> However, if check-parallel has taught me anything, it is that trying
> to use "should work" features on a modern system tends to mean "this
> is a poorly documented rat-hole that with many dead-ends that will
> be explored before a working solution is found"...
<nod> I'm running an experiment overnight with the following patch to
get rid of the session id grossness. AFAICT it can also be used by
check-parallel to start its subprocesses in separate pid namespaces,
though I didn't investigate thoroughly.
I'm also not sure it's required for check-helper to unmount the /proc
that it creates; merely exiting seems to clean everything up? <shrug>
I also tried using systemd-nspawn to run fstests inside a "container"
but very quickly discovered that you can't pass block devices to the
container which makes fstests pretty useless for testing real scsi
devices. :/
--D
check: run tests in a private pid/mount namespace
Experiment with running tests in a private pid/mount namespace for
better isolation of the scheduler (check) vs the test (./$seq). This
also makes it cleaner to adjust the oom score and is a convenient place
to set up the environment before invoking the test.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
check | 30 +++++-------------------------
check-helper | 31 +++++++++++++++++++++++++++++++
common/rc | 8 +++-----
3 files changed, 39 insertions(+), 30 deletions(-)
create mode 100755 check-helper
diff --git a/check b/check
index 00ee5af2a31e34..9c70f6f1e10110 100755
--- a/check
+++ b/check
@@ -698,41 +698,21 @@ _adjust_oom_score -500
# systemd doesn't automatically remove transient scopes that fail to terminate
# when systemd tells them to terminate (e.g. programs stuck in D state when
# systemd sends SIGKILL), so we use reset-failed to tear down the scope.
-#
-# Use setsid to run the test program with a separate session id so that we
-# can pkill only the processes started by this test.
_run_seq() {
- local cmd=(bash -c "test -w ${OOM_SCORE_ADJ} && echo 250 > ${OOM_SCORE_ADJ}; exec setsid bash ./$seq")
+ local cmd=("./check-helper" "./$seq")
if [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then
local unit="$(systemd-escape "fs$seq").scope"
systemctl reset-failed "${unit}" &> /dev/null
- systemd-run --quiet --unit "${unit}" --scope "${cmd[@]}" &
- CHILDPID=$!
- wait
+ systemd-run --quiet --unit "${unit}" --scope "${cmd[@]}"
res=$?
- unset CHILDPID
systemctl stop "${unit}" &> /dev/null
return "${res}"
else
# bash won't run the SIGINT trap handler while there are
# foreground children in a separate session, so we must run
# the test in the background and wait for it.
- "${cmd[@]}" &
- CHILDPID=$!
- wait
- unset CHILDPID
- fi
-}
-
-_kill_seq() {
- if [ -n "$CHILDPID" ]; then
- # SIGPIPE will kill all the children (including fsstress)
- # without bash logging fatal signal termination messages to the
- # console
- pkill -PIPE --session "$CHILDPID"
- wait
- unset CHILDPID
+ "${cmd[@]}"
fi
}
@@ -741,9 +721,9 @@ _prepare_test_list
fstests_start_time="$(date +"%F %T")"
if $OPTIONS_HAVE_SECTIONS; then
- trap "_kill_seq; _summary; exit \$status" 0 1 2 3 15
+ trap "_summary; exit \$status" 0 1 2 3 15
else
- trap "_kill_seq; _wrapup; exit \$status" 0 1 2 3 15
+ trap "_wrapup; exit \$status" 0 1 2 3 15
fi
function run_section()
diff --git a/check-helper b/check-helper
new file mode 100755
index 00000000000000..2cc8dbe5cfc791
--- /dev/null
+++ b/check-helper
@@ -0,0 +1,31 @@
+#!/bin/bash
+
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2025 Oracle. All Rights Reserved.
+#
+# Try starting things in a private pid/mount namespace with a private /tmp
+# and /proc so that child process trees cannot interfere with each other.
+
+if [ -n "${IN_NSEXEC}" ]; then
+ for path in /proc /tmp; do
+ mount --make-private "$path"
+ done
+ unset IN_NSEXEC
+ mount -t proc proc /proc
+ mount -t tmpfs tmpfs /tmp
+
+ oom_knob="/proc/self/oom_score_adj"
+ test -w "${oom_knob}" && echo 250 > "${oom_knob}"
+
+ # Run the test, but don't let it be pid 1 because that will confuse
+ # the filter functions in common/dump.
+ "$@"
+ exit
+fi
+
+if [ -z "$1" ] || [ "$1" = "--help" ]; then
+ echo "Usage: $0 command [args...]"
+ exit 1
+fi
+
+IN_NSEXEC=1 exec "$(dirname "$0")/src/nsexec" -m -p $0 "$@"
diff --git a/common/rc b/common/rc
index 1c28a2d190f5a0..cc080ecaa9f801 100644
--- a/common/rc
+++ b/common/rc
@@ -33,13 +33,13 @@ _test_sync()
# Kill only the test processes started by this test
_pkill()
{
- pkill --session 0 "$@"
+ pkill "$@"
}
# Find only the test processes started by this test
_pgrep()
{
- pgrep --session 0 "$@"
+ pgrep "$@"
}
# Common execution handling for fsstress invocation.
@@ -2817,11 +2817,9 @@ _require_user_exists()
[ "$?" == "0" ] || _notrun "$user user not defined."
}
-# Run all non-root processes in the same session as the root. Believe it or
-# not, passing $SHELL in this manner works both for "su" and "su -c cmd".
_su()
{
- su --session-command $SHELL "$@"
+ su "$@"
}
# check if a user exists and is able to execute commands.
next prev parent reply other threads:[~2025-01-29 3:13 UTC|newest]
Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-16 23:21 [PATCHBOMB] fstests: fix check-parallel and get all the xfs 6.13 changes merged Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 1/7] fstests: random fixes for v2025.01.12 Darrick J. Wong
2025-01-16 23:25 ` [PATCH 01/23] generic/476: fix fsstress process management Darrick J. Wong
2025-01-21 3:03 ` Dave Chinner
2025-01-16 23:25 ` [PATCH 02/23] metadump: make non-local function variables more obvious Darrick J. Wong
2025-01-21 3:06 ` Dave Chinner
2025-01-16 23:25 ` [PATCH 03/23] metadump: fix cleanup for v1 metadump testing Darrick J. Wong
2025-01-21 3:07 ` Dave Chinner
2025-01-16 23:26 ` [PATCH 04/23] generic/482: _run_fsstress needs the test filesystem Darrick J. Wong
2025-01-21 3:12 ` Dave Chinner
2025-01-22 3:27 ` Darrick J. Wong
2025-01-16 23:26 ` [PATCH 05/23] generic/019: don't fail if fio crashes while shutting down Darrick J. Wong
2025-01-21 3:12 ` Dave Chinner
2025-01-16 23:26 ` [PATCH 06/23] fuzzy: do not set _FSSTRESS_PID when exercising fsx Darrick J. Wong
2025-01-21 3:13 ` Dave Chinner
2025-01-16 23:27 ` [PATCH 07/23] common/rc: create a wrapper for the su command Darrick J. Wong
2025-01-16 23:27 ` [PATCH 08/23] common: fix pkill by running test program in a separate session Darrick J. Wong
2025-01-21 3:28 ` Dave Chinner
2025-01-22 4:24 ` Darrick J. Wong
2025-01-22 6:08 ` Dave Chinner
2025-01-22 7:05 ` Darrick J. Wong
2025-01-22 9:42 ` Dave Chinner
2025-01-22 21:46 ` Darrick J. Wong
2025-01-23 1:16 ` Dave Chinner
2025-01-28 4:34 ` Dave Chinner
2025-01-28 7:23 ` Darrick J. Wong
2025-01-28 20:39 ` Dave Chinner
2025-01-29 3:13 ` Darrick J. Wong [this message]
2025-01-29 6:06 ` Dave Chinner
2025-01-29 7:33 ` Darrick J. Wong
2025-01-16 23:27 ` [PATCH 09/23] unmount: resume logging of stdout and stderr for filtering Darrick J. Wong
2025-01-21 3:52 ` Dave Chinner
2025-01-22 3:29 ` Darrick J. Wong
2025-01-16 23:27 ` [PATCH 10/23] mkfs: don't hardcode log size Darrick J. Wong
2025-01-21 3:58 ` Dave Chinner
2025-01-21 12:44 ` Theodore Ts'o
2025-01-21 22:05 ` Dave Chinner
2025-01-22 3:40 ` Darrick J. Wong
2025-01-22 3:36 ` Darrick J. Wong
2025-01-22 3:30 ` Darrick J. Wong
2025-01-16 23:28 ` [PATCH 11/23] common/xfs: find loop devices for non-blockdevs passed to _prepare_for_eio_shutdown Darrick J. Wong
2025-01-21 4:37 ` Dave Chinner
2025-01-22 4:05 ` Darrick J. Wong
2025-01-22 5:21 ` Dave Chinner
2025-01-16 23:28 ` [PATCH 12/23] preamble: fix missing _kill_fsstress Darrick J. Wong
2025-01-21 4:37 ` Dave Chinner
2025-01-16 23:28 ` [PATCH 13/23] generic/650: revert SOAK DURATION changes Darrick J. Wong
2025-01-21 4:57 ` Dave Chinner
2025-01-21 13:00 ` Theodore Ts'o
2025-01-21 22:15 ` Dave Chinner
2025-01-22 3:51 ` Darrick J. Wong
2025-01-22 4:08 ` Theodore Ts'o
2025-01-22 6:01 ` Dave Chinner
2025-01-22 7:02 ` Darrick J. Wong
2025-01-22 3:49 ` Darrick J. Wong
2025-01-22 4:12 ` Dave Chinner
2025-01-22 4:37 ` Darrick J. Wong
2025-01-16 23:28 ` [PATCH 14/23] generic/032: fix pinned mount failure Darrick J. Wong
2025-01-21 5:03 ` Dave Chinner
2025-01-22 4:08 ` Darrick J. Wong
2025-01-22 4:19 ` Dave Chinner
2025-01-16 23:29 ` [PATCH 15/23] fuzzy: stop __stress_scrub_fsx_loop if fsx fails Darrick J. Wong
2025-01-16 23:29 ` [PATCH 16/23] fuzzy: don't use readarray for xfsfind output Darrick J. Wong
2025-01-16 23:29 ` [PATCH 17/23] fuzzy: always stop the scrub fsstress loop on error Darrick J. Wong
2025-01-16 23:29 ` [PATCH 18/23] fuzzy: port fsx and fsstress loop to use --duration Darrick J. Wong
2025-01-16 23:30 ` [PATCH 19/23] common/rc: don't copy fsstress to $TEST_DIR Darrick J. Wong
2025-01-21 5:05 ` Dave Chinner
2025-01-22 3:52 ` Darrick J. Wong
2025-01-16 23:30 ` [PATCH 20/23] fix _require_scratch_duperemove ordering Darrick J. Wong
2025-01-16 23:30 ` [PATCH 21/23] fsstress: fix a memory leak Darrick J. Wong
2025-01-16 23:30 ` [PATCH 22/23] fsx: fix leaked log file pointer Darrick J. Wong
2025-01-16 23:31 ` [PATCH 23/23] build: initialize stack variables to zero by default Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 2/7] fstests: fix logwrites zeroing Darrick J. Wong
2025-01-16 23:31 ` [PATCH 1/3] logwrites: warn if we don't think read after discard returns zeroes Darrick J. Wong
2025-01-16 23:31 ` [PATCH 2/3] logwrites: use BLKZEROOUT if it's available Darrick J. Wong
2025-01-16 23:31 ` [PATCH 3/3] logwrites: only use BLKDISCARD if we know discard zeroes data Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 3/7] fstests: enable metadir Darrick J. Wong
2025-01-16 23:32 ` [PATCH 01/11] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong
2025-01-16 23:32 ` [PATCH 02/11] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong
2025-01-16 23:32 ` [PATCH 03/11] common/repair: patch up repair sb inode value complaints Darrick J. Wong
2025-01-16 23:32 ` [PATCH 04/11] xfs/206: update for metadata directory support Darrick J. Wong
2025-01-16 23:33 ` [PATCH 05/11] xfs/{050,144,153,299,330}: update quota reports to handle metadir trees Darrick J. Wong
2025-01-16 23:33 ` [PATCH 06/11] xfs/509: adjust inumbers accounting for metadata directories Darrick J. Wong
2025-01-16 23:33 ` [PATCH 07/11] xfs: create fuzz tests " Darrick J. Wong
2025-01-16 23:34 ` [PATCH 08/11] xfs/163: bigger fs for metadir Darrick J. Wong
2025-01-16 23:34 ` [PATCH 09/11] xfs/122: disable this test for any codebase that knows about metadir Darrick J. Wong
2025-01-16 23:34 ` [PATCH 10/11] scrub: race metapath online fsck with fsstress Darrick J. Wong
2025-01-16 23:34 ` [PATCH 11/11] xfs: test metapath repairs Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 4/7] fstests: make protofiles less janky Darrick J. Wong
2025-01-16 23:35 ` [PATCH 1/1] fstests: test mkfs.xfs protofiles with xattr support Darrick J. Wong
2025-03-02 13:15 ` Zorro Lang
2025-03-04 17:42 ` Darrick J. Wong
2025-03-04 18:00 ` Zorro Lang
2025-03-04 18:21 ` Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 5/7] fstests: shard the realtime section Darrick J. Wong
2025-01-16 23:35 ` [PATCH 01/14] common/populate: refactor caching of metadumps to a helper Darrick J. Wong
2025-01-16 23:35 ` [PATCH 02/14] common/{fuzzy,populate}: use _scratch_xfs_mdrestore Darrick J. Wong
2025-01-16 23:35 ` [PATCH 03/14] fuzzy: stress data and rt sections of xfs filesystems equally Darrick J. Wong
2025-01-16 23:36 ` [PATCH 04/14] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong
2025-01-16 23:36 ` [PATCH 05/14] common/populate: use metadump v2 format by default for fs metadata snapshots Darrick J. Wong
2025-01-16 23:36 ` [PATCH 06/14] punch-alternating: detect xfs realtime files with large allocation units Darrick J. Wong
2025-01-16 23:36 ` [PATCH 07/14] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong
2025-01-16 23:37 ` [PATCH 08/14] common: pass the realtime device to xfs_db when possible Darrick J. Wong
2025-01-16 23:37 ` [PATCH 09/14] xfs/185: update for rtgroups Darrick J. Wong
2025-01-16 23:37 ` [PATCH 10/14] xfs/449: update test to know about xfs_db -R Darrick J. Wong
2025-01-16 23:37 ` [PATCH 11/14] xfs/271,xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands Darrick J. Wong
2025-01-16 23:38 ` [PATCH 12/14] common/xfs: capture realtime devices during metadump/mdrestore Darrick J. Wong
2025-01-16 23:38 ` [PATCH 13/14] common/fuzzy: adapt the scrub stress tests to support rtgroups Darrick J. Wong
2025-01-16 23:38 ` [PATCH 14/14] xfs: fix fuzz tests of rtgroups bitmap and summary files Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 6/7] fstests: store quota files in the metadir Darrick J. Wong
2025-01-16 23:38 ` [PATCH 1/4] xfs: update tests for " Darrick J. Wong
2025-01-16 23:39 ` [PATCH 2/4] xfs: test persistent quota flags Darrick J. Wong
2025-01-16 23:39 ` [PATCH 3/4] xfs: fix quota detection in fuzz tests Darrick J. Wong
2025-01-16 23:39 ` [PATCH 4/4] xfs: fix tests for persistent qflags Darrick J. Wong
2025-01-16 23:25 ` [PATCHSET v6.2 7/7] fstests: enable quota for realtime volumes Darrick J. Wong
2025-01-16 23:40 ` [PATCH 1/3] common: enable testing of realtime quota when supported Darrick J. Wong
2025-01-16 23:40 ` [PATCH 2/3] xfs: fix quota tests to adapt to realtime quota Darrick J. Wong
2025-01-16 23:40 ` [PATCH 3/3] xfs: regression testing of quota on the realtime device Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250129031313.GV3557695@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox