From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: zlang@redhat.com, hch@lst.de, fstests@vger.kernel.org,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 08/23] common: fix pkill by running test program in a separate session
Date: Mon, 27 Jan 2025 23:23:52 -0800 [thread overview]
Message-ID: <20250128072352.GP3557553@frogsfrogsfrogs> (raw)
In-Reply-To: <Z5heaj-ZsL_rBF--@dread.disaster.area>
On Tue, Jan 28, 2025 at 03:34:50PM +1100, Dave Chinner wrote:
> On Thu, Jan 23, 2025 at 12:16:50PM +1100, Dave Chinner wrote:
> > On Wed, Jan 22, 2025 at 01:46:09PM -0800, Darrick J. Wong wrote:
> > > > > Agreed, though at some point after these bugfixes are merged I'll see if
> > > > > I can build on the existing "if you have systemd then ___ else here's
> > > > > your shabby opencoded version" logic in fstests to isolate the ./checks
> > > > > from each other a little better. It'd be kinda nice if we could
> > > > > actually just put them in something resembling a modernish container,
> > > > > albeit with the same underlying fs.
> > > >
> > > > Agreed, but I don't think we need to depend on systemd for that,
> > > > either.
> > > >
> > > > > <shrug> Anyone else interested in that?
> > > >
> > > > check-parallel has already started down that road with the
> > > > mount namespace isolation it uses for the runner tasks via
> > > > src/nsexec.c.
> .....
> > > > Hmmm - looks like src/nsexec.c can create new PID namespaces via
> > > > the "-p" option. I haven't used that before - I wonder if that's a
> > > > better solution that using per-test session IDs to solve the pkill
> > > > --parent problem? Something to look into in the morning....
>
> .....
>
> > Note, however, that ps will show all processes in both the parent
> > and the child namespace the shell is running on because the contents
> > of /proc are the same for both.
> >
> > However, because we are also using private mount namespaces for the
> > check process, pid_namespaces(7) tells us:
> >
> > /proc and PID namespaces
> >
> > A /proc filesystem shows (in the /proc/pid directories) only
> > processes visible in the PID namespace of the process that
> > performed the mount, even if the /proc filesystem is viewed
> > from processes in other namespaces.
> >
> > After creating a new PID namespace, it is useful for the
> > child to change its root directory and mount a new procfs
> > instance at /proc so that tools such as ps(1) work
> > >>> correctly. If a new mount namespace is simultaneously
> > >>> created by including CLONE_NEWNS in the flags argument of
> > >>> clone(2) or unshare(2), then it isn't necessary to change the
> > >>> root directory: a new procfs instance can be mounted directly
> > >>> over /proc.
> >
> > From a shell, the command to mount /proc is:
> >
> > $ mount -t proc proc /proc
> >
> > Calling readlink(2) on the path /proc/self yields the process
> > ID of the caller in the PID namespace of the procfs mount
> > (i.e., the PID name‐ space of the process that mounted the
> > procfs). This can be useful for introspection purposes, when
> > a process wants to discover its PID in other namespaces.
> >
> > This appears to give us an environment that only shows the processes
> > within the current PID namespace:
> >
> > $ sudo src/nsexec -p -m bash
> > # mount -t proc proc /proc
> > # ps waux
> > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> > root 1 0.0 0.0 7384 3744 pts/1 S 11:55 0:00 bash
> > root 72 0.0 0.0 8300 3736 pts/1 R+ 12:04 0:00 ps waux
> > # pstree -N pid
> > [4026538173]
> > bash───pstree
> > #
> >
> > Yup, there we go - we have full PID isolation for this shell.
>
> Ok, it took me some time to get this to work reliably - the way was
> full of landmines with little documentation to guide around them.
>
> 1. If multiple commands are needed to be run from nsexec, a helper
> script is needed (I called it check-helper).
>
> 2. You have to `mount --make-private /proc` before doing anything to
> make the mounts of /proc inside the pid namespace work correctly.
> If you don't do this, every other mount namespace will also see
> only the newest mount, and that means every runner but the last
> one to start with the wrong mount and PID namespace view in /proc.
>
> 3. if you get the system into the state described by 1), unmounting
> /proc in each runner then races to remove the top /proc mount.
> Many threads get -EBUSY failures from unmount, leaving many stale
> mounts of /proc behind.
>
> 4. the top level shell where check-parallel was invoked is left with
> the view where /proc has been unmounted from a PID/mount
> namespace that has gone away. Hence /proc now has no processes or
> mounts being reported and nothing works until you mount a new
> instance /proc in that namespace.
>
> 5. After mounting proc again there are lots of mounts of stale /proc
> mounts still reported. They cannot be unmounted as the mount
> namespaces that created them have gone away and unmounting /proc
> in the current shell simply removes the last mounted one and we
> goto 4).
>
> 4. /tmp is still shared across all runner instances so all the
>
> concurrent runners dump all their tmp files in /tmp. However, the
> runners no longer have unique PIDs (i.e. check runs as PID 3 in
> all runner instaces). This means using /tmp/tmp.$$ as the
> check/test temp file definition results is instant tmp file name
> collisions and random things in check and tests fail. check and
> common/preamble have to be converted to use `mktemp` to provide
> unique tempfile name prefixes again.
>
> 5. Don't forget to revert the parent /proc mount back to shared
> after check has finished running (or was aborted).
>
> I think with this (current prototype patch below), we can use PID
> namespaces rather than process session IDs for check-parallel safe
> process management.
>
> Thoughts?
Was about to go to bed, but can we also start a new mount namespace,
create a private (or at least non-global) /tmp to put files into, and
then each test instance is isolated from clobbering the /tmpfiles of
other ./check instances *and* the rest of the system?
--D
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
>
> check-parallel: use PID namespaces for runner process isolation
>
> From: Dave Chinner <dchinner@redhat.com>
>
> This provides isolation between individual runners so that they
> cannot see the processes that other test runners have created.
> This means tools like pkill will only find processes run by the test
> that is calling it, hence there is no danger taht it might kill
> processes owned by a different test in a different runner context.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> check | 6 +++++-
> check-helper | 23 +++++++++++++++++++++++
> check-parallel | 25 +++++++++++++++++++------
> common/preamble | 5 ++++-
> 4 files changed, 51 insertions(+), 8 deletions(-)
>
> diff --git a/check b/check
> index 4dc266dcf..314436667 100755
> --- a/check
> +++ b/check
> @@ -4,7 +4,11 @@
> #
> # Control script for QA
> #
> -tmp=/tmp/$$
> +
> +# When run from a pid namespace, /tmp/tmp.$$ is not a unique identifier.
> +# Use mktemp instead.
> +tmp=`mktemp`
> +
> status=0
> needwrap=true
> needsum=true
> diff --git a/check-helper b/check-helper
> new file mode 100755
> index 000000000..47a92de8b
> --- /dev/null
> +++ b/check-helper
> @@ -0,0 +1,23 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2025 Red Hat, Inc. All Rights Reserved.
> +#
> +
> +# When check is run from check-parallel, it is run in private mount and PID
> +# namespaces. We need to set up /proc to only show processes in this PID
> +# namespace, so we mount a new instance of /proc over the top of the existing
> +# version. Because we are in a private mount namespace, the does not propagate
> +# outside this context and hence does not conflict with the other test runners
> +# that are performing the same setup actions.
> +
> +args="$@"
> +
> +#echo $args
> +mount -t proc proc /proc
> +ret=$?
> +if [ $ret -eq 0 ]; then
> + ./check $args
> + umount -l /proc
> +else
> + echo failed to mount /proc, ret $ret!
> +fi
> diff --git a/check-parallel b/check-parallel
> index 2fbf0fdbe..6082baf5e 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -259,14 +259,14 @@ runner_go()
> rm -f $RESULT_BASE/check.*
>
> # Only supports default mkfs parameters right now
> - wipefs -a $TEST_DEV > /dev/null 2>&1
> - yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV > /dev/null 2>&1
> + wipefs -a $TEST_DEV > $me/log 2>&1
> + yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $me/log 2>&1
>
> # export DUMP_CORRUPT_FS=1
>
> - # Run the tests in it's own mount namespace, as per the comment below
> - # that precedes making the basedir a private mount.
> - ./src/nsexec -m ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
> + # Run the tests in it's own mount and PID namespace, as per the comment
> + # below that precedes making the basedir a private mount.
> + ./src/nsexec -m -p ./check-helper $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} >> $me/log 2>&1
>
> wait
> sleep 1
> @@ -291,6 +291,8 @@ cleanup()
> umount -R $basedir/*/test 2> /dev/null
> umount -R $basedir/*/scratch 2> /dev/null
> losetup --detach-all
> + mount --make-shared /proc
> + mount --make-shared $basedir
> }
>
> trap "cleanup; exit" HUP INT QUIT TERM
> @@ -311,10 +313,17 @@ fi
> # controls the mount to succeed without actually unmounting the filesytsem
> # because a mount namespace still holds a reference to it. This causes other
> # operations on the block device to fail as it is still busy (e.g. fsck, mkfs,
> -# etc). Hence we make the basedir private here and then run each check instance
> +# etc).
> +#
> +# Hence we make the basedir private here and then run each check instance
> # in it's own mount namespace so that they cannot see mounts that other tests
> # are performing.
> +#
> +# We also need to make /proc private so that the runners can be run cleanly in
> +# a PID namespace. This requires an new mount of /proc inside the PID namespace,
> +# and this requires a private mount namespace to work correctly.
> mount --make-private $basedir
> +mount --make-private /proc
>
> now=`date +%Y-%m-%d-%H:%M:%S`
> for ((i = 0; i < $runners; i++)); do
> @@ -328,6 +337,10 @@ for ((i = 0; i < $runners; i++)); do
> done;
> wait
>
> +# Restore the parents back to shared mount namespace behaviour.
> +mount --make-shared /proc
> +mount --make-shared $basedir
> +
> if [ -n "$show_test_list" ]; then
> exit 0
> fi
> diff --git a/common/preamble b/common/preamble
> index 78e45d522..8f47b172a 100644
> --- a/common/preamble
> +++ b/common/preamble
> @@ -43,8 +43,11 @@ _begin_fstest()
> seqres=$RESULT_DIR/$seq
> echo "QA output created by $seq"
>
> + # When run from a pid namespace, /tmp/tmp.$$ is not a unique identifier.
> + # Use mktemp instead.
> + tmp=`mktemp`
> +
> here=`pwd`
> - tmp=/tmp/$$
> status=1 # failure is the default!
>
> _register_cleanup _cleanup
next prev parent reply other threads:[~2025-01-28 7:23 UTC|newest]
Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-16 23:21 [PATCHBOMB] fstests: fix check-parallel and get all the xfs 6.13 changes merged Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 1/7] fstests: random fixes for v2025.01.12 Darrick J. Wong
2025-01-16 23:25 ` [PATCH 01/23] generic/476: fix fsstress process management Darrick J. Wong
2025-01-21 3:03 ` Dave Chinner
2025-01-16 23:25 ` [PATCH 02/23] metadump: make non-local function variables more obvious Darrick J. Wong
2025-01-21 3:06 ` Dave Chinner
2025-01-16 23:25 ` [PATCH 03/23] metadump: fix cleanup for v1 metadump testing Darrick J. Wong
2025-01-21 3:07 ` Dave Chinner
2025-01-16 23:26 ` [PATCH 04/23] generic/482: _run_fsstress needs the test filesystem Darrick J. Wong
2025-01-21 3:12 ` Dave Chinner
2025-01-22 3:27 ` Darrick J. Wong
2025-01-16 23:26 ` [PATCH 05/23] generic/019: don't fail if fio crashes while shutting down Darrick J. Wong
2025-01-21 3:12 ` Dave Chinner
2025-01-16 23:26 ` [PATCH 06/23] fuzzy: do not set _FSSTRESS_PID when exercising fsx Darrick J. Wong
2025-01-21 3:13 ` Dave Chinner
2025-01-16 23:27 ` [PATCH 07/23] common/rc: create a wrapper for the su command Darrick J. Wong
2025-01-16 23:27 ` [PATCH 08/23] common: fix pkill by running test program in a separate session Darrick J. Wong
2025-01-21 3:28 ` Dave Chinner
2025-01-22 4:24 ` Darrick J. Wong
2025-01-22 6:08 ` Dave Chinner
2025-01-22 7:05 ` Darrick J. Wong
2025-01-22 9:42 ` Dave Chinner
2025-01-22 21:46 ` Darrick J. Wong
2025-01-23 1:16 ` Dave Chinner
2025-01-28 4:34 ` Dave Chinner
2025-01-28 7:23 ` Darrick J. Wong [this message]
2025-01-28 20:39 ` Dave Chinner
2025-01-29 3:13 ` Darrick J. Wong
2025-01-29 6:06 ` Dave Chinner
2025-01-29 7:33 ` Darrick J. Wong
2025-01-16 23:27 ` [PATCH 09/23] unmount: resume logging of stdout and stderr for filtering Darrick J. Wong
2025-01-21 3:52 ` Dave Chinner
2025-01-22 3:29 ` Darrick J. Wong
2025-01-16 23:27 ` [PATCH 10/23] mkfs: don't hardcode log size Darrick J. Wong
2025-01-21 3:58 ` Dave Chinner
2025-01-21 12:44 ` Theodore Ts'o
2025-01-21 22:05 ` Dave Chinner
2025-01-22 3:40 ` Darrick J. Wong
2025-01-22 3:36 ` Darrick J. Wong
2025-01-22 3:30 ` Darrick J. Wong
2025-01-16 23:28 ` [PATCH 11/23] common/xfs: find loop devices for non-blockdevs passed to _prepare_for_eio_shutdown Darrick J. Wong
2025-01-21 4:37 ` Dave Chinner
2025-01-22 4:05 ` Darrick J. Wong
2025-01-22 5:21 ` Dave Chinner
2025-01-16 23:28 ` [PATCH 12/23] preamble: fix missing _kill_fsstress Darrick J. Wong
2025-01-21 4:37 ` Dave Chinner
2025-01-16 23:28 ` [PATCH 13/23] generic/650: revert SOAK DURATION changes Darrick J. Wong
2025-01-21 4:57 ` Dave Chinner
2025-01-21 13:00 ` Theodore Ts'o
2025-01-21 22:15 ` Dave Chinner
2025-01-22 3:51 ` Darrick J. Wong
2025-01-22 4:08 ` Theodore Ts'o
2025-01-22 6:01 ` Dave Chinner
2025-01-22 7:02 ` Darrick J. Wong
2025-01-22 3:49 ` Darrick J. Wong
2025-01-22 4:12 ` Dave Chinner
2025-01-22 4:37 ` Darrick J. Wong
2025-01-16 23:28 ` [PATCH 14/23] generic/032: fix pinned mount failure Darrick J. Wong
2025-01-21 5:03 ` Dave Chinner
2025-01-22 4:08 ` Darrick J. Wong
2025-01-22 4:19 ` Dave Chinner
2025-01-16 23:29 ` [PATCH 15/23] fuzzy: stop __stress_scrub_fsx_loop if fsx fails Darrick J. Wong
2025-01-16 23:29 ` [PATCH 16/23] fuzzy: don't use readarray for xfsfind output Darrick J. Wong
2025-01-16 23:29 ` [PATCH 17/23] fuzzy: always stop the scrub fsstress loop on error Darrick J. Wong
2025-01-16 23:29 ` [PATCH 18/23] fuzzy: port fsx and fsstress loop to use --duration Darrick J. Wong
2025-01-16 23:30 ` [PATCH 19/23] common/rc: don't copy fsstress to $TEST_DIR Darrick J. Wong
2025-01-21 5:05 ` Dave Chinner
2025-01-22 3:52 ` Darrick J. Wong
2025-01-16 23:30 ` [PATCH 20/23] fix _require_scratch_duperemove ordering Darrick J. Wong
2025-01-16 23:30 ` [PATCH 21/23] fsstress: fix a memory leak Darrick J. Wong
2025-01-16 23:30 ` [PATCH 22/23] fsx: fix leaked log file pointer Darrick J. Wong
2025-01-16 23:31 ` [PATCH 23/23] build: initialize stack variables to zero by default Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 2/7] fstests: fix logwrites zeroing Darrick J. Wong
2025-01-16 23:31 ` [PATCH 1/3] logwrites: warn if we don't think read after discard returns zeroes Darrick J. Wong
2025-01-16 23:31 ` [PATCH 2/3] logwrites: use BLKZEROOUT if it's available Darrick J. Wong
2025-01-16 23:31 ` [PATCH 3/3] logwrites: only use BLKDISCARD if we know discard zeroes data Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 3/7] fstests: enable metadir Darrick J. Wong
2025-01-16 23:32 ` [PATCH 01/11] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong
2025-01-16 23:32 ` [PATCH 02/11] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong
2025-01-16 23:32 ` [PATCH 03/11] common/repair: patch up repair sb inode value complaints Darrick J. Wong
2025-01-16 23:32 ` [PATCH 04/11] xfs/206: update for metadata directory support Darrick J. Wong
2025-01-16 23:33 ` [PATCH 05/11] xfs/{050,144,153,299,330}: update quota reports to handle metadir trees Darrick J. Wong
2025-01-16 23:33 ` [PATCH 06/11] xfs/509: adjust inumbers accounting for metadata directories Darrick J. Wong
2025-01-16 23:33 ` [PATCH 07/11] xfs: create fuzz tests " Darrick J. Wong
2025-01-16 23:34 ` [PATCH 08/11] xfs/163: bigger fs for metadir Darrick J. Wong
2025-01-16 23:34 ` [PATCH 09/11] xfs/122: disable this test for any codebase that knows about metadir Darrick J. Wong
2025-01-16 23:34 ` [PATCH 10/11] scrub: race metapath online fsck with fsstress Darrick J. Wong
2025-01-16 23:34 ` [PATCH 11/11] xfs: test metapath repairs Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 4/7] fstests: make protofiles less janky Darrick J. Wong
2025-01-16 23:35 ` [PATCH 1/1] fstests: test mkfs.xfs protofiles with xattr support Darrick J. Wong
2025-03-02 13:15 ` Zorro Lang
2025-03-04 17:42 ` Darrick J. Wong
2025-03-04 18:00 ` Zorro Lang
2025-03-04 18:21 ` Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 5/7] fstests: shard the realtime section Darrick J. Wong
2025-01-16 23:35 ` [PATCH 01/14] common/populate: refactor caching of metadumps to a helper Darrick J. Wong
2025-01-16 23:35 ` [PATCH 02/14] common/{fuzzy,populate}: use _scratch_xfs_mdrestore Darrick J. Wong
2025-01-16 23:35 ` [PATCH 03/14] fuzzy: stress data and rt sections of xfs filesystems equally Darrick J. Wong
2025-01-16 23:36 ` [PATCH 04/14] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong
2025-01-16 23:36 ` [PATCH 05/14] common/populate: use metadump v2 format by default for fs metadata snapshots Darrick J. Wong
2025-01-16 23:36 ` [PATCH 06/14] punch-alternating: detect xfs realtime files with large allocation units Darrick J. Wong
2025-01-16 23:36 ` [PATCH 07/14] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong
2025-01-16 23:37 ` [PATCH 08/14] common: pass the realtime device to xfs_db when possible Darrick J. Wong
2025-01-16 23:37 ` [PATCH 09/14] xfs/185: update for rtgroups Darrick J. Wong
2025-01-16 23:37 ` [PATCH 10/14] xfs/449: update test to know about xfs_db -R Darrick J. Wong
2025-01-16 23:37 ` [PATCH 11/14] xfs/271,xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands Darrick J. Wong
2025-01-16 23:38 ` [PATCH 12/14] common/xfs: capture realtime devices during metadump/mdrestore Darrick J. Wong
2025-01-16 23:38 ` [PATCH 13/14] common/fuzzy: adapt the scrub stress tests to support rtgroups Darrick J. Wong
2025-01-16 23:38 ` [PATCH 14/14] xfs: fix fuzz tests of rtgroups bitmap and summary files Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 6/7] fstests: store quota files in the metadir Darrick J. Wong
2025-01-16 23:38 ` [PATCH 1/4] xfs: update tests for " Darrick J. Wong
2025-01-16 23:39 ` [PATCH 2/4] xfs: test persistent quota flags Darrick J. Wong
2025-01-16 23:39 ` [PATCH 3/4] xfs: fix quota detection in fuzz tests Darrick J. Wong
2025-01-16 23:39 ` [PATCH 4/4] xfs: fix tests for persistent qflags Darrick J. Wong
2025-01-16 23:25 ` [PATCHSET v6.2 7/7] fstests: enable quota for realtime volumes Darrick J. Wong
2025-01-16 23:40 ` [PATCH 1/3] common: enable testing of realtime quota when supported Darrick J. Wong
2025-01-16 23:40 ` [PATCH 2/3] xfs: fix quota tests to adapt to realtime quota Darrick J. Wong
2025-01-16 23:40 ` [PATCH 3/3] xfs: regression testing of quota on the realtime device Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250128072352.GP3557553@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox