public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: zlang@redhat.com, hch@lst.de, fstests@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 08/23] common: fix pkill by running test program in a separate session
Date: Mon, 27 Jan 2025 23:23:52 -0800	[thread overview]
Message-ID: <20250128072352.GP3557553@frogsfrogsfrogs> (raw)
In-Reply-To: <Z5heaj-ZsL_rBF--@dread.disaster.area>

On Tue, Jan 28, 2025 at 03:34:50PM +1100, Dave Chinner wrote:
> On Thu, Jan 23, 2025 at 12:16:50PM +1100, Dave Chinner wrote:
> > On Wed, Jan 22, 2025 at 01:46:09PM -0800, Darrick J. Wong wrote:
> > > > > Agreed, though at some point after these bugfixes are merged I'll see if
> > > > > I can build on the existing "if you have systemd then ___ else here's
> > > > > your shabby opencoded version" logic in fstests to isolate the ./checks
> > > > > from each other a little better.  It'd be kinda nice if we could
> > > > > actually just put them in something resembling a modernish container,
> > > > > albeit with the same underlying fs.
> > > > 
> > > > Agreed, but I don't think we need to depend on systemd for that,
> > > > either.
> > > > 
> > > > > <shrug> Anyone else interested in that?
> > > > 
> > > > check-parallel has already started down that road with the
> > > > mount namespace isolation it uses for the runner tasks via
> > > > src/nsexec.c.
> .....
> > > > Hmmm - looks like src/nsexec.c can create new PID namespaces via
> > > > the "-p" option. I haven't used that before - I wonder if that's a
> > > > better solution that using per-test session IDs to solve the pkill
> > > > --parent problem? Something to look into in the morning....
> 
> .....
> 
> > Note, however, that ps will show all processes in both the parent
> > and the child namespace the shell is running on because the contents
> > of /proc are the same for both.
> > 
> > However, because we are also using private mount namespaces for the
> > check process, pid_namespaces(7) tells us:
> > 
> > /proc and PID namespaces
> > 
> >        A /proc filesystem shows (in the /proc/pid directories) only
> >        processes visible in the PID namespace of the process that
> >        performed the mount, even if the /proc filesystem is viewed
> >        from processes in other namespaces.
> > 
> >        After creating a new PID namespace, it is useful for the
> >        child to change its root directory and mount a new procfs
> >        instance at /proc so that tools  such  as  ps(1) work
> > >>>    correctly.  If a new mount namespace is simultaneously
> > >>>    created by including CLONE_NEWNS in the flags argument of
> > >>>    clone(2) or unshare(2), then it isn't necessary to change the
> > >>>    root directory: a new procfs instance can be mounted directly
> > >>>    over /proc.
> > 
> >        From a shell, the command to mount /proc is:
> > 
> >            $ mount -t proc proc /proc
> > 
> >        Calling readlink(2) on the path /proc/self yields the process
> >        ID of the caller in the PID namespace of the procfs mount
> >        (i.e., the PID name‐ space of the process that mounted the
> >        procfs).  This can be useful for introspection purposes, when
> >        a process wants to discover its  PID  in other namespaces.
> > 
> > This appears to give us an environment that only shows the processes
> > within the current PID namespace:
> > 
> > $ sudo src/nsexec -p -m bash
> > # mount -t proc proc /proc
> > # ps waux
> > USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> > root           1  0.0  0.0   7384  3744 pts/1    S    11:55   0:00 bash
> > root          72  0.0  0.0   8300  3736 pts/1    R+   12:04   0:00 ps waux
> > # pstree -N pid
> > [4026538173]
> > bash───pstree
> > #
> > 
> > Yup, there we go - we have full PID isolation for this shell.
> 
> Ok, it took me some time to get this to work reliably - the way was
> full of landmines with little documentation to guide around them.
> 
> 1. If multiple commands are needed to be run from nsexec, a helper
>    script is needed (I called it check-helper).
> 
> 2. You have to `mount --make-private /proc` before doing anything to
>    make the mounts of /proc inside the pid namespace work correctly.
>    If you don't do this, every other mount namespace will also see
>    only the newest mount, and that means every runner but the last
>    one to start with the wrong mount and PID namespace view in /proc.
> 
> 3. if you get the system into the state described by 1), unmounting
>    /proc in each runner then races to remove the top /proc mount.
>    Many threads get -EBUSY failures from unmount, leaving many stale
>    mounts of /proc behind.
> 
> 4. the top level shell where check-parallel was invoked is left with
>    the view where /proc has been unmounted from a PID/mount
>    namespace that has gone away. Hence /proc now has no processes or
>    mounts being reported and nothing works until you mount a new
>    instance /proc in that namespace.
> 
> 5. After mounting proc again there are lots of mounts of stale /proc
>    mounts still reported. They cannot be unmounted as the mount
>    namespaces that created them have gone away and unmounting /proc
>    in the current shell simply removes the last mounted one and we
>    goto 4).
> 
> 4. /tmp is still shared across all runner instances so all the
> 
>    concurrent runners dump all their tmp files in /tmp. However, the
>    runners no longer have unique PIDs (i.e. check runs as PID 3 in
>    all runner instaces). This means using /tmp/tmp.$$ as the
>    check/test temp file definition results is instant tmp file name
>    collisions and random things in check and tests fail.  check and
>    common/preamble have to be converted to use `mktemp` to provide
>    unique tempfile name prefixes again.
> 
> 5. Don't forget to revert the parent /proc mount back to shared
>    after check has finished running (or was aborted).
> 
> I think with this (current prototype patch below), we can use PID
> namespaces rather than process session IDs for check-parallel safe
> process management.
> 
> Thoughts?

Was about to go to bed, but can we also start a new mount namespace,
create a private (or at least non-global) /tmp to put files into, and
then each test instance is isolated from clobbering the /tmpfiles of
other ./check instances *and* the rest of the system?

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> 
> check-parallel: use PID namespaces for runner process isolation
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> This provides isolation between individual runners so that they
> cannot see the processes that other test runners have created.
> This means tools like pkill will only find processes run by the test
> that is calling it, hence there is no danger taht it might kill
> processes owned by a different test in a different runner context.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check           |  6 +++++-
>  check-helper    | 23 +++++++++++++++++++++++
>  check-parallel  | 25 +++++++++++++++++++------
>  common/preamble |  5 ++++-
>  4 files changed, 51 insertions(+), 8 deletions(-)
> 
> diff --git a/check b/check
> index 4dc266dcf..314436667 100755
> --- a/check
> +++ b/check
> @@ -4,7 +4,11 @@
>  #
>  # Control script for QA
>  #
> -tmp=/tmp/$$
> +
> +# When run from a pid namespace, /tmp/tmp.$$ is not a unique identifier.
> +# Use mktemp instead.
> +tmp=`mktemp`
> +
>  status=0
>  needwrap=true
>  needsum=true
> diff --git a/check-helper b/check-helper
> new file mode 100755
> index 000000000..47a92de8b
> --- /dev/null
> +++ b/check-helper
> @@ -0,0 +1,23 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2025 Red Hat, Inc.  All Rights Reserved.
> +#
> +
> +# When check is run from check-parallel, it is run in private mount and PID
> +# namespaces. We need to set up /proc to only show processes in this PID
> +# namespace, so we mount a new instance of /proc over the top of the existing
> +# version. Because we are in a private mount namespace, the does not propagate
> +# outside this context and hence does not conflict with the other test runners
> +# that are performing the same setup actions.
> +
> +args="$@"
> +
> +#echo $args
> +mount -t proc proc /proc
> +ret=$?
> +if [ $ret -eq 0 ]; then
> +	./check $args
> +	umount -l /proc
> +else
> +	echo failed to mount /proc, ret $ret!
> +fi
> diff --git a/check-parallel b/check-parallel
> index 2fbf0fdbe..6082baf5e 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -259,14 +259,14 @@ runner_go()
>  	rm -f $RESULT_BASE/check.*
>  
>  	# Only supports default mkfs parameters right now
> -	wipefs -a $TEST_DEV > /dev/null 2>&1
> -	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV > /dev/null 2>&1
> +	wipefs -a $TEST_DEV > $me/log 2>&1
> +	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $me/log 2>&1
>  
>  #	export DUMP_CORRUPT_FS=1
>  
> -	# Run the tests in it's own mount namespace, as per the comment below
> -	# that precedes making the basedir a private mount.
> -	./src/nsexec -m ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
> +	# Run the tests in it's own mount and PID namespace, as per the comment
> +	# below that precedes making the basedir a private mount.
> +	./src/nsexec -m -p ./check-helper $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} >> $me/log 2>&1
>  
>  	wait
>  	sleep 1
> @@ -291,6 +291,8 @@ cleanup()
>  	umount -R $basedir/*/test 2> /dev/null
>  	umount -R $basedir/*/scratch 2> /dev/null
>  	losetup --detach-all
> +	mount --make-shared /proc
> +	mount --make-shared $basedir
>  }
>  
>  trap "cleanup; exit" HUP INT QUIT TERM
> @@ -311,10 +313,17 @@ fi
>  # controls the mount to succeed without actually unmounting the filesytsem
>  # because a mount namespace still holds a reference to it. This causes other
>  # operations on the block device to fail as it is still busy (e.g. fsck, mkfs,
> -# etc). Hence we make the basedir private here and then run each check instance
> +# etc).
> +#
> +# Hence we make the basedir private here and then run each check instance
>  # in it's own mount namespace so that they cannot see mounts that other tests
>  # are performing.
> +#
> +# We also need to make /proc private so that the runners can be run cleanly in
> +# a PID namespace. This requires an new mount of /proc inside the PID namespace,
> +# and this requires a private mount namespace to work correctly.
>  mount --make-private $basedir
> +mount --make-private /proc
>  
>  now=`date +%Y-%m-%d-%H:%M:%S`
>  for ((i = 0; i < $runners; i++)); do
> @@ -328,6 +337,10 @@ for ((i = 0; i < $runners; i++)); do
>  done;
>  wait
>  
> +# Restore the parents back to shared mount namespace behaviour.
> +mount --make-shared /proc
> +mount --make-shared $basedir
> +
>  if [ -n "$show_test_list" ]; then
>  	exit 0
>  fi
> diff --git a/common/preamble b/common/preamble
> index 78e45d522..8f47b172a 100644
> --- a/common/preamble
> +++ b/common/preamble
> @@ -43,8 +43,11 @@ _begin_fstest()
>  	seqres=$RESULT_DIR/$seq
>  	echo "QA output created by $seq"
>  
> +	# When run from a pid namespace, /tmp/tmp.$$ is not a unique identifier.
> +	# Use mktemp instead.
> +	tmp=`mktemp`
> +
>  	here=`pwd`
> -	tmp=/tmp/$$
>  	status=1	# failure is the default!
>  
>  	_register_cleanup _cleanup

  reply	other threads:[~2025-01-28  7:23 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-16 23:21 [PATCHBOMB] fstests: fix check-parallel and get all the xfs 6.13 changes merged Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 1/7] fstests: random fixes for v2025.01.12 Darrick J. Wong
2025-01-16 23:25   ` [PATCH 01/23] generic/476: fix fsstress process management Darrick J. Wong
2025-01-21  3:03     ` Dave Chinner
2025-01-16 23:25   ` [PATCH 02/23] metadump: make non-local function variables more obvious Darrick J. Wong
2025-01-21  3:06     ` Dave Chinner
2025-01-16 23:25   ` [PATCH 03/23] metadump: fix cleanup for v1 metadump testing Darrick J. Wong
2025-01-21  3:07     ` Dave Chinner
2025-01-16 23:26   ` [PATCH 04/23] generic/482: _run_fsstress needs the test filesystem Darrick J. Wong
2025-01-21  3:12     ` Dave Chinner
2025-01-22  3:27       ` Darrick J. Wong
2025-01-16 23:26   ` [PATCH 05/23] generic/019: don't fail if fio crashes while shutting down Darrick J. Wong
2025-01-21  3:12     ` Dave Chinner
2025-01-16 23:26   ` [PATCH 06/23] fuzzy: do not set _FSSTRESS_PID when exercising fsx Darrick J. Wong
2025-01-21  3:13     ` Dave Chinner
2025-01-16 23:27   ` [PATCH 07/23] common/rc: create a wrapper for the su command Darrick J. Wong
2025-01-16 23:27   ` [PATCH 08/23] common: fix pkill by running test program in a separate session Darrick J. Wong
2025-01-21  3:28     ` Dave Chinner
2025-01-22  4:24       ` Darrick J. Wong
2025-01-22  6:08         ` Dave Chinner
2025-01-22  7:05           ` Darrick J. Wong
2025-01-22  9:42             ` Dave Chinner
2025-01-22 21:46               ` Darrick J. Wong
2025-01-23  1:16                 ` Dave Chinner
2025-01-28  4:34                   ` Dave Chinner
2025-01-28  7:23                     ` Darrick J. Wong [this message]
2025-01-28 20:39                       ` Dave Chinner
2025-01-29  3:13                         ` Darrick J. Wong
2025-01-29  6:06                           ` Dave Chinner
2025-01-29  7:33                             ` Darrick J. Wong
2025-01-16 23:27   ` [PATCH 09/23] unmount: resume logging of stdout and stderr for filtering Darrick J. Wong
2025-01-21  3:52     ` Dave Chinner
2025-01-22  3:29       ` Darrick J. Wong
2025-01-16 23:27   ` [PATCH 10/23] mkfs: don't hardcode log size Darrick J. Wong
2025-01-21  3:58     ` Dave Chinner
2025-01-21 12:44       ` Theodore Ts'o
2025-01-21 22:05         ` Dave Chinner
2025-01-22  3:40           ` Darrick J. Wong
2025-01-22  3:36         ` Darrick J. Wong
2025-01-22  3:30       ` Darrick J. Wong
2025-01-16 23:28   ` [PATCH 11/23] common/xfs: find loop devices for non-blockdevs passed to _prepare_for_eio_shutdown Darrick J. Wong
2025-01-21  4:37     ` Dave Chinner
2025-01-22  4:05       ` Darrick J. Wong
2025-01-22  5:21         ` Dave Chinner
2025-01-16 23:28   ` [PATCH 12/23] preamble: fix missing _kill_fsstress Darrick J. Wong
2025-01-21  4:37     ` Dave Chinner
2025-01-16 23:28   ` [PATCH 13/23] generic/650: revert SOAK DURATION changes Darrick J. Wong
2025-01-21  4:57     ` Dave Chinner
2025-01-21 13:00       ` Theodore Ts'o
2025-01-21 22:15         ` Dave Chinner
2025-01-22  3:51           ` Darrick J. Wong
2025-01-22  4:08           ` Theodore Ts'o
2025-01-22  6:01             ` Dave Chinner
2025-01-22  7:02               ` Darrick J. Wong
2025-01-22  3:49       ` Darrick J. Wong
2025-01-22  4:12         ` Dave Chinner
2025-01-22  4:37           ` Darrick J. Wong
2025-01-16 23:28   ` [PATCH 14/23] generic/032: fix pinned mount failure Darrick J. Wong
2025-01-21  5:03     ` Dave Chinner
2025-01-22  4:08       ` Darrick J. Wong
2025-01-22  4:19         ` Dave Chinner
2025-01-16 23:29   ` [PATCH 15/23] fuzzy: stop __stress_scrub_fsx_loop if fsx fails Darrick J. Wong
2025-01-16 23:29   ` [PATCH 16/23] fuzzy: don't use readarray for xfsfind output Darrick J. Wong
2025-01-16 23:29   ` [PATCH 17/23] fuzzy: always stop the scrub fsstress loop on error Darrick J. Wong
2025-01-16 23:29   ` [PATCH 18/23] fuzzy: port fsx and fsstress loop to use --duration Darrick J. Wong
2025-01-16 23:30   ` [PATCH 19/23] common/rc: don't copy fsstress to $TEST_DIR Darrick J. Wong
2025-01-21  5:05     ` Dave Chinner
2025-01-22  3:52       ` Darrick J. Wong
2025-01-16 23:30   ` [PATCH 20/23] fix _require_scratch_duperemove ordering Darrick J. Wong
2025-01-16 23:30   ` [PATCH 21/23] fsstress: fix a memory leak Darrick J. Wong
2025-01-16 23:30   ` [PATCH 22/23] fsx: fix leaked log file pointer Darrick J. Wong
2025-01-16 23:31   ` [PATCH 23/23] build: initialize stack variables to zero by default Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 2/7] fstests: fix logwrites zeroing Darrick J. Wong
2025-01-16 23:31   ` [PATCH 1/3] logwrites: warn if we don't think read after discard returns zeroes Darrick J. Wong
2025-01-16 23:31   ` [PATCH 2/3] logwrites: use BLKZEROOUT if it's available Darrick J. Wong
2025-01-16 23:31   ` [PATCH 3/3] logwrites: only use BLKDISCARD if we know discard zeroes data Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 3/7] fstests: enable metadir Darrick J. Wong
2025-01-16 23:32   ` [PATCH 01/11] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong
2025-01-16 23:32   ` [PATCH 02/11] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong
2025-01-16 23:32   ` [PATCH 03/11] common/repair: patch up repair sb inode value complaints Darrick J. Wong
2025-01-16 23:32   ` [PATCH 04/11] xfs/206: update for metadata directory support Darrick J. Wong
2025-01-16 23:33   ` [PATCH 05/11] xfs/{050,144,153,299,330}: update quota reports to handle metadir trees Darrick J. Wong
2025-01-16 23:33   ` [PATCH 06/11] xfs/509: adjust inumbers accounting for metadata directories Darrick J. Wong
2025-01-16 23:33   ` [PATCH 07/11] xfs: create fuzz tests " Darrick J. Wong
2025-01-16 23:34   ` [PATCH 08/11] xfs/163: bigger fs for metadir Darrick J. Wong
2025-01-16 23:34   ` [PATCH 09/11] xfs/122: disable this test for any codebase that knows about metadir Darrick J. Wong
2025-01-16 23:34   ` [PATCH 10/11] scrub: race metapath online fsck with fsstress Darrick J. Wong
2025-01-16 23:34   ` [PATCH 11/11] xfs: test metapath repairs Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 4/7] fstests: make protofiles less janky Darrick J. Wong
2025-01-16 23:35   ` [PATCH 1/1] fstests: test mkfs.xfs protofiles with xattr support Darrick J. Wong
2025-03-02 13:15     ` Zorro Lang
2025-03-04 17:42       ` Darrick J. Wong
2025-03-04 18:00         ` Zorro Lang
2025-03-04 18:21           ` Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 5/7] fstests: shard the realtime section Darrick J. Wong
2025-01-16 23:35   ` [PATCH 01/14] common/populate: refactor caching of metadumps to a helper Darrick J. Wong
2025-01-16 23:35   ` [PATCH 02/14] common/{fuzzy,populate}: use _scratch_xfs_mdrestore Darrick J. Wong
2025-01-16 23:35   ` [PATCH 03/14] fuzzy: stress data and rt sections of xfs filesystems equally Darrick J. Wong
2025-01-16 23:36   ` [PATCH 04/14] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong
2025-01-16 23:36   ` [PATCH 05/14] common/populate: use metadump v2 format by default for fs metadata snapshots Darrick J. Wong
2025-01-16 23:36   ` [PATCH 06/14] punch-alternating: detect xfs realtime files with large allocation units Darrick J. Wong
2025-01-16 23:36   ` [PATCH 07/14] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong
2025-01-16 23:37   ` [PATCH 08/14] common: pass the realtime device to xfs_db when possible Darrick J. Wong
2025-01-16 23:37   ` [PATCH 09/14] xfs/185: update for rtgroups Darrick J. Wong
2025-01-16 23:37   ` [PATCH 10/14] xfs/449: update test to know about xfs_db -R Darrick J. Wong
2025-01-16 23:37   ` [PATCH 11/14] xfs/271,xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands Darrick J. Wong
2025-01-16 23:38   ` [PATCH 12/14] common/xfs: capture realtime devices during metadump/mdrestore Darrick J. Wong
2025-01-16 23:38   ` [PATCH 13/14] common/fuzzy: adapt the scrub stress tests to support rtgroups Darrick J. Wong
2025-01-16 23:38   ` [PATCH 14/14] xfs: fix fuzz tests of rtgroups bitmap and summary files Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 6/7] fstests: store quota files in the metadir Darrick J. Wong
2025-01-16 23:38   ` [PATCH 1/4] xfs: update tests for " Darrick J. Wong
2025-01-16 23:39   ` [PATCH 2/4] xfs: test persistent quota flags Darrick J. Wong
2025-01-16 23:39   ` [PATCH 3/4] xfs: fix quota detection in fuzz tests Darrick J. Wong
2025-01-16 23:39   ` [PATCH 4/4] xfs: fix tests for persistent qflags Darrick J. Wong
2025-01-16 23:25 ` [PATCHSET v6.2 7/7] fstests: enable quota for realtime volumes Darrick J. Wong
2025-01-16 23:40   ` [PATCH 1/3] common: enable testing of realtime quota when supported Darrick J. Wong
2025-01-16 23:40   ` [PATCH 2/3] xfs: fix quota tests to adapt to realtime quota Darrick J. Wong
2025-01-16 23:40   ` [PATCH 3/3] xfs: regression testing of quota on the realtime device Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250128072352.GP3557553@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox