All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: zlang@redhat.com, hch@lst.de, fstests@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 08/23] common: fix pkill by running test program in a separate session
Date: Tue, 21 Jan 2025 14:28:26 +1100	[thread overview]
Message-ID: <Z48UWiVlRmaBe3cY@dread.disaster.area> (raw)
In-Reply-To: <173706974197.1927324.9208284704325894988.stgit@frogsfrogsfrogs>

On Thu, Jan 16, 2025 at 03:27:15PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Run each test program with a separate session id so that we can tell
> pkill to kill all processes of a given name, but only within our own
> session id.  This /should/ suffice to run multiple fstests on the same
> machine without one instance shooting down processes of another
> instance.
> 
> This fixes a general problem with using "pkill --parent" -- if the
> process being targeted is not a direct descendant of the bash script
> calling pkill, then pkill will not do anything.  The scrub stress tests
> make use of multiple background subshells, which is how a ^C in the
> parent process fails to result in fsx/fsstress being killed.

Yeah, 'pkill --parent' was the best I had managed to come up that
mostly worked, not because it perfect. That was something I wanted
feedback on before merge because it still had problems...

> This is necessary to fix SOAK_DURATION runtime constraints for all the
> scrub stress tests.  However, there is a cost -- the test program no
> longer runs with the same controlling tty as ./check, which means that
> ^Z doesn't work and SIGINT/SIGQUIT are set to SIG_IGN.  IOWs, if a test
> wants to kill its subprocesses, it must use another signal such as
> SIGPIPE.  Fortunately, bash doesn't whine about children dying due to
> fatal signals if the children run in a different session id.
> 
> I also explored alternate designs, and this was the least unsatisfying:
> 
> a) Setting the process group didn't work because background subshells
> are assigned a new group id.

Yup, tried that.

> b) Constraining the pkill/pgrep search to a cgroup could work, but we'd
> have to set up a cgroup in which to run the fstest.

thought about that, too, and considered if systemd scopes could do
that, but ...

> 
> c) Putting test subprocesses in a systemd sub-scope and telling systemd
> to kill the sub-scope could work because ./check can already use it to
> ensure that all child processes of a test are killed.  However, this is
> an *optional* feature, which means that we'd have to require systemd.

... requiring systemd was somewhat of a show-stopper for testing
older distros.

> d) Constraining the pkill/pgrep search to a particular mount namespace
> could work, but we already have tests that set up their own mount
> namespaces, which means the constrained pgrep will not find all child
> processes of a test.

*nod*.

> e) Constraining to any other type of namespace (uts, pid, etc) might not
> work because those namespaces might not be enabled.

*nod*

I also tried modifying fsstress to catch and propagate signals and a
couple of other ways of managing processes in the stress code, but
all ended up having significantly worse downsides than using 'pkill
--parent'.

I was aware of session IDs, but I've never used them before and
hadn't gone down the rabbit hole of working out how to use them when
I posted the initial RFC patchset.

> f) Revert check-parallel and go back to one fstests instance per system.
> Zorro already chose not to revert.
> 
> So.  Change _run_seq to create a the ./$seq process with a new session
> id, update _su calls to use the same session as the parent test, update
> all the pkill sites to use a wrapper so that we only target processes
> created by *this* instance of fstests, and update SIGINT to SIGPIPE.

Yeah, that's definitely cleaner.

.....

> @@ -1173,13 +1173,11 @@ _scratch_xfs_stress_scrub_cleanup() {
>  	rm -f "$runningfile"
>  	echo "Cleaning up scrub stress run at $(date)" >> $seqres.full
>  
> -	# Send SIGINT so that bash won't print a 'Terminated' message that
> -	# distorts the golden output.
>  	echo "Killing stressor processes at $(date)" >> $seqres.full
> -	_kill_fsstress
> -	pkill -INT --parent $$ xfs_io >> $seqres.full 2>&1
> -	pkill -INT --parent $$ fsx >> $seqres.full 2>&1
> -	pkill -INT --parent $$ xfs_scrub >> $seqres.full 2>&1
> +	_pkill --echo -PIPE fsstress >> $seqres.full 2>&1
> +	_pkill --echo -PIPE xfs_io >> $seqres.full 2>&1
> +	_pkill --echo -PIPE fsx >> $seqres.full 2>&1
> +	_pkill --echo -PIPE xfs_scrub >> $seqres.full 2>&1

Removing _kill_fsstress is wrong - the fsstress process has already
been renamed, so open coding 'pkill fsstress' may not match. The
_kill_fsstress() code gets changed to do the right thing here:

> @@ -69,7 +75,7 @@ _kill_fsstress()
>  	if [ -n "$_FSSTRESS_PID" ]; then
>  		# use SIGPIPE to avoid "Killed" messages from bash
>  		echo "killing $_FSSTRESS_BIN" >> $seqres.full
> -		pkill -PIPE $_FSSTRESS_BIN >> $seqres.full 2>&1
> +		_pkill -PIPE $_FSSTRESS_BIN >> $seqres.full 2>&1
>  		_wait_for_fsstress
>  		return $?
>  	fi

Then in the next patch when the _FSSTRESS_BIN workaround goes away,
_kill_fsstress() is exactly what you open coded in
_scratch_xfs_stress_scrub_cleanup()....

i.e. common/fuzzy really shouldn't open code the fsstress process
management - it should use the wrapper like everything else does.

Everything else in the patch looks good.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2025-01-21  3:28 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-16 23:21 [PATCHBOMB] fstests: fix check-parallel and get all the xfs 6.13 changes merged Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 1/7] fstests: random fixes for v2025.01.12 Darrick J. Wong
2025-01-16 23:25   ` [PATCH 01/23] generic/476: fix fsstress process management Darrick J. Wong
2025-01-21  3:03     ` Dave Chinner
2025-01-16 23:25   ` [PATCH 02/23] metadump: make non-local function variables more obvious Darrick J. Wong
2025-01-21  3:06     ` Dave Chinner
2025-01-16 23:25   ` [PATCH 03/23] metadump: fix cleanup for v1 metadump testing Darrick J. Wong
2025-01-21  3:07     ` Dave Chinner
2025-01-16 23:26   ` [PATCH 04/23] generic/482: _run_fsstress needs the test filesystem Darrick J. Wong
2025-01-21  3:12     ` Dave Chinner
2025-01-22  3:27       ` Darrick J. Wong
2025-01-16 23:26   ` [PATCH 05/23] generic/019: don't fail if fio crashes while shutting down Darrick J. Wong
2025-01-21  3:12     ` Dave Chinner
2025-01-16 23:26   ` [PATCH 06/23] fuzzy: do not set _FSSTRESS_PID when exercising fsx Darrick J. Wong
2025-01-21  3:13     ` Dave Chinner
2025-01-16 23:27   ` [PATCH 07/23] common/rc: create a wrapper for the su command Darrick J. Wong
2025-01-16 23:27   ` [PATCH 08/23] common: fix pkill by running test program in a separate session Darrick J. Wong
2025-01-21  3:28     ` Dave Chinner [this message]
2025-01-22  4:24       ` Darrick J. Wong
2025-01-22  6:08         ` Dave Chinner
2025-01-22  7:05           ` Darrick J. Wong
2025-01-22  9:42             ` Dave Chinner
2025-01-22 21:46               ` Darrick J. Wong
2025-01-23  1:16                 ` Dave Chinner
2025-01-28  4:34                   ` Dave Chinner
2025-01-28  7:23                     ` Darrick J. Wong
2025-01-28 20:39                       ` Dave Chinner
2025-01-29  3:13                         ` Darrick J. Wong
2025-01-29  6:06                           ` Dave Chinner
2025-01-29  7:33                             ` Darrick J. Wong
2025-01-16 23:27   ` [PATCH 09/23] unmount: resume logging of stdout and stderr for filtering Darrick J. Wong
2025-01-21  3:52     ` Dave Chinner
2025-01-22  3:29       ` Darrick J. Wong
2025-01-16 23:27   ` [PATCH 10/23] mkfs: don't hardcode log size Darrick J. Wong
2025-01-21  3:58     ` Dave Chinner
2025-01-21 12:44       ` Theodore Ts'o
2025-01-21 22:05         ` Dave Chinner
2025-01-22  3:40           ` Darrick J. Wong
2025-01-22  3:36         ` Darrick J. Wong
2025-01-22  3:30       ` Darrick J. Wong
2025-01-16 23:28   ` [PATCH 11/23] common/xfs: find loop devices for non-blockdevs passed to _prepare_for_eio_shutdown Darrick J. Wong
2025-01-21  4:37     ` Dave Chinner
2025-01-22  4:05       ` Darrick J. Wong
2025-01-22  5:21         ` Dave Chinner
2025-01-16 23:28   ` [PATCH 12/23] preamble: fix missing _kill_fsstress Darrick J. Wong
2025-01-21  4:37     ` Dave Chinner
2025-01-16 23:28   ` [PATCH 13/23] generic/650: revert SOAK DURATION changes Darrick J. Wong
2025-01-21  4:57     ` Dave Chinner
2025-01-21 13:00       ` Theodore Ts'o
2025-01-21 22:15         ` Dave Chinner
2025-01-22  3:51           ` Darrick J. Wong
2025-01-22  4:08           ` Theodore Ts'o
2025-01-22  6:01             ` Dave Chinner
2025-01-22  7:02               ` Darrick J. Wong
2025-01-22  3:49       ` Darrick J. Wong
2025-01-22  4:12         ` Dave Chinner
2025-01-22  4:37           ` Darrick J. Wong
2025-01-16 23:28   ` [PATCH 14/23] generic/032: fix pinned mount failure Darrick J. Wong
2025-01-21  5:03     ` Dave Chinner
2025-01-22  4:08       ` Darrick J. Wong
2025-01-22  4:19         ` Dave Chinner
2025-01-16 23:29   ` [PATCH 15/23] fuzzy: stop __stress_scrub_fsx_loop if fsx fails Darrick J. Wong
2025-01-16 23:29   ` [PATCH 16/23] fuzzy: don't use readarray for xfsfind output Darrick J. Wong
2025-01-16 23:29   ` [PATCH 17/23] fuzzy: always stop the scrub fsstress loop on error Darrick J. Wong
2025-01-16 23:29   ` [PATCH 18/23] fuzzy: port fsx and fsstress loop to use --duration Darrick J. Wong
2025-01-16 23:30   ` [PATCH 19/23] common/rc: don't copy fsstress to $TEST_DIR Darrick J. Wong
2025-01-21  5:05     ` Dave Chinner
2025-01-22  3:52       ` Darrick J. Wong
2025-01-16 23:30   ` [PATCH 20/23] fix _require_scratch_duperemove ordering Darrick J. Wong
2025-01-16 23:30   ` [PATCH 21/23] fsstress: fix a memory leak Darrick J. Wong
2025-01-16 23:30   ` [PATCH 22/23] fsx: fix leaked log file pointer Darrick J. Wong
2025-01-16 23:31   ` [PATCH 23/23] build: initialize stack variables to zero by default Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 2/7] fstests: fix logwrites zeroing Darrick J. Wong
2025-01-16 23:31   ` [PATCH 1/3] logwrites: warn if we don't think read after discard returns zeroes Darrick J. Wong
2025-01-16 23:31   ` [PATCH 2/3] logwrites: use BLKZEROOUT if it's available Darrick J. Wong
2025-01-16 23:31   ` [PATCH 3/3] logwrites: only use BLKDISCARD if we know discard zeroes data Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 3/7] fstests: enable metadir Darrick J. Wong
2025-01-16 23:32   ` [PATCH 01/11] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong
2025-01-16 23:32   ` [PATCH 02/11] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong
2025-01-16 23:32   ` [PATCH 03/11] common/repair: patch up repair sb inode value complaints Darrick J. Wong
2025-01-16 23:32   ` [PATCH 04/11] xfs/206: update for metadata directory support Darrick J. Wong
2025-01-16 23:33   ` [PATCH 05/11] xfs/{050,144,153,299,330}: update quota reports to handle metadir trees Darrick J. Wong
2025-01-16 23:33   ` [PATCH 06/11] xfs/509: adjust inumbers accounting for metadata directories Darrick J. Wong
2025-01-16 23:33   ` [PATCH 07/11] xfs: create fuzz tests " Darrick J. Wong
2025-01-16 23:34   ` [PATCH 08/11] xfs/163: bigger fs for metadir Darrick J. Wong
2025-01-16 23:34   ` [PATCH 09/11] xfs/122: disable this test for any codebase that knows about metadir Darrick J. Wong
2025-01-16 23:34   ` [PATCH 10/11] scrub: race metapath online fsck with fsstress Darrick J. Wong
2025-01-16 23:34   ` [PATCH 11/11] xfs: test metapath repairs Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 4/7] fstests: make protofiles less janky Darrick J. Wong
2025-01-16 23:35   ` [PATCH 1/1] fstests: test mkfs.xfs protofiles with xattr support Darrick J. Wong
2025-03-02 13:15     ` Zorro Lang
2025-03-04 17:42       ` Darrick J. Wong
2025-03-04 18:00         ` Zorro Lang
2025-03-04 18:21           ` Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 5/7] fstests: shard the realtime section Darrick J. Wong
2025-01-16 23:35   ` [PATCH 01/14] common/populate: refactor caching of metadumps to a helper Darrick J. Wong
2025-01-16 23:35   ` [PATCH 02/14] common/{fuzzy,populate}: use _scratch_xfs_mdrestore Darrick J. Wong
2025-01-16 23:35   ` [PATCH 03/14] fuzzy: stress data and rt sections of xfs filesystems equally Darrick J. Wong
2025-01-16 23:36   ` [PATCH 04/14] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong
2025-01-16 23:36   ` [PATCH 05/14] common/populate: use metadump v2 format by default for fs metadata snapshots Darrick J. Wong
2025-01-16 23:36   ` [PATCH 06/14] punch-alternating: detect xfs realtime files with large allocation units Darrick J. Wong
2025-01-16 23:36   ` [PATCH 07/14] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong
2025-01-16 23:37   ` [PATCH 08/14] common: pass the realtime device to xfs_db when possible Darrick J. Wong
2025-01-16 23:37   ` [PATCH 09/14] xfs/185: update for rtgroups Darrick J. Wong
2025-01-16 23:37   ` [PATCH 10/14] xfs/449: update test to know about xfs_db -R Darrick J. Wong
2025-01-16 23:37   ` [PATCH 11/14] xfs/271,xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands Darrick J. Wong
2025-01-16 23:38   ` [PATCH 12/14] common/xfs: capture realtime devices during metadump/mdrestore Darrick J. Wong
2025-01-16 23:38   ` [PATCH 13/14] common/fuzzy: adapt the scrub stress tests to support rtgroups Darrick J. Wong
2025-01-16 23:38   ` [PATCH 14/14] xfs: fix fuzz tests of rtgroups bitmap and summary files Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 6/7] fstests: store quota files in the metadir Darrick J. Wong
2025-01-16 23:38   ` [PATCH 1/4] xfs: update tests for " Darrick J. Wong
2025-01-16 23:39   ` [PATCH 2/4] xfs: test persistent quota flags Darrick J. Wong
2025-01-16 23:39   ` [PATCH 3/4] xfs: fix quota detection in fuzz tests Darrick J. Wong
2025-01-16 23:39   ` [PATCH 4/4] xfs: fix tests for persistent qflags Darrick J. Wong
2025-01-16 23:25 ` [PATCHSET v6.2 7/7] fstests: enable quota for realtime volumes Darrick J. Wong
2025-01-16 23:40   ` [PATCH 1/3] common: enable testing of realtime quota when supported Darrick J. Wong
2025-01-16 23:40   ` [PATCH 2/3] xfs: fix quota tests to adapt to realtime quota Darrick J. Wong
2025-01-16 23:40   ` [PATCH 3/3] xfs: regression testing of quota on the realtime device Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z48UWiVlRmaBe3cY@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.