public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: zlang@redhat.com
Cc: dchinner@redhat.com, fstests@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: [PATCH v3.1 15/34] check: run tests in a private pid/mount namespace
Date: Fri, 14 Feb 2025 13:13:41 -0800	[thread overview]
Message-ID: <20250214211341.GG21799@frogsfrogsfrogs> (raw)
In-Reply-To: <173933094584.1758477.17381421804809266222.stgit@frogsfrogsfrogs>

From: Darrick J. Wong <djwong@kernel.org>

As mentioned in the previous patch, trying to isolate processes from
separate test instances through the use of distinct Unix process
sessions is annoying due to the many complications with signal handling.

Instead, we could just use nsexec to run the test program with a private
pid namespace so that each test instance can only see its own processes;
and private mount namespace so that tests writing to /tmp cannot clobber
other tests or the stuff running on the main system.  Further, the
process created by the clone(CLONE_NEWPID) call is considered the init
process of that pid namespace, so all processes will be SIGKILL'd when
the init process terminates, so we no longer need systemd scopes for
externally enforced cleanup.

However, it's not guaranteed that a particular kernel has pid and mount
namespaces enabled.  Mount (2.4.19) and pid (2.6.24) namespaces have
been around for a long time, but there's no hard requirement for the
latter to be enabled in the kernel.  Therefore, this bugfix slips
namespace support in alongside the session id thing.

Declaring CONFIG_PID_NS=n a deprecated configuration and removing
support should be a separate conversation, not something that I have to
do in a bug fix to get mainline QA back up.

Note that the new helper cannot unmount the /proc it inherits before
mounting a pidns-specific /proc because generic/504 relies on being able
to read the init_pid_ns (aka systemwide) version of /proc/locks to find
a file lock that was taken and leaked by a process.

Cc: <fstests@vger.kernel.org> # v2024.12.08
Fixes: 8973af00ec212f ("fstests: cleanup fsstress process management")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
v3.1: add makefile bits per maintainer request
---
 check               |   34 +++++++++++++++++++++++-----------
 common/rc           |   12 ++++++++++--
 src/nsexec.c        |   18 +++++++++++++++---
 tests/generic/504   |   15 +++++++++++++--
 tools/Makefile      |    3 ++-
 tools/run_privatens |   28 ++++++++++++++++++++++++++++
 6 files changed, 91 insertions(+), 19 deletions(-)
 create mode 100755 tools/run_privatens

diff --git a/check b/check
index ef8a8c3b31b3e6..22718e7b4d3c98 100755
--- a/check
+++ b/check
@@ -674,6 +674,11 @@ _stash_test_status() {
 	esac
 }
 
+# Can we run in a private pid/mount namespace?
+HAVE_PRIVATENS=
+./tools/run_privatens bash -c "exit 77"
+test $? -eq 77 && HAVE_PRIVATENS=yes
+
 # Can we run systemd scopes?
 HAVE_SYSTEMD_SCOPES=
 systemctl reset-failed "fstests-check" &>/dev/null
@@ -691,22 +696,29 @@ _adjust_oom_score -500
 # the system runs out of memory it'll be the test that gets killed and not the
 # test framework.  The test is run in a separate process without any of our
 # functions, so we open-code adjusting the OOM score.
-#
-# If systemd is available, run the entire test script in a scope so that we can
-# kill all subprocesses of the test if it fails to clean up after itself.  This
-# is essential for ensuring that the post-test unmount succeeds.  Note that
-# systemd doesn't automatically remove transient scopes that fail to terminate
-# when systemd tells them to terminate (e.g. programs stuck in D state when
-# systemd sends SIGKILL), so we use reset-failed to tear down the scope.
-#
-# Use setsid to run the test program with a separate session id so that we
-# can pkill only the processes started by this test.
 _run_seq() {
 	local res
 	unset CHILDPID
 	unset FSTESTS_ISOL	# set by tools/run_seq_*
 
-	if [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then
+	if [ -n "${HAVE_PRIVATENS}" ]; then
+		# If pid and mount namespaces are available, run the whole test
+		# inside them so that the test cannot access any process or
+		# /tmp contents that it does not itself create.  The ./$seq
+		# process is considered the "init" process of the pid
+		# namespace, so all subprocesses will be sent SIGKILL when it
+		# terminates.
+		./tools/run_privatens "./$seq"
+		res=$?
+	elif [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then
+		# If systemd is available, run the entire test script in a
+		# scope so that we can kill all subprocesses of the test if it
+		# fails to clean up after itself.  This is essential for
+		# ensuring that the post-test unmount succeeds.  Note that
+		# systemd doesn't automatically remove transient scopes that
+		# fail to terminate when systemd tells them to terminate (e.g.
+		# programs stuck in D state when systemd sends SIGKILL), so we
+		# use reset-failed to tear down the scope.
 		local unit="$(systemd-escape "fs$seq").scope"
 		systemctl reset-failed "${unit}" &> /dev/null
 		systemd-run --quiet --unit "${unit}" --scope \
diff --git a/common/rc b/common/rc
index f2fbe15104d7ba..91df9242ecef5f 100644
--- a/common/rc
+++ b/common/rc
@@ -33,7 +33,11 @@ _test_sync()
 # Kill only the processes started by this test.
 _pkill()
 {
-	pkill --session 0 "$@"
+	if [ "$FSTESTS_ISOL" = "setsid" ]; then
+		pkill --session 0 "$@"
+	else
+		pkill "$@"
+	fi
 }
 
 # Common execution handling for fsstress invocation.
@@ -2736,7 +2740,11 @@ _require_user_exists()
 # not, passing $SHELL in this manner works both for "su" and "su -c cmd".
 _su()
 {
-	su --session-command $SHELL "$@"
+	if [ "$FSTESTS_ISOL" = "setsid" ]; then
+		su --session-command $SHELL "$@"
+	else
+		su "$@"
+	fi
 }
 
 # check if a user exists and is able to execute commands.
diff --git a/src/nsexec.c b/src/nsexec.c
index 750d52df129716..5c0bc922153514 100644
--- a/src/nsexec.c
+++ b/src/nsexec.c
@@ -54,6 +54,7 @@ usage(char *pname)
     fpe("            If -M or -G is specified, -U is required\n");
     fpe("-s          Set uid/gid to 0 in the new user namespace\n");
     fpe("-v          Display verbose messages\n");
+    fpe("-z          Return child's return value\n");
     fpe("\n");
     fpe("Map strings for -M and -G consist of records of the form:\n");
     fpe("\n");
@@ -144,6 +145,8 @@ int
 main(int argc, char *argv[])
 {
     int flags, opt;
+    int return_child_status = 0;
+    int child_status = 0;
     pid_t child_pid;
     struct child_args args;
     char *uid_map, *gid_map;
@@ -161,7 +164,7 @@ main(int argc, char *argv[])
     setid = 0;
     gid_map = NULL;
     uid_map = NULL;
-    while ((opt = getopt(argc, argv, "+imnpuUM:G:vs")) != -1) {
+    while ((opt = getopt(argc, argv, "+imnpuUM:G:vsz")) != -1) {
         switch (opt) {
         case 'i': flags |= CLONE_NEWIPC;        break;
         case 'm': flags |= CLONE_NEWNS;         break;
@@ -173,6 +176,7 @@ main(int argc, char *argv[])
         case 'G': gid_map = optarg;             break;
         case 'U': flags |= CLONE_NEWUSER;       break;
         case 's': setid = 1;                    break;
+        case 'z': return_child_status = 1;      break;
         default:  usage(argv[0]);
         }
     }
@@ -229,11 +233,19 @@ main(int argc, char *argv[])
 
     close(args.pipe_fd[1]);
 
-    if (waitpid(child_pid, NULL, 0) == -1)      /* Wait for child */
+    if (waitpid(child_pid, &child_status, 0) == -1)      /* Wait for child */
         errExit("waitpid");
 
     if (verbose)
-        printf("%s: terminating\n", argv[0]);
+        printf("%s: terminating %d\n", argv[0], WEXITSTATUS(child_status));
+
+    if (return_child_status) {
+        if (WIFEXITED(child_status))
+            exit(WEXITSTATUS(child_status));
+        if (WIFSIGNALED(child_status))
+            exit(WTERMSIG(child_status) + 128); /* like sh */
+	exit(EXIT_FAILURE);
+    }
 
     exit(EXIT_SUCCESS);
 }
diff --git a/tests/generic/504 b/tests/generic/504
index 271c040e7b842a..611e6c283e215a 100755
--- a/tests/generic/504
+++ b/tests/generic/504
@@ -18,7 +18,7 @@ _cleanup()
 {
 	exec {test_fd}<&-
 	cd /
-	rm -f $tmp.*
+	rm -r -f $tmp.*
 }
 
 # Import common functions.
@@ -35,13 +35,24 @@ echo inode $tf_inode >> $seqres.full
 
 # Create new fd by exec
 exec {test_fd}> $testfile
-# flock locks the fd then exits, we should see the lock info even the owner is dead
+# flock locks the fd then exits, we should see the lock info even though the
+# owner is dead.  If we're using pid namespace isolation we have to move /proc
+# so that we can access the /proc/locks from the init_pid_ns.
+if [ "$FSTESTS_ISOL" = "privatens" ]; then
+	move_proc="$tmp.procdir"
+	mkdir -p "$move_proc"
+	mount --move /proc "$move_proc"
+fi
 flock -x $test_fd
 cat /proc/locks >> $seqres.full
 
 # Checking
 grep -q ":$tf_inode " /proc/locks || echo "lock info not found"
 
+if [ -n "$move_proc" ]; then
+	mount --move "$move_proc" /proc
+fi
+
 # success, all done
 status=0
 echo "Silence is golden"
diff --git a/tools/Makefile b/tools/Makefile
index 4e42db4ad8b12d..2d502884fbbc93 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -7,7 +7,8 @@ include $(TOPDIR)/include/builddefs
 
 TOOLS_DIR = tools
 helpers=\
-	run_setsid
+	run_setsid \
+	run_privatens
 
 include $(BUILDRULES)
 
diff --git a/tools/run_privatens b/tools/run_privatens
new file mode 100755
index 00000000000000..df94974ab30c3c
--- /dev/null
+++ b/tools/run_privatens
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2025 Oracle.  All Rights Reserved.
+#
+# Try starting things in a private pid/mount namespace with a private /tmp
+# and /proc so that child process trees cannot interfere with each other.
+
+if [ -n "${FSTESTS_ISOL}" ]; then
+	for path in /proc /tmp; do
+		mount --make-private "$path"
+	done
+	mount -t proc proc /proc
+	mount -t tmpfs tmpfs /tmp
+
+	# Allow the test to become a target of the oom killer
+	oom_knob="/proc/self/oom_score_adj"
+	test -w "${oom_knob}" && echo 250 > "${oom_knob}"
+
+	exec "$@"
+fi
+
+if [ -z "$1" ] || [ "$1" = "--help" ]; then
+	echo "Usage: $0 command [args...]"
+	exit 1
+fi
+
+FSTESTS_ISOL=privatens exec "$(dirname "$0")/../src/nsexec" -z -m -p "$0" "$@"

  parent reply	other threads:[~2025-02-14 21:13 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-12  3:30 [PATCHSET v3] fstests: random fixes for v2025.02.02 Darrick J. Wong
2025-02-12  3:30 ` [PATCH 01/34] generic/476: fix fsstress process management Darrick J. Wong
2025-02-12  3:31 ` [PATCH 02/34] metadump: make non-local function variables more obvious Darrick J. Wong
2025-02-12  3:31 ` [PATCH 03/34] metadump: fix cleanup for v1 metadump testing Darrick J. Wong
2025-02-12  3:31 ` [PATCH 04/34] generic/019: don't fail if fio crashes while shutting down Darrick J. Wong
2025-02-12  3:31 ` [PATCH 05/34] fuzzy: do not set _FSSTRESS_PID when exercising fsx Darrick J. Wong
2025-02-12  3:32 ` [PATCH 06/34] common/rc: revert recursive unmount in _clear_mount_stack Darrick J. Wong
2025-02-12  3:32 ` [PATCH 07/34] common/dump: don't replace pids arbitrarily Darrick J. Wong
2025-02-12  3:32 ` [PATCH 08/34] common/populate: correct the parent pointer name creation formulae Darrick J. Wong
2025-02-12  3:33 ` [PATCH 09/34] generic/759,760: fix MADV_COLLAPSE detection and inclusion Darrick J. Wong
2025-02-12 18:31   ` Joanne Koong
2025-02-12  3:33 ` [PATCH 10/34] generic/759,760: skip test if we can't set up a hugepage for IO Darrick J. Wong
2025-02-12 18:39   ` Joanne Koong
2025-02-12  3:33 ` [PATCH 11/34] common/rc: create a wrapper for the su command Darrick J. Wong
2025-02-12  3:33 ` [PATCH 12/34] fuzzy: kill subprocesses with SIGPIPE, not SIGINT Darrick J. Wong
2025-02-12  4:45   ` Dave Chinner
2025-02-12  6:05     ` Darrick J. Wong
2025-02-12  6:10   ` Darrick J. Wong
2025-02-12  3:34 ` [PATCH 13/34] common/rc: hoist pkill to a helper function Darrick J. Wong
2025-02-12  3:34 ` [PATCH 14/34] common: fix pkill by running test program in a separate session Darrick J. Wong
2025-02-14 17:34   ` Zorro Lang
2025-02-14 17:56     ` Darrick J. Wong
2025-02-14 20:50       ` Theodore Ts'o
2025-02-14 21:11         ` Darrick J. Wong
2025-02-15 13:34         ` Zorro Lang
2025-02-14 21:12   ` [PATCH 13.99/34] tools: add a Makefile " Darrick J. Wong
2025-02-15 13:16     ` Zorro Lang
2025-02-14 21:13   ` [PATCH v3.1 14/34] common: fix pkill by running test program in a " Darrick J. Wong
2025-02-15 13:22     ` Zorro Lang
2025-02-15 16:54       ` Darrick J. Wong
2025-02-12  3:34 ` [PATCH 15/34] check: run tests in a private pid/mount namespace Darrick J. Wong
2025-02-14 17:36   ` Zorro Lang
2025-02-14 21:13   ` Darrick J. Wong [this message]
2025-02-25 11:27     ` [PATCH v3.1 " Shinichiro Kawasaki
2025-02-25 15:49       ` Darrick J. Wong
2025-02-25 21:29         ` Dave Chinner
2025-02-26  2:13           ` Shinichiro Kawasaki
2025-02-26  2:18             ` Darrick J. Wong
2025-02-12  3:34 ` [PATCH 16/34] check: deprecate using process sessions to isolate test instances Darrick J. Wong
2025-02-12  3:35 ` [PATCH 17/34] common/rc: don't copy fsstress to $TEST_DIR Darrick J. Wong
2025-02-12  3:35 ` [PATCH 18/34] unmount: resume logging of stdout and stderr for filtering Darrick J. Wong
2025-02-12  3:35 ` [PATCH 19/34] mkfs: don't hardcode log size Darrick J. Wong
2025-02-12  3:35 ` [PATCH 20/34] common/rc: return mount_ret in _try_scratch_mount Darrick J. Wong
2025-02-12  3:36 ` [PATCH 21/34] preamble: fix missing _kill_fsstress Darrick J. Wong
2025-02-12  3:36 ` [PATCH 22/34] generic/650: revert SOAK DURATION changes Darrick J. Wong
2025-02-12  3:36 ` [PATCH 23/34] generic/032: fix pinned mount failure Darrick J. Wong
2025-02-12  3:36 ` [PATCH 24/34] fuzzy: stop __stress_scrub_fsx_loop if fsx fails Darrick J. Wong
2025-02-12  3:37 ` [PATCH 25/34] fuzzy: don't use readarray for xfsfind output Darrick J. Wong
2025-02-12  3:37 ` [PATCH 26/34] fuzzy: always stop the scrub fsstress loop on error Darrick J. Wong
2025-02-12  3:37 ` [PATCH 27/34] fuzzy: port fsx and fsstress loop to use --duration Darrick J. Wong
2025-02-12  4:47   ` Dave Chinner
2025-02-12  3:37 ` [PATCH 28/34] fix _require_scratch_duperemove ordering Darrick J. Wong
2025-02-12  3:38 ` [PATCH 29/34] fsstress: fix a memory leak Darrick J. Wong
2025-02-12  3:38 ` [PATCH 30/34] fsx: fix leaked log file pointer Darrick J. Wong
2025-02-12  3:38 ` [PATCH 31/34] misc: don't put nr_cpus into the fsstress -n argument Darrick J. Wong
2025-02-12  3:39 ` [PATCH 32/34] common/config: add $here to FSSTRESS_PROG Darrick J. Wong
2025-02-12  3:39 ` [PATCH 33/34] config: add FSX_PROG variable Darrick J. Wong
2025-02-12  3:39 ` [PATCH 34/34] build: initialize stack variables to zero by default Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250214211341.GG21799@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox